From ron3200 at gmail.com Fri May 1 00:59:03 2015 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 30 Apr 2015 18:59:03 -0400 Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects In-Reply-To: References: Message-ID: On 04/30/2015 04:37 PM, Oscar Benjamin wrote: > With PEP 492 it seems that I would get something like: > >>>> >>>async def af(): pass >>>> >>>ag = af() >>>> >>>ag > > It seems harder to think of a good name for ag though. A waiter? or awaiter? As in a-wait-ing an awaiter. Maybe there's a restaurant/food way of describing how it works. :-) I'm not sure I have the use correct. But I think we need to use "await af()" when calling an async function. Cheers, Ron From greg.ewing at canterbury.ac.nz Fri May 1 02:26:53 2015 From: greg.ewing at canterbury.ac.nz (Greg) Date: Fri, 01 May 2015 12:26:53 +1200 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: References: <20150430113644.GC5663@ando.pearwood.info> Message-ID: <5542C84D.4000107@canterbury.ac.nz> On 1/05/2015 5:31 a.m., Guido van Rossum wrote: > Ah. But 'async for' is not meant to introduce parallelism or > concurrency. This kind of confusion is why I'm not all that enamoured of using the word "async" the way PEP 492 does. But since there seems to be prior art for it in other languages now, I suppose there are at least some people out there who won't be confused by it. -- Greg From steve at pearwood.info Fri May 1 02:35:52 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 1 May 2015 10:35:52 +1000 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: References: <20150430113644.GC5663@ando.pearwood.info> Message-ID: <20150501003551.GG5663@ando.pearwood.info> On Thu, Apr 30, 2015 at 07:12:11PM +0200, Todd wrote: > On Thu, Apr 30, 2015 at 1:36 PM, Steven D'Aprano > wrote: > > A parallel version of map makes sense, because the semantics of map are > > well defined: given a function f and a sequence [a, b, c, ...] it > > creates a new sequence [f(a), f(b), f(c), ...]. The assumption is that f > > is a pure-function which is side-effect free (if it isn't, you're going > > to have a bad time). The specific order in which a, b, c etc. are > > processed doesn't matter. If it does matter, then map is the wrong way > > to process it. > > > > > multiprocessing.Pool.map guarantees ordering. It is > multiprocessing.Pool.imap_unordered that doesn't. I don't think it guarantees ordering in the sense I'm referring to. It guarantees that the returned result will be [f(a), f(b), f(c), ...] in that order, but not that f(a) will be calculated before f(b), which is calculated before f(c), ... and so on. That's the point of parallelism: if f(a) takes a long time to complete, another worker may have completed f(b) in the meantime. The point I am making is that map() doesn't have any connotations of the order of execution, where as for loops have a very strong connotation of executing the block in a specific sequence. People don't tend to use map with a function with side-effects: map(lambda i: print(i) or i, range(100)) will return [0, 1, 2, ..., 99] but it may not print 0 1 2 3 ... in that order. But with a for-loop, it would be quite surprising if for i in range(100): print(i) printed the values out of order. In my opinion, sticking "mypool" in front of the "for i" doesn't change the fact that adding parallelism to a for loop would be surprising and hard to reason about. If you still wish to argue for this, one thing which may help your case is if you can identify other programming languages that have already done something similar. -- Steve From yselivanov.ml at gmail.com Fri May 1 02:54:42 2015 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 30 Apr 2015 20:54:42 -0400 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: <20150501003551.GG5663@ando.pearwood.info> References: <20150430113644.GC5663@ando.pearwood.info> <20150501003551.GG5663@ando.pearwood.info> Message-ID: <5542CED2.6080108@gmail.com> On 2015-04-30 8:35 PM, Steven D'Aprano wrote: >> multiprocessing.Pool.map guarantees ordering. It is >> >multiprocessing.Pool.imap_unordered that doesn't. > I don't think it guarantees ordering in the sense I'm referring to. It > guarantees that the returned result will be [f(a), f(b), f(c), ...] in > that order, but not that f(a) will be calculated before f(b), which is > calculated before f(c), ... and so on. That's the point of parallelism: > if f(a) takes a long time to complete, another worker may have completed > f(b) in the meantime. This is an *excellent* point. Yury From ethan at stoneleaf.us Fri May 1 03:02:01 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 30 Apr 2015 18:02:01 -0700 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: <5542CED2.6080108@gmail.com> References: <20150430113644.GC5663@ando.pearwood.info> <20150501003551.GG5663@ando.pearwood.info> <5542CED2.6080108@gmail.com> Message-ID: <20150501010201.GL10248@stoneleaf.us> On 04/30, Yury Selivanov wrote: > On 2015-04-30 8:35 PM, Steven D'Aprano wrote: >> I don't think it guarantees ordering in the sense I'm referring to. It >> guarantees that the returned result will be [f(a), f(b), f(c), ...] in >> that order, but not that f(a) will be calculated before f(b), which is >> calculated before f(c), ... and so on. That's the point of parallelism: >> if f(a) takes a long time to complete, another worker may have completed >> f(b) in the meantime. > > This is an *excellent* point. So, PEP 492 asynch for also guarantees that the loop runs in order, one at a time, with one loop finishing before the next one starts? *sigh* How disappointing. -- ~Ethan~ From yselivanov.ml at gmail.com Fri May 1 03:07:50 2015 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 30 Apr 2015 21:07:50 -0400 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: <20150501010201.GL10248@stoneleaf.us> References: <20150430113644.GC5663@ando.pearwood.info> <20150501003551.GG5663@ando.pearwood.info> <5542CED2.6080108@gmail.com> <20150501010201.GL10248@stoneleaf.us> Message-ID: <5542D1E6.80307@gmail.com> On 2015-04-30 9:02 PM, Ethan Furman wrote: > On 04/30, Yury Selivanov wrote: >> On 2015-04-30 8:35 PM, Steven D'Aprano wrote: >>> I don't think it guarantees ordering in the sense I'm referring to. It >>> guarantees that the returned result will be [f(a), f(b), f(c), ...] in >>> that order, but not that f(a) will be calculated before f(b), which is >>> calculated before f(c), ... and so on. That's the point of parallelism: >>> if f(a) takes a long time to complete, another worker may have completed >>> f(b) in the meantime. >> This is an *excellent* point. > So, PEP 492 asynch for also guarantees that the loop runs in order, one at > a time, with one loop finishing before the next one starts? > > *sigh* > > How disappointing. > No. Nothing prevents you from scheduling asynchronous parallel computation, or prefetching more data. Since __anext__ is an awaitable you can do that. Steven's point is that Todd's proposal isn't that straightforward to apply. Yury From guido at python.org Fri May 1 05:29:22 2015 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Apr 2015 20:29:22 -0700 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: <5542D1E6.80307@gmail.com> References: <20150430113644.GC5663@ando.pearwood.info> <20150501003551.GG5663@ando.pearwood.info> <5542CED2.6080108@gmail.com> <20150501010201.GL10248@stoneleaf.us> <5542D1E6.80307@gmail.com> Message-ID: On Thu, Apr 30, 2015 at 6:07 PM, Yury Selivanov wrote: > On 2015-04-30 9:02 PM, Ethan Furman wrote: > >> On 04/30, Yury Selivanov wrote: >> >>> On 2015-04-30 8:35 PM, Steven D'Aprano wrote: >>> >>>> I don't think it guarantees ordering in the sense I'm referring to. It >>>> guarantees that the returned result will be [f(a), f(b), f(c), ...] in >>>> that order, but not that f(a) will be calculated before f(b), which is >>>> calculated before f(c), ... and so on. That's the point of parallelism: >>>> if f(a) takes a long time to complete, another worker may have completed >>>> f(b) in the meantime. >>>> >>> This is an *excellent* point. >>> >> So, PEP 492 asynch for also guarantees that the loop runs in order, one at >> a time, with one loop finishing before the next one starts? >> >> *sigh* >> >> How disappointing. >> >> > > No. Nothing prevents you from scheduling asynchronous > parallel computation, or prefetching more data. Since > __anext__ is an awaitable you can do that. > That's not Ethan's point. The 'async for' statement indeed is a sequential loop: e.g. if you write async for rec in db_cursor: print(rec) you are guaranteed that the records are printed in the order in which they are produced by the database cursor. There is no implicit parallellism of the execution of the loop bodies. Of course you can introduce parallelism, but you have to be explicit about it, e.g. by calling some async function for each record *without* awaiting for the result, e.g. collecting the awaitables in a separate list and then using e.g. the gather() operation from the asyncio package: async def process_record(rec): print(rec) fs = [] for rec in db_cursor: fs.append(process_record(rec)) await asyncio.gather(*fs) This may print the records in arbitrary order. Note that unlike threads, you don't need locks, since there is no worry about parallel access to sys.stdout by print(). The print() function does not guarantee atomicity when it writes to sys.stdout, and in a threaded version of the above code you might occasionally see two records followed by two \n characters, because threads can be arbitrarily interleaved. Task switching between coroutines only happens at await (or yield [from] :-) and at the await points specified by PEP 492 in the 'async for' and 'async with' statements. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Fri May 1 07:24:39 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 01 May 2015 17:24:39 +1200 Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects In-Reply-To: References: Message-ID: <55430E17.7030404@canterbury.ac.nz> Ron Adam wrote: > A waiter? > or awaiter? > > As in a-wait-ing an awaiter. The waiter would be the function executing the await operator, not the thing it's operating on. In a restaurant, waiters wait on customers. But calling an awaitable object a "customer" doesn't seem right at all. -- Greg From ram at rachum.com Fri May 1 10:12:19 2015 From: ram at rachum.com (Ram Rachum) Date: Fri, 1 May 2015 11:12:19 +0300 Subject: [Python-ideas] Add `Executor.filter` Message-ID: Hi, What do you think about adding a method: `Executor.filter`? I was using something like this: my_things = [thing for thing in things if some_condition(thing)] But the problem was that `some_condition` took a long time to run waiting on I/O, which is a great candidate for parallelizing with ThreadPoolExecutor. I made it work using `Executor.map` and some improvizing, but it would be nicer if I could do: with concurrent.futures.ThreadPoolExecutor(100) as executor: my_things = executor.filter(some_condition, things) And have the condition run in parallel on all the threads. What do you think? Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri May 1 13:13:41 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 1 May 2015 04:13:41 -0700 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: References: <20150430113644.GC5663@ando.pearwood.info> Message-ID: On Apr 30, 2015, at 10:54, Paul Moore wrote: > >> On 30 April 2015 at 18:31, Guido van Rossum wrote: >> PEP 492 is only meant to make code easier to read and write that's already >> written to use coroutines (e.g. using the asyncio library, but not limited >> to that). > > OK, that's fair. To an outsider like me it feels like a lot of new > syntax to support a very specific use case. But that's because I don't > really have a feel for what you mean when you note "but not limited to > that". Are there any good examples or use cases for coroutines that > are *not* asyncio-based? IIRC, the original asyncio PEP has links to Greg Ewing's posts that demonstrated how you could use yield from coroutines for various purposes, including asynchronous I/O, but also things like many-actor simulations, with pretty detailed examples. > And assuming you are saying that PEP 482 > should help for those as well, could it include a non-asyncio example? > My immediate reaction is that the keywords "async" and "await" will > seem a little odd in a non-asyncio context. > > Paul > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Fri May 1 13:19:16 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 1 May 2015 04:19:16 -0700 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: <20150501003551.GG5663@ando.pearwood.info> References: <20150430113644.GC5663@ando.pearwood.info> <20150501003551.GG5663@ando.pearwood.info> Message-ID: On Apr 30, 2015, at 17:35, Steven D'Aprano wrote: > >> On Thu, Apr 30, 2015 at 07:12:11PM +0200, Todd wrote: >> On Thu, Apr 30, 2015 at 1:36 PM, Steven D'Aprano >> wrote: > >>> A parallel version of map makes sense, because the semantics of map are >>> well defined: given a function f and a sequence [a, b, c, ...] it >>> creates a new sequence [f(a), f(b), f(c), ...]. The assumption is that f >>> is a pure-function which is side-effect free (if it isn't, you're going >>> to have a bad time). The specific order in which a, b, c etc. are >>> processed doesn't matter. If it does matter, then map is the wrong way >>> to process it. >> multiprocessing.Pool.map guarantees ordering. It is >> multiprocessing.Pool.imap_unordered that doesn't. > > I don't think it guarantees ordering in the sense I'm referring to. It > guarantees that the returned result will be [f(a), f(b), f(c), ...] in > that order, but not that f(a) will be calculated before f(b), which is > calculated before f(c), ... and so on. That's the point of parallelism: > if f(a) takes a long time to complete, another worker may have completed > f(b) in the meantime. > > The point I am making is that map() doesn't have any connotations of the > order of execution, where as for loops have a very strong connotation of > executing the block in a specific sequence. People don't tend to use map > with a function with side-effects: > > map(lambda i: print(i) or i, range(100)) > > will return [0, 1, 2, ..., 99] but it may not print 0 1 2 3 ... in that > order. But with a for-loop, it would be quite surprising if > > for i in range(100): > print(i) > > printed the values out of order. In my opinion, sticking "mypool" in > front of the "for i" doesn't change the fact that adding parallelism to > a for loop would be surprising and hard to reason about. > > If you still wish to argue for this, one thing which may help your case > is if you can identify other programming languages that have already > done something similar. The obvious thing to look at here seems to be OpenMP's parallel for. I haven't used it in a long time, but IIRC, in the C bindings, you use it something like: #pragma omp_parallel_for for (int i=0; i!=100; ++i) { lots_of_work(i); } ... and it turns it into something like: for (int i=0; i!=100; ++i) { queue_put(current_team_queue, processed loop body thingy); } queue_wait(current_team_queue, 100); > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Fri May 1 13:24:55 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 1 May 2015 04:24:55 -0700 Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects In-Reply-To: <55430E17.7030404@canterbury.ac.nz> References: <55430E17.7030404@canterbury.ac.nz> Message-ID: On Apr 30, 2015, at 22:24, Greg Ewing wrote: > > Ron Adam wrote: > >> A waiter? >> or awaiter? >> As in a-wait-ing an awaiter. > > The waiter would be the function executing the await > operator, not the thing it's operating on. > > In a restaurant, waiters wait on customers. But calling > an awaitable object a "customer" doesn't seem right > at all. Well, the only thing in the restaurant besides the waiter and the customers is the Vikings, so I guess the restaurant metaphor doesn't work... Anyway, if I understand the problem, the main confusion is that we use "coroutine" both to mean a thing that can be suspended and resumed, and a function that returns such a thing. Why not just "coroutine" and "coroutine function", just as with "generator" and "generator function". If the issue is that there are other things that are coroutines besides the coroutine type... well, there are plenty of things that are iterators that are all of unrelated types, and has anyone ever been confused by that? (Of course people have been confused by iterator vs. iterable, but that's a different issue, and one that doesn't have a parallel here.) > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From oscar.j.benjamin at gmail.com Fri May 1 13:49:44 2015 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 1 May 2015 12:49:44 +0100 Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects In-Reply-To: References: <55430E17.7030404@canterbury.ac.nz> Message-ID: On 1 May 2015 at 12:24, Andrew Barnert via Python-ideas wrote: > Anyway, if I understand the problem, the main confusion is that we use "coroutine" both to mean a thing that can be suspended and resumed, and a function that returns such a thing. Why not just "coroutine" and "coroutine function", just as with "generator" and "generator function". That's the terminology in the asyncio docs I guess: https://docs.python.org/3/library/asyncio-task.html#coroutine ... except that there it is referring to decorated generator functions. That feels like a category error to me because coroutines are a generalisation a functions so if anything is the coroutine itself then it is the async def function rather than the object it returns but I guess if that's what's already being used. > If the issue is that there are other things that are coroutines besides the coroutine type... well, there are plenty of things that are iterators that are all of unrelated types, and has anyone ever been confused by that? (Of course people have been confused by iterator vs. iterable, but that's a different issue, and one that doesn't have a parallel here.) There is no concrete "iterator" type. The use of iterator as a type is explicitly intended to refer to a range of different types of objects analogous to using an interface in Java. The PEP proposes at the same time that the word coroutine should be both a generic term for objects exposing a certain interface and also the term for a specific language construct: the function resulting from an async def statement. So if I say that something is a "coroutine" it's really not clear what that means. It could mean an an asyncio.coroutine generator function, it could mean an async def function or it could mean both. Worse it could mean the object returned by either of those types of functions. -- Oscar From stefan_ml at behnel.de Fri May 1 13:58:29 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 May 2015 13:58:29 +0200 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: References: <20150430113644.GC5663@ando.pearwood.info> Message-ID: Guido van Rossum schrieb am 30.04.2015 um 19:31: > But 'async for' is not meant to introduce parallelism or concurrency. Well, the fact that it's not *meant* for that doesn't mean you can't use it for that. It allows an iterator (name it coroutine if you want) to suspend and return control to the outer caller to wait for the next item. What the caller does in order to get that item is completely up to itself. It could be called "asyncio" and do some I/O in order to get data, but it can equally well be a multi-threading setup that grabs data from a queue connected to a pool of threads. Granted, this implies an inversion of control in that it's the caller that provides the thread-pool and not the user, but it's not like it's unprecedented to work with a 'global' pool of pre-instantiated threads (or processes, for that matter) in order to avoid startup overhead. Stefan From steve at pearwood.info Fri May 1 14:22:22 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 1 May 2015 22:22:22 +1000 Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects In-Reply-To: References: <55430E17.7030404@canterbury.ac.nz> Message-ID: <20150501122222.GK5663@ando.pearwood.info> On Fri, May 01, 2015 at 12:49:44PM +0100, Oscar Benjamin wrote: > So if I say that something is a "coroutine" it's really not clear what > that means. It could mean an an asyncio.coroutine generator function, > it could mean an async def function or it could mean both. Worse it > could mean the object returned by either of those types of functions. I'm sympathetic to your concerns, and I raised a similar issue earlier. But, it's not entirely without precedence. We already use "generator" to mean both the generator-function and the generator-iterator returned from the generator-function. We use "decorator" to mean both the function and the @ syntax. Sometimes we distinguish between classes and objects (instances), sometimes we say that classes are objects, and sometimes we say that classes are instances of the metaclass. "Method" can refer to either the function object inside a class or the method instance after the descriptor protocol has run. And of course, once you start comparing terms from multiple languages, the whole thing just gets worse (contrast what Haskell considers a functor with what C++ considers a functor). It's regretable when language is ambiguous, but sometimes a little bit of ambiguity is the lesser of the evils. Human beings are usually good at interpreting that given sufficient context and understanding. If there is no good alternative to coroutine, we'll need some good documentation to disambiguate the meanings. -- Steve From guido at python.org Fri May 1 17:08:06 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 1 May 2015 08:08:06 -0700 Subject: [Python-ideas] Add `Executor.filter` In-Reply-To: References: Message-ID: Sounds like should be an easy patch. Of course, needs to work for ProcessPoolExecutor too. On Fri, May 1, 2015 at 1:12 AM, Ram Rachum wrote: > Hi, > > What do you think about adding a method: `Executor.filter`? > > I was using something like this: > > my_things = [thing for thing in things if some_condition(thing)] > > > But the problem was that `some_condition` took a long time to run waiting > on I/O, which is a great candidate for parallelizing with > ThreadPoolExecutor. I made it work using `Executor.map` and some > improvizing, but it would be nicer if I could do: > > with concurrent.futures.ThreadPoolExecutor(100) as executor: > my_things = executor.filter(some_condition, things) > > And have the condition run in parallel on all the threads. > > What do you think? > > > Thanks, > Ram. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri May 1 17:12:58 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 1 May 2015 08:12:58 -0700 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: References: <20150430113644.GC5663@ando.pearwood.info> Message-ID: On Fri, May 1, 2015 at 4:13 AM, Andrew Barnert wrote: > IIRC, the original asyncio PEP has links to Greg Ewing's posts that > demonstrated how you could use yield from coroutines for various purposes, > including asynchronous I/O, but also things like many-actor simulations, > with pretty detailed examples. http://www.cosc.canterbury.ac.nz/greg.ewing/python/yield-from/yield_from.html It has two small examples of *generator iterators* that can be nicely refactored using yield-from (no need to switch to async there), but the only meaty example using a trampoline is a scheduler for multiplexed I/O. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Fri May 1 17:13:35 2015 From: ram at rachum.com (Ram Rachum) Date: Fri, 1 May 2015 18:13:35 +0300 Subject: [Python-ideas] Add `Executor.filter` In-Reply-To: References: Message-ID: I envisioned it being implemented directly on `Executor`, so it'll automatically apply to all executor types. (I'll be happy to write the implementation if we have a general feeling that this is a desired feature.) On Fri, May 1, 2015 at 6:08 PM, Guido van Rossum wrote: > Sounds like should be an easy patch. Of course, needs to work for > ProcessPoolExecutor too. > > On Fri, May 1, 2015 at 1:12 AM, Ram Rachum wrote: > >> Hi, >> >> What do you think about adding a method: `Executor.filter`? >> >> I was using something like this: >> >> my_things = [thing for thing in things if some_condition(thing)] >> >> >> But the problem was that `some_condition` took a long time to run waiting >> on I/O, which is a great candidate for parallelizing with >> ThreadPoolExecutor. I made it work using `Executor.map` and some >> improvizing, but it would be nicer if I could do: >> >> with concurrent.futures.ThreadPoolExecutor(100) as executor: >> my_things = executor.filter(some_condition, things) >> >> And have the condition run in parallel on all the threads. >> >> What do you think? >> >> >> Thanks, >> Ram. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Fri May 1 11:25:51 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 1 May 2015 12:25:51 +0300 Subject: [Python-ideas] Top 10 Python modules that need a redesign Was: Geo coordinates conversion in stdlib In-Reply-To: References: Message-ID: On Sat, Apr 4, 2015 at 7:18 AM, Chris Angelico wrote: > On Fri, Apr 3, 2015 at 7:33 PM, anatoly techtonik > wrote: > > Author is me, so you can ask directly. Why I didn't propose to redesign? > > Because people will assume that somebody will need to write PEP and will > > force me to write one. I don't believe in "redesign by specification" > like > > current PEP process assumes and people accuse me of being lazy and > trolling > > them, because I don't want to write the PEPs. Damn, I believe in > iterative > > development and evolution, and I failed to persuade coredevs that > practices > > digged up by people under the "agile" label is not some sort of corporate > > bullshit. So it is not my problem now. I did all I am capable of. > > Why, exactly, is it that you don't want to author a PEP? Is it because > you don't have the time to devote to chairing the discussion and all? > Don't have time and limited energy for such discussions. Switching to discussion requires unloading all other information, remembering the last point, tracking what people think. If you switch to discussion few days later (because you don't have time) it needs more time to refresh the data about the state. This is highly inefficient. Expanding on that below.. > If so, you could quite possibly persuade someone else to. I'd be > willing to take on the job; convince me that your core idea is worth > pursuing (and make clear to me precisely what your core idea is), and > I could do the grunt-work of writing. But you say that you "don't > *believe in*" the process, which suggests a more philosophical > objection. What's the issue, here? Why are you holding back from such > a plan? *cue the troll music* > I don't believe in the process, right. I need data. How many people actually read the PEPs through the end? How many say that they fully support the PEP decision? How many people read the diffs after they've read the PEP and can validate that none of their previous usage cases were broken? I assume that None. That's my belief, but I'd be happy to see that data that proves me wrong. I also don't believe in the PEP process, because I can't even validate my own usage cases using the layout of information proposed by the PEP. PEP is a compression and optimization of the various usage cases expressed in verbal form that is easy to implement, but not easy to understand or argue about decisions. Especially about ones that seem not-well-thought, because of the flawed process above. I also have problems with reading specifications without diagrams and with drawing concepts on a virtual canvas in my head. I also find that some stuff in PEP is confusing, but there is no channel like StackOverflow to ask question about design decisions. Maybe I am just a poor reader, but that is the reality. I'd prefer cookbook to PEP approach. > There are many Pythons in the world. You can't just hack on CPython > and expect everything to follow on from there. Someone has to explain > to the Jython folks what they'll have to do to be compatible. Someone > has to write something up so MicroPython can run the same code that > CPython does. Someone, somewhere, has to be able to ensure that > Brython users aren't caught out by your proposed change. PEPs provide > that. (They also provide useful pointers for the "What's New" lists, > eg PEP 441.) > > So, are you proposing a change to Python? Then propose it. > The concept of "proposal" is completely fine. But the form is dated and ineffective. And I can't deal with people who are afraid of new concepts and can't see a rationale behind the buzzwords like agile, story, roadmap, user experience. These are all the de-facto tools of the new generation, and if somebody prefers to ride the steam engine, I don't mind, but me personally don't have the lifetime to move so slow. -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Fri May 1 11:44:09 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 1 May 2015 12:44:09 +0300 Subject: [Python-ideas] Top 10 Python modules that need a redesign Was: Geo coordinates conversion in stdlib In-Reply-To: References: Message-ID: On Sat, Apr 4, 2015 at 9:25 PM, Mark Lawrence wrote: > On 04/04/2015 05:18, Chris Angelico wrote: > >> On Fri, Apr 3, 2015 at 7:33 PM, anatoly techtonik >> wrote: >> >>> Author is me, so you can ask directly. Why I didn't propose to redesign? >>> Because people will assume that somebody will need to write PEP and will >>> force me to write one. I don't believe in "redesign by specification" >>> like >>> current PEP process assumes and people accuse me of being lazy and >>> trolling >>> them, because I don't want to write the PEPs. Damn, I believe in >>> iterative >>> development and evolution, and I failed to persuade coredevs that >>> practices >>> digged up by people under the "agile" label is not some sort of corporate >>> bullshit. So it is not my problem now. I did all I am capable of. >>> >> >> Why, exactly, is it that you don't want to author a PEP? Is it because >> you don't have the time to devote to chairing the discussion and all? >> If so, you could quite possibly persuade someone else to. I'd be >> willing to take on the job; convince me that your core idea is worth >> pursuing (and make clear to me precisely what your core idea is), and >> I could do the grunt-work of writing. But you say that you "don't >> *believe in*" the process, which suggests a more philosophical >> objection. What's the issue, here? Why are you holding back from such >> a plan? *cue the troll music* >> >> There are many Pythons in the world. You can't just hack on CPython >> and expect everything to follow on from there. Someone has to explain >> to the Jython folks what they'll have to do to be compatible. Someone >> has to write something up so MicroPython can run the same code that >> CPython does. Someone, somewhere, has to be able to ensure that >> Brython users aren't caught out by your proposed change. PEPs provide >> that. (They also provide useful pointers for the "What's New" lists, >> eg PEP 441.) >> >> So, are you proposing a change to Python? Then propose it. >> >> ChrisA >> >> > I don't understand why people bother with this gentleman. All talk, no > action, but expects others to do his bidding. I would say "Please go take > a running jump", but that would get me into trouble with the CoC > aficionados, so I won't. > What action can I do if I point that CLA is invalid, and nobody answers to my call? I don't agree that people are signing it without understanding the content in detail, and I got banned for it. I sent a few patches to tracker, but what's the point if people are afraid to apply even the doc fixes. Instead of obeying the order of copyright lawyers from the paper age, the role of any Internet Community is to understand and guard its own interests and protect its way of doing things. Instead of that, the community is just places a roadblock, because "lawyers know better". Anti-offtopic. If you want to see, what I do, and want to enable some of the big things that can come up in the future, please help resolve this issue with Jinja2, Python 2 and setdefaultencoding utf-8 - http://issues.roundup-tracker.org/issue2550811 - just as a core developer, send us a patch that we should commit to enable Roundup work with Jinja2 again. This a key to add "modules" field to tracker to track patches submitted to different modules (using modstats.py from https://bitbucket.org/techtonik/python-stdlib) and split the work for different interested parties. This key lower the barrier to entry by removing the need to learn XML and TAL stuff from designers who want to experiment with Python tracker to add stuff, like marking modules that need a redesign. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joseph.martinot-lagarde at m4x.org Fri May 1 17:52:32 2015 From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde) Date: Fri, 01 May 2015 17:52:32 +0200 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: <20150501003551.GG5663@ando.pearwood.info> References: <20150430113644.GC5663@ando.pearwood.info> <20150501003551.GG5663@ando.pearwood.info> Message-ID: <5543A140.6010406@m4x.org> Le 01/05/2015 02:35, Steven D'Aprano a ?crit : > > If you still wish to argue for this, one thing which may help your case > is if you can identify other programming languages that have already > done something similar. > > Cython has prange. It replaces range() in the for loop but runs the loop body in parallel using openmp: from cython.parallel import prange cdef int func(Py_ssize_t n): cdef Py_ssize_t i for i in prange(n, nogil=True): if i == 8: with gil: raise Exception() elif i == 4: break elif i == 2: return i This is an example from the cython documentation: http://docs.cython.org/src/userguide/parallelism.html Joseph From guido at python.org Fri May 1 18:56:04 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 1 May 2015 09:56:04 -0700 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: <5543A140.6010406@m4x.org> References: <20150430113644.GC5663@ando.pearwood.info> <20150501003551.GG5663@ando.pearwood.info> <5543A140.6010406@m4x.org> Message-ID: On Fri, May 1, 2015 at 8:52 AM, Joseph Martinot-Lagarde < joseph.martinot-lagarde at m4x.org> wrote: > Le 01/05/2015 02:35, Steven D'Aprano a ?crit : > >> >> If you still wish to argue for this, one thing which may help your case >> is if you can identify other programming languages that have already >> done something similar. >> >> >> Cython has prange. It replaces range() in the for loop but runs the loop > body in parallel using openmp: > > from cython.parallel import prange > > cdef int func(Py_ssize_t n): > cdef Py_ssize_t i > > for i in prange(n, nogil=True): > if i == 8: > with gil: > raise Exception() > elif i == 4: > break > elif i == 2: > return i > > This is an example from the cython documentation: > http://docs.cython.org/src/userguide/parallelism.html > Interesting. I'm trying to imagine how this could be implemented in CPython by turning the for-loop body into a coroutine. It would be a complicated transformation because of the interaction with local variables in the code surrounding the for-loop. Perhaps the compiler could mark all such variables as implicitly nonlocal. The Cython example also shows other interesting issues -- what should return or break do? In any case, I don't want this idea to distract the PEP 492 discussion -- it's a much thornier problem, and maybe coroutine concurrency isn't what we should be after here -- the use cases here seem to be true (GIL-free) parallelism. I'm imagining that pyparallel has already solved this (if it has solved anything :-). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Fri May 1 19:03:47 2015 From: ron3200 at gmail.com (Ron Adam) Date: Fri, 01 May 2015 13:03:47 -0400 Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects In-Reply-To: <55430E17.7030404@canterbury.ac.nz> References: <55430E17.7030404@canterbury.ac.nz> Message-ID: On 05/01/2015 01:24 AM, Greg Ewing wrote: > Ron Adam wrote: > >> A waiter? >> or awaiter? >> >> As in a-wait-ing an awaiter. > > The waiter would be the function executing the await > operator, not the thing it's operating on. > In a restaurant, waiters wait on customers. But calling > an awaitable object a "customer" doesn't seem right > at all. Guido has been using awaitable over in python-dev. Lets see how that works... In a restaurant, a waiter serves food. An awaitable is a specific kind of waiter.. One that may wait for other waiters to serve their customers (table) food before they serve your food, even though your order may have happened before another tables order was taken. Each awaitable only serves one table, and never takes orders or serves food to any other table. In a normal python restaurant without awaitables, each waiter must takes your order, and then serve your food, before any other waiter can take an order and serve it's customer food. The consumer is the caller of the expression. We can think of restaurant tables as function frames. The "awaiter" keyword here just makes sure an awaiter is qualified to serve food in this async restaurant. We don't want the Vikings serving food, do we. ;-) Of course someone needs to get the tables filled. That's where the maitre d' comes in. He uses an "async for", or "async with", statement to fill all the tables with customers and keeps them happy. That's not perfect, but I think it gets the general concepts correct and makes them easier to think about. (At least for me.) Cheers, Ron From joseph.martinot-lagarde at m4x.org Fri May 1 20:52:20 2015 From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde) Date: Fri, 01 May 2015 20:52:20 +0200 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: References: <20150430113644.GC5663@ando.pearwood.info> <20150501003551.GG5663@ando.pearwood.info> <5543A140.6010406@m4x.org> Message-ID: <5543CB64.8000802@m4x.org> Le 01/05/2015 18:56, Guido van Rossum a ?crit : > On Fri, May 1, 2015 at 8:52 AM, Joseph Martinot-Lagarde > > wrote: > > Le 01/05/2015 02:35, Steven D'Aprano a ?crit : > > > If you still wish to argue for this, one thing which may help > your case > is if you can identify other programming languages that have already > done something similar. > > > Cython has prange. It replaces range() in the for loop but runs the > loop body in parallel using openmp: > > from cython.parallel import prange > > cdef int func(Py_ssize_t n): > cdef Py_ssize_t i > > for i in prange(n, nogil=True): > if i == 8: > with gil: > raise Exception() > elif i == 4: > break > elif i == 2: > return i > > This is an example from the cython documentation: > http://docs.cython.org/src/userguide/parallelism.html > > > Interesting. I'm trying to imagine how this could be implemented in > CPython by turning the for-loop body into a coroutine. It would be a > complicated transformation because of the interaction with local > variables in the code surrounding the for-loop. Perhaps the compiler > could mark all such variables as implicitly nonlocal. The Cython example > also shows other interesting issues -- what should return or break do? About return and break in cython, there is a section in the documentation: "For prange() this means that the loop body is skipped after the first break, return or exception for any subsequent iteration in any thread. It is undefined which value shall be returned if multiple different values may be returned, as the iterations are in no particular order." > > In any case, I don't want this idea to distract the PEP 492 discussion > -- it's a much thornier problem, and maybe coroutine concurrency isn't > what we should be after here -- the use cases here seem to be true > (GIL-free) parallelism. I'm imagining that pyparallel has already solved > this (if it has solved anything :-). > > -- > --Guido van Rossum (python.org/~guido ) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From abarnert at yahoo.com Sat May 2 00:39:21 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 1 May 2015 15:39:21 -0700 Subject: [Python-ideas] Add `Executor.filter` In-Reply-To: References: Message-ID: On May 1, 2015, at 08:13, Ram Rachum wrote: > > I envisioned it being implemented directly on `Executor`, so it'll automatically apply to all executor types. (I'll be happy to write the implementation if we have a general feeling that this is a desired feature.) I'd say just write it if you want it. If it turns out to be so trivial everyone decides it's unnecessary to add, you've only wasted 10 minutes. If it turns out to be tricky enough to take more time, that in itself will be a great argument that it should be added so users don't screw it up themselves. Plus, of course, even if it gets rejected, you'll have the code you want for your own project. :) > >> On Fri, May 1, 2015 at 6:08 PM, Guido van Rossum wrote: >> Sounds like should be an easy patch. Of course, needs to work for ProcessPoolExecutor too. >> >>> On Fri, May 1, 2015 at 1:12 AM, Ram Rachum wrote: >>> Hi, >>> >>> What do you think about adding a method: `Executor.filter`? >>> >>> I was using something like this: >>> >>> my_things = [thing for thing in things if some_condition(thing)] >>> >>> But the problem was that `some_condition` took a long time to run waiting on I/O, which is a great candidate for parallelizing with ThreadPoolExecutor. I made it work using `Executor.map` and some improvizing, but it would be nicer if I could do: >>> >>> with concurrent.futures.ThreadPoolExecutor(100) as executor: >>> my_things = executor.filter(some_condition, things) >>> >>> And have the condition run in parallel on all the threads. >>> >>> What do you think? >>> >>> >>> Thanks, >>> Ram. >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> -- >> --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat May 2 00:52:26 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 1 May 2015 15:52:26 -0700 Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects In-Reply-To: References: <55430E17.7030404@canterbury.ac.nz> Message-ID: <47F4D662-5A3C-4781-850A-3BACE6D65A5B@yahoo.com> On May 1, 2015, at 04:49, Oscar Benjamin wrote: > > On 1 May 2015 at 12:24, Andrew Barnert via Python-ideas > wrote: >> Anyway, if I understand the problem, the main confusion is that we use "coroutine" both to mean a thing that can be suspended and resumed, and a function that returns such a thing. Why not just "coroutine" and "coroutine function", just as with "generator" and "generator function". > > That's the terminology in the asyncio docs I guess: > https://docs.python.org/3/library/asyncio-task.html#coroutine > ... except that there it is referring to decorated generator functions. > > That feels like a category error to me because coroutines are a > generalisation a functions so if anything is the coroutine itself then > it is the async def function rather than the object it returns but I > guess if that's what's already being used. I agree with this last point, and the "cofunction" terminology handled that better... In practice, I don't think this kind of thing usually causes much of a problem. For example, when first learning Swift, you have to learn that an iterator isn't really an iterator, it's a generalized index, but within the first day you've already forgotten the issue and you're just using iterators. It's no worse than switching back and forth between Self and C++, which both have things that as reasonably accurately called "iterators" but nevertheless work completely differently. But maybe the best thing to do here is look at the terminology used in the F# papers (which I think introduced the await/async idea), and then see if the same terminology is used in practice in more widespread languages like C# that borrowed the idea, and if so just go with that. Even if it's wrong, it'll be the same wrong that everyone else is learning, and if we don't have something clearly better... >> If the issue is that there are other things that are coroutines besides the coroutine type... well, there are plenty of things that are iterators that are all of unrelated types, and has anyone ever been confused by that? (Of course people have been confused by iterator vs. iterable, but that's a different issue, and one that doesn't have a parallel here.) > > There is no concrete "iterator" type. The use of iterator as a type is > explicitly intended to refer to a range of different types of objects > analogous to using an interface in Java. > > The PEP proposes at the same time that the word coroutine should be > both a generic term for objects exposing a certain interface and also > the term for a specific language construct: the function resulting > from an async def statement. > > So if I say that something is a "coroutine" it's really not clear what > that means. It could mean an an asyncio.coroutine generator function, > it could mean an async def function or it could mean both. Worse it > could mean the object returned by either of those types of functions. > > > -- > Oscar > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ram at rachum.com Sat May 2 11:25:30 2015 From: ram at rachum.com (Ram Rachum) Date: Sat, 2 May 2015 12:25:30 +0300 Subject: [Python-ideas] Add `Executor.filter` In-Reply-To: References: Message-ID: Okay, I implemented it. Might be getting something wrong because I've never worked with the internals of this module before. See attached file for a demonstration, and here's the code for just the method: def filter(self, fn, iterable, timeout=None): if timeout is not None: end_time = timeout + time.time() items_and_futures = [ (item, self.submit(fn, item)) for item in iterable ] # Yield must be hidden in closure so that the futures are submitted # before the first iterator value is required. def result_iterator(): try: for item, future in items_and_futures: if timeout is None: result = future.result() else: result = future.result(end_time - time.time()) if result: yield item finally: for _, future in items_and_futures: future.cancel() return result_iterator() On Sat, May 2, 2015 at 1:39 AM, Andrew Barnert wrote: > On May 1, 2015, at 08:13, Ram Rachum wrote: > > I envisioned it being implemented directly on `Executor`, so it'll > automatically apply to all executor types. (I'll be happy to write the > implementation if we have a general feeling that this is a desired feature.) > > > I'd say just write it if you want it. If it turns out to be so trivial > everyone decides it's unnecessary to add, you've only wasted 10 minutes. If > it turns out to be tricky enough to take more time, that in itself will be > a great argument that it should be added so users don't screw it up > themselves. > > Plus, of course, even if it gets rejected, you'll have the code you want > for your own project. :) > > > On Fri, May 1, 2015 at 6:08 PM, Guido van Rossum wrote: > >> Sounds like should be an easy patch. Of course, needs to work for >> ProcessPoolExecutor too. >> >> On Fri, May 1, 2015 at 1:12 AM, Ram Rachum wrote: >> >>> Hi, >>> >>> What do you think about adding a method: `Executor.filter`? >>> >>> I was using something like this: >>> >>> my_things = [thing for thing in things if some_condition(thing)] >>> >>> >>> But the problem was that `some_condition` took a long time to run >>> waiting on I/O, which is a great candidate for parallelizing with >>> ThreadPoolExecutor. I made it work using `Executor.map` and some >>> improvizing, but it would be nicer if I could do: >>> >>> with concurrent.futures.ThreadPoolExecutor(100) as executor: >>> my_things = executor.filter(some_condition, things) >>> >>> And have the condition run in parallel on all the threads. >>> >>> What do you think? >>> >>> >>> Thanks, >>> Ram. >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- import concurrent.futures import time import requests class NiceExecutorMixin: def filter(self, fn, iterable, timeout=None): if timeout is not None: end_time = timeout + time.time() items_and_futures = [ (item, self.submit(fn, item)) for item in iterable ] # Yield must be hidden in closure so that the futures are submitted # before the first iterator value is required. def result_iterator(): try: for item, future in items_and_futures: if timeout is None: result = future.result() else: result = future.result(end_time - time.time()) if result: yield item finally: for _, future in items_and_futures: future.cancel() return result_iterator() class MyThreadPoolExecutor(NiceExecutorMixin, concurrent.futures.ThreadPoolExecutor): pass def has_wikipedia_page(name): response = requests.get( 'http://en.wikipedia.org/wiki/%s' % name.replace(' ', '_') ) return response.status_code == 200 if __name__ == '__main__': people = ( 'Barack Obama', 'Shimon Peres', 'Justin Bieber', 'Some guy I saw on the street', 'Steve Buscemi', 'My first-grade teacher', 'Gandhi' ) people_who_have_wikipedia_pages = ( 'Barack Obama', 'Shimon Peres', 'Justin Bieber', 'Steve Buscemi', 'Gandhi' ) # assert tuple(filter(has_wikipedia_page, people_who_have_wikipedia_pages)) \ # == people_who_have_wikipedia_pages with MyThreadPoolExecutor(100) as executor: executor_filter_result = tuple( executor.filter(has_wikipedia_page, people) ) print(executor_filter_result) assert executor_filter_result == people_who_have_wikipedia_pages From stephen at xemacs.org Sat May 2 15:30:44 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 02 May 2015 22:30:44 +0900 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: References: <20150430113644.GC5663@ando.pearwood.info> Message-ID: <87y4l7nm2j.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > mypool for item in items: > do_something_here > do_something_else > do_yet_another_thing > > I'm assuming that's the OP's intention (it's certainly mine) is that > the "mypool for" loop works something like > > def _work(item): > do_something_here > do_something_else > do_yet_another_thing > for _ in mypool.map(_work, items): > # Wait for the subprocesses > pass I would think that given a pool of processors, the pool's .map method itself would implement the distribution. In fact the Pool ABC would probably provide several variations on the map method (eg, a mapreduce implementation, a map-to-list implementation, and a map-is-generator implementation depending on the treatment of results of the _work computation (if any). I don't see a need for syntax here. Aside: Doesn't the "Wait for the subprocesses" belong outside the for suite? From abarnert at yahoo.com Sat May 2 18:16:33 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 2 May 2015 09:16:33 -0700 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: <5543A140.6010406@m4x.org> References: <20150430113644.GC5663@ando.pearwood.info> <20150501003551.GG5663@ando.pearwood.info> <5543A140.6010406@m4x.org> Message-ID: On May 1, 2015, at 08:52, Joseph Martinot-Lagarde wrote: > > Le 01/05/2015 02:35, Steven D'Aprano a ?crit : >> >> If you still wish to argue for this, one thing which may help your case >> is if you can identify other programming languages that have already >> done something similar. > Cython has prange. It replaces range() in the for loop but runs the loop body in parallel using openmp: I think that's pretty good evidence that this proposal (I meant the syntax for loop modifiers, not "some way to do loops in parallel would be nice") isn't needed. What OpenMP has to do with loop modifier syntax, Cython can do with just a special iterator in normal Python syntax. Of course that doesn't guarantee that something similar to prange could be built for Python 3.5's Pool, Executor, etc. types without changes, but if even if it can't, a change to the iterator protocol to make prange bulldable doesn't seem as disruptive as a change to the basic syntax of the for loop. (Unless there just is no reasonable change to the protocol that could work.) > from cython.parallel import prange > > cdef int func(Py_ssize_t n): > cdef Py_ssize_t i > > for i in prange(n, nogil=True): > if i == 8: > with gil: > raise Exception() > elif i == 4: > break > elif i == 2: > return i > > This is an example from the cython documentation: http://docs.cython.org/src/userguide/parallelism.html > > Joseph > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ron3200 at gmail.com Sat May 2 18:45:17 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 02 May 2015 12:45:17 -0400 Subject: [Python-ideas] awaiting ... was Re: More general "for" loop handling In-Reply-To: <5542D1E6.80307@gmail.com> References: <20150430113644.GC5663@ando.pearwood.info> <20150501003551.GG5663@ando.pearwood.info> <5542CED2.6080108@gmail.com> <20150501010201.GL10248@stoneleaf.us> <5542D1E6.80307@gmail.com> Message-ID: On 04/30/2015 09:07 PM, Yury Selivanov wrote: > On 2015-04-30 9:02 PM, Ethan Furman wrote: >> On 04/30, Yury Selivanov wrote: >>> On 2015-04-30 8:35 PM, Steven D'Aprano wrote: >>>> I don't think it guarantees ordering in the sense I'm referring to. It >>>> guarantees that the returned result will be [f(a), f(b), f(c), ...] in >>>> that order, but not that f(a) will be calculated before f(b), which is >>>> calculated before f(c), ... and so on. That's the point of parallelism: >>>> if f(a) takes a long time to complete, another worker may have completed >>>> f(b) in the meantime. >>> This is an *excellent* point. >> So, PEP 492 asynch for also guarantees that the loop runs in order, one at >> a time, with one loop finishing before the next one starts? >> >> *sigh* >> >> How disappointing. > No. Nothing prevents you from scheduling asynchronous > parallel computation, or prefetching more data. Since > __anext__ is an awaitable you can do that. > > Steven's point is that Todd's proposal isn't that > straightforward to apply. Initialising several coroutines at once still doesn't seem clear/clean to me. Or maybe I'm just not getting that part yet. Here is what I would like. :-) values = awaiting [awaitable, awaitable, ...] a, b, ... = awaiting (awaitable, awaitable, ...) This doesn't have the issues of order because a list of values is returned with the same order of the awaitables. But the awaitables are scheduled in parallel. A regular for loop could still do these in order, but would pause when it gets to a values that haven't returned/resolved yet. That would probably be expected. Awaiting sets would be different... they are unordered. So we can use a set and get the items that become available as they become available... for x in awaiting {awaitable, awaitable, ...}: print(x) x would print in an arbitrary order, but that would be what I would expect here. :-) The body could have await calls in it, and so it could cooperate along with the awaiting set. Of course if it's only a few statements, that probably wouldn't make much difference. This seems like it's both explicit and simple to think about. It also seems like it might not be that hard to do, I think most of the parts are already worked out. One option is to allow await to work with iterables in this way. But the awaiting keyword would make the code clearer and error messages nicer. Cheers, Ron From ron3200 at gmail.com Sat May 2 20:12:51 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 02 May 2015 14:12:51 -0400 Subject: [Python-ideas] awaiting iterables Message-ID: (I had posted this in the "more general 'for' loop" thread, but this really is a different idea from that.) Initialising several coroutines at once still doesn't seem clear/clean to me. Here is what I would like. values = awaiting [awaitable, awaitable, ...] a, b, ... = awaiting (awaitable, awaitable, ...) This doesn't have the issues of order because a list of values is returned with the same order of the awaitables. But the awaitables are scheduled in parallel. A regular for loop could still do these in order, but would pause when it gets to a values that hasn't returned/resolved yet. That would probably be expected. for x in awaiting [awaitable, awaitable, ...]: print(x) X is printed in the order of the awaitables. Awaiting sets would be different... they are unordered. So we could use a set and get the items that become available as they become available... for x in awaiting {awaitable, awaitable, ...}: print(x) x would print in an arbitrary order, but that would be what I would expect here. The body could have await calls in it, and so it could cooperate along with the awaiting set of awaitablers. Of course if the for body is only a few statements, that probably wouldn't make much difference. This seems like it's both explicit and simple to think about. It also seems like it might not be that hard to do, I think most of the parts are already worked out. One option is to allow await to work with iterables in this way. But the awaiting keyword would make the code clearer and error messages nicer. The last piece of the puzzle is how to specify the current coroutine manager/runner. import asyncio with asyncio.coroutine_loop(): main() That seems simple enough. It prettty much abstracts out all the coroutine specific stuff to three keyword. async, await, and awaiting. Are async for and async with needed if we have awaiting? Can they be impelented in terms of awaiting? import asyncio async def factorial(name, number): f = 1 for i in range(2, number+1): print("Task %s: Compute factorial(%s)..." % (name, i)) await yielding() f *= i print("Task %s: factorial(%s) = %s" % (name, number, f)) with asyncio.coroutine_loop(): awaiting [ factorial("A", 2), factorial("B", 3), factorial("C", 4)] Compared to the example in asyncio docs... import asyncio @asyncio.coroutine def factorial(name, number): f = 1 for i in range(2, number+1): print("Task %s: Compute factorial(%s)..." % (name, i)) yield from asyncio.sleep(1) f *= i print("Task %s: factorial(%s) = %s" % (name, number, f)) loop = asyncio.get_event_loop() tasks = [ asyncio.async(factorial("A", 2)), asyncio.async(factorial("B", 3)), asyncio.async(factorial("C", 4))] loop.run_until_complete(asyncio.wait(tasks)) loop.close() Cheers, Ron From piotr.jerzy.jurkiewicz at gmail.com Sat May 2 23:24:58 2015 From: piotr.jerzy.jurkiewicz at gmail.com (Piotr Jurkiewicz) Date: Sat, 02 May 2015 23:24:58 +0200 Subject: [Python-ideas] awaiting iterables In-Reply-To: References: Message-ID: <554540AA.3080002@gmail.com> There are three modes in which you can await multiple coroutines: - iterate over results as they become ready - await till all are done - await till any is done For example C# has helper functions WhenAll and WhenAny for that: await Task.WhenAll(tasks_list); await Task.WhenAny(tasks_list); I can imagine the set of three functions being exposed to user to control waiting for multiple coroutines: asynctools.as_done() # returns asynchronous iterator for iterating over the results of coroutines as they complete asynctools.all_done() # returns a future aggregating results from the given coroutine objects, which awaited returns list of results (like asyncio.gather()) asynctools.any_done() # returns a future, which awaited returns result of first completed coroutine Example: from asynctools import as_done, all_done, any_done corobj0 = async_sql_query("SELECT...") corobj1 = async_memcached_get("someid") corobj2 = async_http_get("http://python.org") # ------------------------------------------------ # Iterate over results as coroutines complete # using async iterator await for result in as_done([corobj0, corobj1, corobj2]): print(result) # ------------------------------------------------ # Await for results of all coroutines # using async iterator results = [] await for result in as_done([corobj0, corobj1, corobj2]): results.append(result) # or using shorthand all_done() results = await all_done([corobj0, corobj1, corobj2]) # ------------------------------------------------ # Await for a result of first completed coroutine # using async iterator await for result in as_done([corobj0, corobj1, corobj2]): first_result = result break # or using shorthand any_done() first_result = await any_done([corobj0, corobj1, corobj2]) Piotr From guido at python.org Sat May 2 23:29:59 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 2 May 2015 14:29:59 -0700 Subject: [Python-ideas] awaiting iterables In-Reply-To: <554540AA.3080002@gmail.com> References: <554540AA.3080002@gmail.com> Message-ID: The asyncio package already has this functionality; check out wait() (it has various options), as_completed(), gather(). On Sat, May 2, 2015 at 2:24 PM, Piotr Jurkiewicz < piotr.jerzy.jurkiewicz at gmail.com> wrote: > There are three modes in which you can await multiple coroutines: > - iterate over results as they become ready > - await till all are done > - await till any is done > > For example C# has helper functions WhenAll and WhenAny for that: > > await Task.WhenAll(tasks_list); > await Task.WhenAny(tasks_list); > > I can imagine the set of three functions being exposed to user to control > waiting for multiple coroutines: > > asynctools.as_done() # returns asynchronous iterator for iterating over > the results of coroutines as they complete > > asynctools.all_done() # returns a future aggregating results from the > given coroutine objects, which awaited returns list of results (like > asyncio.gather()) > > asynctools.any_done() # returns a future, which awaited returns result of > first completed coroutine > > Example: > > from asynctools import as_done, all_done, any_done > > corobj0 = async_sql_query("SELECT...") > corobj1 = async_memcached_get("someid") > corobj2 = async_http_get("http://python.org") > > # ------------------------------------------------ > > # Iterate over results as coroutines complete > # using async iterator > > await for result in as_done([corobj0, corobj1, corobj2]): > print(result) > > # ------------------------------------------------ > > # Await for results of all coroutines > # using async iterator > > results = [] > await for result in as_done([corobj0, corobj1, corobj2]): > results.append(result) > > # or using shorthand all_done() > > results = await all_done([corobj0, corobj1, corobj2]) > > # ------------------------------------------------ > > # Await for a result of first completed coroutine > # using async iterator > > await for result in as_done([corobj0, corobj1, corobj2]): > first_result = result > break > > # or using shorthand any_done() > > first_result = await any_done([corobj0, corobj1, corobj2]) > > Piotr > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From piotr.jerzy.jurkiewicz at gmail.com Sun May 3 00:18:05 2015 From: piotr.jerzy.jurkiewicz at gmail.com (Piotr Jurkiewicz) Date: Sun, 03 May 2015 00:18:05 +0200 Subject: [Python-ideas] awaiting iterables In-Reply-To: References: <554540AA.3080002@gmail.com> Message-ID: <55454D1D.3080500@gmail.com> I know that. But the problem with wait() is that it returns Tasks, not their results directly. So user has to unpack them manually. Furthermore, after introduction of `await`, its name will become problematic. It will reassembles `await` too much and can cause a confusion. Its usage would result in an awkward 'await wait()'. There is a function gather(*coros_or_futures) which returns results list directly, like the function all_done() I proposed. But there is no function gather_any(*coros_or_futures), to return just a result of the first done coroutine. (One can achieve it with wait(return_when=FIRST_COMPLETED) but as mentioned before, it does not return a result directly, so there is no symmetry with gather()) Function as_completed() returns indeed an iterator over the futures as they complete, but it is not compatible with the 'async for' protocol proposed in PEP 492. So new function has to be created anyway. Therefore I deliberately placed these functions in a new asynctools module, not in the asyncio module: to emphasize that they are supposed to be used with the new-style coroutines, proposed in PEP 492. I wanted to achieve simplicity (by returning results directly) and symmetry (all_done()/any_done()). Piotr On 2015-05-02 23:29, Guido van Rossum wrote: > The asyncio package already has this functionality; check out wait() (it > has various options), as_completed(), gather(). From guido at python.org Sun May 3 02:27:51 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 2 May 2015 17:27:51 -0700 Subject: [Python-ideas] awaiting iterables In-Reply-To: <55454D1D.3080500@gmail.com> References: <554540AA.3080002@gmail.com> <55454D1D.3080500@gmail.com> Message-ID: You can try to place these in a separate module, but in the end they still depend on asyncio. You'll find out why when you try to implement any of them. Don't dismiss the effort that went into asyncio too lightly. On Sat, May 2, 2015 at 3:18 PM, Piotr Jurkiewicz < piotr.jerzy.jurkiewicz at gmail.com> wrote: > I know that. But the problem with wait() is that it returns Tasks, not > their results directly. So user has to unpack them manually. > > Furthermore, after introduction of `await`, its name will become > problematic. It will reassembles `await` too much and can cause a > confusion. Its usage would result in an awkward 'await wait()'. > > There is a function gather(*coros_or_futures) which returns results list > directly, like the function all_done() I proposed. > > But there is no function gather_any(*coros_or_futures), to return just a > result of the first done coroutine. (One can achieve it with > wait(return_when=FIRST_COMPLETED) but as mentioned before, it does not > return a result directly, so there is no symmetry with gather()) > > Function as_completed() returns indeed an iterator over the futures as > they complete, but it is not compatible with the 'async for' protocol > proposed in PEP 492. So new function has to be created anyway. > > Therefore I deliberately placed these functions in a new asynctools > module, not in the asyncio module: to emphasize that they are supposed to > be used with the new-style coroutines, proposed in PEP 492. > > I wanted to achieve simplicity (by returning results directly) and > symmetry (all_done()/any_done()). > > Piotr > > > On 2015-05-02 23:29, Guido van Rossum wrote: > >> The asyncio package already has this functionality; check out wait() (it >> has various options), as_completed(), gather(). >> > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From joseph.martinot-lagarde at m4x.org Sun May 3 23:52:32 2015 From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde) Date: Sun, 03 May 2015 23:52:32 +0200 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: References: <20150430113644.GC5663@ando.pearwood.info> <20150501003551.GG5663@ando.pearwood.info> <5543A140.6010406@m4x.org> Message-ID: <554698A0.6030709@m4x.org> Le 02/05/2015 18:16, Andrew Barnert via Python-ideas a ?crit : > On May 1, 2015, at 08:52, Joseph Martinot-Lagarde wrote: >> >> Le 01/05/2015 02:35, Steven D'Aprano a ?crit : >>> >>> If you still wish to argue for this, one thing which may help your case >>> is if you can identify other programming languages that have already >>> done something similar. >> Cython has prange. It replaces range() in the for loop but runs the loop body in parallel using openmp: > > I think that's pretty good evidence that this proposal (I meant the syntax for loop modifiers, not "some way to do loops in parallel would be nice") isn't needed. What OpenMP has to do with loop modifier syntax, Cython can do with just a special iterator in normal Python syntax. Cython uses python syntax but the behavior is different. This is especially obvious seeing how break and return are managed, where the difference in not only in the iterator. > > Of course that doesn't guarantee that something similar to prange could be built for Python 3.5's Pool, Executor, etc. types without changes, but if even if it can't, a change to the iterator protocol to make prange bulldable doesn't seem as disruptive as a change to the basic syntax of the for loop. (Unless there just is no reasonable change to the protocol that could work.) > >> from cython.parallel import prange >> >> cdef int func(Py_ssize_t n): >> cdef Py_ssize_t i >> >> for i in prange(n, nogil=True): >> if i == 8: >> with gil: >> raise Exception() >> elif i == 4: >> break >> elif i == 2: >> return i >> >> This is an example from the cython documentation: http://docs.cython.org/src/userguide/parallelism.html >> >> Joseph >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From joseph.martinot-lagarde at m4x.org Sun May 3 23:55:04 2015 From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde) Date: Sun, 03 May 2015 23:55:04 +0200 Subject: [Python-ideas] More general "for" loop handling In-Reply-To: <554698A0.6030709@m4x.org> References: <20150430113644.GC5663@ando.pearwood.info> <20150501003551.GG5663@ando.pearwood.info> <5543A140.6010406@m4x.org> <554698A0.6030709@m4x.org> Message-ID: <55469938.6010501@m4x.org> Le 03/05/2015 23:52, Joseph Martinot-Lagarde a ?crit : > Le 02/05/2015 18:16, Andrew Barnert via Python-ideas a ?crit : >> On May 1, 2015, at 08:52, Joseph Martinot-Lagarde >> wrote: >>> >>> Le 01/05/2015 02:35, Steven D'Aprano a ?crit : >>>> >>>> If you still wish to argue for this, one thing which may help your case >>>> is if you can identify other programming languages that have already >>>> done something similar. >>> Cython has prange. It replaces range() in the for loop but runs the >>> loop body in parallel using openmp: >> >> I think that's pretty good evidence that this proposal (I meant the >> syntax for loop modifiers, not "some way to do loops in parallel would >> be nice") isn't needed. What OpenMP has to do with loop modifier >> syntax, Cython can do with just a special iterator in normal Python >> syntax. > > Cython uses python syntax but the behavior is different. This is > especially obvious seeing how break and return are managed, where the > difference in not only in the iterator. > Sorry, ignore my last email. I agree that no new *syntax* is needed. >> >> Of course that doesn't guarantee that something similar to prange >> could be built for Python 3.5's Pool, Executor, etc. types without >> changes, but if even if it can't, a change to the iterator protocol to >> make prange bulldable doesn't seem as disruptive as a change to the >> basic syntax of the for loop. (Unless there just is no reasonable >> change to the protocol that could work.) >> >>> from cython.parallel import prange >>> >>> cdef int func(Py_ssize_t n): >>> cdef Py_ssize_t i >>> >>> for i in prange(n, nogil=True): >>> if i == 8: >>> with gil: >>> raise Exception() >>> elif i == 4: >>> break >>> elif i == 2: >>> return i >>> >>> This is an example from the cython documentation: >>> http://docs.cython.org/src/userguide/parallelism.html >>> >>> Joseph >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From storchaka at gmail.com Mon May 4 10:15:47 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 04 May 2015 11:15:47 +0300 Subject: [Python-ideas] Processing surrogates in Message-ID: Surrogate characters (U+D800-U+DFFF) are not allowed in Unicode, but Python allows them in Unicode strings for different purposes. 1) To represent UTF-8, UTF-16 or UTF-32 encoded strings that contain surrogate characters. This data can came from other programs, including Python 2. 2) To represent undecodable bytes in ASCII-compatible encoding with the "surrogateescape" error handlers. So surrogate characters can be obtained from "surrogateescape" or "surrogatepass" error handlers or created manually with chr() or %c. Some encodings (UTF-7, unicode-escape) also allows surrogate characters. But on output the surrogate characters can cause fail. In issue18814 proposed several functions to work with surrogate and astral characters. All these functions takes a string and returns a string. * rehandle_surrogatepass(string, errors) Handles surrogate characters (U+D800-U+DFFF) with specified error handler. E.g. rehandle_surrogatepass('?\udcba', 'strict') -> error rehandle_surrogatepass('?\udcba', 'ignore') -> '?' rehandle_surrogatepass('?\udcba', 'replace') -> '?\ufffd' rehandle_surrogatepass('?\udcba', 'backslashreplace') -> '?\\udcba' * rehandle_surrogateescape(string, errors) Handles non-ASCII bytes encoded with surrogate characters in range U+DC80-U+DCFF with specified error handler. Surrogate characters outside of range U+DC80-U+DCFF cause error. E.g. rehandle_surrogateescape('?\udcba', 'strict') -> error rehandle_surrogateescape('?\udcba', 'ignore') -> '?' rehandle_surrogateescape('?\udcba', 'replace') -> '?\ufffd' rehandle_surrogateescape('?\udcba', 'backslashreplace') -> '?\\xba' * handle_astrals(string, errors) Handles non-BMP characters (U+10000-U+10FFFF) with specified error handler. E.g. handle_astrals('?\U00012345', 'strict') -> error handle_astrals('?\U00012345', 'ignore') -> '?' handle_astrals('?\U00012345', 'replace') -> '?\ufffd' handle_astrals('?\U00012345', 'backslashreplace') -> '?\\U00012345' * decompose_astrals(string) Converts non-BMP characters (U+10000-U+10FFFF) to surrogate pairs. E.g. decompose_astrals('?\U00012345') -> '?\ud808\udf45' * compose_surrogate_pairs(string) Converts surrogate pairs to non-BMP characters. E.g. compose_surrogate_pairs('?\ud808\udf45') -> '?\U00012345' Function names are preliminary and discussable! Location (currently the codecs module) is discussable. Interface is discussable. These functions revive UnicodeTranslateError, not used currently (but handled with several error handlers). Proposed patch provides Python implementation in the codecs module, but after discussion I'll provide much more efficient (O(1) in best case) C implementation. From python at mrabarnett.plus.com Mon May 4 20:18:32 2015 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 04 May 2015 19:18:32 +0100 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: Message-ID: <5547B7F8.8070405@mrabarnett.plus.com> On 2015-05-04 09:15, Serhiy Storchaka wrote: > Surrogate characters (U+D800-U+DFFF) are not allowed in Unicode, but > Python allows them in Unicode strings for different purposes. > > 1) To represent UTF-8, UTF-16 or UTF-32 encoded strings that contain > surrogate characters. This data can came from other programs, including > Python 2. > > 2) To represent undecodable bytes in ASCII-compatible encoding with the > "surrogateescape" error handlers. > > So surrogate characters can be obtained from "surrogateescape" or > "surrogatepass" error handlers or created manually with chr() or %c. > Some encodings (UTF-7, unicode-escape) also allows surrogate characters. > > But on output the surrogate characters can cause fail. > > In issue18814 proposed several functions to work with surrogate and > astral characters. All these functions takes a string and returns a string. > > * rehandle_surrogatepass(string, errors) > > Handles surrogate characters (U+D800-U+DFFF) with specified error > handler. E.g. > > rehandle_surrogatepass('?\udcba', 'strict') -> error > rehandle_surrogatepass('?\udcba', 'ignore') -> '?' > rehandle_surrogatepass('?\udcba', 'replace') -> '?\ufffd' > rehandle_surrogatepass('?\udcba', 'backslashreplace') -> '?\\udcba' > > * rehandle_surrogateescape(string, errors) > > Handles non-ASCII bytes encoded with surrogate characters in range > U+DC80-U+DCFF with specified error handler. Surrogate characters outside > of range U+DC80-U+DCFF cause error. E.g. > > rehandle_surrogateescape('?\udcba', 'strict') -> error > rehandle_surrogateescape('?\udcba', 'ignore') -> '?' > rehandle_surrogateescape('?\udcba', 'replace') -> '?\ufffd' > rehandle_surrogateescape('?\udcba', 'backslashreplace') -> '?\\xba' > It looks like the first 3 are the same as rehandle_surrogatepass, so couldn't they be merged somehow? handle_surrogates('?\udcba', 'strict') -> error handle_surrogates('?\udcba', 'ignore') -> '?' handle_surrogates('?\udcba', 'replace') -> '?\ufffd' handle_surrogates('?\udcba', 'backslashreplace') -> '?\\udcba' handle_surrogates('?\udcba', 'surrogatereplace') -> '?\\xba' > * handle_astrals(string, errors) > > Handles non-BMP characters (U+10000-U+10FFFF) with specified error > handler. E.g. > > handle_astrals('?\U00012345', 'strict') -> error > handle_astrals('?\U00012345', 'ignore') -> '?' > handle_astrals('?\U00012345', 'replace') -> '?\ufffd' > handle_astrals('?\U00012345', 'backslashreplace') -> '?\\U00012345' > > * decompose_astrals(string) > > Converts non-BMP characters (U+10000-U+10FFFF) to surrogate pairs. E.g. > > decompose_astrals('?\U00012345') -> '?\ud808\udf45' > > * compose_surrogate_pairs(string) > > Converts surrogate pairs to non-BMP characters. E.g. > > compose_surrogate_pairs('?\ud808\udf45') -> '?\U00012345' > Perhaps this should be called "compose_astrals". > Function names are preliminary and discussable! Location (currently the > codecs module) is discussable. Interface is discussable. > > These functions revive UnicodeTranslateError, not used currently (but > handled with several error handlers). > > Proposed patch provides Python implementation in the codecs module, but > after discussion I'll provide much more efficient (O(1) in best case) C > implementation. > From storchaka at gmail.com Mon May 4 21:12:34 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 04 May 2015 22:12:34 +0300 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <5547B7F8.8070405@mrabarnett.plus.com> References: <5547B7F8.8070405@mrabarnett.plus.com> Message-ID: On 04.05.15 21:18, MRAB wrote: > On 2015-05-04 09:15, Serhiy Storchaka wrote: >> * rehandle_surrogatepass(string, errors) >> >> Handles surrogate characters (U+D800-U+DFFF) with specified error >> handler. E.g. >> >> rehandle_surrogatepass('?\udcba', 'strict') -> error >> rehandle_surrogatepass('?\udcba', 'ignore') -> '?' >> rehandle_surrogatepass('?\udcba', 'replace') -> '?\ufffd' >> rehandle_surrogatepass('?\udcba', 'backslashreplace') -> '?\\udcba' >> >> * rehandle_surrogateescape(string, errors) >> >> Handles non-ASCII bytes encoded with surrogate characters in range >> U+DC80-U+DCFF with specified error handler. Surrogate characters outside >> of range U+DC80-U+DCFF cause error. E.g. >> >> rehandle_surrogateescape('?\udcba', 'strict') -> error >> rehandle_surrogateescape('?\udcba', 'ignore') -> '?' >> rehandle_surrogateescape('?\udcba', 'replace') -> '?\ufffd' >> rehandle_surrogateescape('?\udcba', 'backslashreplace') -> '?\\xba' >> > It looks like the first 3 are the same as rehandle_surrogatepass, so > couldn't they be merged somehow? > > handle_surrogates('?\udcba', 'strict') -> error > handle_surrogates('?\udcba', 'ignore') -> '?' > handle_surrogates('?\udcba', 'replace') -> '?\ufffd' > handle_surrogates('?\udcba', 'backslashreplace') -> '?\\udcba' > handle_surrogates('?\udcba', 'surrogatereplace') -> '?\\xba' These functions work with arbitrary error handlers, that support UnicodeTranslateError (for rehandle_surrogatepass) or UnicodeDecodeError (for rehandle_surrogateescape). They behave differently for surrogate characters outside of range U+DC80-U+DCFF. handle_surrogates() needs new error handler "surrogatereplace". >> * compose_surrogate_pairs(string) >> >> Converts surrogate pairs to non-BMP characters. E.g. >> >> compose_surrogate_pairs('?\ud808\udf45') -> '?\U00012345' >> > Perhaps this should be called "compose_astrals". May be. Or "compose_non_bmp". I have no preferences and opened this topic mainly for bikeshedding names. From stephen at xemacs.org Mon May 4 23:21:30 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 05 May 2015 06:21:30 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: Message-ID: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> Serhiy Storchaka writes: > In issue18814 proposed several functions to work with surrogate and > astral characters. All these functions takes a string and returns a > string. What's the use case? As far as I can see, in recent Python 3 PEP 393 is implemented, so non-BMP characters are represented as themselves, not as surrogate pairs. In a PEP 393-enabled Python, the only surrogates should be those due to surrogateescape error handling on input, and chr(). If you don't like the former, be careful about your use of surrogateescape, and the latter is clearly a "consenting adults" issue. Also, you mention that such surrogate characters can be received as input, which is true, but the standard codecs should already be treating those as errors. So as far as I can see, the existing codecs and error handlers already can deal with any case I might run into in practice. From storchaka at gmail.com Mon May 4 23:57:56 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 05 May 2015 00:57:56 +0300 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 05.05.15 00:21, Stephen J. Turnbull wrote: > Serhiy Storchaka writes: > > In issue18814 proposed several functions to work with surrogate and > > astral characters. All these functions takes a string and returns a > > string. > > What's the use case? As far as I can see, in recent Python 3 PEP 393 > is implemented, so non-BMP characters are represented as themselves, > not as surrogate pairs. In a PEP 393-enabled Python, the only > surrogates should be those due to surrogateescape error handling on > input, and chr(). If you don't like the former, be careful about your > use of surrogateescape, and the latter is clearly a "consenting > adults" issue. Use cases include programs that use tkinter (common build of Tcl/Tk don't accept non-BMP characters), email or wsgiref. > Also, you mention that such surrogate characters can be received as > input, which is true, but the standard codecs should already be > treating those as errors. Usually surrogate characters came from decoding with "surrogatepass" or "surrogateescape" error handlers. That is why Nick proposed names rehandle_surrogatepass and rehandle_surrogateescape. > So as far as I can see, the existing codecs and error handlers already > can deal with any case I might run into in practice. See issue18814. It is not so easy to get desirable result. Perhaps the simplest and most efficient way is to use regular expressions, and it is used in Python implementations, but C implementation can be much more efficient. From techtonik at gmail.com Sat May 2 09:48:42 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Sat, 2 May 2015 09:48:42 +0200 Subject: [Python-ideas] Support 1.x notation in version specifiers Message-ID: pip team said they won't support setting limit for major version of package being installed in the way below until it is supported by PEP 440. pip install patch==1.x The current way ==1.* conflicts with system shell expansion and the other way is not known / not intuitive. https://github.com/pypa/pip/issues/2737#issuecomment-97621684 -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Tue May 5 10:21:57 2015 From: phd at phdru.name (Oleg Broytman) Date: Tue, 5 May 2015 10:21:57 +0200 Subject: [Python-ideas] Support 1.x notation in version specifiers In-Reply-To: References: Message-ID: <20150505082157.GA15195@phdru.name> Hi! On Sat, May 02, 2015 at 09:48:42AM +0200, anatoly techtonik wrote: > pip team said they won't support setting limit for major version > of package being installed in the way below until it is supported > by PEP 440. > > pip install patch==1.x This syntax (1.x) is even less intuitive for me. > The current way ==1.* conflicts with system shell expansion Other comparison operators (< and >) conflict with shell redirection. And nobody cares because one can always quote shell metacharacters. pip install patch==1.\* pip install patch=='1.*' pip install 'patch==1.*' pip install 'patch>=1,<2' > and the other way is not known / not intuitive. > > https://github.com/pypa/pip/issues/2737#issuecomment-97621684 > -- > anatoly t. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From stephen at xemacs.org Tue May 5 10:23:52 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 05 May 2015 17:23:52 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> Serhiy Storchaka writes: > Use cases include programs that use tkinter (common build of Tcl/Tk > don't accept non-BMP characters), email or wsgiref. So, consider Tcl/Tk. If you use it for input, no problem, it *can't* produce non-BMP characters. So you're using it for output. If knowing that your design involves tkinter, you deduce you must not accept non-BMP characters on input, where's your problem? And ... you looked twice at your proposal? You have basically reproduced the codec error handling API for .decode and .encode in a bunch to str2str "rehandle" functions. In other words, you need to know as much to use "rehandle_*" properly as you do to use .decode and .encode. I do not see a win for the programmer who is mostly innocent of encoding knowledge. What you're going to see is what Ezio points out in issue18814: With Python 2 I've seen lot of people blindingly trying .decode when .encode failed (and the other way around) whenever they were getting an UnicodeError[...]. I'm afraid that confused developers will try to (mis)use redecode as a workaround to attempt to fix something that shouldn't be broken in the first place, without actually understanding what the real problem is. If we apply these rehandle_* thumbs to the holes in the I18N dike, it's just going to spring more leaks elsewhere. > See issue18814. It is not so easy to get desirable result. That's because it is damn hard to get desirable results, end of story, nothing to see here, move along, people, move along! The only way available to consistently get desirable results is a Swiftian "Modest Proposal": euthanize all those miserable folks using non-UTF-8 encodings, and start the world over again. Seriously, I see nothing in issue18814 except frustration. There's no plausible account of how these new functions are going to enable naive programmers to get better results, just complaints that the current situation is unbearable. I can't speak to wsgiref, but in email I think David is overly worried about efficiency: in most mail flows, the occasional need to mess with surrogates is going to be far overshadowed by spam/virus filtering and authentication (DKIM signature verification and DMARC/DKIM/SPF DNS lookups) on pretty much all real mailflows. So this proposal merely amounts to reintroduction of the Python 2 str confusion into Python 3. It is dangerous *precisely because* the current situation is so frustrating. These functions will not be used by "consenting adults", in most cases. Those with sufficient knowledge for "informed consent" also know enough to decode encoded text ASAP, and encode internal text ALAP, with appropriate handlers, in the first place. Rather, these str2str functions will be used by programmers at the ends of their ropes desperate to suppress "those damned Unicode errors" by any means available. In fact, they are most likely to be used and recommended by *library* writers, because they're the ones who are least like to have control over input, or to know their clients' requirements for output. "Just use rehandle_* to ameliorate the errors" is going to be far too tempting for them to resist. That Nick, of all people, supports this proposal is to me just confirmation that it's frustration, and only frustration, speaking here. He used to be one of the strongest supporters of keeping "native text" (Unicode) and "encoded text" separate by keeping the latter in bytes. From rosuav at gmail.com Tue May 5 11:17:33 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 5 May 2015 19:17:33 +1000 Subject: [Python-ideas] Support 1.x notation in version specifiers In-Reply-To: <20150505082157.GA15195@phdru.name> References: <20150505082157.GA15195@phdru.name> Message-ID: On Tue, May 5, 2015 at 6:21 PM, Oleg Broytman wrote: >> The current way ==1.* conflicts with system shell expansion > > Other comparison operators (< and >) conflict with shell redirection. > And nobody cares because one can always quote shell metacharacters. > > pip install patch==1.\* > pip install patch=='1.*' > pip install 'patch==1.*' > pip install 'patch>=1,<2' Plus, you can stick anything you like into a requirements.txt and simply 'pip install -r requirements.txt'. That's a safe option - not least since it lets you manage your dependencies in source control. ChrisA From abarnert at yahoo.com Tue May 5 11:56:34 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 5 May 2015 02:56:34 -0700 Subject: [Python-ideas] Support 1.x notation in version specifiers In-Reply-To: References: Message-ID: <0B613961-E396-4BDF-8ED8-55D1E70B5426@yahoo.com> On May 2, 2015, at 00:48, anatoly techtonik wrote: > > pip team said they won't support setting limit for major version > of package being installed in the way below until it is supported > by PEP 440. I think that's misrepresenting them. They explained why it isn't needed, and threw in an "anyway, it's not up to us"; they didn't say "sounds like a good idea, but you have to fix the PEP first". Also, if you can't use pip 6.0 or later to take advantage of the already-working syntax that they recommended you use, how would you be able to use your new syntax even if it did get added? > pip install patch==1.x > > The current way ==1.* conflicts with system shell expansion > and the other way is not known / not intuitive. > > https://github.com/pypa/pip/issues/2737#issuecomment-97621684 > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue May 5 12:00:53 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 5 May 2015 03:00:53 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <381A9EDF-A2F5-43FF-9795-FC15AEC78A9A@yahoo.com> On May 5, 2015, at 01:23, Stephen J. Turnbull wrote: > > Serhiy Storchaka writes: > >> Use cases include programs that use tkinter (common build of Tcl/Tk >> don't accept non-BMP characters), email or wsgiref. > > So, consider Tcl/Tk. If you use it for input, no problem, it *can't* > produce non-BMP characters. So you're using it for output. If > knowing that your design involves tkinter, you deduce you must not > accept non-BMP characters on input, where's your problem? The real issue with tkinter (and similar cases that can't handle BMP) is that they're actually UCS-2, and we paper over that by pretending the interface is Unicode. Maybe it would be better to wrap the low-level interfaces in `bytes` rather than `str` and put an explicit `.encode('UCS-2')` in the higher-level interfaces (or even in user code?) to make the problem obvious and debuggable rather than just pretending the problem doesn't exist? (I'm not sure if we actually have a UCS-2 codec, but if not, it's trivial to write--it's just UTF-16 without surrogates.) > And ... you looked twice at your proposal? You have basically > reproduced the codec error handling API for .decode and .encode in a > bunch to str2str "rehandle" functions. In other words, you need to > know as much to use "rehandle_*" properly as you do to use .decode and > .encode. I do not see a win for the programmer who is mostly innocent > of encoding knowledge. What you're going to see is what Ezio points > out in issue18814: > > With Python 2 I've seen lot of people blindingly trying .decode > when .encode failed (and the other way around) whenever they were > getting an UnicodeError[...]. > > I'm afraid that confused developers will try to (mis)use redecode > as a workaround to attempt to fix something that shouldn't be > broken in the first place, without actually understanding what the > real problem is. > > If we apply these rehandle_* thumbs to the holes in the I18N dike, > it's just going to spring more leaks elsewhere. > >> See issue18814. It is not so easy to get desirable result. > > That's because it is damn hard to get desirable results, end of story, > nothing to see here, move along, people, move along! The only way > available to consistently get desirable results is a Swiftian "Modest > Proposal": euthanize all those miserable folks using non-UTF-8 > encodings, and start the world over again. > > Seriously, I see nothing in issue18814 except frustration. There's no > plausible account of how these new functions are going to enable naive > programmers to get better results, just complaints that the current > situation is unbearable. I can't speak to wsgiref, but in email I > think David is overly worried about efficiency: in most mail flows, > the occasional need to mess with surrogates is going to be far > overshadowed by spam/virus filtering and authentication (DKIM > signature verification and DMARC/DKIM/SPF DNS lookups) on pretty much > all real mailflows. > > So this proposal merely amounts to reintroduction of the Python 2 str > confusion into Python 3. It is dangerous *precisely because* the > current situation is so frustrating. These functions will not be used > by "consenting adults", in most cases. Those with sufficient > knowledge for "informed consent" also know enough to decode encoded > text ASAP, and encode internal text ALAP, with appropriate handlers, > in the first place. > > Rather, these str2str functions will be used by programmers at the > ends of their ropes desperate to suppress "those damned Unicode > errors" by any means available. In fact, they are most likely to be > used and recommended by *library* writers, because they're the ones > who are least like to have control over input, or to know their > clients' requirements for output. "Just use rehandle_* to ameliorate > the errors" is going to be far too tempting for them to resist. > > That Nick, of all people, supports this proposal is to me just > confirmation that it's frustration, and only frustration, speaking > here. He used to be one of the strongest supporters of keeping > "native text" (Unicode) and "encoded text" separate by keeping the > latter in bytes. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From stephen at xemacs.org Tue May 5 12:46:41 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 05 May 2015 19:46:41 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <381A9EDF-A2F5-43FF-9795-FC15AEC78A9A@yahoo.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <381A9EDF-A2F5-43FF-9795-FC15AEC78A9A@yahoo.com> Message-ID: <87zj5j47zi.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > (I'm not sure if we actually have a UCS-2 codec, but if not, it's > trivial to write--it's just UTF-16 without surrogates.) The PEP 393 machinery knows when astral characters are introduced because it has to widen the representation. That might be a more convenient place to raise an exception on non-BMP characters. From koos.zevenhoven at aalto.fi Tue May 5 15:55:56 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Tue, 5 May 2015 16:55:56 +0300 Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?) Message-ID: <5548CBEC.3000303@aalto.fi> Hi all! I am excited about seeing what's going on with asyncio and PEP492 etc. I really like that Python is becoming more suitable for the increasing amount of async code and that the distinction between async functions and generators is increasing. In addition, however, I would also like to see the async functions and methods come even closer to regular functions and methods. This is something that is keeping me from using asyncio at the moment even if I would like to. Below I'll try to explain what and why, and a little bit of how. If it is not clear, please ask :) Motivation: One of the best things about asyncio and coroutines/async functions is that you can write asynchronous code as if it were synchronous, the difference in many places being just the use of "await" ("yield from") when calling something that may end up doing IO (somewhere down the function call chain) and that the code is run from an event loop. When writing a package that does IO, you have the option to make it either synchronous or asynchronous. Regardless of the choice, the code will look roughly the same. But what if you want to be able to do both? Should you maintain two versions, one with "async" and "await" everywhere and one without? Besides the keywords "async" and "await", async code of course differs from synchronous code by the functions/coroutines that are used for IO at the end of the function call chain. Here, I mean the end (close to) where the "yield" expressions are hidden in the async versions. At the other end of the calling chain, async code needs the event loop and associated framework (almost always asyncio?) which hides all the async scheduling fanciness etc. I'm not sure about the terminology, but I will use "L end" and "Y end" to refer to the two ends here. (L for event Loop; Y for Yield) The Y and L ends need to be compatible with each other for the code to work. While asyncio and the standard library might provide both ends in many cases, there can also be situations where a package would want to work with different combinations of L and Y end, or completely without an event loop, i.e. synchronously. In a very simple example, one might want to wrap different implementations of sleep() in a function that would pick the right one depending on the context. Perhaps something like this: async def any_sleep(seconds): if __async__.framework is None: time.sleep(1) elif __async__.framework is asyncio: await asyncio.sleep(1) else: raise RuntimeError("Was called with an unsupported async framework.") [You could of course replace sleep() with socket IO or whatever, but sleep is nice and simple. Also, a larger library would probably have a whole chain of async functions and methods before calling something like this] But if await is only allowed inside "async def", then how can any_sleep() be conveniently run in non-async code? Also, there is nothing like __async__.framework. Below, I describe what I think a potential solution might look like. Potential solution: This is simplified version; for instance, as "awaitables", I consider only async function objects here. I describe the idea in three parts: (1) next(...): Add a keyword argument "async_framework" (or whatever) to next(...) with a default value of None. When an async framework, typically asyncio, starts an async function object (coroutine) with a call to next(...), it would do something like next(coro, async_framework = asyncio). Here, asyncio could of course be replaced with any object that identifies the framework. This information would then be somehow attached to the async function object. (2) __async__.framework or something similar: Add something like __async__ that has an attribute such as .framework that allows the code inside the async function to access the information passed to next(...) by the framework (L end) using the keyword argument of next [see (1)]. (3) Generalized "await": [When the world is ready:] Allow using "await" anywhere, not just within async functions. Inside async functions, the behavior of "await" would be the same as in PEP492, with the addition that it would somehow propagate the __async__.framework value to the awaited coroutine. Outside async functions, "await" would do roughly the same as this function: def await(async_func_obj): try: next(async_func_obj) # same as next(async_func_obj, async_framework = None) except StopIteration as si: return si.value raise RuntimeError("The function does not support synchronous execution") (This function would, of course, work in Python 3.4, but it would be mostly useless because the async functions would not know that they are being called in a 'synchronous program'. IIUC, this *function* would be valid even with PEP492, but having this as a function would be ugly in the long run.) Some random thoughts: With this addition to Python, one could write libraries that work both async and non-async. When await is not inside async def, one would expect it to potentially do blocking IO, just like an await inside async def would suggest that there is a yield/suspend somewhere in there. For testing, I tried to see if there is a reasonable way to make a hack with __async__.framework that could be set by next(), but did not find an obvious way. For instance, coro.gi_frame.f_locals is read-only, I believe. An alternative to this approach could be that await would implicitly start a temporary event loop for running the coroutine, but how would it know which event loop? This might also have a huge performance overhead. Relation to PEP492: This of course still needs more thinking, but I wanted to post it here now in case there is desire to prepare for something like this already in PEP492. It is not completely clear if/how this would need to affect PEP492, but some things come to mind. For example, this could potentially remove the need for __aenter__, __aiter__, etc. or even "async for" and "async with". If __aenter__ is defined as "async def", then a with statement would do an "await" on it, and the context manager would have __async__.framework (or whatever it would be called) available, for determining what behavior is appropriate. Was this clear enough to understand which problem(s) this would be solving and how? I'd be happy to hear about any thoughts on this :). Best regards, Koos From gmludo at gmail.com Tue May 5 16:57:36 2015 From: gmludo at gmail.com (Ludovic Gasc) Date: Tue, 5 May 2015 16:57:36 +0200 Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?) In-Reply-To: <5548CBEC.3000303@aalto.fi> References: <5548CBEC.3000303@aalto.fi> Message-ID: Hi Koos, 2015-05-05 15:55 GMT+02:00 Koos Zevenhoven : > With this addition to Python, one could write libraries that work both > async and non-async. When await is not inside async def, one would expect > it to potentially do blocking IO, just like an await inside async def would > suggest that there is a yield/suspend somewhere in there. To be honest with you, I'd this type of ideas back in my mind, but for now, I've no suggestion to avoid end-developer nor low-developer nightmares. For example, we may detect if it's async or not if you have: result = await response.payload() or result = response.payload() The issue I see with that and certainly already explained during PEP492 discussions, is that it will be difficult for the developer to spot where he is forgotten await keyword, because he won't have errors. Moreover, in the use cases where async is less efficient that sync, it should be interesting to be possible, maybe with a context manager to define a block of code where all await are in fact sync (without to use event loop). But, even if a talentuous low-developper find a solution to implement this idea, because I'm not sure it's technically possible, in fact it will more easier even for end-developers to use the sync library version of this need. FYI, I've made an yocto library for my company where I need to be sync for some use cases and async for some other use cases. For the sync and async public API where the business logic behind most functions are identical, I've followed the same pattern as in Python-LDAP: http://www.python-ldap.org/doc/html/ldap.html#sending-ldap-requests I've postfixed all sync functions by "_s". For a more complex library, it may possible to have two differents classes with explicit names. At least to me, it's enough to work efficiently, explicit is better than implicit ;-) -- Ludovic Gasc (GMLudo) http://www.gmludo.eu/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From koos.zevenhoven at aalto.fi Tue May 5 17:49:47 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Tue, 5 May 2015 18:49:47 +0300 Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?) In-Reply-To: References: <5548CBEC.3000303@aalto.fi> Message-ID: <5548E69B.3000902@aalto.fi> On 2015-05-05 17:57, Ludovic Gasc wrote: > > For example, we may detect if it's async or not if you have: result = > await response.payload() or result = response.payload() > The issue I see with that and certainly already explained > during PEP492 discussions, is that it will be difficult for the > developer to spot where he is forgotten await keyword, because he > won't have errors. > Thank you for your email! I've been following quite a bit of the PEP492 discussions, but not sure if I have missed something. If there is something about await outside async def that goes further than "It is a SyntaxError to use await outside of an async def function (like it is a SyntaxError to use yield outside of def function.)", which is directly from the PEP, I've missed that. A link or pointer would be helpful. In any case, I think I understand the problem you are referring to, but is that any different from forgetting a postfix "_s" in the approach you mention below? > Moreover, in the use cases where async is less efficient that sync, it > should be interesting to be possible, maybe with a context manager to > define a block of code where all await are in fact sync (without to > use event loop). But, even if a talentuous low-developper find a > solution to implement this idea, because I'm not sure it's technically > possible, in fact it will more easier even for end-developers to use > the sync library version of this need. Surely that is possible, although may of course be hard to implement :). I think this is related to this earlier suggestion by Joshua Bartlett (which I do like): https://mail.python.org/pipermail/python-ideas/2013-January/018519.html However, I don't think it solves *this* problem. It would just become a more verbose version of what I suggested. > > FYI, I've made an yocto library for my company where I need to be sync > for some use cases and async for some other use cases. > For the sync and async public API where the business logic behind most > functions are identical, I've followed the same pattern as in > Python-LDAP: > http://www.python-ldap.org/doc/html/ldap.html#sending-ldap-requests > I've postfixed all sync functions by "_s". > > For a more complex library, it may possible to have two differents > classes with explicit names. > > At least to me, it's enough to work efficiently, explicit is better > than implicit ;-) > In my mind, this is not at all about explicit vs. implicit. It is mostly about letting the coroutines know what kind of context they are being run from. Anyway, I'm pretty sure there are plenty of people in the Python community who don't think efficiency is enough, but that is a matter of personal preference. I want everything, and that's why I'm using Python ;). -- Koos -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue May 5 19:28:46 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 6 May 2015 03:28:46 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: Message-ID: <20150505172845.GF5663@ando.pearwood.info> On Mon, May 04, 2015 at 11:15:47AM +0300, Serhiy Storchaka wrote: > Surrogate characters (U+D800-U+DFFF) are not allowed in Unicode, but > Python allows them in Unicode strings for different purposes. > > 1) To represent UTF-8, UTF-16 or UTF-32 encoded strings that contain > surrogate characters. This data can came from other programs, including > Python 2. Can you give a simple example of a Python 2 program that provides output that Python 3 will read as surrogates? > 2) To represent undecodable bytes in ASCII-compatible encoding with the > "surrogateescape" error handlers. > > So surrogate characters can be obtained from "surrogateescape" or > "surrogatepass" error handlers or created manually with chr() or %c. > > Some encodings (UTF-7, unicode-escape) also allows surrogate characters. Also UTF-16, and possible others. I'm not entirely sure, but I think that this is a mistake, if not a bug. I think that *no* UTF encoding should allow lone surrogates to escape through encoding. But I not entirely sure, so I won't argue that now -- besides, it's irrelevant to the proposal. > But on output the surrogate characters can cause fail. What do you mean by "on output"? Do you mean when printing? > In issue18814 proposed several functions to work with surrogate and > astral characters. All these functions takes a string and returns a string. I like the idea of having better surrogate and astral character handling, but I don't think I like your suggested API of using functions for this. I think this is better handled as str-to-str codecs. Unfortunately, there is still no concensus of the much-debated return of str-to-str and byte-to-byte codecs via the str.encode and byte.decode methods. At one point people were talking about adding a separate method (transform?) to handle them, but that seems to have been forgotten. Fortunately the codecs module handles them just fine: py> codecs.encode("Hello world", "rot-13") 'Uryyb jbeyq' I propose, instead of your function/method rehandle_surrogatepass(), we add a pair of str-to-str codecs: codecs.encode(mystring, 'remove_surrogates', errors='strict') codecs.encode(mystring, 'remove_astrals', errors='strict') For the first one, if the string has no surrogates, it returns the string unchanged. If it contains any surrogates, the error handler runs in the usual fashion. The second is exactly the same, except it checks for astral characters. For the avoidance of doubt: * surrogates are code points in the range U+D800 to U+DFFF inclusive; * astrals are characters from the Supplementary Multilingual Planes, that is code points U+10000 and above. Advantage of using codecs: - there's no arguments about where to put it (is it a str method? a function? in the string module? some other module? where?) - we can use the usual codec machinery, rather than duplicate it; - people already understand that codecs and error handles go together; Disadvantage: - have to use codec.encode instead of str.encode. It is slightly sad that there is still no entirely obvious way to call str-to-str codecs from the encode method, but since this is a fairly advanced and unusual use-case, I don't think it is a problem that we have to use the codecs module. > * decompose_astrals(string) > * compose_surrogate_pairs(string) I'm not sure about those. I have to think about them. -- Steve From abarnert at yahoo.com Tue May 5 19:33:28 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 5 May 2015 10:33:28 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <87zj5j47zi.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <381A9EDF-A2F5-43FF-9795-FC15AEC78A9A@yahoo.com> <87zj5j47zi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <72DABA4D-EA98-46CC-824B-BA3AF1785B04@yahoo.com> On May 5, 2015, at 03:46, Stephen J. Turnbull wrote: > > Andrew Barnert writes: > >> (I'm not sure if we actually have a UCS-2 codec, but if not, it's >> trivial to write--it's just UTF-16 without surrogates.) > > The PEP 393 machinery knows when astral characters are introduced > because it has to widen the representation. That might be a more > convenient place to raise an exception on non-BMP characters. > But the PEP 393 machinery doesn't know when it's dealing with strings that are ultimately destined for a UCS-2 application, any more than it can know when it's dealing with strings that have to be pure ASCII or CP1252 or any other character set. If you want to print emoji to a CP1252 console or write them to a Shift-JIS text file, you get an error from an explicit or implicit `str.encode` that you can debug. If you want to display emoji in a Tkinter GUI, it should be exactly the same. The only reason it isn't is that we pretend "narrow Unicode" is a real thing and implicitly convert to UTF-16 instead of making the code explicitly specify UCS-2 or UTF-16 as appropriate. From abarnert at yahoo.com Tue May 5 20:00:03 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 5 May 2015 18:00:03 +0000 (UTC) Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?) In-Reply-To: <5548CBEC.3000303@aalto.fi> References: <5548CBEC.3000303@aalto.fi> Message-ID: <1924070601.522541.1430848804012.JavaMail.yahoo@mail.yahoo.com> It seems like it might be a lot easier to approach this from the other end: Is it possible to write a decorator that takes an async coroutine function, strips out all the awaits, and returns a regular sync function? If so, all you need to do is write everything as async, and then users can "from spam import sync as spam" or "from spam import async as spam" (where async just imports all the real functions, while sync imports them and calls the decorator on all of them). That also avoids the need to have all the looking up the event loop, switching between different code branches, etc. inside every function at runtime. (Not that it matters for the performance of sleep(1), but it might matter for the performance of other functions?and, more importantly, it might make the implementation of those functions simpler and easier to debug through.) On Tuesday, May 5, 2015 7:01 AM, Koos Zevenhoven wrote: Hi all! I am excited about seeing what's going on with asyncio and PEP492 etc. I really like that Python is becoming more suitable for the increasing amount of async code and that the distinction between async functions and generators is increasing. In addition, however, I would also like to see the async functions and methods come even closer to regular functions and methods. This is something that is keeping me from using asyncio at the moment even if I would like to. Below I'll try to explain what and why, and a little bit of how. If it is not clear, please ask :) Motivation: One of the best things about asyncio and coroutines/async functions is that you can write asynchronous code as if it were synchronous, the difference in many places being just the use of "await" ("yield from") when calling something that may end up doing IO (somewhere down the function call chain) and that the code is run from an event loop. When writing a package that does IO, you have the option to make it either synchronous or asynchronous. Regardless of the choice, the code will look roughly the same. But what if you want to be able to do both? Should you maintain two versions, one with "async" and "await" everywhere and one without? Besides the keywords "async" and "await", async code of course differs from synchronous code by the functions/coroutines that are used for IO at the end of the function call chain. Here, I mean the end (close to) where the "yield" expressions are hidden in the async versions. At the other end of the calling chain, async code needs the event loop and associated framework (almost always asyncio?) which hides all the async scheduling fanciness etc. I'm not sure about the terminology, but I will use "L end" and "Y end" to refer to the two ends here. (L for event Loop; Y for Yield) The Y and L ends need to be compatible with each other for the code to work. While asyncio and the standard library might provide both ends in many cases, there can also be situations where a package would want to work with different combinations of L and Y end, or completely without an event loop, i.e. synchronously. In a very simple example, one might want to wrap different implementations of sleep() in a function that would pick the right one depending on the context. Perhaps something like this: ? async def any_sleep(seconds): ? ? ? if __async__.framework is None: ? ? ? ? ? time.sleep(1) ? ? ? elif __async__.framework is asyncio: ? ? ? ? ? await asyncio.sleep(1) ? ? ? else: ? ? ? ? ? raise RuntimeError("Was called with an unsupported async framework.") [You could of course replace sleep() with socket IO or whatever, but sleep is nice and simple. Also, a larger library would probably have a whole chain of async functions and methods before calling something like this] But if await is only allowed inside "async def", then how can any_sleep() be conveniently run in non-async code? Also, there is nothing like __async__.framework. Below, I describe what I think a potential solution might look like. Potential solution: This is simplified version; for instance, as "awaitables", I consider only async function objects here. I describe the idea in three parts: (1) next(...): Add a keyword argument "async_framework" (or whatever) to next(...) with a default value of None. When an async framework, typically asyncio, starts an async function object (coroutine) with a call to next(...), it would do something like next(coro, async_framework = asyncio). Here, asyncio could of course be replaced with any object that identifies the framework. This information would then be somehow attached to the async function object. (2) __async__.framework or something similar: Add something like __async__ that has an attribute such as .framework that allows the code inside the async function to access the information passed to next(...) by the framework (L end) using the keyword argument of next [see (1)]. (3) Generalized "await": [When the world is ready:] Allow using "await" anywhere, not just within async functions. Inside async functions, the behavior of "await" would be the same as in PEP492, with the addition that it would somehow propagate the __async__.framework value to the awaited coroutine. Outside async functions, "await" would do roughly the same as this function: ? def await(async_func_obj): ? ? ? try: ? ? ? ? ? next(async_func_obj)? # same as next(async_func_obj, async_framework = None) ? ? ? except StopIteration as si: ? ? ? ? ? return si.value ? ? ? raise RuntimeError("The function does not support synchronous execution") (This function would, of course, work in Python 3.4, but it would be mostly useless because the async functions would not know that they are being called in a 'synchronous program'. IIUC, this *function* would be valid even with PEP492, but having this as a function would be ugly in the long run.) Some random thoughts: With this addition to Python, one could write libraries that work both async and non-async. When await is not inside async def, one would expect it to potentially do blocking IO, just like an await inside async def would suggest that there is a yield/suspend somewhere in there. For testing, I tried to see if there is a reasonable way to make a hack with __async__.framework that could be set by next(), but did not find an obvious way. For instance, coro.gi_frame.f_locals is read-only, I believe. An alternative to this approach could be that await would implicitly start a temporary event loop for running the coroutine, but how would it know which event loop? This might also have a huge performance overhead. Relation to PEP492: This of course still needs more thinking, but I wanted to post it here now in case there is desire to prepare for something like this already in PEP492. It is not completely clear if/how this would need to affect PEP492, but some things come to mind. For example, this could potentially remove the need for __aenter__, __aiter__, etc. or even "async for" and "async with". If __aenter__ is defined as "async def", then a with statement would do an "await" on it, and the context manager would have __async__.framework (or whatever it would be called) available, for determining what behavior is appropriate. Was this clear enough to understand which problem(s) this would be solving and how? I'd be happy to hear about any thoughts on this :). Best regards, Koos _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue May 5 20:48:45 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 5 May 2015 11:48:45 -0700 Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?) In-Reply-To: <1924070601.522541.1430848804012.JavaMail.yahoo@mail.yahoo.com> References: <5548CBEC.3000303@aalto.fi> <1924070601.522541.1430848804012.JavaMail.yahoo@mail.yahoo.com> Message-ID: Quick notes: - I don't think it's really possible to write realistic async code independently from an async framework. - For synchronous code that wants to use some async code, the pattern is simple: asyncio.get_event_loop().run_until_complete(some_async_call(args, etc)) - We can probably wrap this in a convenience helper function so you can just write: asyncio.sync_wait(some_async_call(args, etc)) - Note that this will fail (and rightly so!) if called when the event loop is already running. On Tue, May 5, 2015 at 11:00 AM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > It seems like it might be a lot easier to approach this from the other > end: Is it possible to write a decorator that takes an async coroutine > function, strips out all the awaits, and returns a regular sync function? > If so, all you need to do is write everything as async, and then users can > "from spam import sync as spam" or "from spam import async as spam" (where > async just imports all the real functions, while sync imports them and > calls the decorator on all of them). > > That also avoids the need to have all the looking up the event loop, > switching between different code branches, etc. inside every function at > runtime. (Not that it matters for the performance of sleep(1), but it might > matter for the performance of other functions?and, more importantly, it > might make the implementation of those functions simpler and easier to > debug through.) > > > > On Tuesday, May 5, 2015 7:01 AM, Koos Zevenhoven < > koos.zevenhoven at aalto.fi> wrote: > > > > Hi all! > > I am excited about seeing what's going on with asyncio and PEP492 etc. I > really like that Python is becoming more suitable for the increasing > amount of async code and that the distinction between async functions > and generators is increasing. > > In addition, however, I would also like to see the async functions and > methods come even closer to regular functions and methods. This is > something that is keeping me from using asyncio at the moment even if I > would like to. Below I'll try to explain what and why, and a little bit > of how. If it is not clear, please ask :) > > Motivation: > > One of the best things about asyncio and coroutines/async functions is > that you can write asynchronous code as if it were synchronous, the > difference in many places being just the use of "await" ("yield from") > when calling something that may end up doing IO (somewhere down the > function call chain) and that the code is run from an event loop. > > When writing a package that does IO, you have the option to make it > either synchronous or asynchronous. Regardless of the choice, the code > will look roughly the same. But what if you want to be able to do both? > Should you maintain two versions, one with "async" and "await" > everywhere and one without? > > Besides the keywords "async" and "await", async code of course differs > from synchronous code by the functions/coroutines that are used for IO > at the end of the function call chain. Here, I mean the end (close to) > where the "yield" expressions are hidden in the async versions. At the > other end of the calling chain, async code needs the event loop and > associated framework (almost always asyncio?) which hides all the async > scheduling fanciness etc. I'm not sure about the terminology, but I will > use "L end" and "Y end" to refer to the two ends here. (L for event > Loop; Y for Yield) > > The Y and L ends need to be compatible with each other for the code to > work. While asyncio and the standard library might provide both ends in > many cases, there can also be situations where a package would want to > work with different combinations of L and Y end, or completely without > an event loop, i.e. synchronously. > > In a very simple example, one might want to wrap different > implementations of sleep() in a function that would pick the right one > depending on the context. Perhaps something like this: > > async def any_sleep(seconds): > if __async__.framework is None: > time.sleep(1) > elif __async__.framework is asyncio: > await asyncio.sleep(1) > else: > raise RuntimeError("Was called with an unsupported async > framework.") > > [You could of course replace sleep() with socket IO or whatever, but > sleep is nice and simple. Also, a larger library would probably have a > whole chain of async functions and methods before calling something like > this] > > But if await is only allowed inside "async def", then how can > any_sleep() be conveniently run in non-async code? Also, there is > nothing like __async__.framework. Below, I describe what I think a > potential solution might look like. > > > > Potential solution: > > This is simplified version; for instance, as "awaitables", I consider > only async function objects here. I describe the idea in three parts: > > (1) next(...): > > Add a keyword argument "async_framework" (or whatever) to next(...) with > a default value of None. When an async framework, typically asyncio, > starts an async function object (coroutine) with a call to next(...), it > would do something like next(coro, async_framework = asyncio). Here, > asyncio could of course be replaced with any object that identifies the > framework. This information would then be somehow attached to the async > function object. > > > (2) __async__.framework or something similar: > > Add something like __async__ that has an attribute such as .framework > that allows the code inside the async function to access the information > passed to next(...) by the framework (L end) using the keyword argument > of next [see (1)]. > > (3) Generalized "await": > > [When the world is ready:] Allow using "await" anywhere, not just within > async functions. Inside async functions, the behavior of "await" would > be the same as in PEP492, with the addition that it would somehow > propagate the __async__.framework value to the awaited coroutine. > Outside async functions, "await" would do roughly the same as this > function: > > def await(async_func_obj): > try: > next(async_func_obj) # same as next(async_func_obj, > async_framework = None) > except StopIteration as si: > return si.value > raise RuntimeError("The function does not support synchronous > execution") > > (This function would, of course, work in Python 3.4, but it would be > mostly useless because the async functions would not know that they are > being called in a 'synchronous program'. IIUC, this *function* would be > valid even with PEP492, but having this as a function would be ugly in > the long run.) > > > Some random thoughts: > > With this addition to Python, one could write libraries that work both > async and non-async. When await is not inside async def, one would > expect it to potentially do blocking IO, just like an await inside async > def would suggest that there is a yield/suspend somewhere in there. > > For testing, I tried to see if there is a reasonable way to make a hack > with __async__.framework that could be set by next(), but did not find > an obvious way. For instance, coro.gi_frame.f_locals is read-only, I > believe. > > An alternative to this approach could be that await would implicitly > start a temporary event loop for running the coroutine, but how would it > know which event loop? This might also have a huge performance overhead. > > Relation to PEP492: > > This of course still needs more thinking, but I wanted to post it here > now in case there is desire to prepare for something like this already > in PEP492. It is not completely clear if/how this would need to affect > PEP492, but some things come to mind. For example, this could > potentially remove the need for __aenter__, __aiter__, etc. or even > "async for" and "async with". If __aenter__ is defined as "async def", > then a with statement would do an "await" on it, and the context manager > would have __async__.framework (or whatever it would be called) > available, for determining what behavior is appropriate. > > Was this clear enough to understand which problem(s) this would be > solving and how? I'd be happy to hear about any thoughts on this :). > > > Best regards, > Koos > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue May 5 21:21:37 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 May 2015 05:21:37 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5 May 2015 at 18:23, Stephen J. Turnbull wrote: > So this proposal merely amounts to reintroduction of the Python 2 str > confusion into Python 3. It is dangerous *precisely because* the > current situation is so frustrating. These functions will not be used > by "consenting adults", in most cases. Those with sufficient > knowledge for "informed consent" also know enough to decode encoded > text ASAP, and encode internal text ALAP, with appropriate handlers, > in the first place. > > Rather, these str2str functions will be used by programmers at the > ends of their ropes desperate to suppress "those damned Unicode > errors" by any means available. In fact, they are most likely to be > used and recommended by *library* writers, because they're the ones > who are least like to have control over input, or to know their > clients' requirements for output. "Just use rehandle_* to ameliorate > the errors" is going to be far too tempting for them to resist. The primary intended audience is Linux distribution developers using Python 3 as the system Python. I agree misuse in other contexts is a risk, but consider assisting the migration of the Linux ecosystem from Python 2 to Python 3 sufficiently important that it's worth our while taking that risk. > That Nick, of all people, supports this proposal is to me just > confirmation that it's frustration, and only frustration, speaking > here. He used to be one of the strongest supporters of keeping > "native text" (Unicode) and "encoded text" separate by keeping the > latter in bytes. It's not frustration (at least, I don't think it is), it's a proposal for advanced tooling to deal properly with legacy *nix systems that either: a. use a locale encoding other than UTF-8; or b. don't reliably set the locale encoding for system services and cron jobs (which anecdotally appears to amount to "aren't using systemd" in the current crop of *nix init systems) If a developer only cares about Windows, Mac OS X, or modern systemd based *nix systems that use UTF-8 as the system locale, and they never set "LANG=C" before running a Python program, then these new functions will be completely irrelevant to them. (I've also submitted a request to the glibc team to make C.UTF-8 universally available, reducing the need to use "LANG=C", and they're amenable to the idea, but it requires someone to work on preparing and submitting a patch: https://sourceware.org/bugzilla/show_bug.cgi?id=17318) If, however, a developer wants to handle "LANG=C", or other non-UTF-8 locales reliably across the full spectrum of *nix systems in Python 3, they need a way to cope with system data that they *know* has been decoded incorrectly by the interpreter, as we'll potentially do exactly that for environment variables, command line arguments, stdin/stdout/stderr and more if we get bad locale encoding settings from the OS (such as when "LANG=C" is specified, or the init system simply doesn't set a locale at all and hence CPython falls back to the POSIX default of ASCII). Python 2 lets users sweep a lot of that under the rug, as the data at least round trips within the system, but you get unexpected mojibake in some cases (especially when taking local data and pushing it out over the network). Since these boundary decoding issues don't arise on properly configured modern *nix systems, we've been able to take advantage of that by moving Python 3 towards a more pragmatic and distro-friendly approach in coping with legacy *nix platforms and behaviours, primarily by starting to use "surrogateescape" by default on a few more system interfaces (e.g. on the standard streams when the OS *claims* that the locale encoding is ASCII, which we now assume to indicate a configuration error, which we can at least work around for roundtripping purposes so that "os.listdir()" works reliably at the interactive prompt). This change in approach (heavily influenced by the parallel "Python 3 as the default system Python" efforts in Ubuntu and Fedora) *has* moved us back towards an increased risk of introducing mojibake in legacy environments, but the nature of that trade-off has changed markedly from the situation back in 2009 (let alone 2006): * most popular modern Linux systems use systemd with the UTF-8 locale, which "just works" from a boundary encoding/decoding perspective (it's closely akin to the situation we've had on Mac OS X from the dawn of Python 3) * even without systemd, most modern *nix systems at least default to the UTF-8 locale, which works reliably for user processes in the absence of an explicit setting like "LANG=C", even if service daemons and cron jobs can be a bit sketchier in terms of the locale settings they receive * for legacy environments migrating from Python 2 without upgrading the underlying OS, our emphasis has shifted to tolerating "bug compatibility" at the Python level in order to ease migration, as the most appropriate long term solution for those environments is now to upgrade their OS such that it more reliably provides correct locale encoding settings to the Python 3 interpreter (which wasn't a generally available option back when Python 3 first launched) Armin Ronacher (as ever) provides a good explanation of the system interface problems that can arise in Python 3 with bad locale encoding settings here: http://click.pocoo.org/4/python3/#python3-surrogates In my view, the critical helper function for this purpose is actually "handle_surrogateescape", as that's the one that lets us readily adapt from the incorrectly specified ASCII locale encoding to any other ASCII-compatible system encoding once we've bootstrapped into a full Python environment which has more options for figuring out a suitable encoding than just looking at the locale setting provided by the C runtime. It's also the function that serves to provide the primary "hook" where we can hang documentation of this platform specific boundary encoding/decoding issue. The other suggested functions are then more about providing a "peek behind the curtain" API for folks that want to *use Python* to explore some of the ins and outs of Unicode surrogate handling. Surrogates and astrals really aren't that complicated, but we've historically hidden them away as "dark magic not to be understood by mere mortals". In reality, they're just different ways of composing sequences of integers to represent text, and the suggested APIs are designed to expose that in a way we haven't done in the past. I can't actually think of a practical purpose for them other than teaching people the basics of how Unicode representations work, but demystifying that seems sufficiently worthwhile to me that I'm not opposed to their inclusion (bear in mind I'm also the current "dis" module maintainer, and a contributor to the "inspect", so I'm a big fan of exposing underlying concepts like this in a way that lets people play with them programmatically for learning purposes). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From eric at trueblade.com Tue May 5 23:03:33 2015 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 05 May 2015 17:03:33 -0400 Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?) In-Reply-To: References: <5548CBEC.3000303@aalto.fi> <1924070601.522541.1430848804012.JavaMail.yahoo@mail.yahoo.com> Message-ID: <55493025.10209@trueblade.com> On 5/5/2015 2:48 PM, Guido van Rossum wrote: > Quick notes: > - I don't think it's really possible to write realistic async code > independently from an async framework. > - For synchronous code that wants to use some async code, the pattern is > simple: > asyncio.get_event_loop().run_until_complete(some_async_call(args, etc)) > - We can probably wrap this in a convenience helper function so you can > just write: > asyncio.sync_wait(some_async_call(args, etc)) > - Note that this will fail (and rightly so!) if called when the event > loop is already running. If we're going through all of the effort to elevate await and async def to syntax, then can't the interpreter also be aware if it's running an event loop? Then, if we are running an event loop, await becomes "yield from", using the event loop. But if we're not running an event loop, then await becomes a blocking wait, using some version of run_until_complete, whether really from asyncio or baked into the interpreter. This way, I can write my library code as being async, but it's still usable from non-async code (although it would need to be called with await, of course). I'll admit I haven't thought this all the way through, and I'm still reading through PEP 492. But if I can write my async code as if it were blocking using await, why can't it really be blocking, too? Eric. From guido at python.org Tue May 5 23:27:29 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 5 May 2015 14:27:29 -0700 Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?) In-Reply-To: <55493025.10209@trueblade.com> References: <5548CBEC.3000303@aalto.fi> <1924070601.522541.1430848804012.JavaMail.yahoo@mail.yahoo.com> <55493025.10209@trueblade.com> Message-ID: No, we can't, because the async/await are interpreted by the *compiler*, while the presence of an event loop is a condition of the *runtime*. On Tue, May 5, 2015 at 2:03 PM, Eric V. Smith wrote: > On 5/5/2015 2:48 PM, Guido van Rossum wrote: > > Quick notes: > > - I don't think it's really possible to write realistic async code > > independently from an async framework. > > - For synchronous code that wants to use some async code, the pattern is > > simple: > > asyncio.get_event_loop().run_until_complete(some_async_call(args, > etc)) > > - We can probably wrap this in a convenience helper function so you can > > just write: > > asyncio.sync_wait(some_async_call(args, etc)) > > - Note that this will fail (and rightly so!) if called when the event > > loop is already running. > > If we're going through all of the effort to elevate await and async def > to syntax, then can't the interpreter also be aware if it's running an > event loop? Then, if we are running an event loop, await becomes "yield > from", using the event loop. But if we're not running an event loop, > then await becomes a blocking wait, using some version of > run_until_complete, whether really from asyncio or baked into the > interpreter. > > This way, I can write my library code as being async, but it's still > usable from non-async code (although it would need to be called with > await, of course). > > I'll admit I haven't thought this all the way through, and I'm still > reading through PEP 492. But if I can write my async code as if it were > blocking using await, why can't it really be blocking, too? > > Eric. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From koos.zevenhoven at aalto.fi Wed May 6 00:23:19 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Wed, 6 May 2015 01:23:19 +0300 Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?) In-Reply-To: References: <5548CBEC.3000303@aalto.fi> <1924070601.522541.1430848804012.JavaMail.yahoo@mail.yahoo.com> Message-ID: <554942D7.6080107@aalto.fi> Hi Guido and Andrew, Thank you for your prompt responses! On 5.5.2015 21:48, Guido van Rossum wrote: > Quick notes: > - I don't think it's really possible to write realistic async code > independently from an async framework. And since there is asyncio in the standard library, I would assume there typically is no reason to do that either(?) However, as a side effect of my proposal, there would still be a way to use an if statement to pick the right async code to match the framework, along with matching the non-async version :). Speaking of side effects, I think the same "__async__" variable might also naturally provide this: https://mail.python.org/pipermail/python-ideas/2015-April/033152.html By the way, if I understand your first note, it might be the same as my "The Y and L ends need to be compatible with each other for the code to work." Sorry about the terminology. I hope the explanations of Y and L are somewhat understandable. > - For synchronous code that wants to use some async code, the pattern > is simple: > asyncio.get_event_loop().run_until_complete(some_async_call(args, etc)) > - We can probably wrap this in a convenience helper function so you > can just write: > asyncio.sync_wait(some_async_call(args, etc)) This is what is keeping me from using asyncio. Ignoring performance overhead, if in any synchronous script (or interactive prompt or ipython notebook) all calls to my library would look like that, I will happily use my 2.7 version that uses threads. Well, I admit that the part about "happily" is not completely true in my case. Instead, I would be quite happy typing "await ", since awaiting the function call (to finish/return a value) is exactly what I would be doing, regardless of whether there is an event loop or not. > - Note that this will fail (and rightly so!) if called when the event > loop is already running. > Regarding my proposal, there would still be a way for libraries to provide this functionality, if desired :). Please see also the comments below. > On Tue, May 5, 2015 at 11:00 AM, Andrew Barnert via Python-ideas > > wrote: > > It seems like it might be a lot easier to approach this from the > other end: Is it possible to write a decorator that takes an async > coroutine function, strips out all the awaits, and returns a > regular sync function? If so, all you need to do is write > everything as async, and then users can "from spam import sync as > spam" or "from spam import async as spam" (where async just > imports all the real functions, while sync imports them and calls > the decorator on all of them). > Interesting idea. If this is possible, it would solve part of the issue, but the "Y end" (sorry) of the chain may still need to be done by hand. > > That also avoids the need to have all the looking up the event > loop, switching between different code branches, etc. inside every > function at runtime. (Not that it matters for the performance of > sleep(1), but it might matter for the performance of other > functions?and, more importantly, it might make the implementation > of those functions simpler and easier to debug through.) > > This could indeed save some if statements at runtime. Note that the if statements would not be inside every function, but only in the ones that do the actual IO. For instance, some 3rd-party library might use wrappers around socket send and socket recv to choose between sync and async versions, and that might be all the IO it needs to build several layers of async code. Even better, had someone taken the time to provide these if statements inside the standard library, the whole 3rd-party async library would just magically work also in synchronous code :). Best regards, Koos > > On Tuesday, May 5, 2015 7:01 AM, Koos Zevenhoven > > wrote: > > > > Hi all! > > I am excited about seeing what's going on with asyncio and > PEP492 etc. I > really like that Python is becoming more suitable for the > increasing > amount of async code and that the distinction between async > functions > and generators is increasing. > > In addition, however, I would also like to see the async > functions and > methods come even closer to regular functions and methods. > This is > something that is keeping me from using asyncio at the moment > even if I > would like to. Below I'll try to explain what and why, and a > little bit > of how. If it is not clear, please ask :) > > Motivation: > > One of the best things about asyncio and coroutines/async > functions is > that you can write asynchronous code as if it were > synchronous, the > difference in many places being just the use of "await" > ("yield from") > when calling something that may end up doing IO (somewhere > down the > function call chain) and that the code is run from an event loop. > > When writing a package that does IO, you have the option to > make it > either synchronous or asynchronous. Regardless of the choice, > the code > will look roughly the same. But what if you want to be able to > do both? > Should you maintain two versions, one with "async" and "await" > everywhere and one without? > > Besides the keywords "async" and "await", async code of course > differs > from synchronous code by the functions/coroutines that are > used for IO > at the end of the function call chain. Here, I mean the end > (close to) > where the "yield" expressions are hidden in the async > versions. At the > other end of the calling chain, async code needs the event > loop and > associated framework (almost always asyncio?) which hides all > the async > scheduling fanciness etc. I'm not sure about the terminology, > but I will > use "L end" and "Y end" to refer to the two ends here. (L for > event > Loop; Y for Yield) > > The Y and L ends need to be compatible with each other for the > code to > work. While asyncio and the standard library might provide > both ends in > many cases, there can also be situations where a package would > want to > work with different combinations of L and Y end, or completely > without > an event loop, i.e. synchronously. > > In a very simple example, one might want to wrap different > implementations of sleep() in a function that would pick the > right one > depending on the context. Perhaps something like this: > > async def any_sleep(seconds): > if __async__.framework is None: > time.sleep(1) > elif __async__.framework is asyncio: > await asyncio.sleep(1) > else: > raise RuntimeError("Was called with an unsupported > async > framework.") > > [You could of course replace sleep() with socket IO or > whatever, but > sleep is nice and simple. Also, a larger library would > probably have a > whole chain of async functions and methods before calling > something like > this] > > But if await is only allowed inside "async def", then how can > any_sleep() be conveniently run in non-async code? Also, there is > nothing like __async__.framework. Below, I describe what I > think a > potential solution might look like. > > > > Potential solution: > > This is simplified version; for instance, as "awaitables", I > consider > only async function objects here. I describe the idea in three > parts: > > (1) next(...): > > Add a keyword argument "async_framework" (or whatever) to > next(...) with > a default value of None. When an async framework, typically > asyncio, > starts an async function object (coroutine) with a call to > next(...), it > would do something like next(coro, async_framework = asyncio). > Here, > asyncio could of course be replaced with any object that > identifies the > framework. This information would then be somehow attached to > the async > function object. > > > (2) __async__.framework or something similar: > > Add something like __async__ that has an attribute such as > .framework > that allows the code inside the async function to access the > information > passed to next(...) by the framework (L end) using the keyword > argument > of next [see (1)]. > > (3) Generalized "await": > > [When the world is ready:] Allow using "await" anywhere, not > just within > async functions. Inside async functions, the behavior of > "await" would > be the same as in PEP492, with the addition that it would somehow > propagate the __async__.framework value to the awaited coroutine. > Outside async functions, "await" would do roughly the same as > this function: > > def await(async_func_obj): > try: > next(async_func_obj) # same as next(async_func_obj, > async_framework = None) > except StopIteration as si: > return si.value > raise RuntimeError("The function does not support > synchronous > execution") > > (This function would, of course, work in Python 3.4, but it > would be > mostly useless because the async functions would not know that > they are > being called in a 'synchronous program'. IIUC, this *function* > would be > valid even with PEP492, but having this as a function would be > ugly in > the long run.) > > > Some random thoughts: > > With this addition to Python, one could write libraries that > work both > async and non-async. When await is not inside async def, one > would > expect it to potentially do blocking IO, just like an await > inside async > def would suggest that there is a yield/suspend somewhere in > there. > > For testing, I tried to see if there is a reasonable way to > make a hack > with __async__.framework that could be set by next(), but did > not find > an obvious way. For instance, coro.gi_frame.f_locals is > read-only, I > believe. > > An alternative to this approach could be that await would > implicitly > start a temporary event loop for running the coroutine, but > how would it > know which event loop? This might also have a huge performance > overhead. > > Relation to PEP492: > > This of course still needs more thinking, but I wanted to post > it here > now in case there is desire to prepare for something like this > already > in PEP492. It is not completely clear if/how this would need > to affect > PEP492, but some things come to mind. For example, this could > potentially remove the need for __aenter__, __aiter__, etc. or > even > "async for" and "async with". If __aenter__ is defined as > "async def", > then a with statement would do an "await" on it, and the > context manager > would have __async__.framework (or whatever it would be called) > available, for determining what behavior is appropriate. > > Was this clear enough to understand which problem(s) this > would be > solving and how? I'd be happy to hear about any thoughts on > this :). > > > Best regards, > Koos > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > -- > --Guido van Rossum (python.org/~guido ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From koos.zevenhoven at aalto.fi Wed May 6 01:19:04 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Wed, 6 May 2015 02:19:04 +0300 Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?) In-Reply-To: <5477_1430834452_5548CD13_5477_4819_1_5548CBEC.3000303@aalto.fi> References: <5477_1430834452_5548CD13_5477_4819_1_5548CBEC.3000303@aalto.fi> Message-ID: <55494FE8.8050703@aalto.fi> Hi all, I noticed a typo in my first email (had written__aenter__ instead of __enter__). I fixed the typo below. -- Koos On 5.5.2015 16:55, Koos Zevenhoven wrote: > > Relation to PEP492: > > This of course still needs more thinking, but I wanted to post it here > now in case there is desire to prepare for something like this already > in PEP492. It is not completely clear if/how this would need to affect > PEP492, but some things come to mind. For example, this could > potentially remove the need for __aenter__, __aiter__, etc. or even > "async for" and "async with". If __enter__ is defined as "async def", > then a with statement would do an "await" on it, and the context > manager would have __async__.framework (or whatever it would be > called) available, for determining what behavior is appropriate. > From abarnert at yahoo.com Wed May 6 06:00:29 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 5 May 2015 21:00:29 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4D8FF17C-1D0B-42C8-A55F-0479A652321F@yahoo.com> On May 5, 2015, at 12:21, Nick Coghlan wrote: > >> On 5 May 2015 at 18:23, Stephen J. Turnbull wrote: >> So this proposal merely amounts to reintroduction of the Python 2 str >> confusion into Python 3. It is dangerous *precisely because* the >> current situation is so frustrating. These functions will not be used >> by "consenting adults", in most cases. Those with sufficient >> knowledge for "informed consent" also know enough to decode encoded >> text ASAP, and encode internal text ALAP, with appropriate handlers, >> in the first place. >> >> Rather, these str2str functions will be used by programmers at the >> ends of their ropes desperate to suppress "those damned Unicode >> errors" by any means available. In fact, they are most likely to be >> used and recommended by *library* writers, because they're the ones >> who are least like to have control over input, or to know their >> clients' requirements for output. "Just use rehandle_* to ameliorate >> the errors" is going to be far too tempting for them to resist. > > The primary intended audience is Linux distribution developers using > Python 3 as the system Python. I agree misuse in other contexts is a > risk, but consider assisting the migration of the Linux ecosystem from > Python 2 to Python 3 sufficiently important that it's worth our while > taking that risk. In this case, the "unfortunate" fact that all these functions have to be "buried" in codecs instead of more discoverable sounds like a _good_ thing, not a problem. The Fedora and Ubuntu people will know where to find them, other linux distros will follow their lead, and the kind of end-user developers that Stephen is worried about who just like to throw in random encode and decode calls until their one test case on their one machine works will never even notice them and will still be encouraged to actually do the right thing. >> That Nick, of all people, supports this proposal is to me just >> confirmation that it's frustration, and only frustration, speaking >> here. He used to be one of the strongest supporters of keeping >> "native text" (Unicode) and "encoded text" separate by keeping the >> latter in bytes. > > It's not frustration (at least, I don't think it is), it's a proposal > for advanced tooling to deal properly with legacy *nix systems that > either: > > a. use a locale encoding other than UTF-8; or > b. don't reliably set the locale encoding for system services and cron > jobs (which anecdotally appears to amount to "aren't using systemd" in > the current crop of *nix init systems) It seems like launchd systems are as good as systemd systems here. Or are you not considering OS X a *nix? I suppose given than the timeline for Apple to switch to Python 3 as the default Python is "maybe it'll happen, but we'll never tell you until a month before the public beta", it isn't really all that relevant... > If a developer only cares about Windows, Mac OS X, or modern systemd > based *nix systems that use UTF-8 as the system locale, and they never > set "LANG=C" before running a Python program, then these new functions > will be completely irrelevant to them. (I've also submitted a request > to the glibc team to make C.UTF-8 universally available, reducing the > need to use "LANG=C", and they're amenable to the idea, but it > requires someone to work on preparing and submitting a patch: > https://sourceware.org/bugzilla/show_bug.cgi?id=17318) > > If, however, a developer wants to handle "LANG=C", or other non-UTF-8 > locales reliably across the full spectrum of *nix systems in Python 3, > they need a way to cope with system data that they *know* has been > decoded incorrectly by the interpreter, as we'll potentially do > exactly that for environment variables, command line arguments, > stdin/stdout/stderr and more if we get bad locale encoding settings > from the OS (such as when "LANG=C" is specified, or the init system > simply doesn't set a locale at all and hence CPython falls back to the > POSIX default of ASCII). > > Python 2 lets users sweep a lot of that under the rug, as the data at > least round trips within the system, but you get unexpected mojibake > in some cases (especially when taking local data and pushing it out > over the network). > > Since these boundary decoding issues don't arise on properly > configured modern *nix systems, we've been able to take advantage of > that by moving Python 3 towards a more pragmatic and distro-friendly > approach in coping with legacy *nix platforms and behaviours, > primarily by starting to use "surrogateescape" by default on a few > more system interfaces (e.g. on the standard streams when the OS > *claims* that the locale encoding is ASCII, which we now assume to > indicate a configuration error, which we can at least work around for > roundtripping purposes so that "os.listdir()" works reliably at the > interactive prompt). > > This change in approach (heavily influenced by the parallel "Python 3 > as the default system Python" efforts in Ubuntu and Fedora) *has* > moved us back towards an increased risk of introducing mojibake in > legacy environments, but the nature of that trade-off has changed > markedly from the situation back in 2009 (let alone 2006): > > * most popular modern Linux systems use systemd with the UTF-8 locale, > which "just works" from a boundary encoding/decoding perspective (it's > closely akin to the situation we've had on Mac OS X from the dawn of > Python 3) > * even without systemd, most modern *nix systems at least default to > the UTF-8 locale, which works reliably for user processes in the > absence of an explicit setting like "LANG=C", even if service daemons > and cron jobs can be a bit sketchier in terms of the locale settings > they receive > * for legacy environments migrating from Python 2 without upgrading > the underlying OS, our emphasis has shifted to tolerating "bug > compatibility" at the Python level in order to ease migration, as the > most appropriate long term solution for those environments is now to > upgrade their OS such that it more reliably provides correct locale > encoding settings to the Python 3 interpreter (which wasn't a > generally available option back when Python 3 first launched) > > Armin Ronacher (as ever) provides a good explanation of the system > interface problems that can arise in Python 3 with bad locale encoding > settings here: http://click.pocoo.org/4/python3/#python3-surrogates > > In my view, the critical helper function for this purpose is actually > "handle_surrogateescape", as that's the one that lets us readily adapt > from the incorrectly specified ASCII locale encoding to any other > ASCII-compatible system encoding once we've bootstrapped into a full > Python environment which has more options for figuring out a suitable > encoding than just looking at the locale setting provided by the C > runtime. It's also the function that serves to provide the primary > "hook" where we can hang documentation of this platform specific > boundary encoding/decoding issue. > > The other suggested functions are then more about providing a "peek > behind the curtain" API for folks that want to *use Python* to explore > some of the ins and outs of Unicode surrogate handling. Surrogates and > astrals really aren't that complicated, but we've historically hidden > them away as "dark magic not to be understood by mere mortals". I thought most linux 2.x system pythons were wide builds, and there definitely aren't any UTF-16 system interfaces like there are on Windows (which misleadingly calls them "Unicode", which we abet by not making people .encode('utf-16') in some of the places where they'd have to .encode('utf-8') on Mac and Linux...). So I'm surprised there's a problem here at all. The only issues a Linux user is likely to ever see should be with surrogate escapes, not real surrogates, right? > In > reality, they're just different ways of composing sequences of > integers to represent text, and the suggested APIs are designed to > expose that in a way we haven't done in the past. I can't actually > think of a practical purpose for them other than teaching people the > basics of how Unicode representations work, but demystifying that > seems sufficiently worthwhile to me that I'm not opposed to their > inclusion (bear in mind I'm also the current "dis" module maintainer, > and a contributor to the "inspect", so I'm a big fan of exposing > underlying concepts like this in a way that lets people play with them > programmatically for learning purposes). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From stephen at xemacs.org Wed May 6 08:36:23 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 May 2015 15:36:23 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <72DABA4D-EA98-46CC-824B-BA3AF1785B04@yahoo.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <381A9EDF-A2F5-43FF-9795-FC15AEC78A9A@yahoo.com> <87zj5j47zi.fsf@uwakimon.sk.tsukuba.ac.jp> <72DABA4D-EA98-46CC-824B-BA3AF1785B04@yahoo.com> Message-ID: <87twvq43h4.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > But the PEP 393 machinery doesn't know when it's dealing with > strings that are ultimately destined for a UCS-2 application, > any more than it can know when it's dealing with strings that have > to be pure ASCII or CP1252 or any other character set. Of course[1] it doesn't, and that's why I say the whole issue is just frustration speaking. Whatever we do, it's going to require that the programmers know what they're doing, or they're just throwing their garbage in the neighbor's yard. With respect to doing the check in the str machinery, you can provide an option that tells PEP 393 str to raise an "OutOfRepertoireError" (subclass of UnicodeError) on introduction of astral characters to an instance of str, or provide an API to ask an instance if it's wide enough to accomodate astral characters. Either way, the programmer needs to design and implement the application to use those features, and that's hard. "Toto! I don't think we're in Kansas anymore!" > If you want to print emoji to a CP1252 console or write them to a > Shift-JIS text file, you get an error from an explicit or implicit > `str.encode` that you can debug. Yup, and these proposals for str2str conversions propose to sneak data with unknown meaning into the application as if it were well-formed. This is just like assuming the modular arithmetic that is performed in registers is actually mathematical integer arithmetic. You'll almost never get burned. Isn't that good enough? That's not for me to say, but apparently, "small integer arithmetic" is *not* good enough for Python. Footnotes: [1] In the current implementation. We could provide a fontconfig- like charset facility to describe repertoire restrictions in str, and code to enforce it. But this is a delicate question. Users almost always hate repertoire restrictions when imposed for the programmer's convenience: they want to insert emoji, or write foreign words correctly, or cut-and-paste from email or web pages, or whatever. And of course the restrictions may vary depending on the output media. From stephen at xemacs.org Wed May 6 09:56:36 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 May 2015 16:56:36 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > If a developer only cares about Windows, Mac OS X, or modern systemd > based *nix systems that use UTF-8 as the system locale, and they never > set "LANG=C" before running a Python program, then these new functions > will be completely irrelevant to them. "Irrelevant" is wildly optimistic. They are a gift from heaven for programmers who are avoiding developing Unicode skills. Don't tell me those skills are expensive -- I know, I sweat blood and spilt milk to acquire them. Nevertheless, without acquiring a modicum of those skills, use of these proposed APIs is just what Ezio described: applying any random thing that might work, to shut up those annoying Unicode errors. But these *will* *appear* to work, because they are *designed* to smuggle the unprintable all the way to the output medium by giving it a printable encoding. You'll only find out that it was done incorrectly when the user goes "achtung! mojibake!", and that will be way too late. > If, however, a developer wants to handle "LANG=C", or other non-UTF-8 > locales reliably across the full spectrum of *nix systems in Python 3, > they need a way to cope with system data that they *know* has been > decoded incorrectly by the interpreter, But if so, why is this being discussed as a visible addition to the Python API? AFAICS, .decode('ascii', errors=surrogateescape) plus some variant on for encoding in plausible_encoding_by_likelihood_list: try: s = input.encode('ascii', errors='surrogateescape') s = s.decode(encoding, errors='strict') break except UnicodeError: continue is all you really need inside of the Python init sequence. That is how I read your opinion, too. > The other suggested functions are then more about providing a "peek > behind the curtain" API for folks that want to *use Python* to explore > some of the ins and outs of Unicode surrogate handling. I just don't see a need. .encode and .decode already give you all the tools you need for exploring, and they do so in a way that tells you via the type whether you're looking at abstract text or at the representation. It doesn't get better than this! And if the APIs merely exposed the internal representation that would be one thing. But they don't, and the people who are saying, "I'm not an expert on Unicode but this looks great!" are clearly interested in mutating str instances to be something more palatable to the requisite modules and I/O systems they need to use, but which aren't prepared for astral characters or proper handling of surrogateescapes. > I can't actually think of a practical purpose for them other than > teaching people the basics of how Unicode representations work, I agree, but it seems to me that a lot of people are already scheming to use them for practical purposes. Serhiy mentions tkinter, email, and wsgiref, and David lusts after them for email. From levkivskyi at gmail.com Wed May 6 15:15:38 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 6 May 2015 15:15:38 +0200 Subject: [Python-ideas] (no subject) Message-ID: Dear all, The matrix multiplication operator @ is going to be introduced in Python 3.5 and I am thinking about the following idea: The semantics of matrix multiplication is the composition of the corresponding linear transformations. A linear transformation is a particular example of a more general concept - functions. The latter are frequently composed with ("wrap") each other. For example: plot(real(sqrt(data))) However, it is not very readable in case of many wrapping layers. Therefore, it could be useful to employ the matrix multiplication operator @ for indication of function composition. This could be done by such (simplified) decorator: class composable: def __init__(self, func): self.func = func def __call__(self, arg): return self.func(arg) def __matmul__(self, other): def composition(*args, **kwargs): return self.func(other(*args, **kwargs)) return composable(composition) I think using such decorator with functions that are going to be deeply wrapped could improve readability. You could compare (note that only the outermost function should be decorated): plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) (data_array) I think the latter is more readable, also compare def sunique(lst): return sorted(list(set(lst))) vs. sunique = sorted @ list @ set Apart from readability, there are following pros of the proposed decorator: 1. Similar semantics as for matrix multiplication. 2. Same symbol for composition as for decorators. 3. The symbol @ resembles mathematical notation for function composition: ? I think it could be a good idea to add such a decorator to the stdlib functools module. -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Wed May 6 15:20:15 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 6 May 2015 15:20:15 +0200 Subject: [Python-ideas] Add 'composable' decorator to functools (with @ matrix multiplication) Message-ID: Dear all, The matrix multiplication operator @ is going to be introduced in Python 3.5 and I am thinking about the following idea: The semantics of matrix multiplication is the composition of the corresponding linear transformations. A linear transformation is a particular example of a more general concept - functions. The latter are frequently composed with ("wrap") each other. For example: plot(real(sqrt(data))) However, it is not very readable in case of many wrapping layers. Therefore, it could be useful to employ the matrix multiplication operator @ for indication of function composition. This could be done by such (simplified) decorator: class composable: def __init__(self, func): self.func = func def __call__(self, arg): return self.func(arg) def __matmul__(self, other): def composition(*args, **kwargs): return self.func(other(*args, **kwargs)) return composable(composition) I think using such decorator with functions that are going to be deeply wrapped could improve readability. You could compare (note that only the outermost function should be decorated): plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) (data_array) I think the latter is more readable, also compare def sunique(lst): return sorted(list(set(lst))) vs. sunique = sorted @ list @ set Apart from readability, there are following pros of the proposed decorator: 1. Similar semantics as for matrix multiplication. 2. Same symbol for composition as for decorators. 3. The symbol @ resembles mathematical notation for function composition: ? I think it could be a good idea to add such a decorator to the stdlib functools module. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed May 6 15:59:45 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 6 May 2015 06:59:45 -0700 Subject: [Python-ideas] (no subject) In-Reply-To: References: Message-ID: This was discussed when the proposal to add @ for matrix multiplication came up, so you should first read that thread and make sure you have answers to all of the issues that came up before proposing it again. Off the top of my head: Python functions don't just take 1 parameter, they take any number of parameters, possibly including optional parameters, keyword-only, *args, **kwargs, etc. There are a dozen different compose implementations on PyPI and ActiveState that handle these differently. Which one is "right"? The design you describe can be easily implemented as a third-party library. Why not do so, put it on PyPI, see if you get any traction and any ideas for improvement, and then suggest it for the stdlib? The same thing is already doable today using a different operator--and, again, there are a dozen implementations. Why isn't anyone using them? Thinking in terms of function composition requires a higher level of abstraction than thinking in terms of lambda expressions. That's one of the reasons people perceive Haskell to be a harder language to learn than Lisp or Python. Of course learning Haskell is rewarding--but being easy to learn is one of Python's major strengths. Python doesn't have a static optimizing compiler that can avoid building 4 temporary function objects to evaluate (plot @ sorted @ sqrt @ real) (data_array), so it will make your code significantly less efficient. Is @ for composition and () for application really sufficient to write point free code in general without auto-curried functions, operator sectioning, reverse compose, reverse apply, etc.? Most of the examples people use in describing the feature from Haskell have a (+ 1) or (== x) or take advantage of map-type functions being (a->b) -> ([a] -> [b]) instead of (a->b, [a]) -> [b]. Sent from my iPhone > On May 6, 2015, at 06:15, Ivan Levkivskyi wrote: > > Dear all, > > The matrix multiplication operator @ is going to be introduced in Python 3.5 and I am thinking about the following idea: > > The semantics of matrix multiplication is the composition of the corresponding linear transformations. > A linear transformation is a particular example of a more general concept - functions. > The latter are frequently composed with ("wrap") each other. For example: > > plot(real(sqrt(data))) > > However, it is not very readable in case of many wrapping layers. Therefore, it could be useful to employ > the matrix multiplication operator @ for indication of function composition. This could be done by such (simplified) decorator: > > class composable: > > def __init__(self, func): > self.func = func > > def __call__(self, arg): > return self.func(arg) > > def __matmul__(self, other): > def composition(*args, **kwargs): > return self.func(other(*args, **kwargs)) > return composable(composition) > > I think using such decorator with functions that are going to be deeply wrapped > could improve readability. > You could compare (note that only the outermost function should be decorated): > > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) (data_array) > > I think the latter is more readable, also compare > > def sunique(lst): > return sorted(list(set(lst))) > > vs. > > sunique = sorted @ list @ set > > Apart from readability, there are following pros of the proposed decorator: > > 1. Similar semantics as for matrix multiplication. > 2. Same symbol for composition as for decorators. > 3. The symbol @ resembles mathematical notation for function composition: ? > > I think it could be a good idea to add such a decorator to the stdlib functools module. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From guettliml at thomas-guettler.de Wed May 6 16:05:00 2015 From: guettliml at thomas-guettler.de (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=) Date: Wed, 06 May 2015 16:05:00 +0200 Subject: [Python-ideas] Policy for altering sys.path Message-ID: <554A1F8C.1040005@thomas-guettler.de> I am missing a policy how sys.path should be altered. We run a custom sub class of list in sys.path. We set it in sitecustomize.py This instance get replace by a common list in lines like this: sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path The above line is from pip, it similar things happen in a lot of packages. Before trying to solve this with code, I think the python community should agree an a policy for altering sys.path. What can I do to this done? We use Python 2.7. Related: http://bugs.python.org/issue24135 Regards, Thomas G?ttler From erik.m.bray at gmail.com Wed May 6 16:10:22 2015 From: erik.m.bray at gmail.com (Erik Bray) Date: Wed, 6 May 2015 10:10:22 -0400 Subject: [Python-ideas] Add 'composable' decorator to functools (with @ matrix multiplication) In-Reply-To: References: Message-ID: On Wed, May 6, 2015 at 9:20 AM, Ivan Levkivskyi wrote: > Dear all, > > The matrix multiplication operator @ is going to be introduced in Python 3.5 > and I am thinking about the following idea: > > The semantics of matrix multiplication is the composition of the > corresponding linear transformations. > A linear transformation is a particular example of a more general concept - > functions. > The latter are frequently composed with ("wrap") each other. For example: > > plot(real(sqrt(data))) > > However, it is not very readable in case of many wrapping layers. Therefore, > it could be useful to employ > the matrix multiplication operator @ for indication of function composition. > This could be done by such (simplified) decorator: > > class composable: > > def __init__(self, func): > self.func = func > > def __call__(self, arg): > return self.func(arg) > > def __matmul__(self, other): > def composition(*args, **kwargs): > return self.func(other(*args, **kwargs)) > return composable(composition) > > I think using such decorator with functions that are going to be deeply > wrapped > could improve readability. > You could compare (note that only the outermost function should be > decorated): > > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) > (data_array) > > I think the latter is more readable, also compare > > def sunique(lst): > return sorted(list(set(lst))) > > vs. > > sunique = sorted @ list @ set > > Apart from readability, there are following pros of the proposed decorator: > > 1. Similar semantics as for matrix multiplication. > 2. Same symbol for composition as for decorators. > 3. The symbol @ resembles mathematical notation for function composition: ? > > I think it could be a good idea to add such a decorator to the stdlib > functools module. In the astropy.modeling package, which consists largely of collection of fancy wrappers around analytic functions, we used the pipe operator | (that is, __or__) to implement function composition, as demonstrated here: http://docs.astropy.org/en/stable/modeling/compound-models.html#model-composition I do like the idea of using the new @ operator for this purpose--it makes sense as a generalization of linear operators, and it just looks a little more like the circle operator often used for functional composition. On the other hand I'm also fond of the choice to use |, for the similarity to UNIX shell pipe operations, as long as it can't be confused with __or__. Point being something like this could be implemented now with __or__. I think this is simple enough that it doesn't need to be in the stdlib, especially if there are different ways people would like to do this. But I do like the idea. Erik From steve at pearwood.info Wed May 6 16:51:35 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 7 May 2015 00:51:35 +1000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: Message-ID: <20150506145131.GL5663@ando.pearwood.info> On Wed, May 06, 2015 at 03:15:38PM +0200, Ivan Levkivskyi wrote: > Dear all, > > The matrix multiplication operator @ is going to be introduced in Python > 3.5 and I am thinking about the following idea: > > The semantics of matrix multiplication is the composition of the > corresponding linear transformations. > A linear transformation is a particular example of a more general concept - > functions. > The latter are frequently composed with ("wrap") each other. For example: > > plot(real(sqrt(data))) > > However, it is not very readable in case of many wrapping layers. > Therefore, it could be useful to employ > the matrix multiplication operator @ for indication of function > composition. This could be done by such (simplified) decorator: I like the idea of @ as a function compose operator. There have been many requests and attempts at support for function composition: http://code.activestate.com/recipes/574458-composable-functions/ http://code.activestate.com/recipes/52902-function-composition/ http://code.activestate.com/recipes/528929-dynamic-function-composition-decorator/ http://blog.o1iver.net/2011/08/09/python-function-composition.html https://mail.python.org/pipermail/python-dev/2009-August/091161.html http://stackoverflow.com/questions/2281693/is-it-a-good-idea-to-have-a-syntax-sugar-to-function-composition-in-python The last one is notable, as it floundered in part on the lack of a good operator. I think @ makes a good operator for function composition. I think that there are some questions that would need to be answered. For instance, given some composition: f = math.sin @ (lambda x: x**2) what would f.__name__ return? What about str(f)? Do the composed functions: (spam @ eggs @ cheese)(x) perform acceptibly compared to the traditional syntax? spam(eggs(cheese(x)) -- Steve From levkivskyi at gmail.com Wed May 6 17:05:05 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 6 May 2015 17:05:05 +0200 Subject: [Python-ideas] (no subject) In-Reply-To: References: Message-ID: Dear Andrew, Thank you for pointing out the previous discussion, I have overlooked it. (Btw, I have found your post about the infix operators, that is a great idea). Also, It turns out that astropy uses a very similar idea for function composition. I agree that there are indeed to much ambiguities about the "right way", and thus it is not good for stdlib. However, implementing only one decorator as a third-party library is not good idea as well. You are right that no one will install such library. Probably, it would be better to combine it with other functionality like @infix (via overloading __or__ or __rshift__), @auto_curry, etc. Thank you for the feedback! On 6 May 2015 at 15:59, Andrew Barnert wrote: > This was discussed when the proposal to add @ for matrix multiplication > came up, so you should first read that thread and make sure you have > answers to all of the issues that came up before proposing it again. > > Off the top of my head: > > Python functions don't just take 1 parameter, they take any number of > parameters, possibly including optional parameters, keyword-only, *args, > **kwargs, etc. There are a dozen different compose implementations on PyPI > and ActiveState that handle these differently. Which one is "right"? > > The design you describe can be easily implemented as a third-party > library. Why not do so, put it on PyPI, see if you get any traction and any > ideas for improvement, and then suggest it for the stdlib? > > The same thing is already doable today using a different operator--and, > again, there are a dozen implementations. Why isn't anyone using them? > > Thinking in terms of function composition requires a higher level of > abstraction than thinking in terms of lambda expressions. That's one of the > reasons people perceive Haskell to be a harder language to learn than Lisp > or Python. Of course learning Haskell is rewarding--but being easy to learn > is one of Python's major strengths. > > Python doesn't have a static optimizing compiler that can avoid building 4 > temporary function objects to evaluate (plot @ sorted @ sqrt @ real) > (data_array), so it will make your code significantly less efficient. > > Is @ for composition and () for application really sufficient to write > point free code in general without auto-curried functions, operator > sectioning, reverse compose, reverse apply, etc.? Most of the examples > people use in describing the feature from Haskell have a (+ 1) or (== x) or > take advantage of map-type functions being (a->b) -> ([a] -> [b]) instead > of (a->b, [a]) -> [b]. > > Sent from my iPhone > > > On May 6, 2015, at 06:15, Ivan Levkivskyi wrote: > > > > Dear all, > > > > The matrix multiplication operator @ is going to be introduced in Python > 3.5 and I am thinking about the following idea: > > > > The semantics of matrix multiplication is the composition of the > corresponding linear transformations. > > A linear transformation is a particular example of a more general > concept - functions. > > The latter are frequently composed with ("wrap") each other. For example: > > > > plot(real(sqrt(data))) > > > > However, it is not very readable in case of many wrapping layers. > Therefore, it could be useful to employ > > the matrix multiplication operator @ for indication of function > composition. This could be done by such (simplified) decorator: > > > > class composable: > > > > def __init__(self, func): > > self.func = func > > > > def __call__(self, arg): > > return self.func(arg) > > > > def __matmul__(self, other): > > def composition(*args, **kwargs): > > return self.func(other(*args, **kwargs)) > > return composable(composition) > > > > I think using such decorator with functions that are going to be deeply > wrapped > > could improve readability. > > You could compare (note that only the outermost function should be > decorated): > > > > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) > (data_array) > > > > I think the latter is more readable, also compare > > > > def sunique(lst): > > return sorted(list(set(lst))) > > > > vs. > > > > sunique = sorted @ list @ set > > > > Apart from readability, there are following pros of the proposed > decorator: > > > > 1. Similar semantics as for matrix multiplication. > > 2. Same symbol for composition as for decorators. > > 3. The symbol @ resembles mathematical notation for function > composition: ? > > > > I think it could be a good idea to add such a decorator to the stdlib > functools module. > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed May 6 17:07:52 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 6 May 2015 16:07:52 +0100 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: <554A1F8C.1040005@thomas-guettler.de> References: <554A1F8C.1040005@thomas-guettler.de> Message-ID: On 6 May 2015 at 15:05, Thomas G?ttler wrote: > I am missing a policy how sys.path should be altered. Well, the docs say that applications can modify sys.path as needed. Generally, applications modify sys.path in place via sys.path[:] = whatever, but that's not mandated as far as I know. > We run a custom sub class of list in sys.path. We set it in sitecustomize.py Can you explain why? It seems pretty risky to expect that no applications will replace sys.path. I understand that you're proposing that we say that applications shouldn't do that - but just saying so won't change the many applications already out there. > This instance get replace by a common list in lines like this: > > sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path > > The above line is from pip, it similar things happen in a lot of packages. How does the fact that pip does that cause a problem? The sys.path modification is only in effect while pip is running, and no code in pip relies on sys.path being an instance of your custom class. > Before trying to solve this with code, I think the python community should > agree an a policy for altering sys.path. I can't imagine that happening, and even if it does, it won't make any difference because a new policy won't change existing code. It won't even affect new code unless people know about it (which isn't certain - I doubt many people read the documentation that closely). > What can I do to this done? I doubt you can. A PR for pip that changes the above line to modify sys.path in place would probably get accepted (I can't see any reason why it wouldn't), and I guess you could do the same for any other code you find. But as for persuading the Python programming community not to replace sys.path in any code, that seems unlikely to happen. > We use Python 2.7 If you were using 3.x, then it's (barely) conceivable that making sys.path read-only (so people could only modify it in-place) could be done as a new feature, but (a) it would be a major backward compatibility break, so there would have to be a strong justification, and (b) it would stop you from replacing sys.path with your custom class in the first place, so it wouldn't solve your issue. Which also raises the question, why do you believe it's OK to forbid other people to replace sys.path when that's what you're doing in your sitecustomize code? That seems self-contradictory... Paul From rosuav at gmail.com Wed May 6 17:11:16 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 7 May 2015 01:11:16 +1000 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: <554A1F8C.1040005@thomas-guettler.de> References: <554A1F8C.1040005@thomas-guettler.de> Message-ID: On Thu, May 7, 2015 at 12:05 AM, Thomas G?ttler wrote: > We run a custom sub class of list in sys.path. We set it in sitecustomize.py > > This instance get replace by a common list in lines like this: > > sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path Forgive the obtuse question, but wouldn't an __radd__ method resolve this for you? ChrisA From levkivskyi at gmail.com Wed May 6 17:21:28 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 6 May 2015 17:21:28 +0200 Subject: [Python-ideas] Add 'composable' decorator to functools (with @ matrix multiplication) In-Reply-To: References: Message-ID: Dear Erik, Thank you for the link! I agree that this idea is too raw for stdlib (there are problems with many argument functions, keyword arguments, etc.) Concerning the shell | vs. matrix @ I think it is a good idea to have both... but with different order. I mean in shell logic f | g means g (f (x)), while for matrix multiplication f @ g means f(g(x)). The former is probably more natural for people with more "programming" background, while the latter is more natural for people with a "scientific" background. We could now do good for both, since we now have a new operator. On 6 May 2015 at 16:10, Erik Bray wrote: > On Wed, May 6, 2015 at 9:20 AM, Ivan Levkivskyi > wrote: > > Dear all, > > > > The matrix multiplication operator @ is going to be introduced in Python > 3.5 > > and I am thinking about the following idea: > > > > The semantics of matrix multiplication is the composition of the > > corresponding linear transformations. > > A linear transformation is a particular example of a more general > concept - > > functions. > > The latter are frequently composed with ("wrap") each other. For example: > > > > plot(real(sqrt(data))) > > > > However, it is not very readable in case of many wrapping layers. > Therefore, > > it could be useful to employ > > the matrix multiplication operator @ for indication of function > composition. > > This could be done by such (simplified) decorator: > > > > class composable: > > > > def __init__(self, func): > > self.func = func > > > > def __call__(self, arg): > > return self.func(arg) > > > > def __matmul__(self, other): > > def composition(*args, **kwargs): > > return self.func(other(*args, **kwargs)) > > return composable(composition) > > > > I think using such decorator with functions that are going to be deeply > > wrapped > > could improve readability. > > You could compare (note that only the outermost function should be > > decorated): > > > > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) > > (data_array) > > > > I think the latter is more readable, also compare > > > > def sunique(lst): > > return sorted(list(set(lst))) > > > > vs. > > > > sunique = sorted @ list @ set > > > > Apart from readability, there are following pros of the proposed > decorator: > > > > 1. Similar semantics as for matrix multiplication. > > 2. Same symbol for composition as for decorators. > > 3. The symbol @ resembles mathematical notation for function > composition: ? > > > > I think it could be a good idea to add such a decorator to the stdlib > > functools module. > > In the astropy.modeling package, which consists largely of collection > of fancy wrappers around analytic functions, > we used the pipe operator | (that is, __or__) to implement function > composition, as demonstrated here: > > > http://docs.astropy.org/en/stable/modeling/compound-models.html#model-composition > > I do like the idea of using the new @ operator for this purpose--it > makes sense as a generalization of linear operators, > and it just looks a little more like the circle operator often used > for functional composition. On the other hand > I'm also fond of the choice to use |, for the similarity to UNIX shell > pipe operations, as long as it can't be confused with > __or__. Point being something like this could be implemented now with > __or__. > > I think this is simple enough that it doesn't need to be in the > stdlib, especially if there are different ways people > would like to do this. But I do like the idea. > > Erik > -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Wed May 6 17:30:56 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 6 May 2015 17:30:56 +0200 Subject: [Python-ideas] Function composition Message-ID: Dear Steve, Thank you for the feedback and for the links! I think that both (f at g).__name__ and str(f at g) should be f.__name__ + ' @ ' + g.__name__ and str(f) + ' @ ' +str(g) Concerning the performance, I think that it could be poor, and I don't know yet how to improve this. > > Dear all, > > > > The matrix multiplication operator @ is going to be introduced in Python > > 3.5 and I am thinking about the following idea: > > > > The semantics of matrix multiplication is the composition of the > > corresponding linear transformations. > > A linear transformation is a particular example of a more general concept - > > functions. > > The latter are frequently composed with ("wrap") each other. For example: > > > > plot(real(sqrt(data))) > > > > However, it is not very readable in case of many wrapping layers. > > Therefore, it could be useful to employ > > the matrix multiplication operator @ for indication of function > > composition. This could be done by such (simplified) decorator: > > I like the idea of @ as a function compose operator. > > There have been many requests and attempts at support for function > composition: > > http://code.activestate.com/recipes/574458-composable-functions/ > > http://code.activestate.com/recipes/52902-function-composition/ > > http://code.activestate.com/recipes/528929-dynamic-function-composition-decorator/ > > http://blog.o1iver.net/2011/08/09/python-function-composition.html > > https://mail.python.org/pipermail/python-dev/2009-August/091161.html > > http://stackoverflow.com/questions/2281693/is-it-a-good-idea-to-have-a-syntax-sugar-to-function-composition-in-python > > > The last one is notable, as it floundered in part on the lack of a good > operator. I think @ makes a good operator for function composition. > > > I think that there are some questions that would need to be answered. > For instance, given some composition: > > f = math.sin @ (lambda x: x**2) > > what would f.__name__ return? What about str(f)? > > > Do the composed functions: > > (spam @ eggs @ cheese)(x) > > perform acceptibly compared to the traditional syntax? > > spam(eggs(cheese(x)) > > > > -- > Steve -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.m.bray at gmail.com Wed May 6 17:42:17 2015 From: erik.m.bray at gmail.com (Erik Bray) Date: Wed, 6 May 2015 11:42:17 -0400 Subject: [Python-ideas] Add 'composable' decorator to functools (with @ matrix multiplication) In-Reply-To: References: Message-ID: On Wed, May 6, 2015 at 11:21 AM, Ivan Levkivskyi wrote: > Dear Erik, > > Thank you for the link! I agree that this idea is too raw for stdlib (there > are problems with many argument functions, keyword arguments, etc.) > Concerning the shell | vs. matrix @ I think it is a good idea to have > both... but with different order. > I mean in shell logic f | g means g (f (x)), while for matrix multiplication > f @ g means f(g(x)). > The former is probably more natural for people with more "programming" > background, while the latter is more natural for people with a "scientific" > background. > We could now do good for both, since we now have a new operator. Absolutely! I've found that it takes a little work sometimes for scientific users to wrap their heads around the g | f syntax. Once Python 3.5 is out I might add support for "f @ g" as well, though I'm wary of having more than one way to do it. Worth trying out though, so thanks for the idea. Erik > On 6 May 2015 at 16:10, Erik Bray wrote: >> >> On Wed, May 6, 2015 at 9:20 AM, Ivan Levkivskyi >> wrote: >> > Dear all, >> > >> > The matrix multiplication operator @ is going to be introduced in Python >> > 3.5 >> > and I am thinking about the following idea: >> > >> > The semantics of matrix multiplication is the composition of the >> > corresponding linear transformations. >> > A linear transformation is a particular example of a more general >> > concept - >> > functions. >> > The latter are frequently composed with ("wrap") each other. For >> > example: >> > >> > plot(real(sqrt(data))) >> > >> > However, it is not very readable in case of many wrapping layers. >> > Therefore, >> > it could be useful to employ >> > the matrix multiplication operator @ for indication of function >> > composition. >> > This could be done by such (simplified) decorator: >> > >> > class composable: >> > >> > def __init__(self, func): >> > self.func = func >> > >> > def __call__(self, arg): >> > return self.func(arg) >> > >> > def __matmul__(self, other): >> > def composition(*args, **kwargs): >> > return self.func(other(*args, **kwargs)) >> > return composable(composition) >> > >> > I think using such decorator with functions that are going to be deeply >> > wrapped >> > could improve readability. >> > You could compare (note that only the outermost function should be >> > decorated): >> > >> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) >> > (data_array) >> > >> > I think the latter is more readable, also compare >> > >> > def sunique(lst): >> > return sorted(list(set(lst))) >> > >> > vs. >> > >> > sunique = sorted @ list @ set >> > >> > Apart from readability, there are following pros of the proposed >> > decorator: >> > >> > 1. Similar semantics as for matrix multiplication. >> > 2. Same symbol for composition as for decorators. >> > 3. The symbol @ resembles mathematical notation for function >> > composition: ? >> > >> > I think it could be a good idea to add such a decorator to the stdlib >> > functools module. >> >> In the astropy.modeling package, which consists largely of collection >> of fancy wrappers around analytic functions, >> we used the pipe operator | (that is, __or__) to implement function >> composition, as demonstrated here: >> >> >> http://docs.astropy.org/en/stable/modeling/compound-models.html#model-composition >> >> I do like the idea of using the new @ operator for this purpose--it >> makes sense as a generalization of linear operators, >> and it just looks a little more like the circle operator often used >> for functional composition. On the other hand >> I'm also fond of the choice to use |, for the similarity to UNIX shell >> pipe operations, as long as it can't be confused with >> __or__. Point being something like this could be implemented now with >> __or__. >> >> I think this is simple enough that it doesn't need to be in the >> stdlib, especially if there are different ways people >> would like to do this. But I do like the idea. >> >> Erik > > From steve at pearwood.info Wed May 6 17:48:11 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 7 May 2015 01:48:11 +1000 Subject: [Python-ideas] (no subject) In-Reply-To: References: Message-ID: <20150506154811.GM5663@ando.pearwood.info> On Wed, May 06, 2015 at 06:59:45AM -0700, Andrew Barnert via Python-ideas wrote: > Python functions don't just take 1 parameter, they take any number of > parameters, possibly including optional parameters, keyword-only, > *args, **kwargs, etc. Maybe Haskell programmers are used to functions which all take one argument, and f(a, b, c) is syntactic sugar for f(a)(b)(c), but I doubt anyone else is. When we Python programmers manually compose a function today, by writing an expression or a new function, we have to deal with the exact same problems. There's nothing new about the programmer needing to ensure that the function signatures are compatible: def spam(a, b, c): return a+b+c def eggs(x, y, z): return x*y/z def composed(*args): return eggs(spam(*args)) # doesn't work It is the programmer's responsibility to compose compatible functions. Why should it be a fatal flaw that the same limitation applies to a composition operator? Besides, with Argument Clinic, it's possible that the @ operator could catch incompatible signatures ahead of time. > There are a dozen different compose > implementations on PyPI and ActiveState that handle these differently. That is good evidence that this is functionality that people want. > Which one is "right"? Perhaps all of them? Perhaps none of them? There are lots of buggy or badly designed functions and classes on the internet. Perhaps that suggests that the std lib should solve it right once and for all. > The design you describe can be easily implemented as a third-party > library. Why not do so, put it on PyPI, see if you get any traction > and any ideas for improvement, and then suggest it for the stdlib? I agree that this idea needs to have some real use before it can be added to the std lib, but see below for a counter-objection to the PyPI objection. > The same thing is already doable today using a different > operator--and, again, there are a dozen implementations. Why isn't > anyone using them? It takes a certain amount of effort for people to discover and use a third-party library: one has to find a library, or libraries, determine that it is mature, decide which competing library to use, determine if the licence is suitable, download and install it. This "activiation energy" is insignificant if the library does something big, say, like numpy, or nltk, or even medium sized. But for a library that provides effectively a single function, that activation energy is a barrier to entry. It's not that the function isn't useful, or that people wouldn't use it if it were already available. It's just that the effort to get it is too much bother. People will do without, or re-invent the wheel. (Re-inventing the wheel is at least fun. Searching PyPI and reading licences is not.) > Thinking in terms of function composition requires a higher level of > abstraction than thinking in terms of lambda expressions. Do you think its harder than, say, the "async for" feature that's just been approved by Guido? Compared to asynchronous code, I would say function composition is trivial. Anyone who can learn the correspondence (a @ b)(arg) <=> a(b(arg)) can deal with it. > Python doesn't have a static optimizing compiler that can avoid > building 4 temporary function objects to evaluate (plot @ sorted @ > sqrt @ real) (data_array), so it will make your code significantly > less efficient. Why would it necessarily have to create 4 temporary function objects? Besides, the rules for optimization apply here too: don't dismiss something as too slow until you've measured it :-) We shouldn't care about the cost of the @ operator itself, only the cost of calling the composed functions. Building the Composed object generally happens only once, while calling it generally happens many times. > Is @ for composition and () for application really sufficient to write > point free code in general without auto-curried functions, operator > sectioning, reverse compose, reverse apply, etc.? Most of the examples > people use in describing the feature from Haskell have a (+ 1) or (== > x) or take advantage of map-type functions being (a->b) -> ([a] -> > [b]) instead of (a->b, [a]) -> [b]. See, now *that's* why people consider Haskell to be difficult: it is based on areas of mathematics which even maths graduates may never have come across. But function composition is taught in high school. (At least in Australia, and I expect Europe and Japan.) It's a nice, simple and useful functional tool, like partial. -- Steve From guido at python.org Wed May 6 17:48:08 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 6 May 2015 08:48:08 -0700 Subject: [Python-ideas] Add 'composable' decorator to functools (with @ matrix multiplication) In-Reply-To: References: Message-ID: I realize this is still python-ideas, but does this really leave functions with multiple arguments completely out of the picture (except as the first stage in the pipeline)? On Wed, May 6, 2015 at 8:42 AM, Erik Bray wrote: > On Wed, May 6, 2015 at 11:21 AM, Ivan Levkivskyi > wrote: > > Dear Erik, > > > > Thank you for the link! I agree that this idea is too raw for stdlib > (there > > are problems with many argument functions, keyword arguments, etc.) > > Concerning the shell | vs. matrix @ I think it is a good idea to have > > both... but with different order. > > I mean in shell logic f | g means g (f (x)), while for matrix > multiplication > > f @ g means f(g(x)). > > The former is probably more natural for people with more "programming" > > background, while the latter is more natural for people with a > "scientific" > > background. > > We could now do good for both, since we now have a new operator. > > Absolutely! I've found that it takes a little work sometimes for > scientific users to wrap > their heads around the > > g | f > > syntax. Once Python 3.5 is out I might add support for "f @ g" as > well, though I'm wary > of having more than one way to do it. Worth trying out though, so > thanks for the idea. > > Erik > > > On 6 May 2015 at 16:10, Erik Bray wrote: > >> > >> On Wed, May 6, 2015 at 9:20 AM, Ivan Levkivskyi > >> wrote: > >> > Dear all, > >> > > >> > The matrix multiplication operator @ is going to be introduced in > Python > >> > 3.5 > >> > and I am thinking about the following idea: > >> > > >> > The semantics of matrix multiplication is the composition of the > >> > corresponding linear transformations. > >> > A linear transformation is a particular example of a more general > >> > concept - > >> > functions. > >> > The latter are frequently composed with ("wrap") each other. For > >> > example: > >> > > >> > plot(real(sqrt(data))) > >> > > >> > However, it is not very readable in case of many wrapping layers. > >> > Therefore, > >> > it could be useful to employ > >> > the matrix multiplication operator @ for indication of function > >> > composition. > >> > This could be done by such (simplified) decorator: > >> > > >> > class composable: > >> > > >> > def __init__(self, func): > >> > self.func = func > >> > > >> > def __call__(self, arg): > >> > return self.func(arg) > >> > > >> > def __matmul__(self, other): > >> > def composition(*args, **kwargs): > >> > return self.func(other(*args, **kwargs)) > >> > return composable(composition) > >> > > >> > I think using such decorator with functions that are going to be > deeply > >> > wrapped > >> > could improve readability. > >> > You could compare (note that only the outermost function should be > >> > decorated): > >> > > >> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) > >> > (data_array) > >> > > >> > I think the latter is more readable, also compare > >> > > >> > def sunique(lst): > >> > return sorted(list(set(lst))) > >> > > >> > vs. > >> > > >> > sunique = sorted @ list @ set > >> > > >> > Apart from readability, there are following pros of the proposed > >> > decorator: > >> > > >> > 1. Similar semantics as for matrix multiplication. > >> > 2. Same symbol for composition as for decorators. > >> > 3. The symbol @ resembles mathematical notation for function > >> > composition: ? > >> > > >> > I think it could be a good idea to add such a decorator to the stdlib > >> > functools module. > >> > >> In the astropy.modeling package, which consists largely of collection > >> of fancy wrappers around analytic functions, > >> we used the pipe operator | (that is, __or__) to implement function > >> composition, as demonstrated here: > >> > >> > >> > http://docs.astropy.org/en/stable/modeling/compound-models.html#model-composition > >> > >> I do like the idea of using the new @ operator for this purpose--it > >> makes sense as a generalization of linear operators, > >> and it just looks a little more like the circle operator often used > >> for functional composition. On the other hand > >> I'm also fond of the choice to use |, for the similarity to UNIX shell > >> pipe operations, as long as it can't be confused with > >> __or__. Point being something like this could be implemented now with > >> __or__. > >> > >> I think this is simple enough that it doesn't need to be in the > >> stdlib, especially if there are different ways people > >> would like to do this. But I do like the idea. > >> > >> Erik > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed May 6 18:01:26 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 6 May 2015 09:01:26 -0700 Subject: [Python-ideas] (no subject) In-Reply-To: <20150506154811.GM5663@ando.pearwood.info> References: <20150506154811.GM5663@ando.pearwood.info> Message-ID: On Wed, May 6, 2015 at 8:48 AM, Steven D'Aprano wrote: > Compared to asynchronous code, I would say function composition is > trivial. Anyone who can learn the correspondence > > (a @ b)(arg) <=> a(b(arg)) > > can deal with it. Personally, I can certainly "deal" with it, but it'll never come naturally to me. As soon as I see code like this I have to mentally pick it apart and rewrite it in the more familiar form before I understand what's going on. Maybe if I needed this frequently I'd learn to fly with it, but I just don't see the need that often. I see things like f().g() much more often than f(g()). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.m.bray at gmail.com Wed May 6 18:04:10 2015 From: erik.m.bray at gmail.com (Erik Bray) Date: Wed, 6 May 2015 12:04:10 -0400 Subject: [Python-ideas] Add 'composable' decorator to functools (with @ matrix multiplication) In-Reply-To: References: Message-ID: On Wed, May 6, 2015 at 11:48 AM, Guido van Rossum wrote: > I realize this is still python-ideas, but does this really leave functions > with multiple arguments completely out of the picture (except as the first > stage in the pipeline)? I'm not sure exactly what this is in response to, but in the case of astropy.modeling, any function at any point in the chain can take multiple arguments, as long as the function its is composed with returns the right number of outputs (as a tuple). There is also a "Mapping" object that allows remapping arguments so if for example you want to swap the return values of one function before passing them as inputs to the next function. You can also duplicate outputs, drop outputs, inline new ones, etc. It can get quite arbitrarily complex, and there aren't enough facilities in place yet to visualize complicated function compositions as I would like. But this is already being put to good use. Nothing about this is particular to astropy.modeling--the same approach could be used in a generic function composition operator. Erik > On Wed, May 6, 2015 at 8:42 AM, Erik Bray wrote: >> >> On Wed, May 6, 2015 at 11:21 AM, Ivan Levkivskyi >> wrote: >> > Dear Erik, >> > >> > Thank you for the link! I agree that this idea is too raw for stdlib >> > (there >> > are problems with many argument functions, keyword arguments, etc.) >> > Concerning the shell | vs. matrix @ I think it is a good idea to have >> > both... but with different order. >> > I mean in shell logic f | g means g (f (x)), while for matrix >> > multiplication >> > f @ g means f(g(x)). >> > The former is probably more natural for people with more "programming" >> > background, while the latter is more natural for people with a >> > "scientific" >> > background. >> > We could now do good for both, since we now have a new operator. >> >> Absolutely! I've found that it takes a little work sometimes for >> scientific users to wrap >> their heads around the >> >> g | f >> >> syntax. Once Python 3.5 is out I might add support for "f @ g" as >> well, though I'm wary >> of having more than one way to do it. Worth trying out though, so >> thanks for the idea. >> >> Erik >> >> > On 6 May 2015 at 16:10, Erik Bray wrote: >> >> >> >> On Wed, May 6, 2015 at 9:20 AM, Ivan Levkivskyi >> >> wrote: >> >> > Dear all, >> >> > >> >> > The matrix multiplication operator @ is going to be introduced in >> >> > Python >> >> > 3.5 >> >> > and I am thinking about the following idea: >> >> > >> >> > The semantics of matrix multiplication is the composition of the >> >> > corresponding linear transformations. >> >> > A linear transformation is a particular example of a more general >> >> > concept - >> >> > functions. >> >> > The latter are frequently composed with ("wrap") each other. For >> >> > example: >> >> > >> >> > plot(real(sqrt(data))) >> >> > >> >> > However, it is not very readable in case of many wrapping layers. >> >> > Therefore, >> >> > it could be useful to employ >> >> > the matrix multiplication operator @ for indication of function >> >> > composition. >> >> > This could be done by such (simplified) decorator: >> >> > >> >> > class composable: >> >> > >> >> > def __init__(self, func): >> >> > self.func = func >> >> > >> >> > def __call__(self, arg): >> >> > return self.func(arg) >> >> > >> >> > def __matmul__(self, other): >> >> > def composition(*args, **kwargs): >> >> > return self.func(other(*args, **kwargs)) >> >> > return composable(composition) >> >> > >> >> > I think using such decorator with functions that are going to be >> >> > deeply >> >> > wrapped >> >> > could improve readability. >> >> > You could compare (note that only the outermost function should be >> >> > decorated): >> >> > >> >> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ >> >> > real) >> >> > (data_array) >> >> > >> >> > I think the latter is more readable, also compare >> >> > >> >> > def sunique(lst): >> >> > return sorted(list(set(lst))) >> >> > >> >> > vs. >> >> > >> >> > sunique = sorted @ list @ set >> >> > >> >> > Apart from readability, there are following pros of the proposed >> >> > decorator: >> >> > >> >> > 1. Similar semantics as for matrix multiplication. >> >> > 2. Same symbol for composition as for decorators. >> >> > 3. The symbol @ resembles mathematical notation for function >> >> > composition: ? >> >> > >> >> > I think it could be a good idea to add such a decorator to the stdlib >> >> > functools module. >> >> >> >> In the astropy.modeling package, which consists largely of collection >> >> of fancy wrappers around analytic functions, >> >> we used the pipe operator | (that is, __or__) to implement function >> >> composition, as demonstrated here: >> >> >> >> >> >> >> >> http://docs.astropy.org/en/stable/modeling/compound-models.html#model-composition >> >> >> >> I do like the idea of using the new @ operator for this purpose--it >> >> makes sense as a generalization of linear operators, >> >> and it just looks a little more like the circle operator often used >> >> for functional composition. On the other hand >> >> I'm also fond of the choice to use |, for the similarity to UNIX shell >> >> pipe operations, as long as it can't be confused with >> >> __or__. Point being something like this could be implemented now with >> >> __or__. >> >> >> >> I think this is simple enough that it doesn't need to be in the >> >> stdlib, especially if there are different ways people >> >> would like to do this. But I do like the idea. >> >> >> >> Erik >> > >> > >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > -- > --Guido van Rossum (python.org/~guido) From levkivskyi at gmail.com Wed May 6 18:10:22 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 6 May 2015 18:10:22 +0200 Subject: [Python-ideas] Add 'composable' decorator to functools (with @ matrix multiplication) In-Reply-To: References: Message-ID: Dear Guido, My original idea was to make the composable functions auto-curried (similar to proposed here http://code.activestate.com/recipes/52902-function-composition/ as pointed out by Steve) so that my_fun = square @ add(1) my_fun(x) evaluates to square(add(1,x)) On 6 May 2015 at 17:48, Guido van Rossum wrote: > I realize this is still python-ideas, but does this really leave functions > with multiple arguments completely out of the picture (except as the first > stage in the pipeline)? > > On Wed, May 6, 2015 at 8:42 AM, Erik Bray wrote: > >> On Wed, May 6, 2015 at 11:21 AM, Ivan Levkivskyi >> wrote: >> > Dear Erik, >> > >> > Thank you for the link! I agree that this idea is too raw for stdlib >> (there >> > are problems with many argument functions, keyword arguments, etc.) >> > Concerning the shell | vs. matrix @ I think it is a good idea to have >> > both... but with different order. >> > I mean in shell logic f | g means g (f (x)), while for matrix >> multiplication >> > f @ g means f(g(x)). >> > The former is probably more natural for people with more "programming" >> > background, while the latter is more natural for people with a >> "scientific" >> > background. >> > We could now do good for both, since we now have a new operator. >> >> Absolutely! I've found that it takes a little work sometimes for >> scientific users to wrap >> their heads around the >> >> g | f >> >> syntax. Once Python 3.5 is out I might add support for "f @ g" as >> well, though I'm wary >> of having more than one way to do it. Worth trying out though, so >> thanks for the idea. >> >> Erik >> >> > On 6 May 2015 at 16:10, Erik Bray wrote: >> >> >> >> On Wed, May 6, 2015 at 9:20 AM, Ivan Levkivskyi >> >> wrote: >> >> > Dear all, >> >> > >> >> > The matrix multiplication operator @ is going to be introduced in >> Python >> >> > 3.5 >> >> > and I am thinking about the following idea: >> >> > >> >> > The semantics of matrix multiplication is the composition of the >> >> > corresponding linear transformations. >> >> > A linear transformation is a particular example of a more general >> >> > concept - >> >> > functions. >> >> > The latter are frequently composed with ("wrap") each other. For >> >> > example: >> >> > >> >> > plot(real(sqrt(data))) >> >> > >> >> > However, it is not very readable in case of many wrapping layers. >> >> > Therefore, >> >> > it could be useful to employ >> >> > the matrix multiplication operator @ for indication of function >> >> > composition. >> >> > This could be done by such (simplified) decorator: >> >> > >> >> > class composable: >> >> > >> >> > def __init__(self, func): >> >> > self.func = func >> >> > >> >> > def __call__(self, arg): >> >> > return self.func(arg) >> >> > >> >> > def __matmul__(self, other): >> >> > def composition(*args, **kwargs): >> >> > return self.func(other(*args, **kwargs)) >> >> > return composable(composition) >> >> > >> >> > I think using such decorator with functions that are going to be >> deeply >> >> > wrapped >> >> > could improve readability. >> >> > You could compare (note that only the outermost function should be >> >> > decorated): >> >> > >> >> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ >> real) >> >> > (data_array) >> >> > >> >> > I think the latter is more readable, also compare >> >> > >> >> > def sunique(lst): >> >> > return sorted(list(set(lst))) >> >> > >> >> > vs. >> >> > >> >> > sunique = sorted @ list @ set >> >> > >> >> > Apart from readability, there are following pros of the proposed >> >> > decorator: >> >> > >> >> > 1. Similar semantics as for matrix multiplication. >> >> > 2. Same symbol for composition as for decorators. >> >> > 3. The symbol @ resembles mathematical notation for function >> >> > composition: ? >> >> > >> >> > I think it could be a good idea to add such a decorator to the stdlib >> >> > functools module. >> >> >> >> In the astropy.modeling package, which consists largely of collection >> >> of fancy wrappers around analytic functions, >> >> we used the pipe operator | (that is, __or__) to implement function >> >> composition, as demonstrated here: >> >> >> >> >> >> >> http://docs.astropy.org/en/stable/modeling/compound-models.html#model-composition >> >> >> >> I do like the idea of using the new @ operator for this purpose--it >> >> makes sense as a generalization of linear operators, >> >> and it just looks a little more like the circle operator often used >> >> for functional composition. On the other hand >> >> I'm also fond of the choice to use |, for the similarity to UNIX shell >> >> pipe operations, as long as it can't be confused with >> >> __or__. Point being something like this could be implemented now with >> >> __or__. >> >> >> >> I think this is simple enough that it doesn't need to be in the >> >> stdlib, especially if there are different ways people >> >> would like to do this. But I do like the idea. >> >> >> >> Erik >> > >> > >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed May 6 18:15:58 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 7 May 2015 02:15:58 +1000 Subject: [Python-ideas] Add 'composable' decorator to functools (with @ matrix multiplication) In-Reply-To: References: Message-ID: On Thu, May 7, 2015 at 2:10 AM, Ivan Levkivskyi wrote: > Dear Guido, > > My original idea was to make the composable functions auto-curried (similar > to proposed here > http://code.activestate.com/recipes/52902-function-composition/ as pointed > out by Steve) so that > > my_fun = square @ add(1) > my_fun(x) > > evaluates to > > square(add(1,x)) Hmm. This would require that your composable functions autocurry, which may be tricky to do in the general case. It also requires that the right hand function be composable, unlike in your earlier example. ChrisA From steve at pearwood.info Wed May 6 18:17:55 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 7 May 2015 02:17:55 +1000 Subject: [Python-ideas] (no subject) In-Reply-To: References: <20150506154811.GM5663@ando.pearwood.info> Message-ID: <20150506161754.GN5663@ando.pearwood.info> On Wed, May 06, 2015 at 09:01:26AM -0700, Guido van Rossum wrote: > On Wed, May 6, 2015 at 8:48 AM, Steven D'Aprano wrote: > > > Compared to asynchronous code, I would say function composition is > > trivial. Anyone who can learn the correspondence > > > > (a @ b)(arg) <=> a(b(arg)) > > > > can deal with it. > > > Personally, I can certainly "deal" with it, but it'll never come naturally > to me. As soon as I see code like this I have to mentally pick it apart and > rewrite it in the more familiar form before I understand what's going on. Yes, I remember you dislike reduce() as well :-) > Maybe if I needed this frequently I'd learn to fly with it, but I just > don't see the need that often. I see things like f().g() much more often > than f(g()). Perhaps it has something to do with my background in mathematics, I see things like f(g()) all the time. I will admit that, going back to my teens in high school, it took a little while for the "f o g" notation to really sink in. I remember writing study notes to learn it. But I still have to look up the order of list.insert() every single time I use it, and I can never remember whether functools.partial applies its arguments from the left or the right, and let's not even mention list comps with more than one "for" clause. Some things just never become entirely comfortable to some people. I look forward to many weeks or months trying to wrap my head around asyncronous programming too :-) -- Steve From ericsnowcurrently at gmail.com Wed May 6 18:23:09 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 6 May 2015 10:23:09 -0600 Subject: [Python-ideas] discouraging direct use of the C-API Message-ID: A big blocker to making certain sweeping changes to CPython (e.g. ref-counting) is compatibility with the vast body of C extension modules out there that use the C-API. While there are certainly drastic long-term solutions to that problem, there is one thing we can do in the short-term that would at least get the ball rolling. We can put a big red note at the top of every page of the C-API docs that encourages folks to either use CFFI or Cython. Thoughts? -eric From levkivskyi at gmail.com Wed May 6 18:23:25 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 6 May 2015 18:23:25 +0200 Subject: [Python-ideas] (no subject) Message-ID: I should clarify why I would like to have the possibility to easily compose functions. I am a physicist (not a real programmer), and in my code I often compose functions. To do this I need to write something like def new_func(x): return f(g(h(x))) This means I see f(g(h())) quite often and I would prefer to see f @ g @ h instead. > > Compared to asynchronous code, I would say function composition is > > trivial. Anyone who can learn the correspondence > > > > (a @ b)(arg) <=> a(b(arg)) > > > > can deal with it. > > > Personally, I can certainly "deal" with it, but it'll never come naturally > to me. As soon as I see code like this I have to mentally pick it apart and > rewrite it in the more familiar form before I understand what's going on. > > Maybe if I needed this frequently I'd learn to fly with it, but I just > don't see the need that often. I see things like f().g() much more often > than f(g()). > > -- > --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Wed May 6 18:36:29 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 06 May 2015 18:36:29 +0200 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: Eric Snow schrieb am 06.05.2015 um 18:23: > A big blocker to making certain sweeping changes to CPython (e.g. > ref-counting) is compatibility with the vast body of C extension > modules out there that use the C-API. While there are certainly > drastic long-term solutions to that problem, there is one thing we can > do in the short-term that would at least get the ball rolling. We can > put a big red note at the top of every page of the C-API docs that > encourages folks to either use CFFI or Cython. I've been advocating that for years now: leave the low-level stuff to the experts. (There's a reason why Cython code is usually faster than C-API code.) Not sure how big, fat and red the warning needs to be, but a big +1 from me. Stefan From guido at python.org Wed May 6 18:41:10 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 6 May 2015 09:41:10 -0700 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: On Wed, May 6, 2015 at 9:23 AM, Eric Snow wrote: > A big blocker to making certain sweeping changes to CPython (e.g. > ref-counting) is compatibility with the vast body of C extension > modules out there that use the C-API. While there are certainly > drastic long-term solutions to that problem, there is one thing we can > do in the short-term that would at least get the ball rolling. We can > put a big red note at the top of every page of the C-API docs that > encourages folks to either use CFFI or Cython. > > Thoughts? > I think Cython is already used by those people who benefit from it. As for CFFI, is the ownership/maintenance issue solved yet? IIRC we have some really outdated versions in the CPython tree and nobody wants to step in and upgrade these to the latest CFFI, for some reason (such as that that would actually break a lot of code because the latest version is so different from the version we currently include?). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Wed May 6 18:45:05 2015 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 6 May 2015 18:45:05 +0200 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: On Wed, May 6, 2015 at 6:36 PM, Stefan Behnel wrote: > Eric Snow schrieb am 06.05.2015 um 18:23: >> A big blocker to making certain sweeping changes to CPython (e.g. >> ref-counting) is compatibility with the vast body of C extension >> modules out there that use the C-API. While there are certainly >> drastic long-term solutions to that problem, there is one thing we can >> do in the short-term that would at least get the ball rolling. We can >> put a big red note at the top of every page of the C-API docs that >> encourages folks to either use CFFI or Cython. > > I've been advocating that for years now: leave the low-level stuff to the > experts. (There's a reason why Cython code is usually faster than C-API code.) > > Not sure how big, fat and red the warning needs to be, but a big +1 from me. Probably not too big. Cython and CFFI are easier to use, so people who know about them, and can afford the extra dependency*, should use them. I a pointer would be enough, perhaps like in https://docs.python.org/3/library/urllib.request.html * Possibly build-time dependency. Or a complete rewrite, in case of existing code. From levkivskyi at gmail.com Wed May 6 18:55:16 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 6 May 2015 18:55:16 +0200 Subject: [Python-ideas] Add 'composable' decorator to functools Message-ID: Dear Chris, > > My original idea was to make the composable functions auto-curried (similar > > to proposed here > > http://code.activestate.com/recipes/52902-function-composition/ as pointed > > out by Steve) so that > > > > my_fun = square @ add(1) > > my_fun(x) > > > > evaluates to > > > > square(add(1,x)) > It also requires that > the right hand function be > composable, unlike in your earlier > example. This is true. One can only use single argument "normal" functions. Multiple argument ones should be made "composable". > > ChrisA > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed May 6 18:57:15 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 6 May 2015 18:57:15 +0200 Subject: [Python-ideas] discouraging direct use of the C-API References: Message-ID: <20150506185715.2083b063@fsol> On Wed, 6 May 2015 10:23:09 -0600 Eric Snow wrote: > A big blocker to making certain sweeping changes to CPython (e.g. > ref-counting) is compatibility with the vast body of C extension > modules out there that use the C-API. While there are certainly > drastic long-term solutions to that problem, there is one thing we can > do in the short-term that would at least get the ball rolling. We can > put a big red note at the top of every page of the C-API docs that > encourages folks to either use CFFI or Cython. CFFI is only useful for a small subset of stuff people use the C API for (mainly, thin wrappers around external libraries). Cython is a more reasonable suggestion in this context. I would advocate against red warning boxes. Warnings are for potentially dangerous constructs, we use them mainly for security issues. Adding a note and some pointers at the start of the C API docs may be enough. Regards Antoine. From donald at stufft.io Wed May 6 19:13:57 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 6 May 2015 13:13:57 -0400 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: <20150506185715.2083b063@fsol> References: <20150506185715.2083b063@fsol> Message-ID: <7CE0A9A8-8688-4F7D-A47B-074C5148883A@stufft.io> > On May 6, 2015, at 12:57 PM, Antoine Pitrou wrote: > > On Wed, 6 May 2015 10:23:09 -0600 > Eric Snow > wrote: >> A big blocker to making certain sweeping changes to CPython (e.g. >> ref-counting) is compatibility with the vast body of C extension >> modules out there that use the C-API. While there are certainly >> drastic long-term solutions to that problem, there is one thing we can >> do in the short-term that would at least get the ball rolling. We can >> put a big red note at the top of every page of the C-API docs that >> encourages folks to either use CFFI or Cython. > > CFFI is only useful for a small subset of stuff people use the C API for > (mainly, thin wrappers around external libraries). Cython is a more > reasonable suggestion in this context. You can write stuff in C itself for cffi too, it?s not just for C bindings, an example would be the .c?s and .h?s for padding and constant time compare in the cryptography project [1]. [1] https://github.com/pyca/cryptography/tree/master/src/cryptography/hazmat/primitives/src --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From p.f.moore at gmail.com Wed May 6 20:44:25 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 6 May 2015 19:44:25 +0100 Subject: [Python-ideas] (no subject) In-Reply-To: References: Message-ID: On 6 May 2015 at 17:23, Ivan Levkivskyi wrote: > I should clarify why I would like to have the possibility to easily compose > functions. > I am a physicist (not a real programmer), and in my code I often compose > functions. > > To do this I need to write something like > > def new_func(x): > return f(g(h(x))) > > This means I see f(g(h())) quite often and I would prefer to see f @ g @ h > instead. I appreciate that it's orthogonal to the proposal, but would a utility function like this be useful? def compose(*fns): def composed(x): for f in reversed(fns): x = f(x) return x return composed comp = compose(f, g, h) # comp(x) = f(g(h(x))) Paul From p.f.moore at gmail.com Wed May 6 20:56:50 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 6 May 2015 19:56:50 +0100 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: On 6 May 2015 at 17:41, Guido van Rossum wrote: > As for CFFI, is the ownership/maintenance issue solved yet? IIRC we have > some really outdated versions in the CPython tree and nobody wants to step > in and upgrade these to the latest CFFI, for some reason (such as that that > would actually break a lot of code because the latest version is so > different from the version we currently include?). I think you are referring to libffi (used by ctypes) here rather than cffi. Libffi isn't really relevant to the original topic, but the big issue there is that ctypes on Windows uses a patched copy of a pretty old version of libffi. The reason it does is to trap stack usage errors so that it can give a ValueError if you call a C function with the wrong number of arguments. libffi doesn't offer a way to do this, so migrating to the latest libffi would mean hacking out that code, and a loss of the stack checking (which is currently tested for, so although it's not exactly an API guarantee, it would be a compatibility break). Otherwise, it's mostly a matter of getting the build steps to work. Zach Ware got 32-bit builds going, and his approach (use git bash to run configure and keep a copy of the results) should in principle be fine for 64-bit, but I stalled because I've no way of testing a libffi build short of building the whole of Python and running the ctypes tests, which is both heavy handed and likely to obscure the root cause of any actual bugs found that way :-( The big problem is that ctypes with the current embedded libffi is "good enough" on Windows, which is where the bulk of ctypes usage occurs. So there's no compelling reason to put in the work to upgrade it. And (as usual) few people with the necessary expertise (in this case, Windows, C, Unix-style build processes, and assembler-level calling conventions - a pretty impressive mix!). Paul From guido at python.org Wed May 6 21:28:35 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 6 May 2015 12:28:35 -0700 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: On Wed, May 6, 2015 at 11:56 AM, Paul Moore wrote: > On 6 May 2015 at 17:41, Guido van Rossum wrote: > > As for CFFI, is the ownership/maintenance issue solved yet? IIRC we have > > some really outdated versions in the CPython tree and nobody wants to > step > > in and upgrade these to the latest CFFI, for some reason (such as that > that > > would actually break a lot of code because the latest version is so > > different from the version we currently include?). > > I think you are referring to libffi (used by ctypes) here rather than cffi. > Oh dear. I think you're right. :-( Forget I said anything. Naming is hard. I'm still not sure how realistic it is to try and deprecate the C API. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaiser.yann at gmail.com Wed May 6 21:32:54 2015 From: kaiser.yann at gmail.com (Yann Kaiser) Date: Wed, 06 May 2015 19:32:54 +0000 Subject: [Python-ideas] Add 'composable' decorator to functools (with @ matrix multiplication) In-Reply-To: References: Message-ID: On Wed, 6 May 2015 at 09:10 Ivan Levkivskyi wrote: > Dear Guido, > > My original idea was to make the composable functions auto-curried > (similar to proposed here > http://code.activestate.com/recipes/52902-function-composition/ as > pointed out by Steve) so that > > my_fun = square @ add(1) > my_fun(x) > > evaluates to > > square(add(1,x)) > This breaks the (IMO) fundamental expectation that z = add(1) my_fun = square @ z is equivalent to my_fun = square @ add(1) -Yann -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at the-compiler.org Wed May 6 21:46:25 2015 From: me at the-compiler.org (Florian Bruhin) Date: Wed, 6 May 2015 21:46:25 +0200 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: <20150506194625.GO429@tonks> * Guido van Rossum [2015-05-06 12:28:35 -0700]: > On Wed, May 6, 2015 at 11:56 AM, Paul Moore wrote: > > > On 6 May 2015 at 17:41, Guido van Rossum wrote: > > > As for CFFI, is the ownership/maintenance issue solved yet? IIRC we have > > > some really outdated versions in the CPython tree and nobody wants to > > step > > > in and upgrade these to the latest CFFI, for some reason (such as that > > that > > > would actually break a lot of code because the latest version is so > > > different from the version we currently include?). > > > > I think you are referring to libffi (used by ctypes) here rather than cffi. > > > > Oh dear. I think you're right. :-( > > Forget I said anything. Naming is hard. > > I'm still not sure how realistic it is to try and deprecate the C API. I don't think it should be *deprecated*, but I agree the documentation should point out the (probably better) alternatives. From time to time, in the #python IRC channel there are people who want to use C libraries with Python and try doing so with the C API because they aren't aware of the alternatives. Florian -- http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP) GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc I love long mails! | http://email.is-not-s.ms/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From jsbueno at python.org.br Wed May 6 21:49:04 2015 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Wed, 6 May 2015 16:49:04 -0300 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: On 6 May 2015 at 16:28, Guido van Rossum wrote: > On Wed, May 6, 2015 at 11:56 AM, Paul Moore wrote: >> >> On 6 May 2015 at 17:41, Guido van Rossum wrote: >> > As for CFFI, is the ownership/maintenance issue solved yet? IIRC we have >> > some really outdated versions in the CPython tree and nobody wants to >> > step >> > in and upgrade these to the latest CFFI, for some reason (such as that >> > that >> > would actually break a lot of code because the latest version is so >> > different from the version we currently include?). >> >> I think you are referring to libffi (used by ctypes) here rather than >> cffi. > > > Oh dear. I think you're right. :-( > > Forget I said anything. Naming is hard. > > I'm still not sure how realistic it is to try and deprecate the C API. I am also not sure, but I feel like it would be a __huge__ step forward for other implementations and Python as a language instead of the particular cPython Software Product. Today, many people still see using the C API as "the way" to extend Python, which implies in most extensions created being invalid for all other implementations of the language. (Ok, I actually don't know if cython modules could be called from, say Pypy or Jython, but even if not, I suppose a "jcython" and "pycython" could be made available in the future) (fist-google-entry says cython has at least partial support for pypy already) ? js -><- From kaiser.yann at gmail.com Wed May 6 21:50:41 2015 From: kaiser.yann at gmail.com (Yann Kaiser) Date: Wed, 06 May 2015 19:50:41 +0000 Subject: [Python-ideas] Add 'composable' decorator to functools (with @ matrix multiplication) In-Reply-To: References: Message-ID: On Wed, 6 May 2015 at 08:49 Guido van Rossum wrote: > I realize this is still python-ideas, but does this really leave functions > with multiple arguments completely out of the picture (except as the first > stage in the pipeline)? > To provide some alternative ideas, what I did in sigtools.wrappers.Combination[1] was to replace the first argument with the return value of the previous call while always using the same remaining (positional and keyword) arguments. In code: def __call__(self, arg, *args, **kwargs): for function in self.functions: arg = function(arg, *args, **kwargs) return arg With this you can even use functions that use different parameters, at the cost of less strictness: def func1(arg, *, kw1, **kwargs): ... def func2(arg, *, kw2, **kwargs): ... That class is more of a demo for sigtools.signatures.merge[2] rather than something spawned out of a need however. [1] http://sigtools.readthedocs.org/en/latest/#sigtools.wrappers.Combination [2] http://sigtools.readthedocs.org/en/latest/#sigtools.signatures.merge -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed May 6 21:51:29 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 6 May 2015 12:51:29 -0700 Subject: [Python-ideas] (no subject) In-Reply-To: References: Message-ID: <8C3A59B4-1C5B-4C67-A148-9ADBEE7123A7@yahoo.com> On May 6, 2015, at 08:05, Ivan Levkivskyi wrote: > > Dear Andrew, > > Thank you for pointing out the previous discussion, I have overlooked it. (Btw, I have found your post about the infix operators, that is a great idea). Well, nobody else seemed to like that idea, which may be a warning sign about this one. :) > Also, It turns out that astropy uses a very similar idea for function composition. > > I agree that there are indeed to much ambiguities about the "right way", and thus it is not good for stdlib. However, implementing only one decorator as a third-party library is not good idea as well. > You are right that no one will install such library. Probably, it would be better to combine it with other functionality like @infix (via overloading __or__ or __rshift__), @auto_curry, etc. Actually, many of the implementations on PyPI are part of "miscellaneous functional tools" libraries that do combine it with such things. And they still have practically no users. There are plenty of libraries that, despite being on PyPI and not mentioned anywhere in the standard docs, still have a lot of users. In fact, much of what's in the Python stdlib today (json, sqlite3, ElementTree, statistics, enum, multiprocessing, ...) started off that way. And there may be more people using requests or NumPy or Django than a lot of parts of the stdlib. "Nobody will use it unless it's in the stdlib" doesn't cut it anymore in the days of most Python installations including pip, the stdlib docs referencing libraries on PyPI, etc. If something isn't getting traction on PyPI, either people really don't want it--in which case there's nothing to do--or someone really needs to evangelize it--in which case you should start doing that, rather than proposing yet another implementation that will just gather dust. Finally, I think you've ignored an important part of my message--which is probably my fault for not making it clearer. Code that deals in abstract functional terms is harder for many people to think about. Not just novices (unless you want to call Guido a novice). Languages that make it easier to write such code are harder languages to read. So, making it easier to write such code in Python may not be a win. And the reason I brought up all those other abstract features in Haskell is that they tie together with composition very closely. Most of the best examples anyone can come up with for how compose makes code easier to read also include curried functions, operator sections, composing the apply operator itself, and so on. They're all really cool ideas that can simplify your logic--but only if you're willing to think on that more abstract plane. Adding all of that to Python would make it harder to learn. Not adding it to Python would make compose not very useful. (Which is why the various implementations are languishing without users.) > Thank you for the feedback! > > >> On 6 May 2015 at 15:59, Andrew Barnert wrote: >> This was discussed when the proposal to add @ for matrix multiplication came up, so you should first read that thread and make sure you have answers to all of the issues that came up before proposing it again. >> >> Off the top of my head: >> >> Python functions don't just take 1 parameter, they take any number of parameters, possibly including optional parameters, keyword-only, *args, **kwargs, etc. There are a dozen different compose implementations on PyPI and ActiveState that handle these differently. Which one is "right"? >> >> The design you describe can be easily implemented as a third-party library. Why not do so, put it on PyPI, see if you get any traction and any ideas for improvement, and then suggest it for the stdlib? >> >> The same thing is already doable today using a different operator--and, again, there are a dozen implementations. Why isn't anyone using them? >> >> Thinking in terms of function composition requires a higher level of abstraction than thinking in terms of lambda expressions. That's one of the reasons people perceive Haskell to be a harder language to learn than Lisp or Python. Of course learning Haskell is rewarding--but being easy to learn is one of Python's major strengths. >> >> Python doesn't have a static optimizing compiler that can avoid building 4 temporary function objects to evaluate (plot @ sorted @ sqrt @ real) (data_array), so it will make your code significantly less efficient. >> >> Is @ for composition and () for application really sufficient to write point free code in general without auto-curried functions, operator sectioning, reverse compose, reverse apply, etc.? Most of the examples people use in describing the feature from Haskell have a (+ 1) or (== x) or take advantage of map-type functions being (a->b) -> ([a] -> [b]) instead of (a->b, [a]) -> [b]. >> >> Sent from my iPhone >> >> > On May 6, 2015, at 06:15, Ivan Levkivskyi wrote: >> > >> > Dear all, >> > >> > The matrix multiplication operator @ is going to be introduced in Python 3.5 and I am thinking about the following idea: >> > >> > The semantics of matrix multiplication is the composition of the corresponding linear transformations. >> > A linear transformation is a particular example of a more general concept - functions. >> > The latter are frequently composed with ("wrap") each other. For example: >> > >> > plot(real(sqrt(data))) >> > >> > However, it is not very readable in case of many wrapping layers. Therefore, it could be useful to employ >> > the matrix multiplication operator @ for indication of function composition. This could be done by such (simplified) decorator: >> > >> > class composable: >> > >> > def __init__(self, func): >> > self.func = func >> > >> > def __call__(self, arg): >> > return self.func(arg) >> > >> > def __matmul__(self, other): >> > def composition(*args, **kwargs): >> > return self.func(other(*args, **kwargs)) >> > return composable(composition) >> > >> > I think using such decorator with functions that are going to be deeply wrapped >> > could improve readability. >> > You could compare (note that only the outermost function should be decorated): >> > >> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) (data_array) >> > >> > I think the latter is more readable, also compare >> > >> > def sunique(lst): >> > return sorted(list(set(lst))) >> > >> > vs. >> > >> > sunique = sorted @ list @ set >> > >> > Apart from readability, there are following pros of the proposed decorator: >> > >> > 1. Similar semantics as for matrix multiplication. >> > 2. Same symbol for composition as for decorators. >> > 3. The symbol @ resembles mathematical notation for function composition: ? >> > >> > I think it could be a good idea to add such a decorator to the stdlib functools module. >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed May 6 21:57:57 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 6 May 2015 12:57:57 -0700 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: <2612484C-4BA0-460E-8913-2F294310107B@yahoo.com> On May 6, 2015, at 12:49, Joao S. O. Bueno wrote: > >> On 6 May 2015 at 16:28, Guido van Rossum wrote: >>> On Wed, May 6, 2015 at 11:56 AM, Paul Moore wrote: >>> >>>> On 6 May 2015 at 17:41, Guido van Rossum wrote: >>>> As for CFFI, is the ownership/maintenance issue solved yet? IIRC we have >>>> some really outdated versions in the CPython tree and nobody wants to >>>> step >>>> in and upgrade these to the latest CFFI, for some reason (such as that >>>> that >>>> would actually break a lot of code because the latest version is so >>>> different from the version we currently include?). >>> >>> I think you are referring to libffi (used by ctypes) here rather than >>> cffi. >> >> >> Oh dear. I think you're right. :-( >> >> Forget I said anything. Naming is hard. >> >> I'm still not sure how realistic it is to try and deprecate the C API. > > I am also not sure, but I feel like it would be a __huge__ step > forward for other implementations > and Python as a language instead of the particular cPython Software Product. > > Today, many people still see using the C API as "the way" to extend Python, > which implies in most extensions created being invalid for all other > implementations of the language. > > (Ok, I actually don't know if cython modules > could be called from, say Pypy or Jython, but even if not, I suppose > a "jcython" and "pycython" could be made available in the future) > (fist-google-entry says cython has at least partial support for pypy already) Yes, but PyPy also has pretty good support for C API extensions, too. For Jython and IronPython, I'm not sure what the answer could be. Could Cython automatically build JNI thingies and wrap them up in Java thingies to expose them to Jython even in theory? Even if that worked, what about something like Skulpt? Compile to C code and jsctypes wrappers? So I think there's a limit to what you can expect out of "extending Python the language" in any generic way. > ? > js > -><- > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Wed May 6 21:59:16 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 6 May 2015 12:59:16 -0700 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: On May 6, 2015, at 09:23, Eric Snow wrote: > > A big blocker to making certain sweeping changes to CPython (e.g. > ref-counting) is compatibility with the vast body of C extension > modules out there that use the C-API. While there are certainly > drastic long-term solutions to that problem, there is one thing we can > do in the short-term that would at least get the ball rolling. We can > put a big red note at the top of every page of the C-API docs that > encourages folks to either use CFFI or Cython. Does this mean you also want to discourage boost::python, SIP, SWIG, etc., which as far as I know come down to automatically building C API extensions, and would need to be completely rewritten if you wanted to make them work a different way? > Thoughts? > > -eric > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Wed May 6 22:10:10 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 6 May 2015 13:10:10 -0700 Subject: [Python-ideas] Add 'composable' decorator to functools (with @ matrix multiplication) In-Reply-To: References: Message-ID: On May 6, 2015, at 12:50, Yann Kaiser wrote: > >> On Wed, 6 May 2015 at 08:49 Guido van Rossum wrote: >> I realize this is still python-ideas, but does this really leave functions with multiple arguments completely out of the picture (except as the first stage in the pipeline)? > > To provide some alternative ideas, what I did in sigtools.wrappers.Combination[1] was to replace the first argument with the return value of the previous call while always using the same remaining (positional and keyword) arguments This is exactly my point about there not being one obvious right answer for dealing with multiple arguments. Among the choices are: * Don't allow them. * Auto-*-unpack return values into multiple arguments. * Compose on the first argument, pass others along the chain. * Auto-curry and auto-partial everywhere. * Auto-curry and auto-partial and _also_ auto-*-unpack (which I'd never considered, but it sounds like that's what this thread proposes). They've all got uses. But if you're going to write _the_ compose function, it has to pick one. Also, keep in mind that "auto-curry and auto-partial everything" can't really mean "everything"--unlike Haskell, Python can't partial operator expressions, and we've still got *args and keyword-only params and **kw, and we've got C functions that aren't argclinic'd yet, and so on. (If that all seems like obscure edge cases, consider that most proxy implementations, forwarding functions, etc. work by just taking and passing along *args, **kw, not by inspecting and binding the signature of the wrappee.) > . In code: > > def __call__(self, arg, *args, **kwargs): > for function in self.functions: > arg = function(arg, *args, **kwargs) > return arg > > With this you can even use functions that use different parameters, at the cost of less strictness: > > def func1(arg, *, kw1, **kwargs): > ... > > def func2(arg, *, kw2, **kwargs): > ... > > That class is more of a demo for sigtools.signatures.merge[2] rather than something spawned out of a need however. > > [1] http://sigtools.readthedocs.org/en/latest/#sigtools.wrappers.Combination > [2] http://sigtools.readthedocs.org/en/latest/#sigtools.signatures.merge > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Wed May 6 22:38:11 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 06 May 2015 21:38:11 +0100 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: On 06/05/2015 19:56, Paul Moore wrote: > > Otherwise, it's mostly a matter of getting the build steps to work. > Zach Ware got 32-bit builds going, and his approach (use git bash to > run configure and keep a copy of the results) should in principle be > fine for 64-bit, but I stalled because I've no way of testing a libffi > build short of building the whole of Python and running the ctypes > tests, which is both heavy handed and likely to obscure the root cause > of any actual bugs found that way :-( > > Paul Feel free to throw this or any other Windows issues my way. I've all the time in the world to try stuff like this. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From abarnert at yahoo.com Wed May 6 22:40:18 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 6 May 2015 13:40:18 -0700 Subject: [Python-ideas] (no subject) In-Reply-To: <20150506154811.GM5663@ando.pearwood.info> References: <20150506154811.GM5663@ando.pearwood.info> Message-ID: <26EE1D3C-0496-41C4-98D8-6D918FEF578A@yahoo.com> Apologies for the split replies; is everyone else seeing this as three separate threads spawned from two copies of the original mail, or is this just Yahoo sucking again? On May 6, 2015, at 08:48, Steven D'Aprano wrote: > >> On Wed, May 06, 2015 at 06:59:45AM -0700, Andrew Barnert via Python-ideas wrote: >> >> Python functions don't just take 1 parameter, they take any number of >> parameters, possibly including optional parameters, keyword-only, >> *args, **kwargs, etc. > > Maybe Haskell programmers are used to functions which all take one > argument, and f(a, b, c) is syntactic sugar for f(a)(b)(c), but I doubt > anyone else is. But that's exactly the point. In Haskell, because f(a, b, c) is syntactic sugar for f(a)(b)(c), it's obvious (to a Haskell programmer, not to a human) what it means to compose f. In Python, it's not at all obvious. Or, worse, it _is_ obvious that you just can't compose f with anything. There is no function whose return value can be passed to a function that takes 3 arguments. (Unless you add some extra rule, like auto-*-unpacking, or passing along *args[1:], **kw up the chain, or you do like Haskell and auto-curry everything.) Which makes composition far less useful in Python than in Haskell. > When we Python programmers manually compose a function > today, by writing an expression or a new function, we have to deal with > the exact same problems. Yes, but we can explicitly decide what to pass for the b and c arguments when writing an expression, and the obvious way to encode that decision is trivially readable. For example: f(g(a), b, c) f(*g(a, b, c)) f(g(a)) # using default values for b and c ... etc. I can't think of any syntax for compose that makes that true. > There's nothing new about the programmer > needing to ensure that the function signatures are compatible: > > def spam(a, b, c): > return a+b+c > > def eggs(x, y, z): > return x*y/z > > def composed(*args): > return eggs(spam(*args)) # doesn't work > > It is the programmer's responsibility to compose compatible functions. > Why should it be a fatal flaw that the same limitation applies to a > composition operator? Sure, just as bad for a compose function as for a compose operator. I'm not suggesting we should add composed to the stdlib instead of @, I'm suggesting we should add neither. > Besides, with Argument Clinic, it's possible that the @ operator could > catch incompatible signatures ahead of time. > > >> There are a dozen different compose >> implementations on PyPI and ActiveState that handle these differently. > > That is good evidence that this is functionality that people want. Not if nobody is using any of those implementations. >> Which one is "right"? > > Perhaps all of them? Perhaps none of them? There are lots of buggy or > badly designed functions and classes on the internet. Perhaps that > suggests that the std lib should solve it right once and for all. It's not that they're buggy, it's that there are fundamental design choices that have to be made to fit compose into a language like Python, and none of the options are good enough to be standardized. One project may have good uses for a compose that passes along extra args, another for a compose that *-unpacks, and another for auto-currying. Providing one of those won't help the other two projects at all; all it'll do is collide with the name they wanted to use. >> The design you describe can be easily implemented as a third-party >> library. Why not do so, put it on PyPI, see if you get any traction >> and any ideas for improvement, and then suggest it for the stdlib? > > I agree that this idea needs to have some real use before it can be > added to the std lib, but see below for a counter-objection to the PyPI > objection. > > >> The same thing is already doable today using a different >> operator--and, again, there are a dozen implementations. Why isn't >> anyone using them? > > It takes a certain amount of effort for people to discover and use a > third-party library: one has to find a library, or libraries, determine > that it is mature, decide which competing library to use, determine if > the licence is suitable, download and install it. This "activiation > energy" is insignificant if the library does something big, say, like > numpy, or nltk, or even medium sized. > > But for a library that provides effectively a single function, that > activation energy is a barrier to entry. It's not that the function > isn't useful, or that people wouldn't use it if it were already > available. It's just that the effort to get it is too much bother. > People will do without, or re-invent the wheel. (Re-inventing the wheel > is at least fun. Searching PyPI and reading licences is not.) > > >> Thinking in terms of function composition requires a higher level of >> abstraction than thinking in terms of lambda expressions. > > Do you think its harder than, say, the "async for" feature that's just > been approved by Guido? That's not a fair comparison. Writing proper c10k network code is hard. An extra layer of abstraction that you have to get your head around, but that makes it a lot easier once you do, is a clear win. Calling functions with the result of other functions is easy. An extra layer of abstraction that you have to get your head around, but that makes it possible to write slightly more concise or elegant code once you do, is probably not a win. When I first come back to Python after a bit of time with another language, I have to shift gears and stop reducing compositions and instead loop over explicitly composed expressions, but that means I write more Pythonic code. I don't want Python to enable me to write code that Guido can't understand (unless it's something inherently complex in the first place). > Compared to asynchronous code, I would say function composition is > trivial. Anyone who can learn the correspondence > > (a @ b)(arg) <=> a(b(arg)) > > can deal with it. > > >> Python doesn't have a static optimizing compiler that can avoid >> building 4 temporary function objects to evaluate (plot @ sorted @ >> sqrt @ real) (data_array), so it will make your code significantly >> less efficient. > > Why would it necessarily have to create 4 temporary function objects? > Besides, the rules for optimization apply here too: don't dismiss > something as too slow until you've measured it :-) It's not so much creating the 4 temporary objects as having to call through them every time you want to call the composed function. And, while I obviously haven't measured the vaporware implementation of the current proposal, I have played around with different ways of writing Haskelly code in Python and how they fare without a GHC-style optimizer, and the performance impact is definitely noticeable in almost everything you do. Of course it's possible that the very nature of "playing" vs. writing production code means that I was pushing it a lot farther than anyone would do in real life (nobody's going to invent and apply new combinators in production code, even in Haskell...), so I'll concede that maybe this wouldn't be a problem. But I suspect it will be. > We shouldn't care about the cost of the @ operator itself, only the cost > of calling the composed functions. Building the Composed object > generally happens only once, while calling it generally happens many > times. > > >> Is @ for composition and () for application really sufficient to write >> point free code in general without auto-curried functions, operator >> sectioning, reverse compose, reverse apply, etc.? Most of the examples >> people use in describing the feature from Haskell have a (+ 1) or (== >> x) or take advantage of map-type functions being (a->b) -> ([a] -> >> [b]) instead of (a->b, [a]) -> [b]. > > See, now *that's* why people consider Haskell to be difficult: Hold on. (+ 1) meaning lambda x: x + 1 doesn't require any abstruse graduate-level math. Understanding auto-currying map functions might... But that's kind of my point: most of the best examples for compose involve exactly these kinds of things that you don't want to even try to understand. And notice that the author of this current proposal thinks we should add the same thing to Python. Doesn't that make you worry that maybe compose belongs to the wrong universe? > it is > based on areas of mathematics which even maths graduates may never have > come across. But function composition is taught in high school. (At > least in Australia, and I expect Europe and Japan.) It's a nice, simple > and useful functional tool, like partial. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From me at the-compiler.org Wed May 6 22:58:12 2015 From: me at the-compiler.org (Florian Bruhin) Date: Wed, 6 May 2015 22:58:12 +0200 Subject: [Python-ideas] (no subject) In-Reply-To: <26EE1D3C-0496-41C4-98D8-6D918FEF578A@yahoo.com> References: <20150506154811.GM5663@ando.pearwood.info> <26EE1D3C-0496-41C4-98D8-6D918FEF578A@yahoo.com> Message-ID: <20150506205812.GP429@tonks> * Andrew Barnert via Python-ideas [2015-05-06 13:40:18 -0700]: > Apologies for the split replies; is everyone else seeing this as three separate threads spawned from two copies of the original mail, or is this just Yahoo sucking again? Yes - I guess the OP accidentally sent it without a subject, and then re-sent it with a subject. Florian -- http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP) GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc I love long mails! | http://email.is-not-s.ms/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From levkivskyi at gmail.com Wed May 6 23:12:49 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 6 May 2015 23:12:49 +0200 Subject: [Python-ideas] (no subject) Message-ID: > Apologies for the split replies; is everyone else seeing this > as three separate threads spawned > from two copies of the original mail, or is this just Yahoo sucking again? Probably that is my fault. I have sent first a message via google-group, but then received a message from python-ideas at python.org that my message has not been delivered and sent the second directly. Sorry for that. This is my first post here. > And notice that the author of this current proposal > thinks we should add the same thing to Python. > Doesn't that make you worry that maybe compose > belongs to the wrong universe? I would like to clarify that I don't want to add all Haskell to Python, on the contrary, I wanted to propose a small subset of tools that could be useful. Your position as I understand is that it is not easy: either you get many complex tools, or you get useless tools. Still, I think one could try to find some kind of compromise. -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Wed May 6 23:15:12 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 6 May 2015 23:15:12 +0200 Subject: [Python-ideas] (no subject) In-Reply-To: References: Message-ID: This is one of the options, but in my opinion an operator (@ that I propose) is clearer than a function On 6 May 2015 at 20:44, Paul Moore wrote: > On 6 May 2015 at 17:23, Ivan Levkivskyi wrote: > > I should clarify why I would like to have the possibility to easily > compose > > functions. > > I am a physicist (not a real programmer), and in my code I often compose > > functions. > > > > To do this I need to write something like > > > > def new_func(x): > > return f(g(h(x))) > > > > This means I see f(g(h())) quite often and I would prefer to see f @ g @ > h > > instead. > > I appreciate that it's orthogonal to the proposal, but would a utility > function like this be useful? > > def compose(*fns): > def composed(x): > for f in reversed(fns): > x = f(x) > return x > return composed > > comp = compose(f, g, h) > # comp(x) = f(g(h(x))) > > Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Wed May 6 23:25:05 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 6 May 2015 23:25:05 +0200 Subject: [Python-ideas] Add 'composable' decorator to functools (with @ matrix multiplication) In-Reply-To: References: Message-ID: Dear Yann, The two options that you mentioned are indeed equivalent (the function application is much tighter that @), but note that z would be a partial-like object. Of course for this to work, not only the first function must be decorated with @composable, but also all multi-argument functions. On 6 May 2015 at 21:32, Yann Kaiser wrote: > On Wed, 6 May 2015 at 09:10 Ivan Levkivskyi wrote: > >> Dear Guido, >> >> My original idea was to make the composable functions auto-curried >> (similar to proposed here >> http://code.activestate.com/recipes/52902-function-composition/ as >> pointed out by Steve) so that >> >> my_fun = square @ add(1) >> my_fun(x) >> >> evaluates to >> >> square(add(1,x)) >> > > This breaks the (IMO) fundamental expectation that > > z = add(1) > my_fun = square @ z > > is equivalent to > > my_fun = square @ add(1) > > -Yann > -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Wed May 6 23:38:02 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 6 May 2015 23:38:02 +0200 Subject: [Python-ideas] (no subject) In-Reply-To: <8C3A59B4-1C5B-4C67-A148-9ADBEE7123A7@yahoo.com> References: <8C3A59B4-1C5B-4C67-A148-9ADBEE7123A7@yahoo.com> Message-ID: On 6 May 2015 at 21:51, Andrew Barnert wrote: > On May 6, 2015, at 08:05, Ivan Levkivskyi wrote: > > Dear Andrew, > > Thank you for pointing out the previous discussion, I have overlooked it. > (Btw, I have found your post about the infix operators, that is a great > idea). > > > Well, nobody else seemed to like that idea, which may be a warning sign > about this one. :) > > Also, It turns out that astropy uses a very similar idea for function > composition. > > I agree that there are indeed to much ambiguities about the "right way", > and thus it is not good for stdlib. However, implementing only one > decorator as a third-party library is not good idea as well. > You are right that no one will install such library. Probably, it would be > better to combine it with other functionality like @infix (via overloading > __or__ or __rshift__), @auto_curry, etc. > > > Actually, many of the implementations on PyPI are part of "miscellaneous > functional tools" libraries that do combine it with such things. And they > still have practically no users. > > There are plenty of libraries that, despite being on PyPI and not > mentioned anywhere in the standard docs, still have a lot of users. In > fact, much of what's in the Python stdlib today (json, sqlite3, > ElementTree, statistics, enum, multiprocessing, ...) started off that way. > And there may be more people using requests or NumPy or Django than a lot > of parts of the stdlib. "Nobody will use it unless it's in the stdlib" > doesn't cut it anymore in the days of most Python installations including > pip, the stdlib docs referencing libraries on PyPI, etc. If something isn't > getting traction on PyPI, either people really don't want it--in which case > there's nothing to do--or someone really needs to evangelize it--in which > case you should start doing that, rather than proposing yet another > implementation that will just gather dust. > Ok, I will try inspecting all existing approaches to find the one that seems more "right" to me :) In any case that approach could be updated by incorporating matrix @ as a dedicated operator for compositions. At least, it seems that Erik from astropy likes this idea and it is quite natural for people with "scientific" background. > Finally, I think you've ignored an important part of my message--which is > probably my fault for not making it clearer. Code that deals in abstract > functional terms is harder for many people to think about. Not just novices > (unless you want to call Guido a novice). Languages that make it easier to > write such code are harder languages to read. So, making it easier to write > such code in Python may not be a win. > > And the reason I brought up all those other abstract features in Haskell > is that they tie together with composition very closely. Most of the best > examples anyone can come up with for how compose makes code easier to read > also include curried functions, operator sections, composing the apply > operator itself, and so on. They're all really cool ideas that can simplify > your logic--but only if you're willing to think on that more abstract > plane. Adding all of that to Python would make it harder to learn. Not > adding it to Python would make compose not very useful. (Which is why the > various implementations are languishing without users.) > Thank you for the feedback! > > > On 6 May 2015 at 15:59, Andrew Barnert wrote: > >> This was discussed when the proposal to add @ for matrix multiplication >> came up, so you should first read that thread and make sure you have >> answers to all of the issues that came up before proposing it again. >> >> Off the top of my head: >> >> Python functions don't just take 1 parameter, they take any number of >> parameters, possibly including optional parameters, keyword-only, *args, >> **kwargs, etc. There are a dozen different compose implementations on PyPI >> and ActiveState that handle these differently. Which one is "right"? >> >> The design you describe can be easily implemented as a third-party >> library. Why not do so, put it on PyPI, see if you get any traction and any >> ideas for improvement, and then suggest it for the stdlib? >> >> The same thing is already doable today using a different operator--and, >> again, there are a dozen implementations. Why isn't anyone using them? >> >> Thinking in terms of function composition requires a higher level of >> abstraction than thinking in terms of lambda expressions. That's one of the >> reasons people perceive Haskell to be a harder language to learn than Lisp >> or Python. Of course learning Haskell is rewarding--but being easy to learn >> is one of Python's major strengths. >> >> Python doesn't have a static optimizing compiler that can avoid building >> 4 temporary function objects to evaluate (plot @ sorted @ sqrt @ real) >> (data_array), so it will make your code significantly less efficient. >> >> Is @ for composition and () for application really sufficient to write >> point free code in general without auto-curried functions, operator >> sectioning, reverse compose, reverse apply, etc.? Most of the examples >> people use in describing the feature from Haskell have a (+ 1) or (== x) or >> take advantage of map-type functions being (a->b) -> ([a] -> [b]) instead >> of (a->b, [a]) -> [b]. >> >> Sent from my iPhone >> >> > On May 6, 2015, at 06:15, Ivan Levkivskyi wrote: >> > >> > Dear all, >> > >> > The matrix multiplication operator @ is going to be introduced in >> Python 3.5 and I am thinking about the following idea: >> > >> > The semantics of matrix multiplication is the composition of the >> corresponding linear transformations. >> > A linear transformation is a particular example of a more general >> concept - functions. >> > The latter are frequently composed with ("wrap") each other. For >> example: >> > >> > plot(real(sqrt(data))) >> > >> > However, it is not very readable in case of many wrapping layers. >> Therefore, it could be useful to employ >> > the matrix multiplication operator @ for indication of function >> composition. This could be done by such (simplified) decorator: >> > >> > class composable: >> > >> > def __init__(self, func): >> > self.func = func >> > >> > def __call__(self, arg): >> > return self.func(arg) >> > >> > def __matmul__(self, other): >> > def composition(*args, **kwargs): >> > return self.func(other(*args, **kwargs)) >> > return composable(composition) >> > >> > I think using such decorator with functions that are going to be deeply >> wrapped >> > could improve readability. >> > You could compare (note that only the outermost function should be >> decorated): >> > >> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) >> (data_array) >> > >> > I think the latter is more readable, also compare >> > >> > def sunique(lst): >> > return sorted(list(set(lst))) >> > >> > vs. >> > >> > sunique = sorted @ list @ set >> > >> > Apart from readability, there are following pros of the proposed >> decorator: >> > >> > 1. Similar semantics as for matrix multiplication. >> > 2. Same symbol for composition as for decorators. >> > 3. The symbol @ resembles mathematical notation for function >> composition: ? >> > >> > I think it could be a good idea to add such a decorator to the stdlib >> functools module. >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > Code of Conduct: http://python.org/psf/codeofconduct/ >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Wed May 6 23:41:13 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 06 May 2015 23:41:13 +0200 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: <554A8A79.2040306@egenix.com> On 06.05.2015 18:23, Eric Snow wrote: > A big blocker to making certain sweeping changes to CPython (e.g. > ref-counting) is compatibility with the vast body of C extension > modules out there that use the C-API. While there are certainly > drastic long-term solutions to that problem, there is one thing we can > do in the short-term that would at least get the ball rolling. We can > put a big red note at the top of every page of the C-API docs that > encourages folks to either use CFFI or Cython. > > Thoughts? Python without the C extensions would hardly have had the success it has. It is widely known as perfect language to glue together different systems and provide integration. Deprecating the C API would mean that you deprecate all those existing C extensions together with the C API. This can hardly be in the interest of Python's quest for world domination :-) BTW: What can be more drastic than deprecating the Python C API ? There are certainly better ways to evolve an API than getting rid of it. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 06 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From donald at stufft.io Wed May 6 23:54:06 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 6 May 2015 17:54:06 -0400 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: <554A8A79.2040306@egenix.com> References: <554A8A79.2040306@egenix.com> Message-ID: > On May 6, 2015, at 5:41 PM, M.-A. Lemburg wrote: > > On 06.05.2015 18:23, Eric Snow wrote: >> A big blocker to making certain sweeping changes to CPython (e.g. >> ref-counting) is compatibility with the vast body of C extension >> modules out there that use the C-API. While there are certainly >> drastic long-term solutions to that problem, there is one thing we can >> do in the short-term that would at least get the ball rolling. We can >> put a big red note at the top of every page of the C-API docs that >> encourages folks to either use CFFI or Cython. >> >> Thoughts? > > Python without the C extensions would hardly have had the > success it has. It is widely known as perfect language to > glue together different systems and provide integration. > > Deprecating the C API would mean that you deprecate all > those existing C extensions together with the C API. > > This can hardly be in the interest of Python's quest for > world domination :-) > > BTW: What can be more drastic than deprecating the Python C API ? > There are certainly better ways to evolve an API than getting > rid of it. I think ?deprecate? might be a bad word for it rather than telling people they shoul use CFFI (or Python) instead of the C-API. Similar to having the urllib.request direct people towards the requests project for accessing the internet. CFFI still makes it easy to act as a glue between different systems, it just does so in a way that isn?t tied to one particular implementation?s API and which is generally much easier to work with on top of that. The biggest problems with CFFI currently are the problems in distributing a CFFI module because of some early decisions, but the CFFI 1.0 work is fixing all of that. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ericsnowcurrently at gmail.com Thu May 7 00:00:48 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 6 May 2015 16:00:48 -0600 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: On Wed, May 6, 2015 at 1:59 PM, Andrew Barnert wrote: > On May 6, 2015, at 09:23, Eric Snow wrote: >> >> A big blocker to making certain sweeping changes to CPython (e.g. >> ref-counting) is compatibility with the vast body of C extension >> modules out there that use the C-API. While there are certainly >> drastic long-term solutions to that problem, there is one thing we can >> do in the short-term that would at least get the ball rolling. We can >> put a big red note at the top of every page of the C-API docs that >> encourages folks to either use CFFI or Cython. > > Does this mean you also want to discourage boost::python, SIP, SWIG, etc., which as far as I know come down to automatically building C API extensions, and would need to be completely rewritten if you wanted to make them work a different way? Not really. I mentioned CFFI and Cython specifically because they are the two that kept coming up in previous discussions related to discouraging use of the C-API. If C extensions were always generated using tools, then only tools would have to adapt to (drastic) changes in the C-API. That would be a much better situation than the status quo since it drastically reduces the impact of changes. -eric From solipsis at pitrou.net Thu May 7 00:16:22 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 7 May 2015 00:16:22 +0200 Subject: [Python-ideas] discouraging direct use of the C-API References: <20150506185715.2083b063@fsol> <7CE0A9A8-8688-4F7D-A47B-074C5148883A@stufft.io> Message-ID: <20150507001622.000808af@fsol> On Wed, 6 May 2015 13:13:57 -0400 Donald Stufft wrote: > > > On May 6, 2015, at 12:57 PM, Antoine Pitrou wrote: > > > > On Wed, 6 May 2015 10:23:09 -0600 > > Eric Snow > > wrote: > >> A big blocker to making certain sweeping changes to CPython (e.g. > >> ref-counting) is compatibility with the vast body of C extension > >> modules out there that use the C-API. While there are certainly > >> drastic long-term solutions to that problem, there is one thing we can > >> do in the short-term that would at least get the ball rolling. We can > >> put a big red note at the top of every page of the C-API docs that > >> encourages folks to either use CFFI or Cython. > > > > CFFI is only useful for a small subset of stuff people use the C API for > > (mainly, thin wrappers around external libraries). Cython is a more > > reasonable suggestion in this context. > > You can write stuff in C itself for cffi too, it?s not just for C bindings, > an example would be the .c?s and .h?s for padding and constant time compare > in the cryptography project [1]. That really doesn't change what I said. CFFI is not appropriate to write e.g. actual extension classes. Besides, we have ctypes in the standard library, it would be stupid to recommend CFFI and not ctypes. Regards Antoine. From ericsnowcurrently at gmail.com Thu May 7 00:19:03 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 6 May 2015 16:19:03 -0600 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: <554A8A79.2040306@egenix.com> References: <554A8A79.2040306@egenix.com> Message-ID: On Wed, May 6, 2015 at 3:41 PM, M.-A. Lemburg wrote: > Python without the C extensions would hardly have had the > success it has. It is widely known as perfect language to > glue together different systems and provide integration. > > Deprecating the C API would mean that you deprecate all > those existing C extensions together with the C API. As Donald noted, I'm not suggesting that the C-API be deprecated. I was careful in calling it "discouraging direct use of the C-API". :) > > This can hardly be in the interest of Python's quest for > world domination :-) > > BTW: What can be more drastic than deprecating the Python C API ? > There are certainly better ways to evolve an API than getting > rid of it. I'd like to hear more on alternatives. Lately all I've heard is how much better off we'd be if folks used CFFI or tools like Cython to write their extension modules. Regardless of what it is, we should try to find *some* solution that puts us in a position that we can accomplish certain architectural changes, such as moving away from ref-counting. Larry talked about it at the language summit. -eric From donald at stufft.io Thu May 7 00:27:20 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 6 May 2015 18:27:20 -0400 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: <20150507001622.000808af@fsol> References: <20150506185715.2083b063@fsol> <7CE0A9A8-8688-4F7D-A47B-074C5148883A@stufft.io> <20150507001622.000808af@fsol> Message-ID: > On May 6, 2015, at 6:16 PM, Antoine Pitrou wrote: > > On Wed, 6 May 2015 13:13:57 -0400 > Donald Stufft wrote: >> >>> On May 6, 2015, at 12:57 PM, Antoine Pitrou wrote: >>> >>> On Wed, 6 May 2015 10:23:09 -0600 >>> Eric Snow >>> wrote: >>>> A big blocker to making certain sweeping changes to CPython (e.g. >>>> ref-counting) is compatibility with the vast body of C extension >>>> modules out there that use the C-API. While there are certainly >>>> drastic long-term solutions to that problem, there is one thing we can >>>> do in the short-term that would at least get the ball rolling. We can >>>> put a big red note at the top of every page of the C-API docs that >>>> encourages folks to either use CFFI or Cython. >>> >>> CFFI is only useful for a small subset of stuff people use the C API for >>> (mainly, thin wrappers around external libraries). Cython is a more >>> reasonable suggestion in this context. >> >> You can write stuff in C itself for cffi too, it?s not just for C bindings, >> an example would be the .c?s and .h?s for padding and constant time compare >> in the cryptography project [1]. > > That really doesn't change what I said. CFFI is not appropriate to > write e.g. actual extension classes. What is an ?actual extension class?? > > Besides, we have ctypes in the standard library, it would be stupid to > recommend CFFI and not ctypes. Besides the fact that ctypes can only work at the ABI level which flat out doesn?t work for a lot of C projects, but even if you?re working at the ABI level ctypes isn?t nearly as nice to use as CFFI is. With ctypes you have to repeat the C declarations using ctypes special snowflake API but with cffi you just re-use the C declarations (for the most part), in most scenarios you can simply copy/paste from the .h files or man pages or what have you. Here?s a decent read: http://eli.thegreenplace.net/2013/03/09/python-ffi-with-ctypes-and-cffi --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From solipsis at pitrou.net Thu May 7 00:34:18 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 7 May 2015 00:34:18 +0200 Subject: [Python-ideas] discouraging direct use of the C-API References: <20150506185715.2083b063@fsol> <7CE0A9A8-8688-4F7D-A47B-074C5148883A@stufft.io> <20150507001622.000808af@fsol> Message-ID: <20150507003418.03a78ca0@fsol> On Wed, 6 May 2015 18:27:20 -0400 Donald Stufft wrote: > > > On May 6, 2015, at 6:16 PM, Antoine Pitrou wrote: > > > > On Wed, 6 May 2015 13:13:57 -0400 > > Donald Stufft wrote: > >> > >>> On May 6, 2015, at 12:57 PM, Antoine Pitrou wrote: > >>> > >>> On Wed, 6 May 2015 10:23:09 -0600 > >>> Eric Snow > >>> wrote: > >>>> A big blocker to making certain sweeping changes to CPython (e.g. > >>>> ref-counting) is compatibility with the vast body of C extension > >>>> modules out there that use the C-API. While there are certainly > >>>> drastic long-term solutions to that problem, there is one thing we can > >>>> do in the short-term that would at least get the ball rolling. We can > >>>> put a big red note at the top of every page of the C-API docs that > >>>> encourages folks to either use CFFI or Cython. > >>> > >>> CFFI is only useful for a small subset of stuff people use the C API for > >>> (mainly, thin wrappers around external libraries). Cython is a more > >>> reasonable suggestion in this context. > >> > >> You can write stuff in C itself for cffi too, it?s not just for C bindings, > >> an example would be the .c?s and .h?s for padding and constant time compare > >> in the cryptography project [1]. > > > > That really doesn't change what I said. CFFI is not appropriate to > > write e.g. actual extension classes. > > > What is an ?actual extension class?? Uh... Please take a look at the C API manual. Regards Antoine. From chris.barker at noaa.gov Wed May 6 21:24:04 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 6 May 2015 12:24:04 -0700 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: On Wed, May 6, 2015 at 9:41 AM, Guido van Rossum wrote: > I think Cython is already used by those people who benefit from it. > I wish that where the case, but I don't think so -- there is a LOT of weight behind the idea of something being "built-in" and/or "official". So folks do still right extensions using the raw C API. Some note recommending Cython in the core docs about the C API would be great. And we don't use Cython in the standard library, do we? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Thu May 7 03:13:11 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 07 May 2015 10:13:11 +0900 Subject: [Python-ideas] (no subject) In-Reply-To: References: <8C3A59B4-1C5B-4C67-A148-9ADBEE7123A7@yahoo.com> Message-ID: <87lhh142c8.fsf@uwakimon.sk.tsukuba.ac.jp> Ivan Levkivskyi writes: > Ok, I will try inspecting all existing approaches to find the one > that seems more "right" to me :) If you do inspect all the approaches you can find, I hope you'll keep notes and publish them, perhaps as a blog article. > In any case that approach could be updated by incorporating matrix > @ as a dedicated operator for compositions. I think rather than "dedicated" you mean "suggested". One of Andrew's main points is that you're unlikely to find more than a small minority agreeing on the "right" approach, no matter which one you choose. > At least, it seems that Erik from astropy likes this idea and it is > quite natural for people with "scientific" background. Sure, but as he also points out, when you know that you're going to be composing only functions of one argument, the Unix pipe symbol is also quite natural (as is Haskell's operator-less notation). While one of my hobbies is category theory (basically, the mathematical theory of composable maps for those not familiar with the term), I find the Unix pipeline somehow easier to think about than abstract composition, although I believe they're equivalent (at least as composition is modeled by category theory). From rob.cliffe at btinternet.com Thu May 7 03:41:34 2015 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Thu, 07 May 2015 02:41:34 +0100 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <554AC2CE.5040705@btinternet.com> This is no doubt *not* the best platform to raise these thoughts (which are nothing to do with Python - apologies), but I'm not sure where else to go. I watch discussions like this ... I watch posts like this one [Nick's] ... ... And I despair. I really despair. I am a very experienced but old (some would say "dinosaur") programmer. I appreciate the need for Unicode. I really do. I don't understand Unicode and all its complications AT ALL. And I can't help wondering: Why, oh why, do things have to be SO FU*****G COMPLICATED? This thread, for example, is way over my head. And it is typical of many discussions I have stared at, uncomprehendingly. Surely 65536 (2-byte) encodings are enough to express all characters in all the languages in the world, plus all the special characters we need. Why can't there be just *ONE* universal encoding? (Decided upon, no doubt, by some international standards committee. There would surely be enough spare codes for any special characters etc. that might come up in the foreseeable future.) *Is it just historical accident* (partly due to an awkward move from 1-byte ASCII to 2-byte Unicode, implemented in many different places, in many different ways) *that we now have a patchwork of encodings that we strive to fit into some over-complicated scheme*? Or is there *really* some *fundamental reason* why things *can't* be simpler? (Like, REALLY, _*REALLY*_ simple?) Imageine if we were starting to design the 21st century from scratch, throwing away all the history? How would we go about it? (Maybe I'm just naive, but sometimes ... Out of the mouths of babes and sucklings.) Aaaargh! Do I really have to learn all this mumbo-jumbo?! (Forgive me. :-) ) I would be grateful for any enlightenment - thanks in advance. Rob Cliffe On 05/05/2015 20:21, Nick Coghlan wrote: > On 5 May 2015 at 18:23, Stephen J. Turnbull wrote: >> So this proposal merely amounts to reintroduction of the Python 2 str >> confusion into Python 3. It is dangerous *precisely because* the >> current situation is so frustrating. These functions will not be used >> by "consenting adults", in most cases. Those with sufficient >> knowledge for "informed consent" also know enough to decode encoded >> text ASAP, and encode internal text ALAP, with appropriate handlers, >> in the first place. >> >> Rather, these str2str functions will be used by programmers at the >> ends of their ropes desperate to suppress "those damned Unicode >> errors" by any means available. In fact, they are most likely to be >> used and recommended by *library* writers, because they're the ones >> who are least like to have control over input, or to know their >> clients' requirements for output. "Just use rehandle_* to ameliorate >> the errors" is going to be far too tempting for them to resist. > The primary intended audience is Linux distribution developers using > Python 3 as the system Python. I agree misuse in other contexts is a > risk, but consider assisting the migration of the Linux ecosystem from > Python 2 to Python 3 sufficiently important that it's worth our while > taking that risk. > >> That Nick, of all people, supports this proposal is to me just >> confirmation that it's frustration, and only frustration, speaking >> here. He used to be one of the strongest supporters of keeping >> "native text" (Unicode) and "encoded text" separate by keeping the >> latter in bytes. > It's not frustration (at least, I don't think it is), it's a proposal > for advanced tooling to deal properly with legacy *nix systems that > either: > > a. use a locale encoding other than UTF-8; or > b. don't reliably set the locale encoding for system services and cron > jobs (which anecdotally appears to amount to "aren't using systemd" in > the current crop of *nix init systems) > > If a developer only cares about Windows, Mac OS X, or modern systemd > based *nix systems that use UTF-8 as the system locale, and they never > set "LANG=C" before running a Python program, then these new functions > will be completely irrelevant to them. (I've also submitted a request > to the glibc team to make C.UTF-8 universally available, reducing the > need to use "LANG=C", and they're amenable to the idea, but it > requires someone to work on preparing and submitting a patch: > https://sourceware.org/bugzilla/show_bug.cgi?id=17318) > > If, however, a developer wants to handle "LANG=C", or other non-UTF-8 > locales reliably across the full spectrum of *nix systems in Python 3, > they need a way to cope with system data that they *know* has been > decoded incorrectly by the interpreter, as we'll potentially do > exactly that for environment variables, command line arguments, > stdin/stdout/stderr and more if we get bad locale encoding settings > from the OS (such as when "LANG=C" is specified, or the init system > simply doesn't set a locale at all and hence CPython falls back to the > POSIX default of ASCII). > > Python 2 lets users sweep a lot of that under the rug, as the data at > least round trips within the system, but you get unexpected mojibake > in some cases (especially when taking local data and pushing it out > over the network). > > Since these boundary decoding issues don't arise on properly > configured modern *nix systems, we've been able to take advantage of > that by moving Python 3 towards a more pragmatic and distro-friendly > approach in coping with legacy *nix platforms and behaviours, > primarily by starting to use "surrogateescape" by default on a few > more system interfaces (e.g. on the standard streams when the OS > *claims* that the locale encoding is ASCII, which we now assume to > indicate a configuration error, which we can at least work around for > roundtripping purposes so that "os.listdir()" works reliably at the > interactive prompt). > > This change in approach (heavily influenced by the parallel "Python 3 > as the default system Python" efforts in Ubuntu and Fedora) *has* > moved us back towards an increased risk of introducing mojibake in > legacy environments, but the nature of that trade-off has changed > markedly from the situation back in 2009 (let alone 2006): > > * most popular modern Linux systems use systemd with the UTF-8 locale, > which "just works" from a boundary encoding/decoding perspective (it's > closely akin to the situation we've had on Mac OS X from the dawn of > Python 3) > * even without systemd, most modern *nix systems at least default to > the UTF-8 locale, which works reliably for user processes in the > absence of an explicit setting like "LANG=C", even if service daemons > and cron jobs can be a bit sketchier in terms of the locale settings > they receive > * for legacy environments migrating from Python 2 without upgrading > the underlying OS, our emphasis has shifted to tolerating "bug > compatibility" at the Python level in order to ease migration, as the > most appropriate long term solution for those environments is now to > upgrade their OS such that it more reliably provides correct locale > encoding settings to the Python 3 interpreter (which wasn't a > generally available option back when Python 3 first launched) > > Armin Ronacher (as ever) provides a good explanation of the system > interface problems that can arise in Python 3 with bad locale encoding > settings here: http://click.pocoo.org/4/python3/#python3-surrogates > > In my view, the critical helper function for this purpose is actually > "handle_surrogateescape", as that's the one that lets us readily adapt > from the incorrectly specified ASCII locale encoding to any other > ASCII-compatible system encoding once we've bootstrapped into a full > Python environment which has more options for figuring out a suitable > encoding than just looking at the locale setting provided by the C > runtime. It's also the function that serves to provide the primary > "hook" where we can hang documentation of this platform specific > boundary encoding/decoding issue. > > The other suggested functions are then more about providing a "peek > behind the curtain" API for folks that want to *use Python* to explore > some of the ins and outs of Unicode surrogate handling. Surrogates and > astrals really aren't that complicated, but we've historically hidden > them away as "dark magic not to be understood by mere mortals". In > reality, they're just different ways of composing sequences of > integers to represent text, and the suggested APIs are designed to > expose that in a way we haven't done in the past. I can't actually > think of a practical purpose for them other than teaching people the > basics of how Unicode representations work, but demystifying that > seems sufficiently worthwhile to me that I'm not opposed to their > inclusion (bear in mind I'm also the current "dis" module maintainer, > and a contributor to the "inspect", so I'm a big fan of exposing > underlying concepts like this in a way that lets people play with them > programmatically for learning purposes). > > Cheers, > Nick. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu May 7 04:15:20 2015 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 07 May 2015 03:15:20 +0100 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <554AC2CE.5040705@btinternet.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> Message-ID: <554ACAB8.7010006@mrabarnett.plus.com> On 2015-05-07 02:41, Rob Cliffe wrote: > This is no doubt *not* the best platform to raise these thoughts (which > are nothing to do with Python - apologies), but I'm not sure where else > to go. > I watch discussions like this ... > I watch posts like this one [Nick's] ... > ... And I despair. I really despair. > > I am a very experienced but old (some would say "dinosaur") programmer. > I appreciate the need for Unicode. I really do. > I don't understand Unicode and all its complications AT ALL. > And I can't help wondering: > Why, oh why, do things have to be SO FU*****G COMPLICATED? This > thread, for example, is way over my head. And it is typical of many > discussions I have stared at, uncomprehendingly. > Surely 65536 (2-byte) encodings are enough to express all characters in > all the languages in the world, plus all the special characters we need. > Why can't there be just *ONE* universal encoding? (Decided upon, no > doubt, by some international standards committee. There would surely be > enough spare codes for any special characters etc. that might come up in > the foreseeable future.) > > *Is it just historical accident* (partly due to an awkward move from > 1-byte ASCII to 2-byte Unicode, implemented in many different places, in > many different ways) *that we now have a patchwork of encodings that we > strive to fit into some over-complicated scheme*? > Or is there *really* some *fundamental reason* why things *can't* be > simpler? (Like, REALLY, _*REALLY*_ simple?) > Imageine if we were starting to design the 21st century from scratch, > throwing away all the history? How would we go about it? > (Maybe I'm just naive, but sometimes ... Out of the mouths of babes and > sucklings.) > Aaaargh! Do I really have to learn all this mumbo-jumbo?! (Forgive me. > :-) ) > I would be grateful for any enlightenment - thanks in advance. > Rob Cliffe > When Unicode first came out, they thought that 65536 would be enough. When Java was released, for example, it used 16 bits per codepoint. Simple. But it turned out that it wasn't enough. People have been too inventive over thousands of years! There's the matter of accents and other diacritics. Some languages want to add marks to the letters to indicate a different pronunciation, stress, tone, whatever (a character might need more than one!). Having a separate code for each combination would lead to an _lot_ of codes, so a better solution is to add codes that can combine with the base character when displayed. And then there's the matter of writing direction. Some languages go left-to-right, others right-to-left. So, you think it's complicated? Don't blame Unicode, it's just trying to cope with a very messy problem. From mistersheik at gmail.com Thu May 7 04:05:15 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 6 May 2015 19:05:15 -0700 (PDT) Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? Message-ID: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> Since strings are constant, wouldn't it be much faster to implement string slices as a view of other strings? For clarity, I'm talking about CPython. I'm not talking about anything the user sees. The string views would still look like regular str instances to the user. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu May 7 05:56:21 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 May 2015 13:56:21 +1000 Subject: [Python-ideas] Add `Executor.filter` In-Reply-To: References: Message-ID: On 2 May 2015 at 19:25, Ram Rachum wrote: > Okay, I implemented it. Might be getting something wrong because I've never > worked with the internals of this module before. I think this is sufficiently tricky to get right that it's worth adding filter() as a parallel to the existing map() API. However, it did raise a separate question for me: is it currently possible to use Executor.map() and the as_completed() module level function together? Unless I'm missing something, it doesn't look like it, as map() hides the futures from the caller, so you only have something to pass to as_completed() if you invoke submit() directly. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu May 7 06:07:17 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 May 2015 14:07:17 +1000 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: References: <554A1F8C.1040005@thomas-guettler.de> Message-ID: On 7 May 2015 at 01:11, Chris Angelico wrote: > On Thu, May 7, 2015 at 12:05 AM, Thomas G?ttler > wrote: >> We run a custom sub class of list in sys.path. We set it in sitecustomize.py >> >> This instance get replace by a common list in lines like this: >> >> sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path > > Forgive the obtuse question, but wouldn't an __radd__ method resolve > this for you? If the custom subclass is implemented in Python or otherwise implements the C level nb_add slot, yes, if it's implemented in C and only provides sq_concat without nb_add, no (courtesy of http://bugs.python.org/issue11477, which gets the operand precedence dance wrong for sequence types that only implement the sequence methods and not the corresponding numeric ones) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Thu May 7 06:13:05 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 7 May 2015 14:13:05 +1000 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: References: <554A1F8C.1040005@thomas-guettler.de> Message-ID: On Thu, May 7, 2015 at 2:07 PM, Nick Coghlan wrote: > On 7 May 2015 at 01:11, Chris Angelico wrote: >> On Thu, May 7, 2015 at 12:05 AM, Thomas G?ttler >> wrote: >>> We run a custom sub class of list in sys.path. We set it in sitecustomize.py >>> >>> This instance get replace by a common list in lines like this: >>> >>> sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path >> >> Forgive the obtuse question, but wouldn't an __radd__ method resolve >> this for you? > > If the custom subclass is implemented in Python or otherwise > implements the C level nb_add slot, yes, if it's implemented in C and > only provides sq_concat without nb_add, no (courtesy of > http://bugs.python.org/issue11477, which gets the operand precedence > dance wrong for sequence types that only implement the sequence > methods and not the corresponding numeric ones) Okay, so it mightn't be quite as simple as I thought, but it should still be in the control of the author of the subclass, right? That ought to be easier than trying to stop everyone else from mutating sys.path. ChrisA From ncoghlan at gmail.com Thu May 7 06:22:55 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 May 2015 14:22:55 +1000 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: On 7 May 2015 at 02:23, Eric Snow wrote: > A big blocker to making certain sweeping changes to CPython (e.g. > ref-counting) is compatibility with the vast body of C extension > modules out there that use the C-API. While there are certainly > drastic long-term solutions to that problem, there is one thing we can > do in the short-term that would at least get the ball rolling. We can > put a big red note at the top of every page of the C-API docs that > encourages folks to either use CFFI or Cython. > > Thoughts? Rather than embedding these recommendations directly in the version specific CPython docs, I'd prefer to see contributions to fill in the incomplete sections in https://packaging.python.org/en/latest/extensions.html with links back to the relevant parts of the C API documentation and docs for other projects (I was able to write the current overview section on that page in a few hours, as I didn't need to do much research for that, but filling in the other sections properly involves significantly more work). That page is already linked from the landing page for the extending & embedding documentation as part of a recommendation to consider the use of third party tools rather than handcrafting your own extension modules: https://docs.python.org/3/extending/index.html#recommended-third-party-tools The landing page for the C API docs links back to the extending & embedding guide, but the link is embedded in the header paragraph rather than being a See Also link: https://docs.python.org/3/c-api/index.html Cheers, Nick. > > -eric > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu May 7 07:27:14 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 May 2015 15:27:14 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <554AC2CE.5040705@btinternet.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> Message-ID: On 7 May 2015 at 11:41, Rob Cliffe wrote: > Or is there really some fundamental reason why things can't be simpler? > (Like, REALLY, REALLY simple?) Yep, there are around 7 billion fundamental reasons currently alive, and I have no idea how many that have gone before us: humans :) Unicode is currently messy and complicated because human written communication is messy and complicated, and that inherent complexity didn't go anywhere once we started networking our computers together and digitising our historical records. Early versions of Unicode attempted to simplify things by only considering dictionary words in major living languages (which got them under 65k characters), but folks in Asia and elsewhere were understandably upset when the designers attempted to explain why it was OK for a "universal" encoding to not be able to correctly represent the names of people and places, while archivists and historical researchers were similarly unimpressed when the designers tried to explain why their "universal" encoding didn't adequately cover texts that were more than a few decades old. Breaking down the walls between historically silo'ed communications networks then made things even more complicated, as historical proprietary encodings from different telco networks needed to be mapped to the global standard (this last process is a large part of where the assortment of emoji characters in Unicode comes from). However, most of the messiness and complexity in the digital realm actually arises at the boundary between Unicode and *other encodings*. That's why the fact that POSIX still uses ASCII as the default encoding is such a pain, and why Apple instead unilaterally declared that "everything shall be UTF-8" for Mac OS X, while Microsoft and Java eventually settled on new UTF-16 APIs. We can't even assume ASCII compatibility in general, as codecs like Shift-JIS, ISO-2022 and various other East Asian codecs date from an era where international network connectivity simply wasn't a problem encoding designers needed to worry about, so solving *local* computing problems was a much larger concern than compatibility with DARPA's then nascent internet protocols. I wrote an article attempting to summarise some of that history last year: http://developerblog.redhat.com/2014/09/09/transition-to-multilingual-programming-python/ And gave a presentation about it at Australia's OSDC 2014 that connected some of the dots even further back in history: https://www.youtube.com/watch?v=xOadSc69Hrw (I also just noticed my notes for the latter aren't currently online, which is an oversight I'll aim to fix before too long). As things stand, one suggestion I make to folks truly trying to understand why we need Unicode (with all its complexity), is to attempt to learn a foreign language that *doesn't use a latin based script*. My own Japanese is atrociously bad, but it's good enough that I can appreciate just how Anglo-centric most programming languages (including Python) are. I'm also fully cognizant of the fact that as bad as my written and spoken Japanese are, my ability to enter Japanese text into a computer is entirely non-existent. > Imageine if we were starting to design the 21st century from scratch, > throwing away all the history? How would we go about it? We'd invite Japanese, Chinese, Indian, African, etc developers to get involved in the design process much earlier than we did. Ideally back when the Western Union telegraph was first being designed, as the consequences of some of those original binary encoding design choices are still felt today :) http://utf8everywhere.org/ makes the case that the closest we have to that today is UTF-8 + streaming compression, and it's a fairly compelling story. However, it's premised on a world where string processing algorithms are all written to be UTF-8 aware, when a lot of them, including those used in the Python standard library, were in fact written assuming fixed width encodings. Hence the Python 3.3 flexible string representation model, where string internal storage is sized according to the largest code point, and you need to use StringIO if you want to avoid having a single higher plane code point significantly increase the memory consumption of your string. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From benjamin at python.org Thu May 7 07:37:33 2015 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 7 May 2015 05:37:33 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Why_don=27t_CPython_strings_implement_sl?= =?utf-8?q?icing_using_a=09view=3F?= References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> Message-ID: Neil Girdhar writes: > > Since strings are constant, wouldn't it be much faster to implement string slices as a view of other strings? Maybe for some workloads, but you can end up keeping a large string alive and taking up memory with such an approach. From ncoghlan at gmail.com Thu May 7 07:55:07 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 May 2015 15:55:07 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> Message-ID: On 7 May 2015 at 15:27, Nick Coghlan wrote: > On 7 May 2015 at 11:41, Rob Cliffe wrote: >> Or is there really some fundamental reason why things can't be simpler? >> (Like, REALLY, REALLY simple?) > > Yep, there are around 7 billion fundamental reasons currently alive, > and I have no idea how many that have gone before us: humans :) Heh, a message from Stephen off-list made me realise that an info dump of all the reasons the edge cases are hard probably wasn't a good way to answer your question :) What "we're" working towards (where "we" ~= the Unicode consortium + operating system designers + programming language designers) is a world where everything "just works", and computers talk to humans in each human's preferred language (or a collection of languages, depending on what the human is doing), and to each other in Unicode. There are then a whole host of technical and political reasons why it's taking decades to get from the historical point A (where computers talk to humans in at most one language at a time, and don't talk to each other at all) to that desired point B. We'll know we're done with that transition when Unicode becomes almost transparently invisible, and the vast majority of programmers are once again able to just deal with "text" without worrying too much about how it's represented internally (but also having their programs be readily usable in language's other than their own). Python 3 is already a lot closer to that ideal than Python 2 was, but there are still some rough edges to iron out. The ones I'm personally aware of affecting 3.4+ (including the one Serhiy started this thread about) are listed as dependencies of http://bugs.python.org/issue22555 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guettliml at thomas-guettler.de Thu May 7 08:00:09 2015 From: guettliml at thomas-guettler.de (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=) Date: Thu, 07 May 2015 08:00:09 +0200 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: References: <554A1F8C.1040005@thomas-guettler.de> Message-ID: <554AFF69.9050404@thomas-guettler.de> Am 06.05.2015 um 17:07 schrieb Paul Moore: > On 6 May 2015 at 15:05, Thomas G?ttler wrote: >> I am missing a policy how sys.path should be altered. > > Well, the docs say that applications can modify sys.path as needed. > Generally, applications modify sys.path in place via sys.path[:] = > whatever, but that's not mandated as far as I know. > >> We run a custom sub class of list in sys.path. We set it in sitecustomize.py > > Can you explain why? I forgot to explain the why I use a custom class. Sorry, here is the background. I want sys.path to ordered: 1. virtualenv 2. /usr/local/ 3. /usr/lib We use virtualenvs with system site-packages. There are many places where sys.path gets altered. The last time we had sys.path problems I tried to write a test which checks that sys.path is the same for cron jobs and web requests. I failed. Too many places, I could not find all the places and the conditions where sys.path got modified in a different way. > It seems pretty risky to expect that no > applications will replace sys.path. I understand that you're proposing > that we say that applications shouldn't do that - but just saying so > won't change the many applications already out there. Of course I know that if we agree on a policy, it wont' change existing code in one second. But if there is an official policy, you are able to write bug reports like this "Please alter sys.path according to the docs. See http://www.python.org/...." The next thing: If someone wants to add to sys.path, most of the time the developer inserts its new entries in the front of the list. This can break the ordering if you don't use a custom list class. >> This instance get replace by a common list in lines like this: >> >> sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path >> >> The above line is from pip, it similar things happen in a lot of packages. > > How does the fact that pip does that cause a problem? The sys.path > modification is only in effect while pip is running, and no code in > pip relies on sys.path being an instance of your custom class. pip is a special case, since the pip authors say "we don't provide an API". But they have handy methods which we want to use. We use "import pip" and the class of sys.path of our application gets altered. >> Before trying to solve this with code, I think the python community should >> agree an a policy for altering sys.path. > > I can't imagine that happening, and even if it does, it won't make any > difference because a new policy won't change existing code. It won't > even affect new code unless people know about it (which isn't certain > - I doubt many people read the documentation that closely). Code updates will happen step by step. If someone has a problem, since his custom list class in sys.path gets altered, he will write a bug report to the maintainer. A bug report referencing official python docs has more weight. >> What can I do to this done? > > I doubt you can. > > A PR for pip that changes the above line to modify sys.path in place > would probably get accepted (I can't see any reason why it wouldn't), > and I guess you could do the same for any other code you find. But as > for persuading the Python programming community not to replace > sys.path in any code, that seems unlikely to happen. > >> We use Python 2.7 > > If you were using 3.x, then it's (barely) conceivable that making > sys.path read-only (so people could only modify it in-place) could be > done as a new feature, but (a) it would be a major backward > compatibility break, so there would have to be a strong justification, > and (b) it would stop you from replacing sys.path with your custom > class in the first place, so it wouldn't solve your issue. > > Which also raises the question, why do you believe it's OK to forbid > other people to replace sys.path when that's what you're doing in your > sitecustomize code? That seems self-contradictory... Yes, you are right this looks self-contradictory. I am the one which is responsible for the set up of the environment. Where is the best place during the interpreter initialization for altering the class of sys.path? I guess it is sitecustomize. After it was executed sys.path should be altered only in-place. Regards, Thomas G?ttler -- http://www.thomas-guettler.de/ From ram at rachum.com Thu May 7 08:02:25 2015 From: ram at rachum.com (Ram Rachum) Date: Thu, 7 May 2015 09:02:25 +0300 Subject: [Python-ideas] Add `Executor.filter` In-Reply-To: References: Message-ID: Funny, I suggested these 2 in the past: https://groups.google.com/forum/m/#!searchin/python-ideas/map_as_completed/python-ideas/ VZBdUbYcQjg https://groups.google.com/forum/m/#!searchin/python-ideas/as_completed/python-ideas/yGADxChihhk Sent from my phone. On 2 May 2015 at 19:25, Ram Rachum wrote: > Okay, I implemented it. Might be getting something wrong because I've never > worked with the internals of this module before. I think this is sufficiently tricky to get right that it's worth adding filter() as a parallel to the existing map() API. However, it did raise a separate question for me: is it currently possible to use Executor.map() and the as_completed() module level function together? Unless I'm missing something, it doesn't look like it, as map() hides the futures from the caller, so you only have something to pass to as_completed() if you invoke submit() directly. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu May 7 08:10:46 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 07 May 2015 02:10:46 -0400 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> Message-ID: On 5/6/2015 10:05 PM, Neil Girdhar wrote: > Since strings are constant, wouldn't it be much faster to implement > string slices as a view of other strings? > > For clarity, I'm talking about CPython. I'm not talking about anything > the user sees. The string views would still look like regular str > instances to the user. The idea has been discussed and rejected. See pydev thread 'The "lazy strings" patch", Oct 2006, for one example. I think the best solution is a separate Seqview class. On the thread above, Josiah Carlson pointed out that he had made such a class that worked with multiple Python versions *and* with any sequence class. The only computation involved is addition of start values to indexes when accessing the underlying object, and that is not specific to strings. There might be something on PyPI already, but PyPI cannot search for compounds such as "string view" (or "lazy string"). The three dict view classes are, obviouly, separate classes. They happen to be created with dict methods. But that is partly for historical reasons -- the methods already existed but returned lists in 2.x. The views were only a change in the output class (and the removal of arbitrary order). The API could have been dict_keys(somedict), with 'dict_keys' a builtin name. So there is nothing actually wrong with Seqview(seq, start, stop, step=1). -- Terry Jan Reedy From ncoghlan at gmail.com Thu May 7 08:22:15 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 May 2015 16:22:15 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <4D8FF17C-1D0B-42C8-A55F-0479A652321F@yahoo.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <4D8FF17C-1D0B-42C8-A55F-0479A652321F@yahoo.com> Message-ID: On 6 May 2015 at 14:00, Andrew Barnert wrote: > It seems like launchd systems are as good as systemd systems here. Or are you not considering OS X a *nix? > > I suppose given than the timeline for Apple to switch to Python 3 as the default Python is "maybe it'll happen, but we'll never tell you until a month before the public beta", it isn't really all that relevant... We don't look at the locale encoding at all when it comes to system interfaces on Mac OS X - CPython is hardcoded to use UTF-8 instead. While Apple's tight control over their ecosystem alienates me as a consumer, it certainly has its advantages as a developer :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu May 7 08:47:19 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 May 2015 16:47:19 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 6 May 2015 at 17:56, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > The other suggested functions are then more about providing a "peek > > behind the curtain" API for folks that want to *use Python* to explore > > some of the ins and outs of Unicode surrogate handling. > > I just don't see a need. .encode and .decode already give you all the > tools you need for exploring, and they do so in a way that tells you > via the type whether you're looking at abstract text or at the > representation. It doesn't get better than this! > > And if the APIs merely exposed the internal representation that would > be one thing. But they don't, and the people who are saying, "I'm not > an expert on Unicode but this looks great!" are clearly interested in > mutating str instances to be something more palatable to the requisite > modules and I/O systems they need to use, but which aren't prepared for > astral characters or proper handling of surrogateescapes. > > > I can't actually think of a practical purpose for them other than > > teaching people the basics of how Unicode representations work, > > I agree, but it seems to me that a lot of people are already scheming > to use them for practical purposes. Serhiy mentions tkinter, email, > and wsgiref, and David lusts after them for email. While I personally care about the OS boundary case, that's not the only "the metadata cannot be fully trusted" case that comes up (and yes, I know I'm contradicting what I posted yesterday - I hadn't reread the issue tracker thread at that point, so I'd forgotten the cases the others had mentioned, and hadn't even fully reloaded my own rationale for wanting the feature back into my brain). The key operation to be supported by the proposed APIs is to allow a piece of code to interrogate a string object to ask: "Was this string permissively decoded *and* did that process leave some invalid code points in the string?". Essentially, it's designed to cover the cases where the interpreter (or someone else) is using the "surrogateescape" or "surrogatepass" error handler when decoding some input data to text (I don't believe the interpreter defaults to using surrogatepass anywhere, but we do use surrogateescape in several places). If your code has direct control over the decoding step, you don't need anything new to deal with this appropriately, as you can just change the error handling mode to "strict" and be done with it. However, if you *don't* have control over the decoding step, then a) you can't switch the decoding step to a different error handler (as that's not happening in your code); and b) you don't necessarily know what the assumed encoding was, so your best guess is going to be "hopefully something ASCII compatible", which is going to introduce all kinds of other complexity as you have to start considering what happens for code points outside the surrogate area if you do an encode()/decode() dance in order to apply a different error handler to the smuggled surrogates. Hence the rehandle_surrogatepass() and rehandle_surrogateescape() methods: by default, they will both *throw an exception* if there is improperly decoded data in the input, as they apply the "strict" input error handler instead of whichever one was actually used. This lets you control where such errors are detected (e.g. at the point where the string is first given to your code), rather than having it happen implicitly later when you attempt to encode those strings to bytes. rehandle_surrogateescape() also has the virtue of scanning the supplied string for *other* lone surrogates (created via surrogatepass) and *always* complaining about them (again, at a point you choose, rather than happening unexpectedly elsewhere in the code, often as part of an IO operation). The "errors" argument is then designed to let you apply an arbitrary *input* error handler to surrogates that were originally let through by "surrogatepass" or "surrogateescape" (again, the assumption here is that you don't control the code that did the original decoding). If you decide to throw that improperly decoded data away entirely, you may use "replace" or "ignore" to clean it out. Alternatively, you may use "backslashreplace" (which is now usable on decoding as well as on encoding) to replace the unknown bytes with their hexadecimal representation. Regardless of which specific approach you take, handling surrogates explicitly when a string is passed to you from an API that uses permissive decoding lets you avoid both unexpected UnicodeEncodeError exceptions (if the surrogates end up being encoded with an error handler other than surrogatepass or surrogateescape) or propagating mojibake (if the surrogates are encoded with a suitable error handler, but an encoding that differs from the original). As far as "handle_astrals()" and friends go, I previously suggested on the issue that they could potentially be considered as a separate RFE, as their practical applicability is likely to be limited to cases where you need to deal with a UCS-2 (note: *not* UTF-16) API for some reason. I think they highlight any interesting aspect of what surrogate and astral code points *are*, but they don't have the same input validation use case that rehandle_surrogatepass and rehandle_surrogateescape do. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From me at the-compiler.org Thu May 7 08:48:36 2015 From: me at the-compiler.org (Florian Bruhin) Date: Thu, 7 May 2015 08:48:36 +0200 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: <554AFF69.9050404@thomas-guettler.de> References: <554A1F8C.1040005@thomas-guettler.de> <554AFF69.9050404@thomas-guettler.de> Message-ID: <20150507064836.GR429@tonks> * Thomas G?ttler [2015-05-07 08:00:09 +0200]: > Am 06.05.2015 um 17:07 schrieb Paul Moore: > > On 6 May 2015 at 15:05, Thomas G?ttler wrote: > >> I am missing a policy how sys.path should be altered. > > > > Well, the docs say that applications can modify sys.path as needed. > > Generally, applications modify sys.path in place via sys.path[:] = > > whatever, but that's not mandated as far as I know. > > > >> We run a custom sub class of list in sys.path. We set it in sitecustomize.py > > > > Can you explain why? > > I forgot to explain the why I use a custom class. Sorry, here is the background. > > I want sys.path to ordered: > > 1. virtualenv > 2. /usr/local/ > 3. /usr/lib > > We use virtualenvs with system site-packages. > > There are many places where sys.path gets altered. > > The last time we had sys.path problems I tried to write a test > which checks that sys.path is the same for cron jobs and web requests. > I failed. Too many places, I could not find all the places > and the conditions where sys.path got modified in a different way. It looks like you explained *how* you do what you do, but not *why* - what problem is this solving? Why can't you just invoke the virtualenv's python and let python take care of sys.path? $ ./venv/bin/python -c 'import sys; from pprint import pprint; pprint(sys.path)' ['', '/home/user/venv/lib/python2.7', '/home/user/venv/lib/python2.7/plat-x86_64-linux-gnu', '/home/user/venv/lib/python2.7/lib-tk', '/home/user/venv/lib/python2.7/lib-old', '/home/user/venv/lib/python2.7/lib-dynload', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/home/user/venv/local/lib/python2.7/site-packages', '/home/user/venv/lib/python2.7/site-packages'] Florian -- http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP) GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc I love long mails! | http://email.is-not-s.ms/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From robertc at robertcollins.net Thu May 7 08:54:02 2015 From: robertc at robertcollins.net (Robert Collins) Date: Thu, 7 May 2015 18:54:02 +1200 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: <554AFF69.9050404@thomas-guettler.de> References: <554A1F8C.1040005@thomas-guettler.de> <554AFF69.9050404@thomas-guettler.de> Message-ID: On 7 May 2015 at 18:00, Thomas G?ttler wrote: > Am 06.05.2015 um 17:07 schrieb Paul Moore: > pip is a special case, since the pip authors say "we don't provide an API". > But they have handy methods which we want to use. We use "import pip" > and the class of sys.path of our application gets altered. Submit a PR to move the sys.path changes into something triggered by the CLI entrypoint rather than an import side effect. I see no in-principle issue with that. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud From guettliml at thomas-guettler.de Thu May 7 08:59:10 2015 From: guettliml at thomas-guettler.de (=?windows-1252?Q?Thomas_G=FCttler?=) Date: Thu, 07 May 2015 08:59:10 +0200 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: <20150507064836.GR429@tonks> References: <554A1F8C.1040005@thomas-guettler.de> <554AFF69.9050404@thomas-guettler.de> <20150507064836.GR429@tonks> Message-ID: <554B0D3E.9020708@thomas-guettler.de> Am 07.05.2015 um 08:48 schrieb Florian Bruhin: > * Thomas G?ttler [2015-05-07 08:00:09 +0200]: >> Am 06.05.2015 um 17:07 schrieb Paul Moore: >>> On 6 May 2015 at 15:05, Thomas G?ttler wrote: >>>> I am missing a policy how sys.path should be altered. >>> >>> Well, the docs say that applications can modify sys.path as needed. >>> Generally, applications modify sys.path in place via sys.path[:] = >>> whatever, but that's not mandated as far as I know. >>> >>>> We run a custom sub class of list in sys.path. We set it in sitecustomize.py >>> >>> Can you explain why? >> >> I forgot to explain the why I use a custom class. Sorry, here is the background. >> >> I want sys.path to ordered: >> >> 1. virtualenv >> 2. /usr/local/ >> 3. /usr/lib >> >> We use virtualenvs with system site-packages. >> >> There are many places where sys.path gets altered. >> >> The last time we had sys.path problems I tried to write a test >> which checks that sys.path is the same for cron jobs and web requests. >> I failed. Too many places, I could not find all the places >> and the conditions where sys.path got modified in a different way. > > It looks like you explained *how* you do what you do, but not *why* - > what problem is this solving? Why can't you just invoke the > virtualenv's python and let python take care of sys.path? I want the sys.path be ordered like it, since I want that packages of the inner environment are tried first. Here "inner" means "upper" in the above sys.path order. Example: If a package is installed in the virtualenv with version 2.2 and in global site packages with version 1.0, then I want the interpreter to use the version from virtualenv. Does this explain the *why* enough? If not, please tell me what you want to know. Regards, Thomas G?ttler From abarnert at yahoo.com Thu May 7 08:58:42 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 6 May 2015 23:58:42 -0700 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: <40B94090-3EA2-4970-BA48-81A18A61951B@yahoo.com> On May 6, 2015, at 15:00, Eric Snow wrote: > >> On Wed, May 6, 2015 at 1:59 PM, Andrew Barnert wrote: >>> On May 6, 2015, at 09:23, Eric Snow wrote: >>> >>> A big blocker to making certain sweeping changes to CPython (e.g. >>> ref-counting) is compatibility with the vast body of C extension >>> modules out there that use the C-API. While there are certainly >>> drastic long-term solutions to that problem, there is one thing we can >>> do in the short-term that would at least get the ball rolling. We can >>> put a big red note at the top of every page of the C-API docs that >>> encourages folks to either use CFFI or Cython. >> >> Does this mean you also want to discourage boost::python, SIP, SWIG, etc., which as far as I know come down to automatically building C API extensions, and would need to be completely rewritten if you wanted to make them work a different way? > > Not really. I mentioned CFFI and Cython specifically because they are > the two that kept coming up in previous discussions related to > discouraging use of the C-API. If C extensions were always generated > using tools, then only tools would have to adapt to (drastic) changes > in the C-API. That would be a much better situation than the status > quo since it drastically reduces the impact of changes. OK, that makes sense to me. Even if there are a dozen wrappers and wrapper generators (and I think it's more like 4 or 5...), and we had to get buy-in from all of them (or get buy-in from most of them and reluctantly decide to screw over the last one), that's still orders of magnitude easier than getting buy-in from (or screw over) the 69105 people who are currently maintaining or building a C API extension, so it's still a huge win. I'm not sure it would do nearly enough, at least not for a long time (how many of the current top 100 projects on PyPI use C API extensions and would be non-trivial to rewrite?), but obviously you can make the point that if we don't do anything, we'll _never_ get there. From me at the-compiler.org Thu May 7 09:22:23 2015 From: me at the-compiler.org (Florian Bruhin) Date: Thu, 7 May 2015 09:22:23 +0200 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: <554B0D3E.9020708@thomas-guettler.de> References: <554A1F8C.1040005@thomas-guettler.de> <554AFF69.9050404@thomas-guettler.de> <20150507064836.GR429@tonks> <554B0D3E.9020708@thomas-guettler.de> Message-ID: <20150507072223.GS429@tonks> * Thomas G?ttler [2015-05-07 08:59:10 +0200]: > > > Am 07.05.2015 um 08:48 schrieb Florian Bruhin: > >* Thomas G?ttler [2015-05-07 08:00:09 +0200]: > >>Am 06.05.2015 um 17:07 schrieb Paul Moore: > >>>On 6 May 2015 at 15:05, Thomas G?ttler wrote: > >>>>I am missing a policy how sys.path should be altered. > >>> > >>>Well, the docs say that applications can modify sys.path as needed. > >>>Generally, applications modify sys.path in place via sys.path[:] = > >>>whatever, but that's not mandated as far as I know. > >>> > >>>>We run a custom sub class of list in sys.path. We set it in sitecustomize.py > >>> > >>>Can you explain why? > >> > >>I forgot to explain the why I use a custom class. Sorry, here is the background. > >> > >>I want sys.path to ordered: > >> > >> 1. virtualenv > >> 2. /usr/local/ > >> 3. /usr/lib > >> > >>We use virtualenvs with system site-packages. > >> > >>There are many places where sys.path gets altered. > >> > >>The last time we had sys.path problems I tried to write a test > >>which checks that sys.path is the same for cron jobs and web requests. > >>I failed. Too many places, I could not find all the places > >>and the conditions where sys.path got modified in a different way. > > > >It looks like you explained *how* you do what you do, but not *why* - > >what problem is this solving? Why can't you just invoke the > >virtualenv's python and let python take care of sys.path? > > I want the sys.path be ordered like it, since I want that packages of the inner > environment are tried first. > > Here "inner" means "upper" in the above sys.path order. > > Example: If a package is installed in the virtualenv with version 2.2 and > in global site packages with version 1.0, then I want the interpreter to > use the version from virtualenv. That's already the default virtualenv behaviour: # apt-get install python-requests [...] Unpacking python-requests (2.4.3-6) ... $ ./venv/bin/pip install requests [...] Downloading requests-2.7.0-py2.py3-none-any.whl (470kB): 470kB downloaded $ python -c 'import requests; print requests.__version__' 2.4.3 $ ./venv/bin/python -c 'import requests; print requests.__version__' 2.7.0 > Does this explain the *why* enough? If not, please tell me what you want to know. I'm mainly trying to find out why you're modifying sys.path by hand instead of using what virtualenv already provides. There might be a good reason for that, but to me it seems like you're reinventing the wheel ;) Florian -- http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP) GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc I love long mails! | http://email.is-not-s.ms/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From abarnert at yahoo.com Thu May 7 09:24:11 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 7 May 2015 00:24:11 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <554AC2CE.5040705@btinternet.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> Message-ID: <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> On May 6, 2015, at 18:41, Rob Cliffe wrote: > > This is no doubt not the best platform to raise these thoughts (which are nothing to do with Python - apologies), but I'm not sure where else to go. > I watch discussions like this ... > I watch posts like this one [Nick's] ... > ... And I despair. I really despair. > > I am a very experienced but old (some would say "dinosaur") programmer. > I appreciate the need for Unicode. I really do. > I don't understand Unicode and all its complications AT ALL. > And I can't help wondering: > Why, oh why, do things have to be SO FU*****G COMPLICATED? This thread, for example, is way over my head. And it is typical of many discussions I have stared at, uncomprehendingly. > Surely 65536 (2-byte) encodings are enough to express all characters in all the languages in the world, plus all the special characters we need. Ironically, that idea is exactly why there are problems even within the "all-Unicode" world where cp1252 and Big5 and Shift-JIS don't exist. Apple, Microsoft, Sun, and a few other vendors jumped on the Unicode bandwagon early and committed themselves to the idea that 2 bytes is enough for everything. When the world discovered that wasn't true, we were stuck with a bunch of APIs that insisted on 2 bytes. Apple was able to partly make a break with that era, but Windows and Java are completely stuck with "Unicode means 16-bit" forever, which is why the whole world is stuck dealing with UTF-16 and surrogates forever. > Why can't there be just ONE universal encoding? There is, UTF-8. Except sometimes you have algorithms that require fixed width, so you need UTF-32. And Java and Windows need UTF-16. And a few Internet protocols need UTF-7. And DNS needs a sort-of-UTF-5 called IDNA. At least everything else can die. Once every document stored in an old IBM code page or similar gets transliterated or goes away. Unfortunately, there are still people creating cp1252 documents every day on brand-new Windows desktops (and there are still people creating filenames on Latin-1 filesystems on older Linux and Unix boxes, but that's dying out a lot faster), so who knows when that day will come. Python can't force it. Even the Unicode committee can't force it (especially since Microsoft is one of the most active members). > (Decided upon, no doubt, by some international standards committee. There would surely be enough spare codes for any special characters etc. that might come up in the foreseeable future.) > > Is it just historical accident (partly due to an awkward move from 1-byte ASCII to 2-byte Unicode, implemented in many different places, in many different ways) that we now have a patchwork of encodings that we strive to fit into some over-complicated scheme? UTF-16 is a historical accident, and UTF-7 and IDNA. And all of the non-Unicode encodings, even more so. > Or is there really some fundamental reason why things can't be simpler? (Like, REALLY, REALLY simple?) We really do need at least UTF-8 and UTF-32. But that's it. And I think that's simple enough. > Imageine if we were starting to design the 21st century from scratch, throwing away all the history? How would we go about it? If we could start over with a clean slate today, I'm pretty sure we would have just one character set, Unicode, and two encodings, UTF-8 and UTF-32, and everyone would be happy (except for a small group in Japan who insist TRON's text model is better, but we can ignore them). In particular, this would mean that in Python, a bytes is either UTF-8, or not text. No need to specify codecs or error handlers, no surrogates (and definitely no surrogate escapes), etc. Plus, we'd have no daylight savings time, no changing timezone boundaries, seamless PyPI failovers, sensible drug laws, cars that run forever using garbage as fuel, no war, no crime, and Netflix would never remove a season when you're on episode 11 out of 13. (Unfortunately, we would still have perl. I don't know why, but I know we would.) > (Maybe I'm just naive, but sometimes ... Out of the mouths of babes and sucklings.) > Aaaargh! Do I really have to learn all this mumbo-jumbo?! (Forgive me. :-) ) > I would be grateful for any enlightenment - thanks in advance. > Rob Cliffe > > >> On 05/05/2015 20:21, Nick Coghlan wrote: >>> On 5 May 2015 at 18:23, Stephen J. Turnbull wrote: >>> So this proposal merely amounts to reintroduction of the Python 2 str >>> confusion into Python 3. It is dangerous *precisely because* the >>> current situation is so frustrating. These functions will not be used >>> by "consenting adults", in most cases. Those with sufficient >>> knowledge for "informed consent" also know enough to decode encoded >>> text ASAP, and encode internal text ALAP, with appropriate handlers, >>> in the first place. >>> >>> Rather, these str2str functions will be used by programmers at the >>> ends of their ropes desperate to suppress "those damned Unicode >>> errors" by any means available. In fact, they are most likely to be >>> used and recommended by *library* writers, because they're the ones >>> who are least like to have control over input, or to know their >>> clients' requirements for output. "Just use rehandle_* to ameliorate >>> the errors" is going to be far too tempting for them to resist. >> The primary intended audience is Linux distribution developers using >> Python 3 as the system Python. I agree misuse in other contexts is a >> risk, but consider assisting the migration of the Linux ecosystem from >> Python 2 to Python 3 sufficiently important that it's worth our while >> taking that risk. >> >>> That Nick, of all people, supports this proposal is to me just >>> confirmation that it's frustration, and only frustration, speaking >>> here. He used to be one of the strongest supporters of keeping >>> "native text" (Unicode) and "encoded text" separate by keeping the >>> latter in bytes. >> It's not frustration (at least, I don't think it is), it's a proposal >> for advanced tooling to deal properly with legacy *nix systems that >> either: >> >> a. use a locale encoding other than UTF-8; or >> b. don't reliably set the locale encoding for system services and cron >> jobs (which anecdotally appears to amount to "aren't using systemd" in >> the current crop of *nix init systems) >> >> If a developer only cares about Windows, Mac OS X, or modern systemd >> based *nix systems that use UTF-8 as the system locale, and they never >> set "LANG=C" before running a Python program, then these new functions >> will be completely irrelevant to them. (I've also submitted a request >> to the glibc team to make C.UTF-8 universally available, reducing the >> need to use "LANG=C", and they're amenable to the idea, but it >> requires someone to work on preparing and submitting a patch: >> https://sourceware.org/bugzilla/show_bug.cgi?id=17318) >> >> If, however, a developer wants to handle "LANG=C", or other non-UTF-8 >> locales reliably across the full spectrum of *nix systems in Python 3, >> they need a way to cope with system data that they *know* has been >> decoded incorrectly by the interpreter, as we'll potentially do >> exactly that for environment variables, command line arguments, >> stdin/stdout/stderr and more if we get bad locale encoding settings >> from the OS (such as when "LANG=C" is specified, or the init system >> simply doesn't set a locale at all and hence CPython falls back to the >> POSIX default of ASCII). >> >> Python 2 lets users sweep a lot of that under the rug, as the data at >> least round trips within the system, but you get unexpected mojibake >> in some cases (especially when taking local data and pushing it out >> over the network). >> >> Since these boundary decoding issues don't arise on properly >> configured modern *nix systems, we've been able to take advantage of >> that by moving Python 3 towards a more pragmatic and distro-friendly >> approach in coping with legacy *nix platforms and behaviours, >> primarily by starting to use "surrogateescape" by default on a few >> more system interfaces (e.g. on the standard streams when the OS >> *claims* that the locale encoding is ASCII, which we now assume to >> indicate a configuration error, which we can at least work around for >> roundtripping purposes so that "os.listdir()" works reliably at the >> interactive prompt). >> >> This change in approach (heavily influenced by the parallel "Python 3 >> as the default system Python" efforts in Ubuntu and Fedora) *has* >> moved us back towards an increased risk of introducing mojibake in >> legacy environments, but the nature of that trade-off has changed >> markedly from the situation back in 2009 (let alone 2006): >> >> * most popular modern Linux systems use systemd with the UTF-8 locale, >> which "just works" from a boundary encoding/decoding perspective (it's >> closely akin to the situation we've had on Mac OS X from the dawn of >> Python 3) >> * even without systemd, most modern *nix systems at least default to >> the UTF-8 locale, which works reliably for user processes in the >> absence of an explicit setting like "LANG=C", even if service daemons >> and cron jobs can be a bit sketchier in terms of the locale settings >> they receive >> * for legacy environments migrating from Python 2 without upgrading >> the underlying OS, our emphasis has shifted to tolerating "bug >> compatibility" at the Python level in order to ease migration, as the >> most appropriate long term solution for those environments is now to >> upgrade their OS such that it more reliably provides correct locale >> encoding settings to the Python 3 interpreter (which wasn't a >> generally available option back when Python 3 first launched) >> >> Armin Ronacher (as ever) provides a good explanation of the system >> interface problems that can arise in Python 3 with bad locale encoding >> settings here: http://click.pocoo.org/4/python3/#python3-surrogates >> >> In my view, the critical helper function for this purpose is actually >> "handle_surrogateescape", as that's the one that lets us readily adapt >> from the incorrectly specified ASCII locale encoding to any other >> ASCII-compatible system encoding once we've bootstrapped into a full >> Python environment which has more options for figuring out a suitable >> encoding than just looking at the locale setting provided by the C >> runtime. It's also the function that serves to provide the primary >> "hook" where we can hang documentation of this platform specific >> boundary encoding/decoding issue. >> >> The other suggested functions are then more about providing a "peek >> behind the curtain" API for folks that want to *use Python* to explore >> some of the ins and outs of Unicode surrogate handling. Surrogates and >> astrals really aren't that complicated, but we've historically hidden >> them away as "dark magic not to be understood by mere mortals". In >> reality, they're just different ways of composing sequences of >> integers to represent text, and the suggested APIs are designed to >> expose that in a way we haven't done in the past. I can't actually >> think of a practical purpose for them other than teaching people the >> basics of how Unicode representations work, but demystifying that >> seems sufficiently worthwhile to me that I'm not opposed to their >> inclusion (bear in mind I'm also the current "dis" module maintainer, >> and a contributor to the "inspect", so I'm a big fan of exposing >> underlying concepts like this in a way that lets people play with them >> programmatically for learning purposes). >> >> Cheers, >> Nick. >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu May 7 09:27:18 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 7 May 2015 00:27:18 -0700 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: <554AFF69.9050404@thomas-guettler.de> References: <554A1F8C.1040005@thomas-guettler.de> <554AFF69.9050404@thomas-guettler.de> Message-ID: <285FE1BE-21DD-4033-B536-E2A5959A3F59@yahoo.com> On May 6, 2015, at 23:00, Thomas G?ttler wrote: > >> Am 06.05.2015 um 17:07 schrieb Paul Moore: >>> On 6 May 2015 at 15:05, Thomas G?ttler wrote: >>> I am missing a policy how sys.path should be altered. >> >> Well, the docs say that applications can modify sys.path as needed. >> Generally, applications modify sys.path in place via sys.path[:] = >> whatever, but that's not mandated as far as I know. >> >>> We run a custom sub class of list in sys.path. We set it in sitecustomize.py >> >> Can you explain why? > > I forgot to explain the why I use a custom class. Sorry, here is the background. > > I want sys.path to ordered: > > 1. virtualenv > 2. /usr/local/ > 3. /usr/lib Can you instead just leave sys.path alone, and replace the module finder with a subclass that orders the directories in sys.path the way it wants to? That's something a lot fewer packages are likely to screw with. > We use virtualenvs with system site-packages. > > There are many places where sys.path gets altered. > > The last time we had sys.path problems I tried to write a test > which checks that sys.path is the same for cron jobs and web requests. > I failed. Too many places, I could not find all the places > and the conditions where sys.path got modified in a different way. > >> It seems pretty risky to expect that no >> applications will replace sys.path. I understand that you're proposing >> that we say that applications shouldn't do that - but just saying so >> won't change the many applications already out there. > > Of course I know that if we agree on a policy, it wont' change existing code > in one second. But if there is an official policy, you are able to > write bug reports like this "Please alter sys.path according to the docs. See http://www.python.org/...." > > The next thing: If someone wants to add to sys.path, most of the > time the developer inserts its new entries in the front of the list. > > This can break the ordering if you don't use a custom list class. > > > > >>> This instance get replace by a common list in lines like this: >>> >>> sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path >>> >>> The above line is from pip, it similar things happen in a lot of packages. >> >> How does the fact that pip does that cause a problem? The sys.path >> modification is only in effect while pip is running, and no code in >> pip relies on sys.path being an instance of your custom class. > > pip is a special case, since the pip authors say "we don't provide an API". > But they have handy methods which we want to use. We use "import pip" > and the class of sys.path of our application gets altered. > > >>> Before trying to solve this with code, I think the python community should >>> agree an a policy for altering sys.path. >> >> I can't imagine that happening, and even if it does, it won't make any >> difference because a new policy won't change existing code. It won't >> even affect new code unless people know about it (which isn't certain >> - I doubt many people read the documentation that closely). > > Code updates will happen step by step. > If someone has a problem, since his custom list class in sys.path gets > altered, he will write a bug report to the maintainer. A bug report > referencing official python docs has more weight. > >>> What can I do to this done? >> >> I doubt you can. >> >> A PR for pip that changes the above line to modify sys.path in place >> would probably get accepted (I can't see any reason why it wouldn't), >> and I guess you could do the same for any other code you find. But as >> for persuading the Python programming community not to replace >> sys.path in any code, that seems unlikely to happen. >> >>> We use Python 2.7 >> >> If you were using 3.x, then it's (barely) conceivable that making >> sys.path read-only (so people could only modify it in-place) could be >> done as a new feature, but (a) it would be a major backward >> compatibility break, so there would have to be a strong justification, >> and (b) it would stop you from replacing sys.path with your custom >> class in the first place, so it wouldn't solve your issue. >> >> Which also raises the question, why do you believe it's OK to forbid >> other people to replace sys.path when that's what you're doing in your >> sitecustomize code? That seems self-contradictory... > > Yes, you are right this looks self-contradictory. > I am the one which is responsible for the set up of the environment. > > Where is the best place during the interpreter initialization for > altering the class of sys.path? I guess it is sitecustomize. After > it was executed sys.path should be altered only in-place. > > Regards, > Thomas G?ttler > > > -- > http://www.thomas-guettler.de/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From robertc at robertcollins.net Thu May 7 09:31:09 2015 From: robertc at robertcollins.net (Robert Collins) Date: Thu, 7 May 2015 19:31:09 +1200 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> Message-ID: On 7 May 2015 at 17:55, Nick Coghlan wrote: > On 7 May 2015 at 15:27, Nick Coghlan wrote: >> On 7 May 2015 at 11:41, Rob Cliffe wrote: >>> Or is there really some fundamental reason why things can't be simpler? >>> (Like, REALLY, REALLY simple?) >> >> Yep, there are around 7 billion fundamental reasons currently alive, >> and I have no idea how many that have gone before us: humans :) > > Heh, a message from Stephen off-list made me realise that an info dump > of all the reasons the edge cases are hard probably wasn't a good way > to answer your question :) > > What "we're" working towards (where "we" ~= the Unicode consortium + > operating system designers + programming language designers) is a > world where everything "just works", and computers talk to humans in > each human's preferred language (or a collection of languages, > depending on what the human is doing), and to each other in Unicode. > There are then a whole host of technical and political reasons why > it's taking decades to get from the historical point A (where > computers talk to humans in at most one language at a time, and don't > talk to each other at all) to that desired point B. > > We'll know we're done with that transition when Unicode becomes almost > transparently invisible, and the vast majority of programmers are once > again able to just deal with "text" without worrying too much about > how it's represented internally (but also having their programs be > readily usable in language's other than their own). > > Python 3 is already a lot closer to that ideal than Python 2 was, but > there are still some rough edges to iron out. The ones I'm personally > aware of affecting 3.4+ (including the one Serhiy started this thread > about) are listed as dependencies of http://bugs.python.org/issue22555 So, just last week I had to teach pbr how to deal with git commit messages that are not utf8 decodable. Some of the lowest layers of our stacks are willfully hostile to utf8: - Linux itself refuses to consider paths to be anything other than octet sequences [for various reasons, one of which is that it would be a backwards compatibility break to stop handling non-unicode strings, and Linux reallllllly doesn't want to do that, because you'd immediately make some % of data worldwide inaccessible]. - libc is somewhat, but not a lot better - its constrained by Linux - git considers commit messages to be octet sequences, and file paths likewise [for much the same reason as Linux: existing repositories have the data in them, API break to reject it] bzr refused non-unicode paths from day one, and we had a steady stream of users reporting that they couldn't import their history into bzr. One common reason is that they had test data in files on disk that was deliberately non-unicode (e.g. they were testing unicode handling boundary conditions in their software). Overall I believe we made the right choice, because we had relatively little in the way of headaches on Windows and MacOSX. [The most we ran into was the case insanity, plus normalisation forms on MacOSX]. surrogate escaping is a clever hack, and while the underlying layers are staunchly willing to give us crap data, we have a fairly simple choice: - either accept that under some circumstances folk will have to do their own interop shim at the boundary or - do the surrogate escaping hack to centralise the interop shims. The big risk, as already pointed out, is that the interop shims can at most get you mojibake rather than a crash. This isn't a win, its not even beneficial. I am not at all convinced by the distributor and packaging migration to Python3 argument. They have 'python3 -u' available for writing utilities that may be given mojibake input *and be expected to work regardless*. That lets Python3 get up and started and they can choose their own approach to handling the awful: they can just work in bytestrings, never decoding; they can explicitly decode with surrogateescape; they can write their own tooling. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud From abarnert at yahoo.com Thu May 7 09:46:20 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 7 May 2015 00:46:20 -0700 Subject: [Python-ideas] (no subject) In-Reply-To: <87lhh142c8.fsf@uwakimon.sk.tsukuba.ac.jp> References: <8C3A59B4-1C5B-4C67-A148-9ADBEE7123A7@yahoo.com> <87lhh142c8.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4B4608DC-F4FF-420F-8985-39201CFECA8F@yahoo.com> On May 6, 2015, at 18:13, Stephen J. Turnbull wrote: > > Ivan Levkivskyi writes: > >> Ok, I will try inspecting all existing approaches to find the one >> that seems more "right" to me :) > > If you do inspect all the approaches you can find, I hope you'll keep > notes and publish them, perhaps as a blog article. > >> In any case that approach could be updated by incorporating matrix >> @ as a dedicated operator for compositions. > > I think rather than "dedicated" you mean "suggested". One of Andrew's > main points is that you're unlikely to find more than a small minority > agreeing on the "right" approach, no matter which one you choose. Whatever wording you use, I do think it's likely that at least some of the existing libraries would become much more readable just by using @ in place of what they currently use. Even better, It may also turn out that the @ notation just "feels right" with one solution to the argument problem and wrong with another, narrowing down the possibility space. So, I think it's definitely worth pushing the experiments if someone has the time and inclination, so I'm glad Ivan has volunteered. >> At least, it seems that Erik from astropy likes this idea and it is >> quite natural for people with "scientific" background. I forgot to say before, but: it's great to have input from people coming from the MATLAB-y scientific/numeric world like him (I think) rather than just the Haskell/ML-y mathematical/CS world like you (Stephen, I think), as we usually get in these discussions. If there's one option that's universally obviously right to everyone in the first group, maybe everyone in the second group can shut up and deal with it. If not (which I think is likely, but I'll keep an open mind), well, at least we've got broader viewpoints and more data for Ivan's summary. > Sure, but as he also points out, when you know that you're going to be > composing only functions of one argument, the Unix pipe symbol is also > quite natural (as is Haskell's operator-less notation). While one of > my hobbies is category theory (basically, the mathematical theory of > composable maps for those not familiar with the term), I find the Unix > pipeline somehow easier to think about than abstract composition, > although I believe they're equivalent (at least as composition is > modeled by category theory). I think you're right that they're equivalent in theory. But I feel like they're also equivalent in usability and readability (as in for 1/3 simple cases they're both fine, for 1/3 compose looks better, for 1/3 rcompose), but I definitely can't argue for that. What always throws me is that most languages that offer both choose different precedence (and sometimes associativity, too) for them. The consequence seems to be that when I just use compose and rcompose operators without thinking about it, I always get them right, but as soon as I ask myself "which one is like shell pipes?" or "why did I put parens here?" I get confused and have to go take a break before I can write any more code. Haskell's operatorless notation is nice because it prevents me from noticing what I'm doing and asking myself those questions. :) From mal at egenix.com Thu May 7 09:56:15 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 07 May 2015 09:56:15 +0200 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: <554A8A79.2040306@egenix.com> Message-ID: <554B1A9F.6010606@egenix.com> On 07.05.2015 00:19, Eric Snow wrote: > On Wed, May 6, 2015 at 3:41 PM, M.-A. Lemburg wrote: >> Python without the C extensions would hardly have had the >> success it has. It is widely known as perfect language to >> glue together different systems and provide integration. >> >> Deprecating the C API would mean that you deprecate all >> those existing C extensions together with the C API. > > As Donald noted, I'm not suggesting that the C-API be deprecated. I > was careful in calling it "discouraging direct use of the C-API". :) Looks like that didn't work out when I read your suggestion :-) I'd expect a big red warning on all C API pages to have a similar effect on others. >> This can hardly be in the interest of Python's quest for >> world domination :-) >> >> BTW: What can be more drastic than deprecating the Python C API ? >> There are certainly better ways to evolve an API than getting >> rid of it. > > I'd like to hear more on alternatives. Lately all I've heard is how > much better off we'd be if folks used CFFI or tools like Cython to > write their extension modules. Regardless of what it is, we should > try to find *some* solution that puts us in a position that we can > accomplish certain architectural changes, such as moving away from > ref-counting. Larry talked about it at the language summit. C is pretty flexible when it comes to changing APIs gradually, e.g. you can have macros adjusting signatures for you or small wrapper functions fixing semantics, providing additional arguments, etc. I think it would be better to first investigate possible changes to the C API before recommending putting a layer between Python's C API and its C extensions. Those layers are useful for people who don't want to dive into the C API, but don't work well for those who know the C API and how to use it to give them the best possible performance or best possible integration with Python. I haven't seen Larry's talk, just read a short summary of things he mentioned in that talk. Those looked like a good starting point for discussions. Perhaps we could have GSoC students investigate some of these alternatives ?! Removing the GIL and reference counting will break things, but if there is a way we can reduce this breakage, I think we should definitely go for that approach before saying "oh, no, please don't use our C API". Aside: The fact that we have so many nice C extensions out there is proof that we have a good C API. Even though it is not visible to most Python programmers, it forms a significant part of Python's success. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 07 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From me at the-compiler.org Thu May 7 10:18:44 2015 From: me at the-compiler.org (Florian Bruhin) Date: Thu, 7 May 2015 10:18:44 +0200 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: <554B1A9F.6010606@egenix.com> References: <554A8A79.2040306@egenix.com> <554B1A9F.6010606@egenix.com> Message-ID: <20150507081844.GT429@tonks> * M.-A. Lemburg [2015-05-07 09:56:15 +0200]: > Aside: The fact that we have so many nice C extensions out > there is proof that we have a good C API. Even though it is > not visible to most Python programmers, it forms a significant > part of Python's success. Are many of those using the C API directly rather than using some bindings generator? Most projects I'm aware of use Cython/cffi/SWIG/... and not the raw C API, which is kind of the whole point here :) Florian -- http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP) GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc I love long mails! | http://email.is-not-s.ms/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From p.f.moore at gmail.com Thu May 7 10:31:13 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 7 May 2015 09:31:13 +0100 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: <554AFF69.9050404@thomas-guettler.de> References: <554A1F8C.1040005@thomas-guettler.de> <554AFF69.9050404@thomas-guettler.de> Message-ID: On 7 May 2015 at 07:00, Thomas G?ttler wrote: >> Can you explain why? > > I forgot to explain the why I use a custom class. Sorry, here is the background. > > I want sys.path to ordered: > > 1. virtualenv > 2. /usr/local/ > 3. /usr/lib > > We use virtualenvs with system site-packages. > > There are many places where sys.path gets altered. > > The last time we had sys.path problems I tried to write a test > which checks that sys.path is the same for cron jobs and web requests. > I failed. Too many places, I could not find all the places > and the conditions where sys.path got modified in a different way. You do understand that by reordering sys.path like this you could easily break code that adds entries to sys.path, by shadowing local modules that the code is *deliberately* trying to put at the start of the path? I'm going to assume you have good reasons for doing this (and for needing to - it seems to me that this is normally the order you'd get by default). But even assuming that, I think your requirement is specialised enough that you shouldn't be expecting other applications to have to cater for it. >> It seems pretty risky to expect that no >> applications will replace sys.path. I understand that you're proposing >> that we say that applications shouldn't do that - but just saying so >> won't change the many applications already out there. > > Of course I know that if we agree on a policy, it wont' change existing code > in one second. But if there is an official policy, you are able to > write bug reports like this "Please alter sys.path according to the docs. See http://www.python.org/...." > > The next thing: If someone wants to add to sys.path, most of the > time the developer inserts its new entries in the front of the list. Generally, I would say that applications have every right to alter sys.path to suit their needs. Libraries (typically) shouldn't alter sys.path - in particular on import - without that being part of the documented API. If a library alters sys.path in a way that is a problem, and doesn't document that it's doing so, then I think you have a case for a bug report to that library. At a minimum they should document what they do. Your problem here is that pip is an *application* and so assumes the right to alter sys.path. You seem to be using it as a library, and that's where your problem lies. There *is* a reason we don't support using pip as a library (this wasn't one we'd thought of, but the risk of issues like this certainly was). With luck, now that you've brought this point up, we'll remember if & when we do document a supported pip-as-a-library API, and maybe deal with sys.path differently. Paul PS As I said before, it wouldn't be hard to fix the specific usage you pointed out in pip, and I don't see a problem with submitting an issue to that effect. Your custom subclass may still break pip, even after we make such a change, but that'd be a separate issue with your subclass, not a pip issue ;-) For this thread, though, I'm focusing on your request for a "global policy". From p.f.moore at gmail.com Thu May 7 10:47:15 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 7 May 2015 09:47:15 +0100 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: <554B1A9F.6010606@egenix.com> References: <554A8A79.2040306@egenix.com> <554B1A9F.6010606@egenix.com> Message-ID: On 7 May 2015 at 08:56, M.-A. Lemburg wrote: > Aside: The fact that we have so many nice C extensions out > there is proof that we have a good C API. Even though it is > not visible to most Python programmers, it forms a significant > part of Python's success. Agreed. Maybe a useful exercise for someone thinking about this issue would be to survey some of the major projects using the C API out there, and working out what would be involved in switching them to use cffi or Cython. That would give a good idea of the scale of the issue, as well as providing some practical help to projects that would be affected by this sort of recommendation. Good ones to look at would be: - lxml - pywin32 (I refrained from adding scipy and numpy to that list, as that would make this post seem like a troll attempt, which it isn't, but has anyone thought of the implications of a recommendation like this on those projects? OK, they'd probably just ignore it as they have a genuine need for direct use of the C API, but we would be sending pretty mixed messages). I prefer Nick's suggestion of adding better documentation to the packaging user guide. Maybe even to the extent of having a worked example. The article at https://scipy-lectures.github.io/advanced/interfacing_with_c/interfacing_with_c.html is quite a nice overview, although it's heavily numpy-focused and doesn't include cffi. Paul From stefan at bytereef.org Thu May 7 10:54:37 2015 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 7 May 2015 08:54:37 +0000 (UTC) Subject: [Python-ideas] discouraging direct use of the C-API References: Message-ID: Eric Snow writes: > A big blocker to making certain sweeping changes to CPython (e.g. > ref-counting) is compatibility with the vast body of C extension > modules out there that use the C-API. While there are certainly > drastic long-term solutions to that problem, there is one thing we can > do in the short-term that would at least get the ball rolling. We can > put a big red note at the top of every page of the C-API docs that > encourages folks to either use CFFI or Cython. -1. CFFI is much slower than using the C-API directly. Python is a great language by itself, but its excellent C-API is one of the major selling points. As for garbage collection vs. refcounting: I've tried OCaml's C-API and found it 20% slower than Python's. Note that OCaml has a fantastic native code compiler (and the culture is C-friendly), so it seems to be a hard problem. Stefan Krah From caleb.hattingh at gmail.com Thu May 7 11:01:59 2015 From: caleb.hattingh at gmail.com (Caleb Hattingh) Date: Thu, 7 May 2015 19:01:59 +1000 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: > On 7 May 2015, at 6:54 pm, Stefan Krah wrote: > > Eric Snow writes: >> A big blocker to making certain sweeping changes to CPython (e.g. >> ref-counting) is compatibility with the vast body of C extension >> modules out there that use the C-API. While there are certainly >> drastic long-term solutions to that problem, there is one thing we can >> do in the short-term that would at least get the ball rolling. We can >> put a big red note at the top of every page of the C-API docs that >> encourages folks to either use CFFI or Cython. > > -1. CFFI is much slower than using the C-API directly. I am quite interested in this; do you happen have a link to a case study/gist/repo where this has been measured? Even if you can remember people?s names involved or something similar, I could google it myself. Kind regards Caleb From jmcs at jsantos.eu Thu May 7 11:09:59 2015 From: jmcs at jsantos.eu (=?UTF-8?B?Sm/Do28gU2FudG9z?=) Date: Thu, 07 May 2015 09:09:59 +0000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <20150506145131.GL5663@ando.pearwood.info> References: <20150506145131.GL5663@ando.pearwood.info> Message-ID: On Wed, 6 May 2015 at 16:51 Steven D'Aprano wrote: > > I think that there are some questions that would need to be answered. > For instance, given some composition: > > f = math.sin @ (lambda x: x**2) > > what would f.__name__ return? What about str(f)? > Lambdas return '' so maybe something like ''? Then str(f) would be ' at 0xffffffffffff>'. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at bytereef.org Thu May 7 11:11:29 2015 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 7 May 2015 09:11:29 +0000 (UTC) Subject: [Python-ideas] discouraging direct use of the C-API References: Message-ID: Caleb Hattingh writes: > > -1. CFFI is much slower than using the C-API directly. > > I am quite interested in this; do you happen have a link to a case study/gist/repo where this has been > measured? Even if you can remember people?s names involved or something similar, I could google it myself. I've measured it here: https://mail.python.org/pipermail/python-dev/2013-December/130772.html CFFI is very nice (superb API), but not for high performance use cases. Stefan Krah From donald at stufft.io Thu May 7 11:14:18 2015 From: donald at stufft.io (Donald Stufft) Date: Thu, 7 May 2015 05:14:18 -0400 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: > On May 7, 2015, at 5:11 AM, Stefan Krah wrote: > > Caleb Hattingh writes: >>> -1. CFFI is much slower than using the C-API directly. >> >> I am quite interested in this; do you happen have a link to a case > study/gist/repo where this has been >> measured? Even if you can remember people?s names involved or something > similar, I could google it myself. > > I've measured it here: > > https://mail.python.org/pipermail/python-dev/2013-December/130772.html > > > CFFI is very nice (superb API), but not for high performance use cases. > Is the source code for this benchmark available anywhere? --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From rosuav at gmail.com Thu May 7 11:41:18 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 7 May 2015 19:41:18 +1000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <20150506145131.GL5663@ando.pearwood.info> Message-ID: On Thu, May 7, 2015 at 7:09 PM, Jo?o Santos wrote: > On Wed, 6 May 2015 at 16:51 Steven D'Aprano wrote: >> >> >> I think that there are some questions that would need to be answered. >> For instance, given some composition: >> >> f = math.sin @ (lambda x: x**2) >> >> what would f.__name__ return? What about str(f)? > > > Lambdas return '' so maybe something like ''? > Then str(f) would be ' at 0xffffffffffff>'. Would be nice to use ">", incorporating both names, but that could get unwieldy once you compose a bunch of functions. ChrisA From stephen at xemacs.org Thu May 7 12:04:34 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 07 May 2015 19:04:34 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> Message-ID: <87bnhw4sb1.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > What "we're" working towards (where "we" ~= the Unicode consortium + > operating system designers + programming language designers) is a > world where everything "just works", and computers talk to humans in > each human's preferred language (or a collection of languages, > depending on what the human is doing), and to each other in Unicode. > There are then a whole host of technical and political reasons And economic -- which really bites here because if it weren't for the good ol' American greenback and that huge GDP and consumption (especially of software) this thread would be all about why GB 18030[1] is so hard. Think about *that* prospect the next time the "complexity of Unicode" starts to bug you. :-) > We'll know we're done with that transition when Unicode becomes almost > transparently invisible, and the vast majority of programmers are once > again able to just deal with "text" without worrying too much about > how it's represented internally That part after the "and" is a misstatement, isn't it? Nobody using Python 3 is concerned with how it's represented internally *at all*, because for all the str class cares it *could* be GB 18030, and only ord() (and esoteric features like memoryview) would ever tell you so. And Python 3 programmers *can* treat str as "just text"[2] as long as they stick to pure Python, and don't have to accept or generate encoded text for *external* modules (such as Tcl/Tk) that don't know about (all of) Unicode. Even surrogateescapes only matter when you're dealing with rather unruly input (or a mendacious OS). So it's *still* all about I/O, viz: issue22555. "Unicode" is just the conventional curse word that programmers use when they're thinking "HCI is hard and it sucks and I just wish it would go away!", even though Unicode gets us 90% of the way to the solution. (The other 10% is where us humans go contributing a little peace, love, and understanding. :-) Footnotes: [1] The Chinese standard which has exactly the same character repertoire as Unicode (because it tracks it by design), but instead of grandfathering ISO 8859-1 code points as the first 256 code points of Unicode, it grandfathers GB 2312 (Chinese) as the first few thousand, and has a rather obnoxious variable width representation as a result. [2] With a few exceptions such as dealing with Apple's icky NFD filesystem encoding, and formatting bidirectional strings in reStructuredText (which I haven't tried, but I bet doesn't work very well in tables!) From stefan at bytereef.org Thu May 7 12:15:04 2015 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 7 May 2015 10:15:04 +0000 (UTC) Subject: [Python-ideas] discouraging direct use of the C-API References: Message-ID: Nick Coghlan writes: > Rather than embedding these recommendations directly in the version > specific CPython docs, I'd prefer to see contributions to fill in the > incomplete sections in > https://packaging.python.org/en/latest/extensions.html with links back > to the relevant parts of the C API documentation and docs for other > projects (I was able to write the current overview section on that > page in a few hours, as I didn't need to do much research for that, > but filling in the other sections properly involves significantly more > work). Hmm. I'm getting a twilio.com advertisement on that page. I miss the old python.org... Stefan Krah From steve at pearwood.info Thu May 7 13:30:46 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 7 May 2015 21:30:46 +1000 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: <20150507113045.GQ5663@ando.pearwood.info> On Thu, May 07, 2015 at 10:15:04AM +0000, Stefan Krah wrote: > Nick Coghlan writes: > > Rather than embedding these recommendations directly in the version > > specific CPython docs, I'd prefer to see contributions to fill in the > > incomplete sections in > > https://packaging.python.org/en/latest/extensions.html with links back > > to the relevant parts of the C API documentation and docs for other > > projects (I was able to write the current overview section on that > > page in a few hours, as I didn't need to do much research for that, > > but filling in the other sections properly involves significantly more > > work). > > Hmm. I'm getting a twilio.com advertisement on that page. I miss > the old python.org... I see it too. Why is python.org displaying advertisments? -- Steve From donald at stufft.io Thu May 7 13:44:16 2015 From: donald at stufft.io (Donald Stufft) Date: Thu, 7 May 2015 07:44:16 -0400 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: <20150507113045.GQ5663@ando.pearwood.info> References: <20150507113045.GQ5663@ando.pearwood.info> Message-ID: > On May 7, 2015, at 7:30 AM, Steven D'Aprano wrote: > > On Thu, May 07, 2015 at 10:15:04AM +0000, Stefan Krah wrote: >> Nick Coghlan writes: >>> Rather than embedding these recommendations directly in the version >>> specific CPython docs, I'd prefer to see contributions to fill in the >>> incomplete sections in >>> https://packaging.python.org/en/latest/extensions.html with links back >>> to the relevant parts of the C API documentation and docs for other >>> projects (I was able to write the current overview section on that >>> page in a few hours, as I didn't need to do much research for that, >>> but filling in the other sections properly involves significantly more >>> work). >> >> Hmm. I'm getting a twilio.com advertisement on that page. I miss >> the old python.org... > > I see it too. Why is python.org displaying advertisments? > packaging.python.org is hosted on RTD, I guess that RTD added ads to it?s free service. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From p.f.moore at gmail.com Thu May 7 13:50:09 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 7 May 2015 12:50:09 +0100 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: On 7 May 2015 at 10:11, Stefan Krah wrote: > Caleb Hattingh writes: >> > -1. CFFI is much slower than using the C-API directly. >> >> I am quite interested in this; do you happen have a link to a case > study/gist/repo where this has been >> measured? Even if you can remember people?s names involved or something > similar, I could google it myself. > > I've measured it here: > > https://mail.python.org/pipermail/python-dev/2013-December/130772.html > > > CFFI is very nice (superb API), but not for high performance use cases. I'm guessing that benchmark used cffi in the "ABI level" dynamic form that matches ctypes. Did you try the cffi "API level" form that creates a C extension? I'd be curious as to where that falls in performance. Paul From caleb.hattingh at gmail.com Thu May 7 13:58:01 2015 From: caleb.hattingh at gmail.com (Caleb Hattingh) Date: Thu, 7 May 2015 21:58:01 +1000 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: <86012306-FEA2-4CAE-8553-FBFFA661654D@gmail.com> > On 7 May 2015, at 9:50 pm, Paul Moore wrote: > > On 7 May 2015 at 10:11, Stefan Krah wrote: >> Caleb Hattingh writes: >>>> -1. CFFI is much slower than using the C-API directly. >>> >>> I am quite interested in this; do you happen have a link to a case >> study/gist/repo where this has been >>> measured? Even if you can remember people?s names involved or something >> similar, I could google it myself. >> >> I've measured it here: >> >> https://mail.python.org/pipermail/python-dev/2013-December/130772.html >> >> CFFI is very nice (superb API), but not for high performance use cases. > > I'm guessing that benchmark used cffi in the "ABI level" dynamic form > that matches ctypes. Did you try the cffi "API level" form that > creates a C extension? I'd be curious as to where that falls in > performance. I had a quick look around, @eevee made this comparison some time ago: === ? CPython 2.7 + Cython: 2.0s ? CPython 2.7 + CFFI: 2.7s ? PyPy 2.1 + CFFI: 4.3s That?s the time it takes, from a warm start, to run the test suite. === from http://eev.ee/blog/2013/09/13/cython-versus-cffi/ Kind regards Caleb From storchaka at gmail.com Thu May 7 14:07:49 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 07 May 2015 15:07:49 +0300 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> Message-ID: On 07.05.15 05:05, Neil Girdhar wrote: > Since strings are constant, wouldn't it be much faster to implement > string slices as a view of other strings? > > For clarity, I'm talking about CPython. I'm not talking about anything > the user sees. The string views would still look like regular str > instances to the user. Note that String in Java was implemented as a view of underlying array of chars. This allowed sharing character data and fast (constant time) slicing. But the implementation was changed in Java 7u6. http://java-performance.info/changes-to-string-java-1-7-0_06/ From dw+python-ideas at hmmz.org Thu May 7 14:32:39 2015 From: dw+python-ideas at hmmz.org (David Wilson) Date: Thu, 7 May 2015 12:32:39 +0000 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: Message-ID: <20150507123239.GA1768@k3> On Wed, May 06, 2015 at 10:23:09AM -0600, Eric Snow wrote: > A big blocker to making certain sweeping changes to CPython (e.g. > ref-counting) is compatibility with the vast body of C extension > modules out there that use the C-API. While there are certainly > drastic long-term solutions to that problem, there is one thing we can > do in the short-term that would at least get the ball rolling. We can > put a big red note at the top of every page of the C-API docs that > encourages folks to either use CFFI or Cython. One of CPython's traditional strongholds is its use as an embedded language. I've worked on a bunch of commercial projects using it in this way, often specifically for improved performance/access to interpreter internals, and this is not to mention the numerous free software projects doing similar: gdb, uwsgi, mod_python, Freeswitch, and so on. It might be better to discuss specifics of what should change in the API besides refcounting, and hammer out concrete steps to make those changes happen, since I doubt the C API is ever going to go away, as even if all extension modules were rewritten today its use for embedding would still prevent sweeping changes without upsetting a huge number of users and mature products. David From guettliml at thomas-guettler.de Thu May 7 16:51:36 2015 From: guettliml at thomas-guettler.de (=?windows-1252?Q?Thomas_G=FCttler?=) Date: Thu, 07 May 2015 16:51:36 +0200 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: <20150507072223.GS429@tonks> References: <554A1F8C.1040005@thomas-guettler.de> <554AFF69.9050404@thomas-guettler.de> <20150507064836.GR429@tonks> <554B0D3E.9020708@thomas-guettler.de> <20150507072223.GS429@tonks> Message-ID: <554B7BF8.8070508@thomas-guettler.de> Am 07.05.2015 um 09:22 schrieb Florian Bruhin: > * Thomas G?ttler [2015-05-07 08:59:10 +0200]: >> >> >> Am 07.05.2015 um 08:48 schrieb Florian Bruhin: >>> * Thomas G?ttler [2015-05-07 08:00:09 +0200]: >>>> Am 06.05.2015 um 17:07 schrieb Paul Moore: >>>>> On 6 May 2015 at 15:05, Thomas G?ttler wrote: >>>>>> I am missing a policy how sys.path should be altered. >>>>> >>>>> Well, the docs say that applications can modify sys.path as needed. >>>>> Generally, applications modify sys.path in place via sys.path[:] = >>>>> whatever, but that's not mandated as far as I know. >>>>> >>>>>> We run a custom sub class of list in sys.path. We set it in sitecustomize.py >>>>> >>>>> Can you explain why? >>>> >>>> I forgot to explain the why I use a custom class. Sorry, here is the background. >>>> >>>> I want sys.path to ordered: >>>> >>>> 1. virtualenv >>>> 2. /usr/local/ >>>> 3. /usr/lib >>>> >>>> We use virtualenvs with system site-packages. >>>> >>>> There are many places where sys.path gets altered. >>>> >>>> The last time we had sys.path problems I tried to write a test >>>> which checks that sys.path is the same for cron jobs and web requests. >>>> I failed. Too many places, I could not find all the places >>>> and the conditions where sys.path got modified in a different way. >>> >>> It looks like you explained *how* you do what you do, but not *why* - >>> what problem is this solving? Why can't you just invoke the >>> virtualenv's python and let python take care of sys.path? >> >> I want the sys.path be ordered like it, since I want that packages of the inner >> environment are tried first. >> >> Here "inner" means "upper" in the above sys.path order. >> >> Example: If a package is installed in the virtualenv with version 2.2 and >> in global site packages with version 1.0, then I want the interpreter to >> use the version from virtualenv. > > That's already the default virtualenv behaviour: If this is the behaviour in your virtualenv, that's nice for you. In my virtualenv it was not that way. There are a lot of modules which do magic with sys.path. I guess non of them are installed in your virtualenv. Andrew Barnet suggested to alter the module finder. That looks interesting. Thomas From steve at pearwood.info Thu May 7 17:31:24 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 8 May 2015 01:31:24 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <554AC2CE.5040705@btinternet.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> Message-ID: <20150507153123.GT5663@ando.pearwood.info> On Thu, May 07, 2015 at 02:41:34AM +0100, Rob Cliffe wrote: > This is no doubt *not* the best platform to raise these thoughts (which > are nothing to do with Python - apologies), but I'm not sure where else > to go. > I watch discussions like this ... > I watch posts like this one [Nick's] ... > ... And I despair. I really despair. > > I am a very experienced but old (some would say "dinosaur") programmer. > I appreciate the need for Unicode. I really do. > I don't understand Unicode and all its complications AT ALL. > And I can't help wondering: > Why, oh why, do things have to be SO FU*****G COMPLICATED? This > thread, for example, is way over my head. And it is typical of many > discussions I have stared at, uncomprehendingly. > Surely 65536 (2-byte) encodings are enough to express all characters in > all the languages in the world, plus all the special characters we need. Not even close. Unicode currently encodes over 74,000 CJK (Chinese/Japanese/Korean) ideographs, which is comfortably larger than 2**16, so no 16-bit encoding can handle the complete range of CJK characters. It will probably take many more years before the entire CJK character set is added to Unicode, simply because the characters left to add are obscure and rare. Some may never be added at all, e.g. in 2007 Taiwan withdrew a submission to add 6,545 characters used as personal names as they were deemed to no longer be in use. That's just *one* writing system. Then we add Latin, Cyrillic (Russian), Greek/Coptic, Arabic, Hebrew, Korea's other writing system Hangul, Thai, and dozens of others. (Fortunately, unlike Chinese characters, the other writing systems typically need only a few dozen or hundred characters, not tens of thousands.) Plus dozens of punctuation marks, symbols from mathematics, linguistics, and much more. And the Unicode Consortium projects that at least another five thousand characters will be added in version 8, and probably more beyond that. So no, two bytes is not enough. Unicode actually fits into 21 bits, which is a bit less than three bytes, but for machine efficiency four bytes will often be used. > Why can't there be just *ONE* universal encoding? (Decided upon, no > doubt, by some international standards committee. There would surely be > enough spare codes for any special characters etc. that might come up in > the foreseeable future.) The problem isn't so much with the Unicode encodings (of which there are only a handful, and most of the time you only use one, UTF-8) but with the dozens and dozens of legacy encodings invented during the dark ages before Unicode. > *Is it just historical accident* (partly due to an awkward move from > 1-byte ASCII to 2-byte Unicode, implemented in many different places, in > many different ways) *that we now have a patchwork of encodings that we > strive to fit into some over-complicated scheme*? Yes, it is a historical accident. In the 1960s, 70s and 80s national governments and companies formed a plethora of one-byte (and occasional two-byte) encodings to support their own languages and symbols. E.g. in the 1980s, Apple used their own idiosyncratic set of 256 characters, which didn't match the 256 characters used on DOS, which was different again from those on Amstrad... Unicode was started in the 1990s to bring order to that chaos. If you think things are complicated with Unicode, they would be much worse without it. > Or is there *really* some *fundamental reason* why things *can't* be > simpler? (Like, REALLY, _*REALLY*_ simple?) 90% of the complexity is due to the history of text encodings on various computer platforms. If people had predicted cheap memory and the Internet back in the early 1960s, perhaps we wouldn't have ended up with ASCII and the dozens of incompatible "Extended ASCII" encodings as we know them today. But the other 90% of the complexity is inherent to human languages. For example, you know what the lower case of "I" is, don't you? It's "i". But not in Turkey, which has both a dotted and dotless version: I ? ? i (Strangely, as far as I know, nobody has a dotted J or dotless j.) Consequently, Unicode has a bunch of complexity related to left-to-right and right-to-left writing systems, accents, joiners, variant forms, and other issues. But, unless you're actually writing in a language which needs that, or writing a word-processor application, you can usually ignore all of that and just treat them as "characters". > Imageine if we were starting to design the 21st century from scratch, > throwing away all the history? How would we go about it? Well, for starters I would insist on re-introducing thorn ? and eth ? back into English :-) -- Steve From steve at pearwood.info Thu May 7 17:46:21 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 8 May 2015 01:46:21 +1000 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> Message-ID: <20150507154621.GU5663@ando.pearwood.info> On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote: > Since strings are constant, wouldn't it be much faster to implement string > slices as a view of other strings? String or list views would be *very* useful in situations like this: # Create a massive string s = "some string"*1000000 for c in s[1:]: process(c) which needlessly duplicates almost the entire string just to skip the first char. The same applies to lists or other sequences. But a view would be harmful in this situation: s = "some string"*1000000 t = s[1:2] # a view maskerading as a new string del s Now we keep the entire string alive long after it is needed. How would you solve the first problem without introducing the second? -- Steve From tjreedy at udel.edu Thu May 7 17:58:00 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 07 May 2015 11:58:00 -0400 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: <20150507113045.GQ5663@ando.pearwood.info> Message-ID: On 5/7/2015 7:44 AM, Donald Stufft wrote: >>>> https://packaging.python.org/en/latest/extensions.html >>> Hmm. I'm getting a twilio.com advertisement on that page. I miss >>> the old python.org... >> I see it too. Why is python.org displaying advertisments? > packaging.python.org is hosted on RTD, I guess that RTD added ads to > it?s free service. I don't see any ad (using Verizon FIOS). I have noscript running, but allowing first readthedocs.com and then grokthedocs.com did not produce an ad. -- Terry Jan Reedy From frankwoodall at gmail.com Thu May 7 18:06:32 2015 From: frankwoodall at gmail.com (Frank Woodall) Date: Thu, 7 May 2015 12:06:32 -0400 Subject: [Python-ideas] Handling lack of permissions/groups with pathlib's rglob Message-ID: Greetings, I am attempting to use pathlib to recursively glob and/or find files. File permissions and groups are all over the place due to poor management of the filesystem which is out of my control. The problem occurs when I lack both permissions and group membership to a directory that rglob attempts to descend into. Rglob throws a KeyError and then a PermissionError and finally stops entirely. I see no way to recover gracefully from this and continue globbing. Is this the expected behavior in this case? The behavior that I want is for rglob to skip directories that I don't have permissions on and to generate the list of everything that it saw/had permissions on. The all or nothing nature isn't going to get me very far in this particular case because I'm almost guaranteed to have bad permissions on some directory or another on every run. More specifics: Python: 3.4.1 (and 3.4.3) compiled from source for linux Filesystem I am globbing on: automounted nfs share How to reproduce: mkdir /tmp/path_test && cd /tmp/path_test && mkdir dir1 dir2 dir2/dir3 && touch dir1/file1 dir1/file2 dir2/file1 dir2/file2 dir2/dir3/file1 su chmod 700 dir2/dir3/ chown root:root dir2/dir3/ exit python 3.4.1 from pathlib import Path p = Path('/tmp/path_test') for x in p.rglob('*') : print(x) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu May 7 18:11:08 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 07 May 2015 12:11:08 -0400 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: References: <554A1F8C.1040005@thomas-guettler.de> <554AFF69.9050404@thomas-guettler.de> Message-ID: On 5/7/2015 4:31 AM, Paul Moore wrote: > Generally, I would say that applications have every right to alter > sys.path to suit their needs. Libraries (typically) shouldn't alter > sys.path - in particular on import - without that being part of the > documented API. I agree. Altering sys.path is an instance of monkeypatching, as is altering sys.std*. Libraries that automatically do either on import limit their usefulness and should document their behavior. -- Terry Jan Reedy From mistersheik at gmail.com Thu May 7 18:22:40 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 7 May 2015 12:22:40 -0400 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: <20150507154621.GU5663@ando.pearwood.info> References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> Message-ID: One way, is leave CPython as is, and create a string view class as a user. The other is to make the views use weakref to the target string and copy on delete. Anyway, I'm not really bothered about this. Just wanted to see what people thought. You probably shouldn't be using str for really long strings, which is the only time this would matter anyway. On Thu, May 7, 2015 at 11:46 AM, Steven D'Aprano wrote: > On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote: > > Since strings are constant, wouldn't it be much faster to implement > string > > slices as a view of other strings? > > String or list views would be *very* useful in situations like this: > > # Create a massive string > s = "some string"*1000000 > for c in s[1:]: > process(c) > > > which needlessly duplicates almost the entire string just to skip the > first char. The same applies to lists or other sequences. > > But a view would be harmful in this situation: > > s = "some string"*1000000 > t = s[1:2] # a view maskerading as a new string > del s > > Now we keep the entire string alive long after it is needed. > > How would you solve the first problem without introducing the second? > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/II-4QRDb8Is/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu May 7 18:23:24 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 7 May 2015 18:23:24 +0200 Subject: [Python-ideas] Handling lack of permissions/groups with pathlib's rglob References: Message-ID: <20150507182324.78706543@fsol> Hello Frank, On Thu, 7 May 2015 12:06:32 -0400 Frank Woodall wrote: > The problem occurs when I lack both permissions and group membership to a > directory that rglob attempts to descend into. Rglob throws a KeyError and > then a PermissionError and finally stops entirely. I see no way to recover > gracefully from this and continue globbing. Is this the expected behavior > in this case? It is not unexpected :) Actually, this case was simply not envisioned. I agree that being more laxist could be convenient here. If you want to provide a patch for this, you can start at https://docs.python.org/devguide/ Regards Antoine. > The behavior that I want is for rglob to skip directories that I don't have > permissions on and to generate the list of everything that it saw/had > permissions on. The all or nothing nature isn't going to get me very far in > this particular case because I'm almost guaranteed to have bad permissions > on some directory or another on every run. > > More specifics: Python: 3.4.1 (and 3.4.3) compiled from source for linux > > Filesystem I am globbing on: automounted nfs share > > How to reproduce: > > mkdir /tmp/path_test && cd /tmp/path_test && mkdir dir1 dir2 dir2/dir3 > && touch dir1/file1 dir1/file2 dir2/file1 dir2/file2 dir2/dir3/file1 > su > chmod 700 dir2/dir3/ > chown root:root dir2/dir3/ > exit > > python 3.4.1 > > from pathlib import Path > p = Path('/tmp/path_test') > for x in p.rglob('*') : print(x) > From stefan at bytereef.org Thu May 7 18:54:22 2015 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 7 May 2015 16:54:22 +0000 (UTC) Subject: [Python-ideas] discouraging direct use of the C-API References: <20150507113045.GQ5663@ando.pearwood.info> Message-ID: Terry Reedy writes: > >>>> https://packaging.python.org/en/latest/extensions.html > I don't see any ad (using Verizon FIOS). I have noscript running, but > allowing first readthedocs.com and then grokthedocs.com did not produce > an ad. It's gone now. It was there when I posted earlier (I even clicked through). Stefan Krah From stefan_ml at behnel.de Thu May 7 19:23:41 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 07 May 2015 19:23:41 +0200 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: <554A8A79.2040306@egenix.com> <554B1A9F.6010606@egenix.com> Message-ID: Paul Moore schrieb am 07.05.2015 um 10:47: > On 7 May 2015 at 08:56, M.-A. Lemburg wrote: >> Aside: The fact that we have so many nice C extensions out >> there is proof that we have a good C API. Even though it is >> not visible to most Python programmers, it forms a significant >> part of Python's success. Oh, totally. But that doesn't mean people have to manually write code against it, in the same way that you can benefit from excellent processors without writing assembly. > Maybe a useful exercise for someone thinking about this issue > would be to survey some of the major projects using the C API out > there, and working out what would be involved in switching them to use > cffi or Cython. That would give a good idea of the scale of the issue, > as well as providing some practical help to projects that would be > affected by this sort of recommendation. My general answer is that "Python is way easier to write than C", and therefore "rewriting C code in Cython" is a rather fast thing to do (P's and C's set as intended). Often enough, the rewrite also leads to immediate functional improvements because stuff can easily be done in a more general way in Python syntax than in plain C(-API) code. And it's not uncommon that several ref-counting and/or error handling bugs get fixed on the way. When I rewrite C-API code in Cython, the bulk of the time is spent reverse engineering the intended Python semantics from the verbose (and sometimes cryptic) C code. After that, writing them down in Python syntax is quite easy. Once you get used to it, the plain transformation can be done at more than a hundred lines of C code per hour, if it's not overly complex or dense (the usual 5%). If you have a good test suite, debugging the rewritten code should be quite straight forward afterwards. So, if you have a project with 10000 lines of C code, 30% of which uses the C-API, you should be able to rip out the direct usage of the C-API in just a couple of days by rewriting it in Cython. The code size usually drops by a factor of 2-5 that way. That also makes it a reasonable migration path for porting Py2.x C-API code to Py3, for example. I can't speak for cffi, but my guess is that if you know its API well, the fact that it's also Python should keep the rewriting speed in the same ball park as for Cython. So, for code that isn't performance critical, it's certainly a reasonable alternative, with the added benefit of having excellent support in PyPy. > Good ones to look at would be: > - lxml lxml has been written in Cython even before Cython existed (it used to be a patched Pyrex at the time). In fact, writing it in C would have been entirely impossible. Even if the necessary developer resources had been available, writing C code is so difficult in comparison that many of the non-trivial features would never have been implemented. > (I refrained from adding scipy and numpy to that list, as that would > make this post seem like a troll attempt, which it isn't, but has > anyone thought of the implications of a recommendation like this on > those projects? OK, they'd probably just ignore it as they have a > genuine need for direct use of the C API, but we would be sending > pretty mixed messages). Much of scipy and its surrounding tools and libraries are actually written in Cython. At least much of their parts that interact with Python, and often a lot more than just the interface layer. New code in the scientific computing community is commonly written in Cython these days, or uses other tools for JIT or AOT compilation (Numba, numexpr, ...), many of which were themselves partly written in Cython. Stefan From levkivskyi at gmail.com Thu May 7 19:50:29 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Thu, 7 May 2015 19:50:29 +0200 Subject: [Python-ideas] (no subject) In-Reply-To: <4B4608DC-F4FF-420F-8985-39201CFECA8F@yahoo.com> References: <8C3A59B4-1C5B-4C67-A148-9ADBEE7123A7@yahoo.com> <87lhh142c8.fsf@uwakimon.sk.tsukuba.ac.jp> <4B4608DC-F4FF-420F-8985-39201CFECA8F@yahoo.com> Message-ID: On May 7, 2015 9:46 AM, "Andrew Barnert" wrote: > > On May 6, 2015, at 18:13, Stephen J. Turnbull wrote: > > > > Ivan Levkivskyi writes: > > > >> Ok, I will try inspecting all existing approaches to find the one > >> that seems more "right" to me :) > > > > If you do inspect all the approaches you can find, I hope you'll keep > > notes and publish them, perhaps as a blog article. > > > >> In any case that approach could be updated by incorporating matrix > >> @ as a dedicated operator for compositions. > > > > I think rather than "dedicated" you mean "suggested". One of Andrew's > > main points is that you're unlikely to find more than a small minority > > agreeing on the "right" approach, no matter which one you choose. > > Whatever wording you use, I do think it's likely that at least some of the existing libraries would become much more readable just by using @ in place of what they currently use. Even better, > It may also turn out that the @ notation just "feels right" with one solution to the argument problem and wrong with another, narrowing down the possibility space. > > So, I think it's definitely worth pushing the experiments if someone has the time and inclination, so I'm glad Ivan has volunteered. > Thank you for encouraging me. It will be definitely an interesting experience to do this. > >> At least, it seems that Erik from astropy likes this idea and it is > >> quite natural for people with "scientific" background. > > I forgot to say before, but: it's great to have input from people coming from the MATLAB-y scientific/numeric world like him (I think) rather than just the Haskell/ML-y mathematical/CS world like you (Stephen, I think), as we usually get in these discussions. If there's one option that's universally obviously right to everyone in the first group, maybe everyone in the second group can shut up and deal with it. If not (which I think is likely, but I'll keep an open mind), well, at least we've got broader viewpoints and more data for Ivan's summary. > > > Sure, but as he also points out, when you know that you're going to be > > composing only functions of one argument, the Unix pipe symbol is also > > quite natural (as is Haskell's operator-less notation). While one of > > my hobbies is category theory (basically, the mathematical theory of > > composable maps for those not familiar with the term), I find the Unix > > pipeline somehow easier to think about than abstract composition, > > although I believe they're equivalent (at least as composition is > > modeled by category theory). > > I think you're right that they're equivalent in theory. > > But I feel like they're also equivalent in usability and readability (as in for 1/3 simple cases they're both fine, for 1/3 compose looks better, for 1/3 rcompose), but I definitely can't argue for that. > > What always throws me is that most languages that offer both choose different precedence (and sometimes associativity, too) for them. The consequence seems to be that when I just use compose and rcompose operators without thinking about it, I always get them right, but as soon as I ask myself "which one is like shell pipes?" or "why did I put parens here?" I get confused and have to go take a break before I can write any more code. Haskell's operatorless notation is nice because it prevents me from noticing what I'm doing and asking myself those questions. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Thu May 7 20:01:14 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Thu, 7 May 2015 20:01:14 +0200 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: Message-ID: > On Thu, May 7, 2015 at 7:09 PM, Jo?o Santos wrote: > > On Wed, 6 May 2015 at 16:51 Steven D'Aprano wrote: > >> > >> > >> I think that there are some questions that would need to be answered. > >> For instance, given some composition: > >> > >> f = math.sin @ (lambda x: x**2) > >> > >> what would f.__name__ return? What about str(f)? > > > > > > Lambdas return '' so maybe something like ''? > > Then str(f) would be ' at 0xffffffffffff>'. > > Would be nice to use ">", incorporating both names, but > that could get unwieldy once you compose a bunch of functions. Maybe it would be better to have ' at 0xffffffffffff>' for str(f) but 'sin @ ' for repr (f). So that one can have more info and it would be closer to idealistic obj == eval (repr (obj)). -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu May 7 20:26:06 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 07 May 2015 14:26:06 -0400 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: <20150507154621.GU5663@ando.pearwood.info> References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> Message-ID: On 5/7/2015 11:46 AM, Steven D'Aprano wrote: > On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote: >> Since strings are constant, wouldn't it be much faster to implement string >> slices as a view of other strings? > > String or list views would be *very* useful in situations like this: > > # Create a massive string > s = "some string"*1000000 > for c in s[1:]: > process(c) Easily done without slicing, as discussed on python-list multiple times. it = iter(s) next(it) for c in it: process(c) for s[5555: 399999], use explicit indexes for i in range(5555, 400000): process s[i] or use islice. The use case for sequence views is when one needs to keep around both the base sequence and the slices (views). -- Terry Jan Reedy From nad at acm.org Thu May 7 20:27:40 2015 From: nad at acm.org (Ned Deily) Date: Thu, 07 May 2015 11:27:40 -0700 Subject: [Python-ideas] Handling lack of permissions/groups with pathlib's rglob References: <20150507182324.78706543@fsol> Message-ID: In article <20150507182324.78706543 at fsol>, Antoine Pitrou wrote: > On Thu, 7 May 2015 12:06:32 -0400 > Frank Woodall > wrote: > > The problem occurs when I lack both permissions and group membership to a > > directory that rglob attempts to descend into. Rglob throws a KeyError and > > then a PermissionError and finally stops entirely. I see no way to recover > > gracefully from this and continue globbing. Is this the expected behavior > > in this case? > It is not unexpected :) Actually, this case was simply not envisioned. > I agree that being more laxist could be convenient here. > > If you want to provide a patch for this, you can start at > https://docs.python.org/devguide/ Also there is an open issue about this that can be used to attach a patch or further discussion: http://bugs.python.org/issue24120 -- Ned Deily, nad at acm.org From chris.barker at noaa.gov Thu May 7 20:32:31 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 7 May 2015 11:32:31 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> Message-ID: My not-an-expert thoughts on these issues: [NOTE: nested comments, so attribution may be totally confused] Why, oh why, do things have to be SO FU*****G COMPLICATED? > > two reasons: 1) human languages are complicated, and they all have their idiosyncrasies -- some are inherently better suited to machine interpretation, but the real killer is that we want to use multiple languages with one system -- that IS inherently very complicated. 2) legacy decisions an backward compatibility -- this is what makes it impossible to "simply" come up with a single bets way to to do it (or a few ways, anyway...) > Surely 65536 (2-byte) encodings are enough to express all characters in > all the languages in the world, plus all the special characters we need. > > That was once thought true -- but it turns out it's not -- darn! Though we do think that 4 bytes is plenty, and to some extent I'm confused as to why there isn't more use of UCS-4 -- sure it wastes a lot of space, but everything in computer (memory, cache, disk space, bandwidth) is orders of magnitudes larger/faster than it was when the Unicode discussion got started. But people don't like inefficiency and, in fact, as the newer py3 Unicode objects shows, we don't need to compromise on that. Or is there really some fundamental reason why things can't be simpler? > (Like, REALLY, REALLY simple?) Well, if there were no legacy systems, it still couldn't be REALLY, REALLY simple (though UCS-4 is close), but there could be a LOT fewer ways to do things: programming languages would have their own internal representation (like Python does), and we would have a small handful of encodings optimized for various things: UCS-4 for easy of use, utf-8 for small disk storage (at least of Euro-centered text), and that would be that. But we do have the legacies to deal with. Apple, Microsoft, Sun, and a few other vendors jumped on the Unicode > bandwagon early and committed themselves to the idea that 2 bytes is enough > for everything. When the world discovered that wasn't true, we were stuck > with a bunch of APIs that insisted on 2 bytes. Apple was able to partly > make a break with that era, but Windows and Java are completely stuck with > "Unicode means 16-bit" forever, which is why the whole world is stuck > dealing with UTF-16 and surrogates forever. > I've read many of the rants about UTF-16, but in fact, it's really not any worse than UTF-8 -- it's kind of a worst of both worlds -- not a set number of bytes per char, but a lot of wasted space (particularly for euro languages), but other than a bi tof wasted sapce, it's jsut like UTF-8. The Problem with is it not UTF-16 itself, but the fact that an really surprising number of APIs and programmers still think that it's UCS-2, rather than UTF-16 --painful. And the fact, that AFAIK, ther really is not C++ Unicode type -- at least not one commonly used. Again -- legacy issues. And there are still people creating filenames on Latin-1 filesystems on > older Linux and Unix boxes, > This is the odd one to me -- reading about people's struggles with py3 an *nix filenames -- they argue that *nix is not broken -- and the world should just use char* for filenames and all is well! IN fact, maybe it would be easier to handle filenames as char* in some circumstances, but to argue that a system is not broken when you can't know the encoding of filenames, and there may be differently encoded filenames ON THE SAME Filesystem is insane! of course that is broken! It may be reality, and maybe Py3 needs to do a bit more to accommodate it, but it is broken. In fact, as much as I like to bash Windows, I've had NO problems with assuming filenames in Windows are UTF-16 (as long as we use the "wide char" APIs, sigh), and OS-X's specification of filenames as utf-8 works fine. So Linux really needs to catch up here! UTF-16 is a historical accident, > yeah, but it's not really a killer, either -- the problems come when people assume UTF-16 is UCS-2, just alike assuming that utf-8 is ascii (or any one-byte encoding...) We really do need at least UTF-8 and UTF-32. But that's it. And I think > that's simple enough. is UTF-32 the same as UCS-4 ? Always a bit confused by that. Oh, and endian issues -- *sigh* Aaaargh! Do I really have to learn all this mumbo-jumbo?! (Forgive me. > :-) ) Some of it yes, I'm afraid so -- but probably not the surrogate pair stuff, etc. That stuff is pretty esoteric, and really needs to be understood by people writing APIs -- but for those of us that USE APIs, not so much. For instance, Python's handling Unicode file names almost always "just works" (as long as you stay in Python...) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu May 7 21:19:01 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 7 May 2015 12:19:01 -0700 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: References: <554A8A79.2040306@egenix.com> <554B1A9F.6010606@egenix.com> Message-ID: On May 7, 2015 10:24 AM, "Stefan Behnel" wrote: > > Paul Moore schrieb am 07.05.2015 um 10:47: > > > (I refrained from adding scipy and numpy to that list, as that would > > make this post seem like a troll attempt, which it isn't, but has > > anyone thought of the implications of a recommendation like this on > > those projects? OK, they'd probably just ignore it as they have a > > genuine need for direct use of the C API, but we would be sending > > pretty mixed messages). > > Much of scipy and its surrounding tools and libraries are actually written > in Cython. At least much of their parts that interact with Python, and > often a lot more than just the interface layer. New code in the scientific > computing community is commonly written in Cython these days, or uses other > tools for JIT or AOT compilation (Numba, numexpr, ...), many of which were > themselves partly written in Cython. Yeah, I think if anyone talks to the developers of those libraries they will get a very *un*mixed message saying, don't do what we did :-). One of scipy's GSoC projects this year is even porting a c extension to Cython, and I've been actively investigating the possibility of porting numpy into Cython as well. Mostly for the immediate benefits, but certainly it has occurred to me that in the long run this could potentially provide an escape hatch from CPython. (Numerical people are *very* interested in JITs... and something like Cython provides the unique possibility that if a project like PyPy or pyston added direct support for the language, then one could write a single source file that was fast on cpython b/c it compiled to C, and was even faster on other interpreters because the same source got jitted.) The main obstacle to porting numpy, btw, is that Cython currently assumes that each source file will generate one python extension, and any communication between source files will be via python-level imports. NumPy, of course, has 100,000 lines of C across lots of files that are all built into one extension module, and which happily communicate via direct C function calls. So incrementally porting is impossible without teaching Cython to handle this case a bit better. NumPy is an extreme outlier in this regard though. In particular this is absolutely not a reason to steer *new* projects away from Cython. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu May 7 21:29:57 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 7 May 2015 15:29:57 -0400 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> Message-ID: The point is to have a Pythonic way of saying that. Using islice or iterating over a range and indexing is ugly. It would be cleaner to implement a string class that implements fast slicing than those unpythonic pieces of code. Best, Neil On Thu, May 7, 2015 at 2:26 PM, Terry Reedy wrote: > On 5/7/2015 11:46 AM, Steven D'Aprano wrote: > >> On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote: >> >>> Since strings are constant, wouldn't it be much faster to implement >>> string >>> slices as a view of other strings? >>> >> >> String or list views would be *very* useful in situations like this: >> >> # Create a massive string >> s = "some string"*1000000 >> for c in s[1:]: >> process(c) >> > > Easily done without slicing, as discussed on python-list multiple times. > > it = iter(s) > next(it) > for c in it: process(c) > > for s[5555: 399999], use explicit indexes > > for i in range(5555, 400000): process s[i] > > or use islice. > > The use case for sequence views is when one needs to keep around both the > base sequence and the slices (views). > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/II-4QRDb8Is/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu May 7 21:37:22 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 7 May 2015 12:37:22 -0700 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> Message-ID: <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com> On May 7, 2015, at 11:26, Terry Reedy wrote: > >> On 5/7/2015 11:46 AM, Steven D'Aprano wrote: >>> On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote: >>> Since strings are constant, wouldn't it be much faster to implement string >>> slices as a view of other strings? >> >> String or list views would be *very* useful in situations like this: >> >> # Create a massive string >> s = "some string"*1000000 >> for c in s[1:]: >> process(c) > > Easily done without slicing, as discussed on python-list multiple times. > > it = iter(s) > next(it) > for c in it: process(c) > > for s[5555: 399999], use explicit indexes > > for i in range(5555, 400000): process s[i] > > or use islice. > > The use case for sequence views is when one needs to keep around both the base sequence and the slices (views). Or where you need to keep around multiple views at once. Since NumPy has native view-slicing, I suspect we can find a lot of good use cases there. One question: when you slice a view, do you get a copy, or another view? Because if it's the latter, you can write view slices with view(s)[1:] instead of view(s, 1, None), which seems like a big readability win, but on the other hand it means a view doesn't act just like a normal sequence--e.g., v[:] no longer makes a copy. (NumPy does the latter, of course.) > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From edk141 at gmail.com Thu May 7 21:57:08 2015 From: edk141 at gmail.com (Ed Kellett) Date: Thu, 07 May 2015 19:57:08 +0000 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com> References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com> Message-ID: On Thu, 7 May 2015 at 20:40 Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > > One question: when you slice a view, do you get a copy, or another view? > Because if it's the latter, you can write view slices with view(s)[1:] > instead of view(s, 1, None), which seems like a big readability win, but on > the other hand it means a view doesn't act just like a normal > sequence--e.g., v[:] no longer makes a copy. (NumPy does the latter, of > course.) Well, in the context of strings it doesn't matter. (or, in some sense, not copying immutable strings is a viable implementation technique for copying them). CPython already knows that: >>> x = "foo" >>> x is x[:] True Ed Kellett -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Thu May 7 22:02:19 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Thu, 07 May 2015 21:02:19 +0100 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> Message-ID: On 07/05/2015 20:29, Neil Girdhar wrote: > The point is to have a Pythonic way of saying that. Using islice or > iterating over a range and indexing is ugly. It would be cleaner to > implement a string class that implements fast slicing than those unpythonic > pieces of code. > > Best, > > Neil I don't see anything unpythonic there at all, just standard Python. If you want fast slicing that badly you can write the class unless somebody beats you to it as their itch is more painful. > > On Thu, May 7, 2015 at 2:26 PM, Terry Reedy wrote: > >> On 5/7/2015 11:46 AM, Steven D'Aprano wrote: >> >>> On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote: >>> >>>> Since strings are constant, wouldn't it be much faster to implement >>>> string >>>> slices as a view of other strings? >>>> >>> >>> String or list views would be *very* useful in situations like this: >>> >>> # Create a massive string >>> s = "some string"*1000000 >>> for c in s[1:]: >>> process(c) >>> >> >> Easily done without slicing, as discussed on python-list multiple times. >> >> it = iter(s) >> next(it) >> for c in it: process(c) >> >> for s[5555: 399999], use explicit indexes >> >> for i in range(5555, 400000): process s[i] >> >> or use islice. >> >> The use case for sequence views is when one needs to keep around both the >> base sequence and the slices (views). >> >> -- >> Terry Jan Reedy >> -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From skip.montanaro at gmail.com Thu May 7 22:12:58 2015 From: skip.montanaro at gmail.com (Skip Montanaro) Date: Thu, 7 May 2015 15:12:58 -0500 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com> Message-ID: I haven't seen anyone else mention it, so I will point out: interoperability with C. In C, strings are NUL-terminated. PyStringObject instances do (or used to) have NUL-terminated strings in them. According to unicodeobject.h, that seems still to be the case: typedef struct { /* There are 4 forms of Unicode strings: ... wchar_t *wstr; /* wchar_t representation (*null-terminated*) */ } PyASCIIObject; and: typedef struct { PyASCIIObject _base; Py_ssize_t utf8_length; /* Number of bytes in utf8, *excluding the* * * terminating \0*. */ char *utf8; /* UTF-8 representation (*null-terminated*) */ Py_ssize_t wstr_length; /* Number of code points in wstr, possible * surrogates count as two code points. */ } PyCompactUnicodeObject; The raw string is NUL-terminated, precisely so copying isn't required in most cases before passing to C. Making s[1:-1] a view onto the underlying string data in s would require you to copy the data when you want to pass the view into C so you could tack on that NUL. That happens a lot, so it's likely you wouldn't save much work, and result in a lot more churn in Python's memory allocator. The only place you could avoid the copy is if the view you are dealing with is a strict suffix of s. Skip -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu May 7 23:04:55 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 7 May 2015 14:04:55 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> Message-ID: <08424643-0DB2-4632-A2E5-45C67818E615@yahoo.com> On May 7, 2015, at 11:32, Chris Barker wrote: > > My not-an-expert thoughts on these issues: > > [NOTE: nested comments, so attribution may be totally confused] > >>> Why, oh why, do things have to be SO FU*****G COMPLICATED? > two reasons: > > 1) human languages are complicated, and they all have their idiosyncrasies -- some are inherently better suited to machine interpretation, but the real killer is that we want to use multiple languages with one system -- that IS inherently very complicated. > > 2) legacy decisions an backward compatibility -- this is what makes it impossible to "simply" come up with a single bets way to to do it (or a few ways, anyway...) >>> Surely 65536 (2-byte) encodings are enough to express all characters in all the languages in the world, plus all the special characters we need. > That was once thought true -- but it turns out it's not -- darn! > > Though we do think that 4 bytes is plenty, and to some extent I'm confused as to why there isn't more use of UCS-4 -- sure it wastes a lot of space, but everything in computer (memory, cache, disk space, bandwidth) is orders of magnitudes larger/faster than it was when the Unicode discussion got started. But people don't like inefficiency and, in fact, as the newer py3 Unicode objects shows, we don't need to compromise on that. > >> Or is there really some fundamental reason why things can't be simpler? (Like, REALLY, REALLY simple?) > > > Well, if there were no legacy systems, it still couldn't be REALLY, REALLY simple (though UCS-4 is close), but there could be a LOT fewer ways to do things: programming languages would have their own internal representation (like Python does), and we would have a small handful of encodings optimized for various things: UCS-4 for easy of use, utf-8 for small disk storage (at least of Euro-centered text), and that would be that. But we do have the legacies to deal with. > > >> Apple, Microsoft, Sun, and a few other vendors jumped on the Unicode bandwagon early and committed themselves to the idea that 2 bytes is enough for everything. When the world discovered that wasn't true, we were stuck with a bunch of APIs that insisted on 2 bytes. Apple was able to partly make a break with that era, but Windows and Java are completely stuck with "Unicode means 16-bit" forever, which is why the whole world is stuck dealing with UTF-16 and surrogates forever. > > I've read many of the rants about UTF-16, but in fact, it's really not any worse than UTF-8 -- it's kind of a worst of both worlds -- not a set number of bytes per char, but a lot of wasted space (particularly for euro languages), but other than a bi tof wasted sapce, it's jsut like UTF-8. > > The Problem with is it not UTF-16 itself, but the fact that an really surprising number of APIs and programmers still think that it's UCS-2, rather than UTF-16 --painful. But this makes UTF-16 an attractive nuisance. When people use UTF-16, it's not because it happens to save 12% storage or 3% CPU over UTF-8 for some particular corpus, it's because it lets either them or some API they're dealing with pretend Unicode == UCS-2 so they can write buggy code quickly instead of proper code almost as quickly. If we'd never had UCS-2, and invented UTF-16 only now, I don't think anyone would use it; therefore, it would be better it we didn't have it. > And the fact, that AFAIK, ther really is not C++ Unicode type -- at least not one commonly used. I've got no problem with the fact that they defined UTF-8, UTF-16, and UTF-32 types instead of a Unicode type. In a language where strings are just pointers to arrays of characters, what would a Unicode type even mean? > Again -- legacy issues. >> And there are still people creating filenames on Latin-1 filesystems on older Linux and Unix boxes, > > This is the odd one to me -- reading about people's struggles with py3 an *nix filenames -- they argue that *nix is not broken -- and the world should just use char* for filenames and all is well! IN fact, maybe it would be easier to handle filenames as char* in some circumstances, but to argue that a system is not broken when you can't know the encoding of filenames, and there may be differently encoded filenames ON THE SAME Filesystem is insane! of course that is broken! It may be reality, and maybe Py3 needs to do a bit more to accommodate it, but it is broken. > > In fact, as much as I like to bash Windows, I've had NO problems with assuming filenames in Windows are UTF-16 (as long as we use the "wide char" APIs, sigh), and OS-X's specification of filenames as utf-8 works fine. So Linux really needs to catch up here! I _almost_ like OS X's approach here. If you've got files on a filesystem that aren't in UTF-8 (or that a filesystem driver can't transparently represent as UTF-8 because it stores some other static, per-fs, or per-file encoding, like NTFS's static UTF-16-LE), you see those files as UTF-8 anyway. That means some are mojibake. And maybe some either aren't accessible at all, or are accessible through names the filesystem invented that mean nothing. Too bad, here are some tools to repair your broken filesystem if that's a problem for you. The problem is, those tools are only available at way too high a level. If they just put the real bytes for an undecodable filename right in an extra DIRENTRY slot, anyone could easily write tools to help the user fix it that work at the normal filesystem level. ("rename --transcode-from=Latin-1 broken/*" would require adding 11 lines of trivial code to rename.pl, including the lines for processing the flag and dealing with post-transcoding collisions, if that information were available.) But Apple doesn't seem to care about making those tools writable at that level. Which means there's no chance in hell of GNU nor BSD following Apple's lead. So no one's ever going to solve it, we'll just close our eyes and hope that eventually it's as rare a problem as dealing with Atari or EBCDIC source code are today so we can declare it solved-enough-I-guess. >> UTF-16 is a historical accident, > > yeah, but it's not really a killer, either -- the problems come when people assume UTF-16 is UCS-2, just alike assuming that utf-8 is ascii (or any one-byte encoding...) > >> We really do need at least UTF-8 and UTF-32. But that's it. And I think that's simple enough. > > is UTF-32 the same as UCS-4 ? Always a bit confused by that. Technically, UTF-32 is a subset of UCS-4. UCS-4 is an encoding of 31-bit values in 4 octets by leaving the top bit 0. UTF-32 is an encoding of 21-bit values in 32 bits by leaving the top 11 bits 0. So if you're using them to transmit Unicode code points (the only use they're defined for), they're identical. > Oh, and endian issues -- *sigh* Yes, big-endian-only is order #13 on my plans if I ever became supreme dictator. (Unless my advisors want to argue about big vs. little; in that case, I give them 4 hours to debate it, then drop them all in the crocodile pit and flip a coin.) >> Aaaargh! Do I really have to learn all this mumbo-jumbo?! (Forgive me. :-) ) > > Some of it yes, I'm afraid so -- but probably not the surrogate pair stuff, etc. That stuff is pretty esoteric, and really needs to be understood by people writing APIs -- but for those of us that USE APIs, not so much. > > For instance, Python's handling Unicode file names almost always "just works" (as long as you stay in Python...) > > > -Chris > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri May 8 00:09:34 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 7 May 2015 18:09:34 -0400 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com> Message-ID: This was a CPython idea after all, so I was assuming a C implementation, which means that new flags would have to be added to the string object to denote a string slice, etc. Like I said in another message: it's not that important to me though. I was just curious as to why CPython was designed so that string slicing is linear rather than constant time given that strings are constant. I'm getting the impression that the payoff is not worth the complexity. On Thu, May 7, 2015 at 4:12 PM, Skip Montanaro wrote: > I haven't seen anyone else mention it, so I will point out: > interoperability with C. In C, strings are NUL-terminated. PyStringObject > instances do (or used to) have NUL-terminated strings in them. According to > unicodeobject.h, that seems still to be the case: > > typedef struct { > /* There are 4 forms of Unicode strings: > ... > wchar_t *wstr; /* wchar_t representation ( > *null-terminated*) */ > } PyASCIIObject; > > and: > > typedef struct { > PyASCIIObject _base; > Py_ssize_t utf8_length; /* Number of bytes in utf8, *excluding > the* > * * terminating \0*. */ > char *utf8; /* UTF-8 representation (*null-terminated*) > */ > Py_ssize_t wstr_length; /* Number of code points in wstr, possible > * surrogates count as two code points. */ > } PyCompactUnicodeObject; > > The raw string is NUL-terminated, precisely so copying isn't required in > most cases before passing to C. Making s[1:-1] a view onto the underlying > string data in s would require you to copy the data when you want to pass > the view into C so you could tack on that NUL. That happens a lot, so it's > likely you wouldn't save much work, and result in a lot more churn in > Python's memory allocator. The only place you could avoid the copy is if > the view you are dealing with is a strict suffix of s. > > Skip > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/II-4QRDb8Is/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/II-4QRDb8Is/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Fri May 8 00:30:11 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 08 May 2015 07:30:11 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> Message-ID: <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp> Chris Barker writes: > I've read many of the rants about UTF-16, but in fact, it's really > not any worse than UTF-8 Yes, it is. It's not ASCII compatible. You can safely use the usual libc string APIs on UTF-8 (except for any that might return only part of a string), but not on UTF-16 (nulls). This is a pretty big advantage for UTF-8 in practice. From rosuav at gmail.com Fri May 8 03:40:09 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 8 May 2015 11:40:09 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <20150507153123.GT5663@ando.pearwood.info> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <20150507153123.GT5663@ando.pearwood.info> Message-ID: On Fri, May 8, 2015 at 1:31 AM, Steven D'Aprano wrote: > But the other 90% of the complexity is inherent to human languages. For > example, you know what the lower case of "I" is, don't you? It's "i". > But not in Turkey, which has both a dotted and dotless version: > > I ? > ? i > > (Strangely, as far as I know, nobody has a dotted J or dotless j.) > > Consequently, Unicode has a bunch of complexity related to left-to-right > and right-to-left writing systems, accents, joiners, variant forms, and > other issues. But, unless you're actually writing in a language which > needs that, or writing a word-processor application, you can usually > ignore all of that and just treat them as "characters". Or a transliteration script. Imagine you have a whole lot of videos with text over them, and you'd like to transcribe that text into, well, a text file. It's pretty easy with Latin-based scripts; just come up with a notation for keying in diacriticals and the handful of other characters (slashed O for Norwegian, D with bar for Vietnamese, etc), then (optionally) perform an NFC transformation, and job's done. Cyrillic, Greek, Elder Futhark, and even IPA, can be handled fairly readily by means of simple reversible transliterations (? becomes d, d becomes ?), with a handful of special cases (the Greek sigma has medial (?) and final (?) forms, both of which translate into the Latin letter 's'). Korean's hangul syllables are a slightly odd case, because they can be NFC composed from individual letters, but the decomposed forms take up more space on the page, which makes the NFC transformation mandatory: "hanguk" = "\u1112\u1161\u11ab\u1100\u116e\u11a8" = "\ud55c\uad6d" = "Korea" Aside from that, all the complexities are, as Steven says, inherent to human languages. Unicode isn't the problem; Unicode is just reflecting the fact that people write stuff differently. Python also isn't the problem; Python is one of my top two preferred languages for any sort of international work (the other being Pike, and for all the same reasons). >> Imageine if we were starting to design the 21st century from scratch, >> throwing away all the history? How would we go about it? > > Well, for starters I would insist on re-introducing thorn ? and eth ? > back into English :-) Sure, that'll unify us with ancient texts, and with modern Icelandic. But what about other languages with the same sound (IPA: ?)? European Spanish (though not Mexican Spanish) spells it as "z" - English could do the same, given that "s" is able to make the same sound "z" does in English. :) But seriously, the alphabetic languages aren't much of a problem. Unicode can cope with European languages easily. What I'd want to change is to use some form of phonetic system for Chinese and Japanese languages - a system in which the written form does its best to correspond to the spoken form, rather than the massively complex pictorial system now in use. At very least, I'd like to see an alternative written form used for names, in which they're composed of sounds; that way, there'd be a finite set of characters in use, and it'd be far easier for us to cope with them. (The problem of a collision would be no worse than already exists when names are said aloud. Having multiple characters pronounced the same way is a benefit only to the written form.) It's too late now, of course. ChrisA From steve at pearwood.info Fri May 8 03:54:56 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 8 May 2015 11:54:56 +1000 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com> Message-ID: <20150508015455.GV5663@ando.pearwood.info> On Thu, May 07, 2015 at 03:12:58PM -0500, Skip Montanaro wrote: > I haven't seen anyone else mention it, so I will point out: > interoperability with C. In C, strings are NUL-terminated. PyStringObject > instances do (or used to) have NUL-terminated strings in them. According to > unicodeobject.h, that seems still to be the case: How does that work? Python strings can contain embedded NULs: s = u"abc\0def" -- Steve From rosuav at gmail.com Fri May 8 04:02:01 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 8 May 2015 12:02:01 +1000 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: <20150508015455.GV5663@ando.pearwood.info> References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com> <20150508015455.GV5663@ando.pearwood.info> Message-ID: On Fri, May 8, 2015 at 11:54 AM, Steven D'Aprano wrote: > On Thu, May 07, 2015 at 03:12:58PM -0500, Skip Montanaro wrote: >> I haven't seen anyone else mention it, so I will point out: >> interoperability with C. In C, strings are NUL-terminated. PyStringObject >> instances do (or used to) have NUL-terminated strings in them. According to >> unicodeobject.h, that seems still to be the case: > > How does that work? Python strings can contain embedded NULs: > > s = u"abc\0def" It's a pure convenience. It means that C string operations are guaranteed to terminate; they aren't guaranteed to process the whole string, but they won't run on into random memory. For a lot of cases, that's pretty handy. ChrisA From steve at pearwood.info Fri May 8 04:11:26 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 8 May 2015 12:11:26 +1000 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> Message-ID: <20150508021126.GW5663@ando.pearwood.info> On Thu, May 07, 2015 at 02:26:06PM -0400, Terry Reedy wrote: > On 5/7/2015 11:46 AM, Steven D'Aprano wrote: > >On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote: > >>Since strings are constant, wouldn't it be much faster to implement string > >>slices as a view of other strings? > > > >String or list views would be *very* useful in situations like this: > > > ># Create a massive string > >s = "some string"*1000000 > >for c in s[1:]: > > process(c) > > Easily done without slicing, as discussed on python-list multiple times. For some definition of "easy". If all you want is to skip the first item, this is not too bad: > it = iter(s) > next(it) > for c in it: process(c) Skipping the *last* item, on the other hand? for c in s[:-1]: process(c) Yes, it can be done, but its even messier and uglier and a sequence view would make it neat and pretty: it = iter(s) prev = next(it) for c in it: process(prev) prev = c > for s[5555: 399999], use explicit indexes > for i in range(5555, 400000): process s[i] What, are we programming in Fortran, like some sort of Neanderthal? *grins* The point isn't that we cannot solve these problems without views, but that views would let us solve them in a clean Pythonic manner. -- Steve From ernest.moloko at gmail.com Fri May 8 06:35:44 2015 From: ernest.moloko at gmail.com (Lesego Moloko) Date: Fri, 8 May 2015 06:35:44 +0200 Subject: [Python-ideas] Problems with Python Message-ID: Dear all I am python novice and I am experiencing some problems with one of my programs that I converted from Matlab to Python. Is this an appropriate platform to ask such questions? I will be awaiting your answer before posting details of my problem, Thank you Regards Lesego Sent from Lesego's iPhone From rosuav at gmail.com Fri May 8 06:38:13 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 8 May 2015 14:38:13 +1000 Subject: [Python-ideas] Problems with Python In-Reply-To: References: Message-ID: On Fri, May 8, 2015 at 2:35 PM, Lesego Moloko wrote: > I am python novice and I am experiencing some problems with one of my programs that I converted from Matlab to Python. Is this an appropriate platform to ask such questions? > > I will be awaiting your answer before posting details of my problem, > This list is for discussion of ideas about future development of the Python language itself. The best place to ask would be python-list at python.org, which is two-way gatewayed with the comp.lang.python newsgroup; you'll find lots of people there who are happy to help out! ChrisA From ben+python at benfinney.id.au Fri May 8 07:15:43 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 08 May 2015 15:15:43 +1000 Subject: [Python-ideas] Problems with Python References: Message-ID: <857fsjd4zk.fsf@benfinney.id.au> Lesego Moloko writes: > I am python novice and I am experiencing some problems with one of my > programs that I converted from Matlab to Python. Is this an > appropriate platform to ask such questions? If you're looking for a free-for-all discussion forum for Python programmers, see . If you're looking for a forum dedicated to teaching Python newcomers, see . These and more Python community forums are documented at . Thanks for asking! -- \ ?The whole area of [treating source code as intellectual | `\ property] is almost assuring a customer that you are not going | _o__) to do any innovation in the future.? ?Gary Barnett | Ben Finney From rustompmody at gmail.com Fri May 8 07:19:50 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Fri, 8 May 2015 10:49:50 +0530 Subject: [Python-ideas] (no subject) In-Reply-To: References: Message-ID: On Wed, May 6, 2015 at 6:45 PM, Ivan Levkivskyi wrote: > Dear all, > > The matrix multiplication operator @ is going to be introduced in Python > 3.5 and I am thinking about the following idea: > > The semantics of matrix multiplication is the composition of the > corresponding linear transformations. > A linear transformation is a particular example of a more general concept > - functions. > The latter are frequently composed with ("wrap") each other. For example: > > plot(real(sqrt(data))) > > However, it is not very readable in case of many wrapping layers. > Therefore, it could be useful to employ > the matrix multiplication operator @ for indication of function > composition. This could be done by such (simplified) decorator: > > class composable: > > def __init__(self, func): > self.func = func > > def __call__(self, arg): > return self.func(arg) > > def __matmul__(self, other): > def composition(*args, **kwargs): > return self.func(other(*args, **kwargs)) > return composable(composition) > > I think using such decorator with functions that are going to be deeply > wrapped > could improve readability. > You could compare (note that only the outermost function should be > decorated): > > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) > (data_array) > > I think the latter is more readable, also compare > > def sunique(lst): > return sorted(list(set(lst))) > > vs. > > sunique = sorted @ list @ set > I would like to suggest that if composition is in fact added to python its order is 'corrected' ie in math there are two alternative definitions of composition [1] f o g = ? x ? g(f(x)) [2] f o g = ? x ? f(g(x)) [2] is more common but [1] is also used And IMHO [1] is much better for left-to-right reading so your example becomes sunique = set @ list @ sorted which reads as smoothly as a classic Unix pipeline: "Unnamed parameter input to set; output inputted to list; output inputted to sort" -------------- next part -------------- An HTML attachment was scrubbed... URL: From koos.zevenhoven at aalto.fi Fri May 8 09:03:28 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Fri, 8 May 2015 10:03:28 +0300 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> Message-ID: <554C5FC0.1070106@aalto.fi> On 8.5.2015 8:19, Rustom Mody wrote: > On Wed, May 6, 2015 at 6:45 PM, Ivan Levkivskyi > wrote: > > > def sunique(lst): > return sorted(list(set(lst))) > > vs. > > sunique = sorted @ list @ set > > > I would like to suggest that if composition is in fact added to python > its order is 'corrected' > ie in math there are two alternative definitions of composition > > [1] f o g = ? x ? g(f(x)) > [2] f o g = ? x ? f(g(x)) > > [2] is more common but [1] is also used > > And IMHO [1] is much better for left-to-right reading so your example > becomes > sunique = set @ list @ sorted > which reads as smoothly as a classic Unix pipeline: > > "Unnamed parameter input to set; output inputted to list; output > inputted to sort" > > While both versions make sense, [2] is the one that resembles the chaining of linear operators or matrices, since column vectors are the convention. For the left-to-right pipeline version, some other operator might be more appropriate. Also, it would then be more clear to also feed x into the pipeline from the left, instead of putting (x) on the right like in a normal function call. As a random example, (root @ mean @ square)(x) would produce the right order for rms when using [2]. -- Koos -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Fri May 8 09:59:01 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 08 May 2015 09:59:01 +0200 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com> Message-ID: <554C6CC5.8010605@egenix.com> On 08.05.2015 00:09, Neil Girdhar wrote: > This was a CPython idea after all, so I was assuming a C implementation, > which means that new flags would have to be added to the string object to > denote a string slice, etc. > > Like I said in another message: it's not that important to me though. I > was just curious as to why CPython was designed so that string slicing is > linear rather than constant time given that strings are constant. This was considered very early on in the Unicode type design, but dropped since the problem with such slices is that you have to keep a reference to the original string around which keeps this alive, even if you just use a slice of a few chars from it. There are some situations where such a slicing mechanism would be nice to have, but in most of those you can simply work on the original string using an offset index. Indeed, working with index tuples into the original string is often a better strategy. You can see this used in mxTextTools: http://www.egenix.com/products/python/mxBase/mxTextTools/ to create high performance text parsing and manipulation tools. > I'm getting the impression that the payoff is not worth the complexity. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 08 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From stefan at bytereef.org Fri May 8 11:09:21 2015 From: stefan at bytereef.org (Stefan Krah) Date: Fri, 8 May 2015 09:09:21 +0000 (UTC) Subject: [Python-ideas] discouraging direct use of the C-API References: Message-ID: Paul Moore writes: > > https://mail.python.org/pipermail/python-dev/2013-December/130772.html > > > > > > CFFI is very nice (superb API), but not for high performance use cases. > > I'm guessing that benchmark used cffi in the "ABI level" dynamic form > that matches ctypes. Did you try the cffi "API level" form that > creates a C extension? I'd be curious as to where that falls in > performance. ffi.verify() is only about 10% faster both in pypy and cpython, so it doesn't change much in the posted figures. Stefan Krah From stefan_ml at behnel.de Fri May 8 12:17:40 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 08 May 2015 12:17:40 +0200 Subject: [Python-ideas] discouraging direct use of the C-API In-Reply-To: <20150507123239.GA1768@k3> References: <20150507123239.GA1768@k3> Message-ID: David Wilson schrieb am 07.05.2015 um 14:32: > On Wed, May 06, 2015 at 10:23:09AM -0600, Eric Snow wrote: >> put a big red note at the top of every page of the C-API docs that >> encourages folks to either use CFFI or Cython. > > One of CPython's traditional strongholds is its use as an embedded > language. I've worked on a bunch of commercial projects using it in this > way, often specifically for improved performance/access to interpreter > internals, and this is not to mention the numerous free software > projects doing similar: gdb, uwsgi, mod_python, Freeswitch, and so on. Ah, yes, there is a big wall in the CPython docs between "extending" and "embedding" that gives users the impression that they are really different concepts. But that's just marketing. They are not. The only difference is that in one case, it's the CPython interpreter that starts up and then calls into native user code, and in the other case, it's user code that starts up and then launches a CPython interpreter. From the moment on where both the user code and the CPython interpreter are running, there is exactly zero difference between the two, and you can use the same tools for interfacing native code with Python code in both cases. What this means is that even in an embedding scenario, user code will typically only need to call a tiny set of C-API functions to start up and shut down the interpreter, and then leave all the rest, all the interesting stuff, to tools that do it better. Stefan From storchaka at gmail.com Fri May 8 13:18:07 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 08 May 2015 14:18:07 +0300 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <20150505172845.GF5663@ando.pearwood.info> References: <20150505172845.GF5663@ando.pearwood.info> Message-ID: On 05.05.15 20:28, Steven D'Aprano wrote: > On Mon, May 04, 2015 at 11:15:47AM +0300, Serhiy Storchaka wrote: >> Surrogate characters (U+D800-U+DFFF) are not allowed in Unicode, but >> Python allows them in Unicode strings for different purposes. >> >> 1) To represent UTF-8, UTF-16 or UTF-32 encoded strings that contain >> surrogate characters. This data can came from other programs, including >> Python 2. > > Can you give a simple example of a Python 2 program that provides output > that Python 3 will read as surrogates? f.write(u'?'[:1].encode('utf-8')) json.dump(f, u'?'[:1]) pickle.dump(f, u'?'[:1]) >> 2) To represent undecodable bytes in ASCII-compatible encoding with the >> "surrogateescape" error handlers. >> >> So surrogate characters can be obtained from "surrogateescape" or >> "surrogatepass" error handlers or created manually with chr() or %c. >> >> Some encodings (UTF-7, unicode-escape) also allows surrogate characters. > > Also UTF-16, and possible others. > > I'm not entirely sure, but I think that this is a mistake, if not a > bug. I think that *no* UTF encoding should allow lone surrogates to > escape through encoding. But I not entirely sure, so I won't argue that > now -- besides, it's irrelevant to the proposal. UTF-7 is specified by RFC 2152 and should encode any UCS-2 character. unicode-escape and raw-unicode-escape should encode any Python string. This can't be changed. UTF-8, UTF-16, and UTF-32 don't encode surrogates by default in current Python 3, but encode surrogates in Python 2. The "surrogatepass" error handler was added for compatibility with Python 2. >> But on output the surrogate characters can cause fail. > > What do you mean by "on output"? Do you mean when printing? Printing, writing to text file, passing to C extension, that makes encoding internally, etc. >> In issue18814 proposed several functions to work with surrogate and >> astral characters. All these functions takes a string and returns a string. > > I like the idea of having better surrogate and astral character > handling, but I don't think I like your suggested API of using functions > for this. I think this is better handled as str-to-str codecs. > > Unfortunately, there is still no concensus of the much-debated return of > str-to-str and byte-to-byte codecs via the str.encode and byte.decode > methods. At one point people were talking about adding a separate method > (transform?) to handle them, but that seems to have been forgotten. > Fortunately the codecs module handles them just fine: > > py> codecs.encode("Hello world", "rot-13") > 'Uryyb jbeyq' > > > I propose, instead of your function/method rehandle_surrogatepass(), we > add a pair of str-to-str codecs: > > codecs.encode(mystring, 'remove_surrogates', errors='strict') > codecs.encode(mystring, 'remove_astrals', errors='strict') > > For the first one, if the string has no surrogates, it returns the > string unchanged. If it contains any surrogates, the error handler runs > in the usual fashion. > > The second is exactly the same, except it checks for astral characters. > > For the avoidance of doubt: > > * surrogates are code points in the range U+D800 to U+DFFF inclusive; > > * astrals are characters from the Supplementary Multilingual Planes, > that is code points U+10000 and above. > > > Advantage of using codecs: > > - there's no arguments about where to put it (is it a str method? a > function? in the string module? some other module? where?) > > - we can use the usual codec machinery, rather than duplicate it; > > - people already understand that codecs and error handles go together; > > Disadvantage: > > - have to use codec.encode instead of str.encode. > > > It is slightly sad that there is still no entirely obvious way to call > str-to-str codecs from the encode method, but since this is a fairly > advanced and unusual use-case, I don't think it is a problem that we > have to use the codecs module. Disadvantage of using codecs is that "decoding" operation doesn't make sense. If use one global registry for named transformation, it should be separate registry and separate method (str.transform) for one-way str-to-str transformations. In additional to above transformations of surrogates, it can contain transformations "upper", "lower", "title". But this is separate issue. From storchaka at gmail.com Fri May 8 13:54:37 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 08 May 2015 14:54:37 +0300 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 05.05.15 11:23, Stephen J. Turnbull wrote: > Serhiy Storchaka writes: > > > Use cases include programs that use tkinter (common build of Tcl/Tk > > don't accept non-BMP characters), email or wsgiref. > > So, consider Tcl/Tk. If you use it for input, no problem, it *can't* > produce non-BMP characters. So you're using it for output. If > knowing that your design involves tkinter, you deduce you must not > accept non-BMP characters on input, where's your problem? With Tcl/Tk all is not so easy. The main issue is with translating from Tcl to Python. Tcl uses at least two representations for strings (UCS-2 and modified UTF-8, and Latin1 in some cases), both can contain invalid codes and implicit conversion from one to other is lossy. Currently there is a way to crash IDLE (and may be other Tkinter applications) by just pasting mailformed data from clipboard. I don't think that my proposal will help Tkinter a lot, but there are requests for such features, and perhaps these functions could help to solve or workaround at least some of Tkinter issues. > And ... you looked twice at your proposal? You have basically > reproduced the codec error handling API for .decode and .encode in a > bunch to str2str "rehandle" functions. Yes, this is the main advantage of proposed functions. They reuse existing error handlers and are extensible by writing new error handlers. > In other words, you need to > know as much to use "rehandle_*" properly as you do to use .decode and > .encode. I do not see a win for the programmer who is mostly innocent > of encoding knowledge. Is it a problem? These functions are for experienced users. Perhaps mostly for authors of libraries and frameworks. > If we apply these rehandle_* thumbs to the holes in the I18N dike, > it's just going to spring more leaks elsewhere. There are a lot of butteries included in Python. They can explode if use them incorrectly. Sorry, I don't understand your frustration. From jonathan at slenders.be Fri May 8 14:16:41 2015 From: jonathan at slenders.be (Jonathan Slenders) Date: Fri, 8 May 2015 14:16:41 +0200 Subject: [Python-ideas] What is happening with array.array('u') in Python 4? Message-ID: Hi all, What will happen to array.array('u') in Python 4? It is deprecated right now. I remember reading about mutable strings somewhere, but I forgot, and I can't find the discussion. In any case, I need to have a mutable character array, for efficient manipulations. (Not a byte array.) And I need to be able to use the "re" module to search through it. array.array('u') works great in Python 3. Will we still have something like this in Python 4? Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri May 8 14:28:33 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 8 May 2015 22:28:33 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <20150505172845.GF5663@ando.pearwood.info> Message-ID: On Fri, May 8, 2015 at 9:18 PM, Serhiy Storchaka wrote: >> Can you give a simple example of a Python 2 program that provides output >> that Python 3 will read as surrogates? > > > f.write(u'?'[:1].encode('utf-8')) > json.dump(f, u'?'[:1]) > pickle.dump(f, u'?'[:1]) Not for me. In my Python 2, u'?'[:1] == u'?'. I suppose you're talking only about the (buggy) narrow builds, in which case you don't need to use string slicing at all. But in that case, all you're doing is using a single "\uNNNN" escape code to create an unmatched surrogate. ChrisA From storchaka at gmail.com Fri May 8 14:32:50 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 08 May 2015 15:32:50 +0300 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <20150505172845.GF5663@ando.pearwood.info> Message-ID: On 08.05.15 15:28, Chris Angelico wrote: > On Fri, May 8, 2015 at 9:18 PM, Serhiy Storchaka wrote: >>> Can you give a simple example of a Python 2 program that provides output >>> that Python 3 will read as surrogates? >> >> >> f.write(u'?'[:1].encode('utf-8')) >> json.dump(f, u'?'[:1]) >> pickle.dump(f, u'?'[:1]) > > Not for me. In my Python 2, u'?'[:1] == u'?'. I suppose you're > talking only about the (buggy) narrow builds, in which case you don't > need to use string slicing at all. But in that case, all you're doing > is using a single "\uNNNN" escape code to create an unmatched > surrogate. I want to say that that it is easy to unintentionally get a data with encoded lone surrogate in Python 2. From rosuav at gmail.com Fri May 8 14:41:01 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 8 May 2015 22:41:01 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <20150505172845.GF5663@ando.pearwood.info> Message-ID: On Fri, May 8, 2015 at 10:32 PM, Serhiy Storchaka wrote: > On 08.05.15 15:28, Chris Angelico wrote: >> >> On Fri, May 8, 2015 at 9:18 PM, Serhiy Storchaka >> wrote: >>>> >>>> Can you give a simple example of a Python 2 program that provides output >>>> that Python 3 will read as surrogates? >>> >>> >>> >>> f.write(u'?'[:1].encode('utf-8')) >>> json.dump(f, u'?'[:1]) >>> pickle.dump(f, u'?'[:1]) >> >> >> Not for me. In my Python 2, u'?'[:1] == u'?'. I suppose you're >> talking only about the (buggy) narrow builds, in which case you don't >> need to use string slicing at all. But in that case, all you're doing >> is using a single "\uNNNN" escape code to create an unmatched >> surrogate. > > > I want to say that that it is easy to unintentionally get a data with > encoded lone surrogate in Python 2. Only on Windows, where the standard builds are narrow ones. (Also, how hard and how bad would it be to change that, and have all python.org installers produce wide builds?) ChrisA From stefan_ml at behnel.de Fri May 8 14:50:36 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 08 May 2015 14:50:36 +0200 Subject: [Python-ideas] What is happening with array.array('u') in Python 4? In-Reply-To: References: Message-ID: Jonathan Slenders schrieb am 08.05.2015 um 14:16: > What will happen to array.array('u') in Python 4? It is deprecated right > now. > I remember reading about mutable strings somewhere, but I forgot, and I > can't find the discussion. > > In any case, I need to have a mutable character array, for efficient > manipulations. (Not a byte array.) > And I need to be able to use the "re" module to search through it. > array.array('u') works great in Python 3. Well, for some value of "great" and "works". The problems are that 1) 'u' has a platform dependent size of 16 or 32 bits and 2) it does not match the internal representation of unicode strings. It will thus use surrogate pairs on some platforms and not on others, and converting between Unicode strings and arrays requires an encoding/decoding step. And it also does not seem like the "re" module currently supports searching in unicode arrays (everything else would have been very surprising). ISTM that your best bet is currently to look for a suitable module on PyPI that implements mutable character arrays. I'm sure you're not the only one who needs something like that. The usual suspect would be NumPy, but there may be smaller and simpler tools available. Stefan From mal at egenix.com Fri May 8 15:00:02 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 08 May 2015 15:00:02 +0200 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <20150505172845.GF5663@ando.pearwood.info> Message-ID: <554CB352.9090208@egenix.com> On 08.05.2015 14:41, Chris Angelico wrote: > On Fri, May 8, 2015 at 10:32 PM, Serhiy Storchaka wrote: >> On 08.05.15 15:28, Chris Angelico wrote: >>> >>> On Fri, May 8, 2015 at 9:18 PM, Serhiy Storchaka >>> wrote: >>>>> >>>>> Can you give a simple example of a Python 2 program that provides output >>>>> that Python 3 will read as surrogates? >>>> >>>> >>>> >>>> f.write(u'?'[:1].encode('utf-8')) >>>> json.dump(f, u'?'[:1]) >>>> pickle.dump(f, u'?'[:1]) >>> >>> >>> Not for me. In my Python 2, u'?'[:1] == u'?'. I suppose you're >>> talking only about the (buggy) narrow builds, in which case you don't >>> need to use string slicing at all. But in that case, all you're doing >>> is using a single "\uNNNN" escape code to create an unmatched >>> surrogate. >> >> >> I want to say that that it is easy to unintentionally get a data with >> encoded lone surrogate in Python 2. > > Only on Windows, where the standard builds are narrow ones. (Also, how > hard and how bad would it be to change that, and have all python.org > installers produce wide builds?) Not only on Windows. The default Python 2 build is a narrow build. Most Unix distributions explicitly switch on the UCS4 support, so you usually get UCS4 versions on Unix, but the default still is UCS2. In Python 3.3+ this doesn't matter anymore, since Python selects the storage type based on the string content, so you get UCS2/UCS4 as needed on all platforms. All that said, it's still possible to work with lone surrogates in Python, so Serhiy's example still applies in concept. And slicing surrogates is only one way to break Unicode strings. The many combining characters and annotations offer plenty more :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 08 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ron3200 at gmail.com Fri May 8 18:05:54 2015 From: ron3200 at gmail.com (Ron Adam) Date: Fri, 08 May 2015 12:05:54 -0400 Subject: [Python-ideas] (no subject) In-Reply-To: References: Message-ID: On 05/08/2015 01:19 AM, Rustom Mody wrote: > On Wed, May 6, 2015 at 6:45 PM, Ivan Levkivskyi > > wrote: > > Dear all, > > The matrix multiplication operator @ is going to be introduced in > Python 3.5 and I am thinking about the following idea: > > The semantics of matrix multiplication is the composition of the > corresponding linear transformations. > A linear transformation is a particular example of a more general > concept - functions. > The latter are frequently composed with ("wrap") each other. For example: > > plot(real(sqrt(data))) > > However, it is not very readable in case of many wrapping layers. > Therefore, it could be useful to employ > the matrix multiplication operator @ for indication of function > composition. This could be done by such (simplified) decorator: > > class composable: > > def __init__(self, func): > self.func = func > > def __call__(self, arg): > return self.func(arg) > > def __matmul__(self, other): > def composition(*args, **kwargs): > return self.func(other(*args, **kwargs)) > return composable(composition) > > I think using such decorator with functions that are going to be deeply > wrapped > could improve readability. > You could compare (note that only the outermost function should be > decorated): > > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) > (data_array) > > I think the latter is more readable, also compare > > def sunique(lst): > return sorted(list(set(lst))) > > vs. > > sunique = sorted @ list @ set > > > I would like to suggest that if composition is in fact added to python its > order is 'corrected' > ie in math there are two alternative definitions of composition > > [1] f o g = ? x ? g(f(x)) > [2] f o g = ? x ? f(g(x)) > > [2] is more common but [1] is also used > > And IMHO [1] is much better for left-to-right reading so your example becomes > sunique = set @ list @ sorted > which reads as smoothly as a classic Unix pipeline: > > "Unnamed parameter input to set; output inputted to list; output inputted > to sort" Here's how I would do it as a function. >>> def apply(data, *fns): ... for f in fns: ... data = f(data) ... return data ... >>> apply((8, 9, 8, 4, 5), set, list, sorted) [4, 5, 8, 9] This is a variation of reduce except, it's applying many functions to a single data item rather than applying a single function to many data items. result = apply(data, f, g, e) Which would be the same as... result = e(g(f(data))) Having the order be in alignment with object methods calls is a consistency which can help with learning how to use it. I don't think special syntax has an advantage over a function for this. It may even be a disadvantage. The problem with special syntax is it can't be represented as data easily. That would be counter to a functional style of programming, which seems at odds with the desired feature. (IMO) fns = (set, list, sorted) result = apply(data, *fns) Also having this right next to reduce in functools would work nicely, both for usability and documentation. Cheers, Ron From rosuav at gmail.com Fri May 8 18:13:47 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 9 May 2015 02:13:47 +1000 Subject: [Python-ideas] (no subject) In-Reply-To: References: Message-ID: On Sat, May 9, 2015 at 2:05 AM, Ron Adam wrote: > I don't think special syntax has an advantage over a function for this. It > may even be a disadvantage. The problem with special syntax is it can't be > represented as data easily. That would be counter to a functional style of > programming, which seems at odds with the desired feature. (IMO) > > fns = (set, list, sorted) > result = apply(data, *fns) There's no problem with representing it as data; just like elsewhere in Python, you can break out a subexpression and give it a name. (a @ b @ c)(x) # <=> f = (a @ b @ c) f(x) It's no different from method calls: sys.stdout.write("Hello, world!\n") # <=> write = sys.stdout.write write("Hello, world!\n") ChrisA From ron3200 at gmail.com Fri May 8 19:10:49 2015 From: ron3200 at gmail.com (Ron Adam) Date: Fri, 08 May 2015 13:10:49 -0400 Subject: [Python-ideas] Function Composition was:Re: (no subject) In-Reply-To: References: Message-ID: On 05/08/2015 12:13 PM, Chris Angelico wrote: > On Sat, May 9, 2015 at 2:05 AM, Ron Adam wrote: >> >I don't think special syntax has an advantage over a function for this. It >> >may even be a disadvantage. The problem with special syntax is it can't be >> >represented as data easily. That would be counter to a functional style of >> >programming, which seems at odds with the desired feature. (IMO) >> > >> > fns = (set, list, sorted) >> > result = apply(data, *fns) > There's no problem with representing it as data; just like elsewhere > in Python, you can break out a subexpression and give it a name. > > (a @ b @ c)(x) > # <=> > f = (a @ b @ c) > f(x) > > It's no different from method calls: > > sys.stdout.write("Hello, world!\n") > # <=> > write = sys.stdout.write > write("Hello, world!\n") That's good. What advantage over using a function does the syntax have? So far it looks fairly equal. I think a function would be much simpler to implement, document, and maintain. (Unless there are clear advantages to the syntax has that a function doesn't.) Cheers, Ron From abarnert at yahoo.com Fri May 8 21:45:10 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 8 May 2015 12:45:10 -0700 Subject: [Python-ideas] (no subject) In-Reply-To: References: Message-ID: On May 8, 2015, at 09:13, Chris Angelico wrote: > >> On Sat, May 9, 2015 at 2:05 AM, Ron Adam wrote: >> I don't think special syntax has an advantage over a function for this. It >> may even be a disadvantage. The problem with special syntax is it can't be >> represented as data easily. That would be counter to a functional style of >> programming, which seems at odds with the desired feature. (IMO) >> >> fns = (set, list, sorted) >> result = apply(data, *fns) > > There's no problem with representing it as data; just like elsewhere > in Python, you can break out a subexpression and give it a name. > > (a @ b @ c)(x) > # <=> > f = (a @ b @ c) > f(x) Except that a@ and @b aren't subexpressions in Python. With a function, you can write them with partial; with an operator... Well, you can write the first with the bound dunder method and the second with partial and the unbound dunder method, but I don't think anyone finds type(b).__mmul__ as readable as @ (not to mention that the former is monomorphic, while the latter, like almost everything in Python, is duck typed, as it should be). > It's no different from method calls: > > sys.stdout.write("Hello, world!\n") > # <=> > write = sys.stdout.write > write("Hello, world!\n") There are other things that are a lot easier to do with a function than an operator, which aren't a problem for methods. For example, you can unpack arguments. How would you write apply(value, *funcs) or compose(*funcs) with an operator without calling reduce on type(funcs[0]).__mmul__) or similar? Of course you can always get around any of these problems by wrapping any operator expression up in a function with lambda--but if that really were good enough in practice, we wouldn't have comprehensions (after all, you can do the same thing with map just by wrapping up the expression with lambda--but nobody ever does that except people trying to use Python as Lisp, and nobody else wants to read their code). This is (a part of) what I meant when I said that just posting Haskell's compose operator without all of the other language features and style idioms that make it so useful won't necessarily give us a useful Python feature. In Haskell, a@ and @b are subexpressions that can be given names and passed around, you can turn @ into a function just by wrapping it in parens rather than having to pull the dunder method off a type, etc. That solves most of these problems automatically, but we don't want to import all of those features into Python. (On the other hand, Haskell only "solves" the *args unpacking issue by just not allowing variable or even optional parameters--I'm not arguing that Haskell is "better" here, just that a compose operator fits into Haskell better than into Python.) From chris.barker at noaa.gov Fri May 8 22:46:20 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 8 May 2015 13:46:20 -0700 Subject: [Python-ideas] What is happening with array.array('u') in Python 4? In-Reply-To: References: Message-ID: On Fri, May 8, 2015 at 5:50 AM, Stefan Behnel wrote: > ISTM that your best bet is currently to look for a suitable module on PyPI > that implements mutable character arrays. I'm sure you're not the only one > who needs something like that. The usual suspect would be NumPy, but there > may be smaller and simpler tools available. Numpy does have mutable character arrays -- and the Unicode version uses 4bytes per char, regardless of platform (and so should array.array!) But I don't think you get much of any of the features of strings, and I doubt that the re module would work with it. A "real" mutable string type might be pretty nice to have , but I think it would be pretty hard to d to get it to do everything a string can do. (or maybe not -- I suppose you could cut and paste the regular string cdce, and simply add the mutable part....) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Fri May 8 23:45:35 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 08 May 2015 17:45:35 -0400 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <554CB352.9090208@egenix.com> References: <20150505172845.GF5663@ando.pearwood.info> <554CB352.9090208@egenix.com> Message-ID: <1431121535.657430.264648673.5D917DDF@webmail.messagingengine.com> On Fri, May 8, 2015, at 09:00, M.-A. Lemburg wrote: > Not only on Windows. The default Python 2 build is a narrow build. > > Most Unix distributions explicitly switch on the UCS4 support, > so you usually get UCS4 versions on Unix, but the default still > is UCS2. I had always assumed that the build system selects the default build based on the size of wchar_t (2 on windows, 4 on most unix systems). From chris.barker at noaa.gov Fri May 8 22:39:57 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 8 May 2015 13:39:57 -0700 Subject: [Python-ideas] Problems with Python In-Reply-To: <857fsjd4zk.fsf@benfinney.id.au> References: <857fsjd4zk.fsf@benfinney.id.au> Message-ID: and if you are converting from MATLAB, then you probably are, and certainly should, be using numpy, so the numpy list is good start: http://www.numpy.org/ http://mail.scipy.org/mailman/listinfo/numpy-discussion -Chris On Thu, May 7, 2015 at 10:15 PM, Ben Finney wrote: > Lesego Moloko > writes: > > > I am python novice and I am experiencing some problems with one of my > > programs that I converted from Matlab to Python. Is this an > > appropriate platform to ask such questions? > > If you're looking for a free-for-all discussion forum for Python > programmers, see . > > If you're looking for a forum dedicated to teaching Python newcomers, > see . > > These and more Python community forums are documented at > . > > Thanks for asking! > > -- > \ ?The whole area of [treating source code as intellectual | > `\ property] is almost assuring a customer that you are not going | > _o__) to do any innovation in the future.? ?Gary Barnett | > Ben Finney > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat May 9 04:58:35 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 09 May 2015 11:58:35 +0900 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <554C5FC0.1070106@aalto.fi> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> Message-ID: <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> Koos Zevenhoven writes: > As a random example, (root @ mean @ square)(x) would produce the right > order for rms when using [2]. Hardly interesting. :-) The result is an exception, as root and square are conceptually scalar-to-scalar, while mean is sequence-to-scalar. I suppose you could write (root @ mean @ (map square)) (xs), which seems to support your argument. But will all such issues and solutions give the same support? This kind of thing is a conceptual problem that has to be discussed pretty thoroughly (presumably based on experience with implementations) before discussion of order can be conclusive. From koos.zevenhoven at aalto.fi Sat May 9 06:00:52 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Sat, 9 May 2015 07:00:52 +0300 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <554D8674.4000004@aalto.fi> On 9.5.2015 5:58, Stephen J. Turnbull wrote: > Koos Zevenhoven writes: > > > As a random example, (root @ mean @ square)(x) would produce the right > > order for rms when using [2]. > > Hardly interesting. :-) The result is an exception, as root and square > are conceptually scalar-to-scalar, while mean is sequence-to-scalar. > > I suppose you could write (root @ mean @ (map square)) (xs), which > seems to support your argument. But will all such issues and > solutions give the same support? This kind of thing is a conceptual > problem that has to be discussed pretty thoroughly (presumably based > on experience with implementations) before discussion of order can be > conclusive. > Well, you're wrong :-) Working code: from numpy import sqrt, mean, square rms = sqrt(mean(square(x))) The point is that people have previously described sqrt(mean(square(x))) as root-mean-squared x, not squared-mean-root x. But yes, as I said, it's just one example. -- Koos From Nikolaus at rath.org Sat May 9 06:04:26 2015 From: Nikolaus at rath.org (Nikolaus Rath) Date: Fri, 08 May 2015 21:04:26 -0700 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: <20150507154621.GU5663@ando.pearwood.info> (Steven D'Aprano's message of "Fri, 8 May 2015 01:46:21 +1000") References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> Message-ID: <87mw1es8fp.fsf@vostro.rath.org> On May 07 2015, Steven D'Aprano wrote: > But a view would be harmful in this situation: > > s = "some string"*1000000 > t = s[1:2] # a view maskerading as a new string > del s > > Now we keep the entire string alive long after it is needed. > > How would you solve the first problem without introducing the second? Keep track of the reference count of the underlying string, and if it goes down to one, turn the view into a copy and remove the sliced original? Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From rosuav at gmail.com Sat May 9 07:01:42 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 9 May 2015 15:01:42 +1000 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: <87mw1es8fp.fsf@vostro.rath.org> References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> <87mw1es8fp.fsf@vostro.rath.org> Message-ID: On Sat, May 9, 2015 at 2:04 PM, Nikolaus Rath wrote: > On May 07 2015, Steven D'Aprano wrote: >> But a view would be harmful in this situation: >> >> s = "some string"*1000000 >> t = s[1:2] # a view maskerading as a new string >> del s >> >> Now we keep the entire string alive long after it is needed. >> >> How would you solve the first problem without introducing the second? > > Keep track of the reference count of the underlying string, and if it > goes down to one, turn the view into a copy and remove the sliced > original? > T From rosuav at gmail.com Sat May 9 07:06:07 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 9 May 2015 15:06:07 +1000 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: <87mw1es8fp.fsf@vostro.rath.org> References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> <87mw1es8fp.fsf@vostro.rath.org> Message-ID: On Sat, May 9, 2015 at 2:04 PM, Nikolaus Rath wrote: > On May 07 2015, Steven D'Aprano wrote: >> But a view would be harmful in this situation: >> >> s = "some string"*1000000 >> t = s[1:2] # a view maskerading as a new string >> del s >> >> Now we keep the entire string alive long after it is needed. >> >> How would you solve the first problem without introducing the second? > > Keep track of the reference count of the underlying string, and if it > goes down to one, turn the view into a copy and remove the sliced > original? Oops, mis-sent (stupid touchpad on this new laptop). Trying again. There might be multiple views, so a hard-coded refcount-of-one check wouldn't work. The view would need to keep a weak reference to its underlying string - but not in the sense of the Python weakref module, which doesn't seem to have any notion of "about to be garbage collected", but only "has now been garbage collected". Notably, by the time a callback gets called, it's too late to retrieve information from the callback itself. A modified form of weakref could do it, though; with the understanding that the referents are immutable, and premature transform from view to coalesced slice has no consequence beyond performance, this could be done. Ideally, it'd be an entirely invisible optimization. ChrisA From mistersheik at gmail.com Sat May 9 07:23:15 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 9 May 2015 01:23:15 -0400 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> <87mw1es8fp.fsf@vostro.rath.org> Message-ID: Exactly. You know, it might be nice to have a recipe that creates a view to any abc.Sequence for when you know that the underlying sequence won't change (or don't care). Something like: class View: ... some_view = View("some string", slice(2, 5)) some_view[0: 2] "me" etc. Also a MutableView class could be used for abc.MutableSequences. Best, Neil On Sat, May 9, 2015 at 1:06 AM, Chris Angelico wrote: > On Sat, May 9, 2015 at 2:04 PM, Nikolaus Rath wrote: > > On May 07 2015, Steven D'Aprano owrrOrA at public.gmane.org> wrote: > >> But a view would be harmful in this situation: > >> > >> s = "some string"*1000000 > >> t = s[1:2] # a view maskerading as a new string > >> del s > >> > >> Now we keep the entire string alive long after it is needed. > >> > >> How would you solve the first problem without introducing the second? > > > > Keep track of the reference count of the underlying string, and if it > > goes down to one, turn the view into a copy and remove the sliced > > original? > > Oops, mis-sent (stupid touchpad on this new laptop). Trying again. > > There might be multiple views, so a hard-coded refcount-of-one check > wouldn't work. The view would need to keep a weak reference to its > underlying string - but not in the sense of the Python weakref module, > which doesn't seem to have any notion of "about to be garbage > collected", but only "has now been garbage collected". Notably, by the > time a callback gets called, it's too late to retrieve information > from the callback itself. A modified form of weakref could do it, > though; with the understanding that the referents are immutable, and > premature transform from view to coalesced slice has no consequence > beyond performance, this could be done. > > Ideally, it'd be an entirely invisible optimization. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/II-4QRDb8Is/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat May 9 08:40:48 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 09 May 2015 15:40:48 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87383645jj.fsf@uwakimon.sk.tsukuba.ac.jp> Serhiy Storchaka writes: > On 05.05.15 11:23, Stephen J. Turnbull wrote: > > Serhiy Storchaka writes: > > > > > Use cases include programs that use tkinter (common build of Tcl/Tk > > > don't accept non-BMP characters), email or wsgiref. > > > > So, consider Tcl/Tk. If you use it for input, no problem, it *can't* > > produce non-BMP characters. So you're using it for output. If > > knowing that your design involves tkinter, you deduce you must not > > accept non-BMP characters on input, where's your problem? > > With Tcl/Tk all is not so easy. I didn't claim *all* was easy; IME Tcl is just easy to break, and not only in its Unicode handling. But dealing with the problem you mentioned at the interface between Python and Tcl/Tk can be done this way. > The main issue is with translating from Tcl to Python. Tcl uses at > least two representations for strings (UCS-2 and modified UTF-8, > and Latin1 in some cases), These are not represented *in Tcl* as Python str, are they? If not, they need to be converted with a regular byte-oriented codec, no? Once again, a regular codec with appropriate error handler can deal with it early, and better. So fix Tkinter; it's probably not much harder than documenting the correct use of these functions in dealing with Tkinter. > > And ... you looked twice at your proposal? You have basically > > reproduced the codec error handling API for .decode and .encode in a > > bunch to str2str "rehandle" functions. > > Yes, this is the main advantage of proposed functions. They reuse > existing error handlers and are extensible by writing new error > handlers. They also violate TOOWTDI. In fact, that's their whole purpose. > > In other words, you need to know as much to use "rehandle_*" > > properly as you do to use .decode and .encode. I do not see a > > win for the programmer who is mostly innocent of encoding > > knowledge. > > Is it a problem? These functions are for experienced users. Perhaps > mostly for authors of libraries and frameworks. Yes, it's a problem. You say they're "for" experienced users, but that's a null concept. You intend to make them *available* to all users. Very few users have experience in I18N technology, and those are generally able to chain .encode().decode() correctly, which is conceptually what you're doing anyway (in fact, that's the *implementation* *you* published in issue18814!) OTOH, *most* experienced users have experienced I18N headaches. "To a man with a hammer, every problem looks like a nail" but with this hammer, mostly it's actually a thumb. These functions should only ever be used on input, but in practice programmers under time pressure (and who isn't?) tend to apply bandaids at the point where the problem is detected -- which is output, since Python itself has no problems with lone surrogates or astral characters. As for authors of libraries and frameworks, *they* should *really* should be handling these problems at the external bytes -> internal Unicode interface when the original data, and often metadata or even a human user, is available for interrogation. Not later, when all you have is the resulting radioactive garbage, which you'll end up passing on to the framework users. > > If we apply these rehandle_* thumbs to the holes in the I18N dike, > > it's just going to spring more leaks elsewhere. > > There are a lot of butteries included in Python. They can explode > if use them incorrectly. I think a better analogy is explosive, which can be useful if used safely. :-) If you have to add these functions, *please* do not put them anywhere near the codecs. They are not codecs, they do not transform the representation of data. They change the semantics of the data. Put them in a "validation" submodule of the unicodedata package, or create a new unicodetools package or something like that to hold them. And they should be documented as dangerous because the transformations they perform cannot be inverted to get the original input once the strings produced are passed to other code (unless you also pass the history of transformations as metadata). This matters in applications where the input bytes may have been digitally signed, for example. (I've posted the last two paragraphs in somewhat more precise form to the issue18814.) From abarnert at yahoo.com Sat May 9 09:21:53 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 9 May 2015 00:21:53 -0700 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On May 8, 2015, at 19:58, Stephen J. Turnbull wrote: > > Koos Zevenhoven writes: > >> As a random example, (root @ mean @ square)(x) would produce the right >> order for rms when using [2]. > > Hardly interesting. :-) The result is an exception, as root and square > are conceptually scalar-to-scalar, while mean is sequence-to-scalar. Unless you're using an elementwise square and an array-to-scalar mean, like the ones in NumPy, in which case it works perfectly well... > I suppose you could write (root @ mean @ (map square)) (xs), Actually, you can't. You could write (root @ mean @ partial(map, square))(xs), but that's pretty clearly less readable than root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's been my main argument: Without a full suite of higher-level operators and related syntax, compose alone doesn't do you any good except for toy examples. But Koos's example, even if it was possibly inadvertent, shows that I may be wrong about that. Maybe compose together with element-wise operators actually _is_ sufficient for something beyond toy examples. Of course the fact that we have two groups of people each arguing that obviously the only possible reading of @ is compose/rcompose respectively points out a whole other problem with the idea. If people just were going to have to look up which way it went and learn it through experience, that would be one thing; if everyone already knows intuitively and half of them are wrong, that's a different story... > which > seems to support your argument. But will all such issues and > solutions give the same support? This kind of thing is a conceptual > problem that has to be discussed pretty thoroughly (presumably based > on experience with implementations) before discussion of order can be > conclusive. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Sat May 9 09:31:54 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 9 May 2015 00:31:54 -0700 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: <87mw1es8fp.fsf@vostro.rath.org> References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> <87mw1es8fp.fsf@vostro.rath.org> Message-ID: On May 8, 2015, at 21:04, Nikolaus Rath wrote: > >> On May 07 2015, Steven D'Aprano wrote: >> But a view would be harmful in this situation: >> >> s = "some string"*1000000 >> t = s[1:2] # a view maskerading as a new string >> del s >> >> Now we keep the entire string alive long after it is needed. >> >> How would you solve the first problem without introducing the second? > > Keep track of the reference count of the underlying string, and if it > goes down to one, turn the view into a copy and remove the sliced > original? It sounds like we're talking about an optimization that, although it could have a big benefit in some not too rare cases, could also have a non-negligible cost in incredibly common cases people use every day. For example, today, "line = line.rstrip()" makes a copy of most of the original string, then discards the original string. With this change, the same line of code builds a view referencing most of line, then gets to some not-quite-a-weakref-destructor, which makes the copy and discards the original string and the view we just built. If line were huge, the small extra alloc and dealloc and refcheck might be unnoticeable noise, but if line is about 70 chars, as it usually will be, I'd expect a much more noticeable difference. And this is exactly the kind of thing you do in a loop 5 million times in a row in Python. Of course I could be wrong; we won't really know until someone actually builds at least an implementation and tests it. From jonathan at slenders.be Sat May 9 09:56:46 2015 From: jonathan at slenders.be (Jonathan Slenders) Date: Sat, 9 May 2015 09:56:46 +0200 Subject: [Python-ideas] What is happening with array.array('u') in Python 4? In-Reply-To: References: Message-ID: Thanks a lot, So, apparently it is possible to use a re bytes pattern to search through array.array('u') and it works as well for numpy.chararray. However, I suppose that for doing this you need to have knowledge of the internal encoding, because re.search will actually compare bytes (from the pattern) to unicode chars (from the array). So, the bytes have to be utf32-encoded strings, I suppose. Currently I have not enough knowledge of how Python strings are implemented. I'm convinced that it's a good thing to have mutable strings, but I guess it could indeed be hard to implement. Cheers, Jonathan 2015-05-08 22:46 GMT+02:00 Chris Barker : > On Fri, May 8, 2015 at 5:50 AM, Stefan Behnel wrote: > >> ISTM that your best bet is currently to look for a suitable module on PyPI >> that implements mutable character arrays. I'm sure you're not the only one >> who needs something like that. The usual suspect would be NumPy, but there >> may be smaller and simpler tools available. > > > Numpy does have mutable character arrays -- and the Unicode version uses > 4bytes per char, regardless of platform (and so should array.array!) > > But I don't think you get much of any of the features of strings, and I > doubt that the re module would work with it. > > A "real" mutable string type might be pretty nice to have , but I think it > would be pretty hard to d to get it to do everything a string can do. (or > maybe not -- I suppose you could cut and paste the regular string cdce, and > simply add the mutable part....) > > -Chris > > > > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat May 9 10:36:03 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 09 May 2015 17:36:03 +0900 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <871tiq407g.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > On May 8, 2015, at 19:58, Stephen J. Turnbull wrote: > > > > Koos Zevenhoven writes: > > > >> As a random example, (root @ mean @ square)(x) would produce the right > >> order for rms when using [2]. > > > > Hardly interesting. :-) The result is an exception, as root and square > > are conceptually scalar-to-scalar, while mean is sequence-to-scalar. > > Unless you're using an elementwise square and an array-to-scalar > mean, like the ones in NumPy, Erm, why would square be elementwise and root not? I would suppose that everything is element-wise in Numpy (not a user yet). > in which case it works perfectly well... But that's an aspect of my point (evidently, obscure). Conceptually, as taught in junior high school or so, root and square are scalar-to- scalar. If you are working in a context such as Numpy where it makes sense to assume they are element-wise and thus composable, the context should provide the compose operator(s). Without that context, Koos's example looks like a TypeError. > But Koos's example, even if it was possibly inadvertent, shows that > I may be wrong about that. Maybe compose together with element-wise > operators actually _is_ sufficient for something beyond toy > examples. Of course it is! I didn't really think there was any doubt about that. I thought the question was whether there's enough commonality among such examples to come up with a Pythonic generic definition of compose, or perhaps a sufficiently compelling example to enshrine its definition as the "usual" interpretation in Python (and let other interpretations overload some operator to get that effect in their contexts). > Of course the fact that we have two groups of people each arguing > that obviously the only possible reading of @ is compose/rcompose > respectively points out a whole other problem with the idea. I prefer fgh = f(g(h(-))), but I hardly think it's obvious. Unless you're *not* Dutch. (If it were obvious to a Dutchman, we'd have it already. ) From breamoreboy at yahoo.co.uk Sat May 9 10:51:18 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 09 May 2015 09:51:18 +0100 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> <87mw1es8fp.fsf@vostro.rath.org> Message-ID: On 09/05/2015 08:31, Andrew Barnert via Python-ideas wrote: > On May 8, 2015, at 21:04, Nikolaus Rath wrote: >> >>> On May 07 2015, Steven D'Aprano wrote: >>> But a view would be harmful in this situation: >>> >>> s = "some string"*1000000 >>> t = s[1:2] # a view maskerading as a new string >>> del s >>> >>> Now we keep the entire string alive long after it is needed. >>> >>> How would you solve the first problem without introducing the second? >> >> Keep track of the reference count of the underlying string, and if it >> goes down to one, turn the view into a copy and remove the sliced >> original? > > Of course I could be wrong; we won't really know until someone actually builds at least an implementation and tests it. Well they can, but I found a major problem with views is that you can't compare them and so can't sort them, thus rendering them useless for a lot of applications. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From mistersheik at gmail.com Sat May 9 10:53:47 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 9 May 2015 04:53:47 -0400 Subject: [Python-ideas] Why don't CPython strings implement slicing using a view? In-Reply-To: References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com> <20150507154621.GU5663@ando.pearwood.info> <87mw1es8fp.fsf@vostro.rath.org> Message-ID: Why not? You can compare numpy array views, can't you? In [4]: a = np.array([1,2]) In [5]: a[1:] < a[:1] Out[5]: array([False], dtype=bool) On Sat, May 9, 2015 at 4:51 AM, 'Mark Lawrence' via python-ideas < python-ideas at googlegroups.com> wrote: > On 09/05/2015 08:31, Andrew Barnert via Python-ideas wrote: > >> On May 8, 2015, at 21:04, Nikolaus Rath wrote: >> >>> >>> On May 07 2015, Steven D'Aprano >>> owrrOrA at public.gmane.org> wrote: >>>> But a view would be harmful in this situation: >>>> >>>> s = "some string"*1000000 >>>> t = s[1:2] # a view maskerading as a new string >>>> del s >>>> >>>> Now we keep the entire string alive long after it is needed. >>>> >>>> How would you solve the first problem without introducing the second? >>>> >>> >>> Keep track of the reference count of the underlying string, and if it >>> goes down to one, turn the view into a copy and remove the sliced >>> original? >>> >> >> Of course I could be wrong; we won't really know until someone actually >> builds at least an implementation and tests it. >> > > Well they can, but I found a major problem with views is that you can't > compare them and so can't sort them, thus rendering them useless for a lot > of applications. > > -- > My fellow Pythonistas, ask not what our language can do for you, ask > what you can do for our language. > > Mark Lawrence > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/II-4QRDb8Is/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat May 9 12:19:37 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 9 May 2015 03:19:37 -0700 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <871tiq407g.fsf@uwakimon.sk.tsukuba.ac.jp> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiq407g.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On May 9, 2015, at 01:36, Stephen J. Turnbull wrote: > > Andrew Barnert writes: >>> On May 8, 2015, at 19:58, Stephen J. Turnbull wrote: >>> >>> Koos Zevenhoven writes: >>> >>>> As a random example, (root @ mean @ square)(x) would produce the right >>>> order for rms when using [2]. >>> >>> Hardly interesting. :-) The result is an exception, as root and square >>> are conceptually scalar-to-scalar, while mean is sequence-to-scalar. >> >> Unless you're using an elementwise square and an array-to-scalar >> mean, like the ones in NumPy, > > Erm, why would square be elementwise and root not? I would suppose > that everything is element-wise in Numpy (not a user yet). Most functions in NumPy are elementwise when applied to arrays, but can also be applied to scalars. So, square is elementwise because it's called on an array, root is scalar because it's called on a scalar. (In fact, root could also be elementwise--aggregating functions like mean can be applied across just one axis of a 2D or higher array, reducing it by one dimension, if you want.) Before you try it, this sounds like a complicated nightmare that can't possibly work in practice. But play with it for just a few minutes and it's completely natural. (Except for a few cases where you want some array-wide but not element-wise operation, most famously matrix multiplication, which is why we now have the @ operator to play with.) >> in which case it works perfectly well... > > But that's an aspect of my point (evidently, obscure). Conceptually, > as taught in junior high school or so, root and square are scalar-to- > scalar. If you are working in a context such as Numpy where it makes > sense to assume they are element-wise and thus composable, the context > should provide the compose operator(s). I was actually thinking on these lines: what if @ didn't work on types.FunctionType, but did work on numpy.ufunc (the name for the "universal function" type that knows how to broadcast across arrays but also work on scalars)? That's something NumPy could implement without any help from the core language. (Methods are a minor problem here, but it's obvious how to solve them, so I won't get into it.) And if it turned out to be useful all over the place in NumPy, that might turn up some great uses for the idiomatic non-NumPy Python, or it might show that, like elementwise addition, it's really more a part of NumPy than of Python. But of course that's more of a proposal for NumPy than for Python. > Without that context, Koos's > example looks like a TypeError. >> But Koos's example, even if it was possibly inadvertent, shows that >> I may be wrong about that. Maybe compose together with element-wise >> operators actually _is_ sufficient for something beyond toy >> examples. > > Of course it is! I didn't really think there was any doubt > about that. I think there was, and still is. People keep coming up with abstract toy examples, but as soon as someone tries to give a good real example, it only makes sense with NumPy (Koos's) or with some syntax that Python doesn't have (yours), because to write them with actual Python functions would actually be ugly and verbose (my version of yours). I don't think that's a coincidence. You didn't write "map square" because you don't know how to think in Python, but because using compose profitably inherently implies not thinking in Python. (Except, maybe, in the case of NumPy... which is a different idiom.) Maybe someone has a bunch of obvious good use cases for compose that don't also require other functions, operators, or syntax we don't have, but so far, nobody's mentioned one. From tjreedy at udel.edu Sat May 9 17:20:53 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 09 May 2015 11:20:53 -0400 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiq407g.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/9/2015 6:19 AM, Andrew Barnert via Python-ideas wrote: > I think there was, and still is. People keep coming up with abstract toy examples, but as soon as someone tries to give a good real example, it only makes sense with NumPy (Koos's) or with some syntax that Python doesn't have (yours), because to write them with actual Python functions would actually be ugly and verbose (my version of yours). > > I don't think that's a coincidence. You didn't write "map square" because you don't know how to think in Python, but because using compose profitably inherently implies not thinking in Python. (Except, maybe, in the case of NumPy... which is a different idiom.) Maybe someone has a bunch of obvious good use cases for compose that don't also require other functions, operators, or syntax we don't have, but so far, nobody's mentioned one. I agree that @ is most likely to be usefull in numpy's restricted context. A composition operator is usually defined by application: f at g(x) is defined as f(g(x)). (I sure there are also axiomatic treatments.) It is an optional syntactic abbreviation. It is most useful in a context where there is one set of data objects, such as the real numbers, or one set + arrays (vectors) defined on the one set; where all function are univariate (or possible multivariate, but that can can be transformed to univariate on vectors); *and* where parameter names are dummies like 'x', 'y', 'z', or '_'. The last point is important. Abbreviating h(x) = f(g(x)) with h = f @ g does not lose any information as 'x' is basically a placeholder (so get rid of it). But parameter names are important in most practical contexts, both for understanding a composition and for using it. dev npv(transfers, discount): '''Return the net present value of discounted transfers. transfers: finite iterable of amounts at constant intervals discount: fraction per interval ''' divisor = 1 + discount return sum(tranfer/divisor**time for time, transfer in enumerate(transfers)) Even if one could replace the def statement with npv = with parameter names omitted, it would be harder to understand. Using it would require the ability to infer argument types and order from the composed expression. I intentionally added a statement to calculate the common subexpression prior to the return. I believe it would have to put back in the return expression before converting. -- Terry Jan Reedy From ron3200 at gmail.com Sat May 9 17:38:38 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 09 May 2015 11:38:38 -0400 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote: >> >I suppose you could write (root @ mean @ (map square)) (xs), > Actually, you can't. You could write (root @ mean @ partial(map, > square))(xs), but that's pretty clearly less readable than > root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's > been my main argument: Without a full suite of higher-level operators > and related syntax, compose alone doesn't do you any good except for toy > examples. How about an operator for partial? root @ mean @ map $ square(xs) Actually I'd rather reuse the binary operators. (I'd be happy if they were just methods on bytes objects BTW.) compose(root, mean, map(square, xs)) root ^ mean ^ map & square (xs) root ^ mean ^ map & square ^ xs () Read this as... compose root, of mean, of map with square, of xs Or... apply(map(square, xs), mean, root) map & square | mean | root (xs) xs | map & square | mean | root () Read this as... apply xs, to map with square, to mean, to root These are kind of cool, but does it make python code easier to read? That seems like it may be subjective depending on the amount of programming experience someone has. Cheers, Ron From apieum at gmail.com Sat May 9 18:08:12 2015 From: apieum at gmail.com (Gregory Salvan) Date: Sat, 9 May 2015 18:08:12 +0200 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Hi, I had to answer some of these questions when I wrote Lawvere: https://pypi.python.org/pypi/lawvere First, there is two kind of composition: pipe and circle so I think a single operator like @ is a bit restrictive. I like "->" and "<-" Then, for function name and function to string I had to introduce function signature (a tuple). It provides a good tool for decomposition, introspection and comparison in respect with mathematic definition. Finally, for me composition make sense when you have typed functions otherwise it can easily become a mess and this make composition tied to multiple dispatch. I really hope composition will be introduced in python but I can't see how it be made without rethinking a good part of function definition. 2015-05-09 17:38 GMT+02:00 Ron Adam : > > > On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote: > >> >I suppose you could write (root @ mean @ (map square)) (xs), >>> >> > Actually, you can't. You could write (root @ mean @ partial(map, >> square))(xs), but that's pretty clearly less readable than >> root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's >> been my main argument: Without a full suite of higher-level operators >> and related syntax, compose alone doesn't do you any good except for toy >> examples. >> > > How about an operator for partial? > > root @ mean @ map $ square(xs) > > > Actually I'd rather reuse the binary operators. (I'd be happy if they > were just methods on bytes objects BTW.) > > compose(root, mean, map(square, xs)) > > root ^ mean ^ map & square (xs) > > root ^ mean ^ map & square ^ xs () > > Read this as... > > compose root, of mean, of map with square, of xs > > Or... > > apply(map(square, xs), mean, root) > > map & square | mean | root (xs) > > xs | map & square | mean | root () > > > Read this as... > > apply xs, to map with square, to mean, to root > > > These are kind of cool, but does it make python code easier to read? That > seems like it may be subjective depending on the amount of programming > experience someone has. > > Cheers, > Ron > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat May 9 20:16:43 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 May 2015 04:16:43 +1000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20150509181642.GB5663@ando.pearwood.info> On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote: > How about an operator for partial? > > root @ mean @ map $ square(xs) Apart from the little matter that Guido has said that $ will never be used as an operator in Python, what is the association between $ and partial? Most other operators have either been used for centuries e.g. + and - or at least decades e.g. * for multiplication because ASCII doesn't have the ? symbol. The barrier to using a completely arbitrary symbol with no association to the function it plays should be considered very high. I would only support an operator for function composition if it was at least close to the standard operators used for function composition in other areas. @ at least suggests the ? used in mathematics, e.g. sin?cos, but | is used in pipelining languages and shells and could be considered, e.g. ls | wc. My own preference would be to look at @ as the closest available ASCII symbol to ? and use it for left-to-right composition, and | for left-to-right function application. E.g. (spam @ eggs @ cheese)(arg) is equivalent to spam(eggs(cheese(arg))) (spam | eggs | cheese)(arg) is equivalent to cheese(eggs(spam(arg))) also known as compose() and rcompose(). We can read "@" as "of", "spam of eggs of cheese of arg", and | as a pipe, "spam(arg) piped to eggs piped to cheese". It's a pity we can't match the shell syntax and write: spam(args)|eggs|cheese but that would have a completely different meaning. David Beazley has a tutorial on using coroutines in pipelines: http://www.dabeaz.com/coroutines/ where he ends up writing this: f = open("access-log") follow(f, grep('python', printer())) Coroutines grep() and printer() make up the pipeline. I cannot help but feel that the | syntax would be especially powerful for this sort of data processing purpose: # could this work using some form of function composition? follow(f, grep('python')|printer) -- Steve From mertz at gnosis.cx Sat May 9 20:30:17 2015 From: mertz at gnosis.cx (David Mertz) Date: Sat, 9 May 2015 13:30:17 -0500 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <20150509181642.GB5663@ando.pearwood.info> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <20150509181642.GB5663@ando.pearwood.info> Message-ID: On Sat, May 9, 2015 at 1:16 PM, Steven D'Aprano wrote: > On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote: > > > How about an operator for partial? > > > > root @ mean @ map $ square(xs) > I have trouble seeing the advantage of a special function composition operator when it is easy to write a general 'compose()' function that can produce such things easily enough. E.g. in a white paper I just did for O'Reilly on _Functional Programming in Python_ I propose this little example implementation: def compose(*funcs): "Return a new function s.t. compose(f,g,...)(x) == f(g(...(x)))" def inner(data, funcs=funcs): result = data for f in reversed(funcs): result = f(result) return result return inner Which we might use as: RMS = compose(root, mean, square) result = RMS(my_array) -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Sat May 9 20:33:21 2015 From: donald at stufft.io (Donald Stufft) Date: Sat, 9 May 2015 14:33:21 -0400 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <20150509181642.GB5663@ando.pearwood.info> Message-ID: > On May 9, 2015, at 2:30 PM, David Mertz wrote: > > On Sat, May 9, 2015 at 1:16 PM, Steven D'Aprano > wrote: > On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote: > > > How about an operator for partial? > > > > root @ mean @ map $ square(xs) > > I have trouble seeing the advantage of a special function composition operator when it is easy to write a general 'compose()' function that can produce such things easily enough. > > E.g. in a white paper I just did for O'Reilly on _Functional Programming in Python_ I propose this little example implementation: > > def compose(*funcs): > "Return a new function s.t. compose(f,g,...)(x) == f(g(...(x)))" > def inner(data, funcs=funcs): > result = data > for f in reversed(funcs): > result = f(result) > return result > return inner > > Which we might use as: > > RMS = compose(root, mean, square) > result = RMS(my_array) Maybe functools.compose? --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From koos.zevenhoven at aalto.fi Sat May 9 21:15:21 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Sat, 9 May 2015 22:15:21 +0300 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> Message-ID: <554E5CC9.3010406@aalto.fi> On 2015-05-09 21:16, Steven D'Aprano wrote: > On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote: > >> How about an operator for partial? >> >> root @ mean @ map $ square(xs) > Apart from the little matter that Guido has said that $ will never be > used as an operator in Python, what is the association between $ and > partial? > > Most other operators have either been used for centuries e.g. + and - or > at least decades e.g. * for multiplication because ASCII doesn't have > the ? symbol. The barrier to using a completely arbitrary symbol with no > association to the function it plays should be considered very high. > > I would only support an operator for function composition if it was at > least close to the standard operators used for function composition in > other areas. @ at least suggests the ? used in mathematics, e.g. > sin?cos, but | is used in pipelining languages and shells and could be > considered, e.g. ls | wc. > > My own preference would be to look at @ as the closest available ASCII > symbol to ? and use it for left-to-right composition, and | for > left-to-right function application. E.g. > > (spam @ eggs @ cheese)(arg) is equivalent to spam(eggs(cheese(arg))) > > (spam | eggs | cheese)(arg) is equivalent to cheese(eggs(spam(arg))) > > also known as compose() and rcompose(). > We can read "@" as "of", "spam of eggs of cheese of arg", and | as > a pipe, "spam(arg) piped to eggs piped to cheese". For me these are by far the most logical ones too, for exactly the same reasons (and because of the connection of @ with matrix multiplication and operators that operate from the left). > It's a pity we can't match the shell syntax and write: > > spam(args)|eggs|cheese > > but that would have a completely different meaning. > But it does not need to have a different meaning. You could in addition have: spam @ eggs @ cheese @ arg # equivalent to spam(eggs(cheese(arg))) arg | spam | eggs | cheese # equivalent to cheese(eggs(spam(arg))) Here, arg would thus be recognized as not a function. In this version, your example of spam(args)|eggs|cheese would do exactly the same operation as (spam | eggs | cheese)(args) :-). > David Beazley has a tutorial on using coroutines in pipelines: > > http://www.dabeaz.com/coroutines/ > > where he ends up writing this: > > f = open("access-log") > follow(f, > grep('python', > printer())) > > > Coroutines grep() and printer() make up the pipeline. I cannot help but > feel that the | syntax would be especially powerful for this sort of > data processing purpose: > > # could this work using some form of function composition? > follow(f, grep('python')|printer) > > > This seems promising! -- Koos From apieum at gmail.com Sat May 9 22:41:24 2015 From: apieum at gmail.com (Gregory Salvan) Date: Sat, 9 May 2015 22:41:24 +0200 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <554E5CC9.3010406@aalto.fi> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> Message-ID: pipeline operator may be confusing with bitwise operator. In this case : eggs = arg | spam | cheese Is eggs a composed function or string of bits ? 2015-05-09 21:15 GMT+02:00 Koos Zevenhoven : > > On 2015-05-09 21:16, Steven D'Aprano wrote: > >> On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote: >> >> How about an operator for partial? >>> >>> root @ mean @ map $ square(xs) >>> >> Apart from the little matter that Guido has said that $ will never be >> used as an operator in Python, what is the association between $ and >> partial? >> >> Most other operators have either been used for centuries e.g. + and - or >> at least decades e.g. * for multiplication because ASCII doesn't have >> the ? symbol. The barrier to using a completely arbitrary symbol with no >> association to the function it plays should be considered very high. >> >> I would only support an operator for function composition if it was at >> least close to the standard operators used for function composition in >> other areas. @ at least suggests the ? used in mathematics, e.g. >> sin?cos, but | is used in pipelining languages and shells and could be >> considered, e.g. ls | wc. >> >> My own preference would be to look at @ as the closest available ASCII >> symbol to ? and use it for left-to-right composition, and | for >> left-to-right function application. E.g. >> >> (spam @ eggs @ cheese)(arg) is equivalent to spam(eggs(cheese(arg))) >> >> (spam | eggs | cheese)(arg) is equivalent to cheese(eggs(spam(arg))) >> >> also known as compose() and rcompose(). >> We can read "@" as "of", "spam of eggs of cheese of arg", and | as >> a pipe, "spam(arg) piped to eggs piped to cheese". >> > > For me these are by far the most logical ones too, for exactly the same > reasons (and because of the connection of @ with matrix multiplication and > operators that operate from the left). > > It's a pity we can't match the shell syntax and write: >> >> spam(args)|eggs|cheese >> >> but that would have a completely different meaning. >> >> > > But it does not need to have a different meaning. You could in addition > have: > > spam @ eggs @ cheese @ arg # equivalent to spam(eggs(cheese(arg))) > > arg | spam | eggs | cheese # equivalent to cheese(eggs(spam(arg))) > > Here, arg would thus be recognized as not a function. > > In this version, your example of spam(args)|eggs|cheese would do exactly > the same operation as (spam | eggs | cheese)(args) :-). > > > David Beazley has a tutorial on using coroutines in pipelines: >> >> http://www.dabeaz.com/coroutines/ >> >> where he ends up writing this: >> >> f = open("access-log") >> follow(f, >> grep('python', >> printer())) >> >> >> Coroutines grep() and printer() make up the pipeline. I cannot help but >> feel that the | syntax would be especially powerful for this sort of >> data processing purpose: >> >> # could this work using some form of function composition? >> follow(f, grep('python')|printer) >> >> >> >> > This seems promising! > > > -- Koos > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From apieum at gmail.com Sun May 10 00:03:24 2015 From: apieum at gmail.com (Gregory Salvan) Date: Sun, 10 May 2015 00:03:24 +0200 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> Message-ID: Nobody convinced by arrow operator ? like: arg -> spam -> eggs -> cheese or cheese <- eggs <- spam <- arg This also make sense with annotations: def func(x:type1, y:type2) -> type3: pass we expect func to return type3(func(x, y)) 2015-05-09 22:41 GMT+02:00 Gregory Salvan : > pipeline operator may be confusing with bitwise operator. > In this case : > eggs = arg | spam | cheese > > Is eggs a composed function or string of bits ? > > > 2015-05-09 21:15 GMT+02:00 Koos Zevenhoven : > >> >> On 2015-05-09 21:16, Steven D'Aprano wrote: >> >>> On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote: >>> >>> How about an operator for partial? >>>> >>>> root @ mean @ map $ square(xs) >>>> >>> Apart from the little matter that Guido has said that $ will never be >>> used as an operator in Python, what is the association between $ and >>> partial? >>> >>> Most other operators have either been used for centuries e.g. + and - or >>> at least decades e.g. * for multiplication because ASCII doesn't have >>> the ? symbol. The barrier to using a completely arbitrary symbol with no >>> association to the function it plays should be considered very high. >>> >>> I would only support an operator for function composition if it was at >>> least close to the standard operators used for function composition in >>> other areas. @ at least suggests the ? used in mathematics, e.g. >>> sin?cos, but | is used in pipelining languages and shells and could be >>> considered, e.g. ls | wc. >>> >>> My own preference would be to look at @ as the closest available ASCII >>> symbol to ? and use it for left-to-right composition, and | for >>> left-to-right function application. E.g. >>> >>> (spam @ eggs @ cheese)(arg) is equivalent to spam(eggs(cheese(arg))) >>> >>> (spam | eggs | cheese)(arg) is equivalent to cheese(eggs(spam(arg))) >>> >>> also known as compose() and rcompose(). >>> We can read "@" as "of", "spam of eggs of cheese of arg", and | as >>> a pipe, "spam(arg) piped to eggs piped to cheese". >>> >> >> For me these are by far the most logical ones too, for exactly the same >> reasons (and because of the connection of @ with matrix multiplication and >> operators that operate from the left). >> >> It's a pity we can't match the shell syntax and write: >>> >>> spam(args)|eggs|cheese >>> >>> but that would have a completely different meaning. >>> >>> >> >> But it does not need to have a different meaning. You could in addition >> have: >> >> spam @ eggs @ cheese @ arg # equivalent to spam(eggs(cheese(arg))) >> >> arg | spam | eggs | cheese # equivalent to cheese(eggs(spam(arg))) >> >> Here, arg would thus be recognized as not a function. >> >> In this version, your example of spam(args)|eggs|cheese would do exactly >> the same operation as (spam | eggs | cheese)(args) :-). >> >> >> David Beazley has a tutorial on using coroutines in pipelines: >>> >>> http://www.dabeaz.com/coroutines/ >>> >>> where he ends up writing this: >>> >>> f = open("access-log") >>> follow(f, >>> grep('python', >>> printer())) >>> >>> >>> Coroutines grep() and printer() make up the pipeline. I cannot help but >>> feel that the | syntax would be especially powerful for this sort of >>> data processing purpose: >>> >>> # could this work using some form of function composition? >>> follow(f, grep('python')|printer) >>> >>> >>> >>> >> This seems promising! >> >> >> -- Koos >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun May 10 00:45:26 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 9 May 2015 15:45:26 -0700 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <91A6985C-A94B-4132-99B1-0305933950B5@yahoo.com> On May 9, 2015, at 08:38, Ron Adam wrote: > > > > On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote: >>> >I suppose you could write (root @ mean @ (map square)) (xs), > >> Actually, you can't. You could write (root @ mean @ partial(map, >> square))(xs), but that's pretty clearly less readable than >> root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's >> been my main argument: Without a full suite of higher-level operators >> and related syntax, compose alone doesn't do you any good except for toy >> examples. > > How about an operator for partial? > > root @ mean @ map $ square(xs) I'm pretty sure that anyone who sees that and doesn't interpret it as meaningless nonsense is going to interpret it as a variation on Haskell and get the wrong intuition. But, more importantly, this doesn't work. Your square(xs) isn't going to evaluate to a function, but to a whatever falling square on xs returns. (Which is presumably a TypeError, or you wouldn't be looking to map in the first place). And, even if that did work, you're not actually composing a function here anyway; your @ is just a call operator, which we already have in Python, spelled with parens. > Actually I'd rather reuse the binary operators. (I'd be happy if they were just methods on bytes objects BTW.) > > compose(root, mean, map(square, xs)) Now you're not calling square(xs), but you are calling map(square, xs), which is going to return an iterable of squares, not a function; again, you're not composing a function object at all. And think about how you'd actually write this correctly. You need to either use lambda (which defeats the entire purpose of compose), or partial (which works, but is clumsy and ugly enough without an operator or syntactic sugar that people rarely use it). > > root ^ mean ^ map & square (xs) > > root ^ mean ^ map & square ^ xs () > > Read this as... > > compose root, of mean, of map with square, of xs But that's not composing. The whole point of compose is that you can compose root of mean of mappings square over some argument to be passed in later, and the result is itself a function over some argument to be passed in later. What you're doing doesn't add any new abstraction, it just obfuscates normal function application. > Or... > > apply(map(square, xs), mean, root) > > map & square | mean | root (xs) > > xs | map & square | mean | root () > > > Read this as... > > apply xs, to map with square, to mean, to root > > > These are kind of cool, but does it make python code easier to read? That seems like it may be subjective depending on the amount of programming experience someone has. > > Cheers, > Ron > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Sun May 10 00:49:22 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 9 May 2015 15:49:22 -0700 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <20150509181642.GB5663@ando.pearwood.info> Message-ID: <3A90E8AC-ED40-4BD6-A895-F523246B173E@yahoo.com> On May 9, 2015, at 11:33, Donald Stufft wrote: > > >> On May 9, 2015, at 2:30 PM, David Mertz wrote: >> >>> On Sat, May 9, 2015 at 1:16 PM, Steven D'Aprano wrote: >>> On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote: >>> >>> > How about an operator for partial? >>> > >>> > root @ mean @ map $ square(xs) >> >> I have trouble seeing the advantage of a special function composition operator when it is easy to write a general 'compose()' function that can produce such things easily enough. >> >> E.g. in a white paper I just did for O'Reilly on _Functional Programming in Python_ I propose this little example implementation: >> >> def compose(*funcs): >> "Return a new function s.t. compose(f,g,...)(x) == f(g(...(x)))" >> def inner(data, funcs=funcs): >> result = data >> for f in reversed(funcs): >> result = f(result) >> return result >> return inner >> >> Which we might use as: >> >> RMS = compose(root, mean, square) >> result = RMS(my_array) > > > Maybe functools.compose? But why? This is trivial to write. The nontrivial part is thinking through whether you want left or right compose, what you want to do about multiple arguments, etc. So, unless we can solve _that_ problem by showing that there is one and only one obvious answer, we don't gain anything by implementing one of the many trivial-to-implement possibilities in the stdlib. Maybe as a recipe in the docs, it would be worth showing two different compose functions to demonstrate how easy it is to write whichever one you want (and how important it is to figure out which one you want). -------------- next part -------------- An HTML attachment was scrubbed... URL: From koos.zevenhoven at aalto.fi Sun May 10 01:07:19 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Sun, 10 May 2015 02:07:19 +0300 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> Message-ID: <554E9327.9030706@aalto.fi> On 10.5.2015 1:03, Gregory Salvan wrote: > Nobody convinced by arrow operator ? > > like: arg -> spam -> eggs -> cheese > or cheese <- eggs <- spam <- arg > > I like | a lot because of the pipe analogy. However, having a new operator for this could solve some issues about operator precedence. Today, I sketched one possible version that would use a new .. operator. I'll explain what it would do (but with your -> instead of my ..) Here, the operator (.. or ->) would have a higher precedence than function calls () but a lower precedence than attribute access (obj.attr). First, with single-argument functions spam, eggs and cheese, and a non-function arg: arg->eggs->spam->cheese() # equivalent to cheese(spam(eggs(arg))) eggs->spam->cheese # equivalent to lambda arg: cheese(spam(eggs(arg))) Then if, spam and eggs both took two arguments; eggs(arg1, arg2), spam(arg1, arg2) arg->eggs # equivalent to partial(eggs, arg) eggs->spam(a, b, c) # equivalent to spam(eggs(a, b), c) arg->eggs->spam(b,c) # equivalent to spam(eggs(arg, b), c) So you could think of -> as an extended partial operator. And this would naturally generalize to functions with even more arguments. The arguments would always be fed in the same order as in the equivalent function call, which makes for a nice rule of thumb. However, I suppose one would usually avoid combinations that are difficult to understand. Some examples that this would enable: # Example 1 from numpy import square, mean, sqrt rms = square->mean->sqrt # I think this order is fine because it is not @ # Example 2 (both are equivalent) spam(args)->eggs->cheese() # the shell-syntax analogy that Steven mentioned. # Example 3 # Last but not least, we would finally have this :) some_sequence->len() some_object->isinstance(MyType) -- Koos -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Sun May 10 01:28:29 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Sun, 10 May 2015 01:28:29 +0200 Subject: [Python-ideas] Function composition (was no subject) Message-ID: I was thinking about recent ideas discussed here. I also returned back to origins of my initial idea. The point is that it came from Numpy, I use Numpy arrays everyday, and typically I do exactly something like root(mean(square(data))). Now I am thinking: what is actually a matrix? It is something that takes a vector and returns a vector. But on the other hand the same actually do elementwise functions. It does not really matter, what we do with a vector: transform by a product of matrices or by composition of functions. In other words I agree with Andrew that "elementwise" is a good match with compose, and what we really need is to "pipe" things that take a vector (or just an iterable) and return a vector (iterable). So that probably a good place (in a potential future) for compose would be not functools but itertools. But indeed a good place to test this would be Numpy. An additional comment: it is indeed good to have both @ and | for compose and rcompose. Side note, one can actually overload __rmatmul__ on arrays as well so that you can write root @ mean @ square @ data Moreover, one can overload __or__ on arrays, so that one can write data | square | mean | root even with ordinary functions (not Numpy's ufuncs or composable) . These examples are actually "flat is better than nested" in the extreme form. Anyway, they (Numpy) are going to implement the @ operator for arrays, may be it would be a good idea to check that if something on the left from me (array) is not an array but a callable then apply it elementwise. Concerning the multi-argument functions, I don't like $ symbol, don't know why. It seems really unintuitive why it means partial application. One can autocurry composable functions and apply same rules that Numpy uses for ufuncs. More precisely, if I write add(data1, data2) with arrays it applies add pairwise. But if I write add(data1, 42) it is also fine, it simply adds 42 to every element. With autocurrying one could write root @ mean @ add(data) @ square @ data2 or root @ mean @ square @ add(42) @ data However, as I see it now it is not very readable, so that may be the best choise is to reserve @ and | for "piping" iterables through transformers that take one argument. In other words it should be left to user to make add(42) of an appropriate type. It is the same logic as for decorators, if I write @modify(arg) def func(x): return None I must care that modify(arg) evaluates to something that takes one callable and returns a callable. On May 9, 2015, at 01:36, Stephen J. Turnbull wrote: > > > > Andrew Barnert writes: > >>> On May 8, 2015, at 19:58, Stephen J. Turnbull > wrote: > >>> > >>> Koos Zevenhoven writes: > >>> > >>>> As a random example, (root @ mean @ square)(x) would produce the right > >>>> order for rms when using [2]. > >>> > >>> Hardly interesting. :-) The result is an exception, as root and square > >>> are conceptually scalar-to-scalar, while mean is sequence-to-scalar. > >> > >> Unless you're using an elementwise square and an array-to-scalar > >> mean, like the ones in NumPy, > > > > Erm, why would square be elementwise and root not? I would suppose > > that everything is element-wise in Numpy (not a user yet). > > Most functions in NumPy are elementwise when applied to arrays, but can > also be applied to scalars. So, square is elementwise because it's called > on an array, root is scalar because it's called on a scalar. (In fact, root > could also be elementwise--aggregating functions like mean can be applied > across just one axis of a 2D or higher array, reducing it by one dimension, > if you want.) > > Before you try it, this sounds like a complicated nightmare that can't > possibly work in practice. But play with it for just a few minutes and it's > completely natural. (Except for a few cases where you want some array-wide > but not element-wise operation, most famously matrix multiplication, which > is why we now have the @ operator to play with.) > > >> in which case it works perfectly well... > > > > But that's an aspect of my point (evidently, obscure). Conceptually, > > as taught in junior high school or so, root and square are scalar-to- > > scalar. If you are working in a context such as Numpy where it makes > > sense to assume they are element-wise and thus composable, the context > > should provide the compose operator(s). > > I was actually thinking on these lines: what if @ didn't work on > types.FunctionType, but did work on numpy.ufunc (the name for the > "universal function" type that knows how to broadcast across arrays but > also work on scalars)? That's something NumPy could implement without any > help from the core language. (Methods are a minor problem here, but it's > obvious how to solve them, so I won't get into it.) And if it turned out to > be useful all over the place in NumPy, that might turn up some great uses > for the idiomatic non-NumPy Python, or it might show that, like elementwise > addition, it's really more a part of NumPy than of Python. > > But of course that's more of a proposal for NumPy than for Python. > > > Without that context, Koos's > > example looks like a TypeError. > > >> But Koos's example, even if it was possibly inadvertent, shows that > >> I may be wrong about that. Maybe compose together with element-wise > >> operators actually _is_ sufficient for something beyond toy > >> examples. > > > > Of course it is! I didn't really think there was any doubt > > about that. > > I think there was, and still is. People keep coming up with abstract toy > examples, but as soon as someone tries to give a good real example, it only > makes sense with NumPy (Koos's) or with some syntax that Python doesn't > have (yours), because to write them with actual Python functions would > actually be ugly and verbose (my version of yours). > > I don't think that's a coincidence. You didn't write "map square" because > you don't know how to think in Python, but because using compose profitably > inherently implies not thinking in Python. (Except, maybe, in the case of > NumPy... which is a different idiom.) Maybe someone has a bunch of obvious > good use cases for compose that don't also require other functions, > operators, or syntax we don't have, but so far, nobody's mentioned one. > > ------------------------------ > > On 5/9/2015 6:19 AM, Andrew Barnert via Python-ideas wrote: > > > I think there was, and still is. People keep coming up with abstract toy > examples, but as soon as someone tries to give a good real example, it only > makes sense with NumPy (Koos's) or with some syntax that Python doesn't > have (yours), because to write them with actual Python functions would > actually be ugly and verbose (my version of yours). > > > > I don't think that's a coincidence. You didn't write "map square" > because you don't know how to think in Python, but because using compose > profitably inherently implies not thinking in Python. (Except, maybe, in > the case of NumPy... which is a different idiom.) Maybe someone has a bunch > of obvious good use cases for compose that don't also require other > functions, operators, or syntax we don't have, but so far, nobody's > mentioned one. > > I agree that @ is most likely to be usefull in numpy's restricted context. > > A composition operator is usually defined by application: f at g(x) is > defined as f(g(x)). (I sure there are also axiomatic treatments.) It > is an optional syntactic abbreviation. It is most useful in a context > where there is one set of data objects, such as the real numbers, or one > set + arrays (vectors) defined on the one set; where all function are > univariate (or possible multivariate, but that can can be transformed to > univariate on vectors); *and* where parameter names are dummies like > 'x', 'y', 'z', or '_'. > > The last point is important. Abbreviating h(x) = f(g(x)) with h = f @ g > does not lose any information as 'x' is basically a placeholder (so get > rid of it). But parameter names are important in most practical > contexts, both for understanding a composition and for using it. > > dev npv(transfers, discount): > '''Return the net present value of discounted transfers. > > transfers: finite iterable of amounts at constant intervals > discount: fraction per interval > ''' > divisor = 1 + discount > return sum(tranfer/divisor**time > for time, transfer in enumerate(transfers)) > > Even if one could replace the def statement with > npv = > with parameter names omitted, it would be harder to understand. Using > it would require the ability to infer argument types and order from the > composed expression. > > I intentionally added a statement to calculate the common subexpression > prior to the return. I believe it would have to put back in the return > expression before converting. > > -- > Terry Jan Reedy > > > > ------------------------------ > > On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote: > >> >I suppose you could write (root @ mean @ (map square)) (xs), > > > Actually, you can't. You could write (root @ mean @ partial(map, > > square))(xs), but that's pretty clearly less readable than > > root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's > > been my main argument: Without a full suite of higher-level operators > > and related syntax, compose alone doesn't do you any good except for toy > > examples. > > How about an operator for partial? > > root @ mean @ map $ square(xs) > > > Actually I'd rather reuse the binary operators. (I'd be happy if they were > just methods on bytes objects BTW.) > > compose(root, mean, map(square, xs)) > > root ^ mean ^ map & square (xs) > > root ^ mean ^ map & square ^ xs () > > Read this as... > > compose root, of mean, of map with square, of xs > > Or... > > apply(map(square, xs), mean, root) > > map & square | mean | root (xs) > > xs | map & square | mean | root () > > > Read this as... > > apply xs, to map with square, to mean, to root > > > These are kind of cool, but does it make python code easier to read? That > seems like it may be subjective depending on the amount of programming > experience someone has. > > Cheers, > Ron > > > > ------------------------------ > > Hi, > I had to answer some of these questions when I wrote Lawvere: > https://pypi.python.org/pypi/lawvere > > First, there is two kind of composition: pipe and circle so I think a > single operator like @ is a bit restrictive. > I like "->" and "<-" > > Then, for function name and function to string I had to introduce function > signature (a tuple). > It provides a good tool for decomposition, introspection and comparison in > respect with mathematic definition. > > Finally, for me composition make sense when you have typed functions > otherwise it can easily become a mess and this make composition tied to > multiple dispatch. > > I really hope composition will be introduced in python but I can't see how > it be made without rethinking a good part of function definition. > > > > 2015-05-09 17:38 GMT+02:00 Ron Adam : > > > > > > > On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote: > > > >> >I suppose you could write (root @ mean @ (map square)) (xs), > >>> > >> > > Actually, you can't. You could write (root @ mean @ partial(map, > >> square))(xs), but that's pretty clearly less readable than > >> root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's > >> been my main argument: Without a full suite of higher-level operators > >> and related syntax, compose alone doesn't do you any good except for toy > >> examples. > >> > > > > How about an operator for partial? > > > > root @ mean @ map $ square(xs) > > > > > > Actually I'd rather reuse the binary operators. (I'd be happy if they > > were just methods on bytes objects BTW.) > > > > compose(root, mean, map(square, xs)) > > > > root ^ mean ^ map & square (xs) > > > > root ^ mean ^ map & square ^ xs () > > > > Read this as... > > > > compose root, of mean, of map with square, of xs > > > > Or... > > > > apply(map(square, xs), mean, root) > > > > map & square | mean | root (xs) > > > > xs | map & square | mean | root () > > > > > > Read this as... > > > > apply xs, to map with square, to mean, to root > > > > > > These are kind of cool, but does it make python code easier to read? > That > > seems like it may be subjective depending on the amount of programming > > experience someone has. > > > > Cheers, > > Ron > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun May 10 02:05:06 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 9 May 2015 17:05:06 -0700 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: Message-ID: <84C6058F-8015-4703-979D-59CD9780F93A@yahoo.com> On May 9, 2015, at 16:28, Ivan Levkivskyi wrote: > > I was thinking about recent ideas discussed here. I also returned back to origins of my initial idea. The point is that it came from Numpy, I use Numpy arrays everyday, and typically I do exactly something like root(mean(square(data))). > > Now I am thinking: what is actually a matrix? It is something that takes a vector and returns a vector. But on the other hand the same actually do elementwise functions. It does not really matter, what we do with a vector: transform by a product of matrices or by composition of functions. In other words I agree with Andrew that "elementwise" is a good match with compose, and what we really need is to "pipe" things that take a vector (or just an iterable) and return a vector (iterable). > > So that probably a good place (in a potential future) for compose would be not functools but itertools. But indeed a good place to test this would be Numpy. Itertools is an interesting idea. Anyway, assuming NumPy isn't going to add this in the near future (has anyone even brought it up on the NumPy list, or only here?), it wouldn't be that hard to write a (maybe inefficient but working) @composable wrapper and wrap all the relevant callables from NumPy or from itertools, upload it to PyPI, and let people start coming up with good examples. If it's later worth direct support in NumPy and/or Python (for simplicity or performance), the module will still be useful for backward compatibility. > An additional comment: it is indeed good to have both @ and | for compose and rcompose. > Side note, one can actually overload __rmatmul__ on arrays as well so that you can write > > root @ mean @ square @ data But this doesn't need to overload it on arrays, only on the utuncs, right? Unless you're suggesting that one of these operations could be a matrix as easily as a function, and NumPy users often won't have to care which it is? > > Moreover, one can overload __or__ on arrays, so that one can write > > data | square | mean | root > > even with ordinary functions (not Numpy's ufuncs or composable) . That's an interesting point. But I think this will be a bit confusing, because now it _does_ matter whether square is a matrix or a function--you'll get elementwise bitwise or instead of application. (And really, this is the whole reason for @ in the first place--we needed an operator that never means elementwise.) Also, this doesn't let you actually compose functions--if you want square | mean | root to be a function, square has to have a __or__ operator. > These examples are actually "flat is better than nested" in the extreme form. > > Anyway, they (Numpy) are going to implement the @ operator for arrays, may be it would be a good idea to check that if something on the left from me (array) is not an array but a callable then apply it elementwise. > > Concerning the multi-argument functions, I don't like $ symbol, don't know why. It seems really unintuitive why it means partial application. > One can autocurry composable functions and apply same rules that Numpy uses for ufuncs. > More precisely, if I write > > add(data1, data2) > > with arrays it applies add pairwise. But if I write > > add(data1, 42) > > it is also fine, it simply adds 42 to every element. With autocurrying one could write > > root @ mean @ add(data) @ square @ data2 > > or > > root @ mean @ square @ add(42) @ data > > However, as I see it now it is not very readable, so that may be the best choise is to reserve @ and | for "piping" iterables through transformers that take one argument. In other words it should be left to user to make add(42) of an appropriate type. It is the same logic as for decorators, if I write > > @modify(arg) > def func(x): > return None > > I must care that modify(arg) evaluates to something that takes one callable and returns a callable. > > >> On May 9, 2015, at 01:36, Stephen J. Turnbull wrote: >> > >> > Andrew Barnert writes: >> >>> On May 8, 2015, at 19:58, Stephen J. Turnbull wrote: >> >>> >> >>> Koos Zevenhoven writes: >> >>> >> >>>> As a random example, (root @ mean @ square)(x) would produce the right >> >>>> order for rms when using [2]. >> >>> >> >>> Hardly interesting. :-) The result is an exception, as root and square >> >>> are conceptually scalar-to-scalar, while mean is sequence-to-scalar. >> >> >> >> Unless you're using an elementwise square and an array-to-scalar >> >> mean, like the ones in NumPy, >> > >> > Erm, why would square be elementwise and root not? I would suppose >> > that everything is element-wise in Numpy (not a user yet). >> >> Most functions in NumPy are elementwise when applied to arrays, but can also be applied to scalars. So, square is elementwise because it's called on an array, root is scalar because it's called on a scalar. (In fact, root could also be elementwise--aggregating functions like mean can be applied across just one axis of a 2D or higher array, reducing it by one dimension, if you want.) >> >> Before you try it, this sounds like a complicated nightmare that can't possibly work in practice. But play with it for just a few minutes and it's completely natural. (Except for a few cases where you want some array-wide but not element-wise operation, most famously matrix multiplication, which is why we now have the @ operator to play with.) >> >> >> in which case it works perfectly well... >> > >> > But that's an aspect of my point (evidently, obscure). Conceptually, >> > as taught in junior high school or so, root and square are scalar-to- >> > scalar. If you are working in a context such as Numpy where it makes >> > sense to assume they are element-wise and thus composable, the context >> > should provide the compose operator(s). >> >> I was actually thinking on these lines: what if @ didn't work on types.FunctionType, but did work on numpy.ufunc (the name for the "universal function" type that knows how to broadcast across arrays but also work on scalars)? That's something NumPy could implement without any help from the core language. (Methods are a minor problem here, but it's obvious how to solve them, so I won't get into it.) And if it turned out to be useful all over the place in NumPy, that might turn up some great uses for the idiomatic non-NumPy Python, or it might show that, like elementwise addition, it's really more a part of NumPy than of Python. >> >> But of course that's more of a proposal for NumPy than for Python. >> >> > Without that context, Koos's >> > example looks like a TypeError. >> >> >> But Koos's example, even if it was possibly inadvertent, shows that >> >> I may be wrong about that. Maybe compose together with element-wise >> >> operators actually _is_ sufficient for something beyond toy >> >> examples. >> > >> > Of course it is! I didn't really think there was any doubt >> > about that. >> >> I think there was, and still is. People keep coming up with abstract toy examples, but as soon as someone tries to give a good real example, it only makes sense with NumPy (Koos's) or with some syntax that Python doesn't have (yours), because to write them with actual Python functions would actually be ugly and verbose (my version of yours). >> >> I don't think that's a coincidence. You didn't write "map square" because you don't know how to think in Python, but because using compose profitably inherently implies not thinking in Python. (Except, maybe, in the case of NumPy... which is a different idiom.) Maybe someone has a bunch of obvious good use cases for compose that don't also require other functions, operators, or syntax we don't have, but so far, nobody's mentioned one. >> >> ------------------------------ >> >> On 5/9/2015 6:19 AM, Andrew Barnert via Python-ideas wrote: >> >> > I think there was, and still is. People keep coming up with abstract toy examples, but as soon as someone tries to give a good real example, it only makes sense with NumPy (Koos's) or with some syntax that Python doesn't have (yours), because to write them with actual Python functions would actually be ugly and verbose (my version of yours). >> > >> > I don't think that's a coincidence. You didn't write "map square" because you don't know how to think in Python, but because using compose profitably inherently implies not thinking in Python. (Except, maybe, in the case of NumPy... which is a different idiom.) Maybe someone has a bunch of obvious good use cases for compose that don't also require other functions, operators, or syntax we don't have, but so far, nobody's mentioned one. >> >> I agree that @ is most likely to be usefull in numpy's restricted context. >> >> A composition operator is usually defined by application: f at g(x) is >> defined as f(g(x)). (I sure there are also axiomatic treatments.) It >> is an optional syntactic abbreviation. It is most useful in a context >> where there is one set of data objects, such as the real numbers, or one >> set + arrays (vectors) defined on the one set; where all function are >> univariate (or possible multivariate, but that can can be transformed to >> univariate on vectors); *and* where parameter names are dummies like >> 'x', 'y', 'z', or '_'. >> >> The last point is important. Abbreviating h(x) = f(g(x)) with h = f @ g >> does not lose any information as 'x' is basically a placeholder (so get >> rid of it). But parameter names are important in most practical >> contexts, both for understanding a composition and for using it. >> >> dev npv(transfers, discount): >> '''Return the net present value of discounted transfers. >> >> transfers: finite iterable of amounts at constant intervals >> discount: fraction per interval >> ''' >> divisor = 1 + discount >> return sum(tranfer/divisor**time >> for time, transfer in enumerate(transfers)) >> >> Even if one could replace the def statement with >> npv = >> with parameter names omitted, it would be harder to understand. Using >> it would require the ability to infer argument types and order from the >> composed expression. >> >> I intentionally added a statement to calculate the common subexpression >> prior to the return. I believe it would have to put back in the return >> expression before converting. >> >> -- >> Terry Jan Reedy >> >> >> >> ------------------------------ >> >> On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote: >> >> >I suppose you could write (root @ mean @ (map square)) (xs), >> >> > Actually, you can't. You could write (root @ mean @ partial(map, >> > square))(xs), but that's pretty clearly less readable than >> > root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's >> > been my main argument: Without a full suite of higher-level operators >> > and related syntax, compose alone doesn't do you any good except for toy >> > examples. >> >> How about an operator for partial? >> >> root @ mean @ map $ square(xs) >> >> >> Actually I'd rather reuse the binary operators. (I'd be happy if they were >> just methods on bytes objects BTW.) >> >> compose(root, mean, map(square, xs)) >> >> root ^ mean ^ map & square (xs) >> >> root ^ mean ^ map & square ^ xs () >> >> Read this as... >> >> compose root, of mean, of map with square, of xs >> >> Or... >> >> apply(map(square, xs), mean, root) >> >> map & square | mean | root (xs) >> >> xs | map & square | mean | root () >> >> >> Read this as... >> >> apply xs, to map with square, to mean, to root >> >> >> These are kind of cool, but does it make python code easier to read? That >> seems like it may be subjective depending on the amount of programming >> experience someone has. >> >> Cheers, >> Ron >> >> >> >> ------------------------------ >> >> Hi, >> I had to answer some of these questions when I wrote Lawvere: >> https://pypi.python.org/pypi/lawvere >> >> First, there is two kind of composition: pipe and circle so I think a >> single operator like @ is a bit restrictive. >> I like "->" and "<-" >> >> Then, for function name and function to string I had to introduce function >> signature (a tuple). >> It provides a good tool for decomposition, introspection and comparison in >> respect with mathematic definition. >> >> Finally, for me composition make sense when you have typed functions >> otherwise it can easily become a mess and this make composition tied to >> multiple dispatch. >> >> I really hope composition will be introduced in python but I can't see how >> it be made without rethinking a good part of function definition. >> >> >> >> 2015-05-09 17:38 GMT+02:00 Ron Adam : >> >> > >> > >> > On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote: >> > >> >> >I suppose you could write (root @ mean @ (map square)) (xs), >> >>> >> >> >> > Actually, you can't. You could write (root @ mean @ partial(map, >> >> square))(xs), but that's pretty clearly less readable than >> >> root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's >> >> been my main argument: Without a full suite of higher-level operators >> >> and related syntax, compose alone doesn't do you any good except for toy >> >> examples. >> >> >> > >> > How about an operator for partial? >> > >> > root @ mean @ map $ square(xs) >> > >> > >> > Actually I'd rather reuse the binary operators. (I'd be happy if they >> > were just methods on bytes objects BTW.) >> > >> > compose(root, mean, map(square, xs)) >> > >> > root ^ mean ^ map & square (xs) >> > >> > root ^ mean ^ map & square ^ xs () >> > >> > Read this as... >> > >> > compose root, of mean, of map with square, of xs >> > >> > Or... >> > >> > apply(map(square, xs), mean, root) >> > >> > map & square | mean | root (xs) >> > >> > xs | map & square | mean | root () >> > >> > >> > Read this as... >> > >> > apply xs, to map with square, to mean, to root >> > >> > >> > These are kind of cool, but does it make python code easier to read? That >> > seems like it may be subjective depending on the amount of programming >> > experience someone has. >> > >> > Cheers, >> > Ron >> > >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From koos.zevenhoven at aalto.fi Sun May 10 02:51:38 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Sun, 10 May 2015 03:51:38 +0300 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com> References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com> Message-ID: <554EAB9A.2090501@aalto.fi> On 10.5.2015 2:28, Ivan Levkivskyi wrote: > functions. In other words I agree with Andrew that "elementwise" is a > good match with compose, and what we really need is to "pipe" things > that take a vector (or just an iterable) and return a vector (iterable). > > So that probably a good place (in a potential future) for compose > would be not functools but itertools. But indeed a good place to test > this would be Numpy. > Another way to deal with elementwise operations on iterables would be to make a small, mostly backwards compatible change in map: When map is called with just one argument, for instance map(square), it would return a function that takes iterables and maps them element-wise. Now it would be easier to use map in pipelines, for example: rms = sqrt @ mean @ map(square) or values->map(square)->mean->sqrt() Or if the change in map is not popular, there could be something like functools.mapper(func) that does that. Or even something more crazy, like square.map(seq), so that square.map could be used in pipelines. -- Koos -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun May 10 04:56:30 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 May 2015 12:56:30 +1000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <554E5CC9.3010406@aalto.fi> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> Message-ID: <20150510025630.GC5663@ando.pearwood.info> On Sat, May 09, 2015 at 10:15:21PM +0300, Koos Zevenhoven wrote: > > On 2015-05-09 21:16, Steven D'Aprano wrote: [...] > >It's a pity we can't match the shell syntax and write: > > > >spam(args)|eggs|cheese > > > >but that would have a completely different meaning. > > But it does not need to have a different meaning. It *should* have a different meaning. I want it to have a different meaning. Python is not the shell and spam(args) could be a factory function which itself returns a callable, e.g. partial, or a decorator. We cannot match the shell syntax because Python can do so much more than the shell. > You could in addition have: > > spam @ eggs @ cheese @ arg # equivalent to spam(eggs(cheese(arg))) > > arg | spam | eggs | cheese # equivalent to cheese(eggs(spam(arg))) > > Here, arg would thus be recognized as not a function. No. I think it is absolutely vital to distinguish by syntax the difference between composition and function application, and not try to "do what I mean". DWIM software has a bad history of doing the wrong thing. Every other kind of callable uses obj(arg) to call it: types, functions, methods, partial objects, etc. We shouldn't make function composition try to be different. If I write sqrt at 100 I should get a runtime error, not 10. I don't mind if the error is delayed until I actually try to call the composed object, but at some point I should get a TypeError that 100 is not callable. -- Steve From steve at pearwood.info Sun May 10 05:01:46 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 May 2015 13:01:46 +1000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> Message-ID: <20150510030145.GD5663@ando.pearwood.info> On Sat, May 09, 2015 at 10:41:24PM +0200, Gregory Salvan wrote: > pipeline operator may be confusing with bitwise operator. > In this case : > eggs = arg | spam | cheese > > Is eggs a composed function or string of bits ? Or a set? I think it is okay to overload operators and give them different meanings: z = x + y Is z a number, a string, a list, a tuple? Something else? In practice, we rely on sensible names or context to understand overloaded operators, if you see foo = search | grep | log | process it = (foo(x) for x in data) run(it) it should be fairly obvious from context that foo is not a set or string of bits :-) -- Steve From ron3200 at gmail.com Sun May 10 05:08:32 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 09 May 2015 23:08:32 -0400 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <91A6985C-A94B-4132-99B1-0305933950B5@yahoo.com> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <91A6985C-A94B-4132-99B1-0305933950B5@yahoo.com> Message-ID: On 05/09/2015 06:45 PM, Andrew Barnert via Python-ideas wrote: > On May 9, 2015, at 08:38, Ron Adam wrote: >> > >> > >> > >> >On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote: >>>>> >>> >I suppose you could write (root @ mean @ (map square)) (xs), >> > >>> >>Actually, you can't. You could write (root @ mean @ partial(map, >>> >>square))(xs), but that's pretty clearly less readable than >>> >>root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's >>> >>been my main argument: Without a full suite of higher-level operators >>> >>and related syntax, compose alone doesn't do you any good except for toy >>> >>examples. >> > >> >How about an operator for partial? >> > >> > root @ mean @ map $ square(xs) > I'm pretty sure that anyone who sees that and doesn't interpret it as > meaningless nonsense is going to interpret it as a variation on Haskell and > get the wrong intuition. Yes, I agree that is the problems with it. > But, more importantly, this doesn't work. Your square(xs) isn't going > to evaluate to a function, but to a whatever falling square on xs returns. > (Which is presumably a TypeError, or you wouldn't be looking to map in the > first place). And, even if that did work, you're not actually composing a > function here anyway; your @ is just a call operator, which we already have > in Python, spelled with parens. This is following the patterns being discussed in the thread. (or at least an attempt to do so.) The @ and $ above would bind more tightly than the (). Like the doc "." does for method calls. But the evaluation is from left to right at call time. The calling part does not need to be done at the same times the rest is done. Or at least that is what I got from the conversation. f = root @ mean @ map & square result = f(xs) The other examples would work the same. >> >Actually I'd rather reuse the binary operators. (I'd be happy if they were just methods on bytes objects BTW.) >> > >> > compose(root, mean, map(square, xs)) > Now you're not calling square(xs), but you are calling map(square, xs), > which is going to return an iterable of squares, not a function; again, > you're not composing a function object at all. Yes, this is what directly calling the functions to do the same thing would look like. Except without returning a composed function. > And think about how you'd actually write this correctly. You need to > either use lambda (which defeats the entire purpose of compose), or partial > (which works, but is clumsy and ugly enough without an operator or > syntactic sugar that people rarely use it). The advantage of the syntax is that it is a "potentially" (a matter of opinion) alternative to using lambda. And apparently there are a few here who think doing it with lambda's or other means is less than ideal. Personally I'm not convinced yet either. Cheers, Ron From steve at pearwood.info Sun May 10 05:14:56 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 May 2015 13:14:56 +1000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> Message-ID: <20150510031455.GE5663@ando.pearwood.info> On Sun, May 10, 2015 at 12:03:24AM +0200, Gregory Salvan wrote: > Nobody convinced by arrow operator ? > > like: arg -> spam -> eggs -> cheese > or cheese <- eggs <- spam <- arg Absolutely not! If we were designing a new language from scratch, I might consider arrow operators. I think that they are cute. But this proposal is going to be hard enough to get approval using *existing* operators, | __or__ and @ __mat_mul__ (if I remember the dunder methods correctly). To convince people that we should support function composition as a built-in feature, using NEW operators that will need the parser changed to recognise, and new dunder methods, well, that will be virtually impossible. numpy is one of the biggest and most important user bases for Python, and it took them something like ten years and multiple failed attempts to get enough support for adding the @ operator. You *might* just have a chance for a -> right arrow operator, just barely, but the left arrow <- operator is, I'm pretty sure, doomed to failure. The problem is that the parser would need to distinguish these two cases: f<-x # f left-arrow x f<-x # f less than minus x and I don't think that is possible with Python's parser. -- Steve From steve at pearwood.info Sun May 10 05:20:16 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 May 2015 13:20:16 +1000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <554EAB9A.2090501@aalto.fi> References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com> <554EAB9A.2090501@aalto.fi> Message-ID: <20150510032016.GF5663@ando.pearwood.info> On Sun, May 10, 2015 at 03:51:38AM +0300, Koos Zevenhoven wrote: > Another way to deal with elementwise operations on iterables would be to > make a small, mostly backwards compatible change in map: > > When map is called with just one argument, for instance map(square), it > would return a function that takes iterables and maps them element-wise. > > Now it would be easier to use map in pipelines, for example: > > rms = sqrt @ mean @ map(square) Or just use a tiny helper function: def vectorise(func): return partial(map, func) rms = sqrt @ mean @ vectorise(square) -- Steve From larocca at abiresearch.com Sun May 10 06:58:29 2015 From: larocca at abiresearch.com (Douglas La Rocca) Date: Sun, 10 May 2015 04:58:29 +0000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <20150510032016.GF5663@ando.pearwood.info> References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com> <554EAB9A.2090501@aalto.fi>,<20150510032016.GF5663@ando.pearwood.info> Message-ID: <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com> (Newcomer here.) I use function composition pretty extensively. I've found it to be incredibly powerful, but can lead to bad practices. Certain other drawbacks are there as well, like unreadable tracebacks. But in many cases there are real benefits. And for data pipelines where you want to avoid state and mutation it works well. The fn and pymonad modules implement infix composition functions through overloading but I've found this to be unworkable. For me, the ideal infix operator would simply be a space, with the composition wrapped in parentheses. So e.g. >>> (list str sorted)(range(10)) [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ',', ',', ',', ',', ',', ',', ',', ',', ',', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '[', ']'] I might be overlooking something, but it seems to me this would work with existing syntax and semantics and wouldn't conflict with anything else like operator overloading would. The only other place non-indentation level spaces are significant is with keywords which can't be re-assigned. So e.g. (yield from gen()) wouldn't be parsed as 3 functions, and (def func) would raise SyntaxError. Here's the composition function I'm working with, stripped of the little debugging helpers: ``` def compose(*fns): def compose_(*x): fn, *fns = fns value = fn(*x) if fns: return compose(*fns)(value) else: return value return compose_ O=compose ``` I haven't had any issues with the recursion. The `O` alias rubs me the wrong way but seemed to make sense at the time. The thought was that it should look like an operator because it acts like one. So the use looks like >>> O(fn1, fn2, fn3, ...)('string to be piped') The problem for composition is essentially argument passing and has to do with the convenience of *args, **kwargs. The way to make composition work predictably is to curry the functions yourself, wrapping the arguments you expect to get with nested closures, then repairing the __name__ etc with functools.wraps or update_wrapper in the usual way. This looks much nicer and almost natural when you write it with lambdas, e.g. >>> getitem = lambda item: lambda container: container[item] (Apologies for having named that lambda there...) The other way to manage passing values from one function to the next is to define a function like def star(x): return lambda fn: fn(*x) Then if you get a list at one point in the pipeline and your function takes *args, you can decorate the function and call it like >>> star(getattr)((getattr, '__name__')) 'getattr' I've run into problems using the @curried decorators from the fn and pymonad modules because they don't how to handle *args, i.e. when to stop collecting arguments and finally make the function call. If you want to have the composition order reversed you could decorate the definition with ``` def flip(f): def flip_(*x): f(*reversed(x)) return flip_ ``` Once we have composition we can write partials for `map`, `filter`, and `reduce`, but with a small twist: make them variadic in the first argument and pass the arguments to compose: def fmap(*fn): def fmap_(x): return list(map(compose(*fn),x)) return fmap_ def ffilter(fn): def ffilter_(xs): return list(filter(fn, xs)) return ffilter_ def freduce(fn): def _freduce(xs): return reduce(fn, xs) return _freduce def Fmap(*fns): def Fmap_(x): return list(map(lambda fn:fn(x), fns)) return Fmap_ The `Fmap` function seemed like some sort of "conjugate" to `fmap` so I tried to give it name suggesting this (again, at the expense of abusing naming conventions). Instead of mapping a function over a iterable like `fmap`, `Fmap` applies a each given function to a value. So >>> Fmap(add(1), sub(1))(1) [2, 0] I've called them `fmap`, `ffilter`, and `freduce` but don't much like these names as they imply they might be the same as Haskell's `fmap`, and they're not. And there's no way to make them anything like Haskell as far as I can tell and they shouldn't be. If these implement a "paradigm" it's not purely functional but tacit/concatenative. It made sense to compose the passed arguments because there's no reason to pass anything else to `fmap` in the first call. So sequential calls to (the return value of) `fmap` inside a pipeline, like >>> O(mul(10), ... fmap(add(1)), ... fmap(mul(2)) ... )([1]) [4, 4, 4, 4, 4, 4, 4, 4, 4, 4] can instead be written like >>> O(mul(10), ... fmap(add(1), ... mul(2)) ... )([1]) [4, 4, 4, 4, 4, 4, 4, 4, 4, 4] It also makes it easier to work at different levels inside nested structures. In these heavily nested cases the composition pipeline even begins to resemble the data structure passing through, which makes sense. As another example, following is part of a pipeline that takes strings of bullet-separated strings of "key:value" pairs and converts each one to a dictionary, then folds the result together: >>> d = [' foo00 : bar00 ? foo01 : bar01 ', ... ' foo10 : bar10 ? foo11 : bar11 ', ... ' foo20 : bar10 ? foo21 : bar21 ',] >>> dict_foldl = freduce(lambda d1, d2: dict(d1, **d2)) >>> strip = lambda x: lambda s: s.strip(x) >>> split = lambda x: lambda s: s.split(x) >>> f = O(fmap(strip(' '), ... split('?'), ... fmap(split(':'), ... strip(' '), ... tuple), ... tuple, ... dict), ... dict_foldl) >>> f(d) {'foo00': 'bar00', 'foo01': 'bar01', 'foo10': 'bar10', 'foo11': 'bar11', 'foo20': 'bar10', 'foo21': 'bar21'} The combination of `compose`, `fmap`, and `Fmap` can be amazingly powerful for doing lots of work in a neat way while keeping the focus on the pipeline itself and not the individual values passing through. The other thing is that this opens the door to a full "algebra" of maps which is kind of insane: def mapeach(*fns): def mapeach_(*xs): return list(map(lambda fn, *x: fn(*x), fns, *xs)) return mapeach_ def product_map(fns): return lambda xs: list(map(lambda x: map(lambda fn: fn(x), fns), xs)) def smap(*fns): "star map" return lambda xs: list(map(O(*fns),*xs)) def pmap(*fns): return lambda *xs: list(map(lambda *x:list(map(lambda fn:fn(*x),fns)),*xs)) def matrix_map(*_fns): def matrix_map_(*_xs): return list(map(lambda fns, xs: list(map(lambda fn, x: fmap(fn)(x), fns, xs)), _fns, _xs)) return matrix_map_ def mapcat(*fn): "clojure-inspired?" return compose(fmap(*fn), freduce(list.__add__)) def filtercat(*fn): return compose(ffilter(*fn), freduce(list.__add__)) I rarely use any of these of these. They grew out of an attempt to tease out some hidden structure behind the combination of `map` and star packing/unpacking. I do think there's something there but the names get in the way--it would be better to find a way to define a function that takes a specification of the structures of functions and values and knows what to do, e.g. something like >>> from types import FunctionType >>> fn = FunctionType >>> # then the desired/imaginary version of map... >>> _map(fn, [int])(add(1))(range(5)) # sort of like `fmap` [1,2,3,4,5] >>> _map([fn], [int])((add(x) for x in range(5)))(range(5)) # sort of like `mapeach` [0,2,4,6,8] >>> _map([[fn]], [[int]])(((add(x) for x in range(5))*10))((list(range(5)))*10) # sort of like `matrix_map` [[[0, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7], [4, 5, 6, 7, 8], [0, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7], [4, 5, 6, 7, 8]]] In most cases the first argument would just be `fn`, but it would be *really* nice to be able to do something like >>> map(fn, [[int], [[int],[[str],[str]]]]) where all you need to do is give the schema and indicate which values to apply the function to. Giving the type would be an added measure, but passing `type` in the schema for unknowns should work just as well. ________________________________________ From: Python-ideas on behalf of Steven D'Aprano Sent: Saturday, May 09, 2015 11:20 PM To: python-ideas at python.org Subject: Re: [Python-ideas] Function composition (was no subject) On Sun, May 10, 2015 at 03:51:38AM +0300, Koos Zevenhoven wrote: > Another way to deal with elementwise operations on iterables would be to > make a small, mostly backwards compatible change in map: > > When map is called with just one argument, for instance map(square), it > would return a function that takes iterables and maps them element-wise. > > Now it would be easier to use map in pipelines, for example: > > rms = sqrt @ mean @ map(square) Or just use a tiny helper function: def vectorise(func): return partial(map, func) rms = sqrt @ mean @ vectorise(square) -- Steve _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Sun May 10 07:24:00 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 9 May 2015 22:24:00 -0700 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <91A6985C-A94B-4132-99B1-0305933950B5@yahoo.com> Message-ID: On May 9, 2015, at 20:08, Ron Adam wrote: > >> On 05/09/2015 06:45 PM, Andrew Barnert via Python-ideas wrote: >>> On May 9, 2015, at 08:38, Ron Adam wrote: >>> > >>> > >>> > >>> >On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote: >>>>>> >>> >I suppose you could write (root @ mean @ (map square)) (xs), >>> > >>>> >>Actually, you can't. You could write (root @ mean @ partial(map, >>>> >>square))(xs), but that's pretty clearly less readable than >>>> >>root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's >>>> >>been my main argument: Without a full suite of higher-level operators >>>> >>and related syntax, compose alone doesn't do you any good except for toy >>>> >>examples. >>> > >>> >How about an operator for partial? >>> > >>> > root @ mean @ map $ square(xs) > >> I'm pretty sure that anyone who sees that and doesn't interpret it as >> meaningless nonsense is going to interpret it as a variation on Haskell and >> get the wrong intuition. > > Yes, I agree that is the problems with it. > >> But, more importantly, this doesn't work. Your square(xs) isn't going >> to evaluate to a function, but to a whatever falling square on xs returns. >> (Which is presumably a TypeError, or you wouldn't be looking to map in the >> first place). And, even if that did work, you're not actually composing a >> function here anyway; your @ is just a call operator, which we already have >> in Python, spelled with parens. > > This is following the patterns being discussed in the thread. (or at least an attempt to do so.) > > The @ and $ above would bind more tightly than the (). Like the doc "." does for method calls. @ can't bind more tightly than (). The operator already exists (that's the whole reason people are suggesting it for compose), and it has the same precedence as *. And even if you could change that, you wouldn't want to. Just as 2 * f(a) calls f on a and then multiplies by 2, b @ f(a) will call f on a and then matrix-multiply it by b; it would be very confusing if it matrix-multiplied b and f and then called the result on a. I think I know what you're going for here. Half the reason Haskell has an apply operator even though adjacency already means apply is so it can have different precedence from adjacency. And if you don't like that, you can define your own infix operator with a different string of symbols and a different precedence or even associativity but the same body. That allows you to play all kinds of neat tricks like what you're trying to, where you can write almost anything without parentheses and it means exactly what it looks like. Of course you can just as easily write something that means something completely different from what it looks like... But you have to actually work the operators through carefully, not just wave your hands and say "something like this"; when "this" actually doesn't mean what you want it to, you need to define a new operator that does. And, while allowing users to define enough operators to eliminate all the parens and all the lambdas works great for Haskell, I don't think it's a road that Python should follow. > But the evaluation is from left to right at call time. The calling part does not need to be done at the same times the rest is done. Or at least that is what I got from the conversation. > > f = root @ mean @ map & square > result = f(xs) But that means (root @ mean @ map) & square. Assuming you intended function.__and__ to mean partial, you have to write root @ mean @ (map & square), or create a new operator that has the precedence you want. > The other examples would work the same. Exactly: they don't work, either because you've got the precedence wrong, or because you've got an explicit function call rather than something that defines or references a function, and it doesn't make sense to compose that (well, except when the explicit call is to a higher-order function that returns a function, but that wasn't true of any of the examples). >>> >Actually I'd rather reuse the binary operators. (I'd be happy if they were just methods on bytes objects BTW.) >>> > >>> > compose(root, mean, map(square, xs)) > >> Now you're not calling square(xs), but you are calling map(square, xs), >> which is going to return an iterable of squares, not a function; again, >> you're not composing a function object at all. > > Yes, this is what directly calling the functions to do the same thing would look like. Except without returning a composed function. I don't understand what you mean. The same thing as what? Neither directly calling the functions, nor your proposed thing, returns a composed function (because, again, the last argument is not a function, it's an iterator returned by a function that you called directly). >> And think about how you'd actually write this correctly. You need to >> either use lambda (which defeats the entire purpose of compose), or partial >> (which works, but is clumsy and ugly enough without an operator or >> syntactic sugar that people rarely use it). > > The advantage of the syntax is that it is a "potentially" (a matter of opinion) alternative to using lambda. Not your syntax. All of your examples that do anything just call a function immediately, rather than defining a function to be called later, so they can't replace uses of lambda. For example, your compose(root, mean, map(square, xs)) doesn't define a new function anywhere, so no part of it can replace a lambda. The earlier examples actually do attempt to replace uses of lambda. Stephen's compose(root, mean, map square) returns a function. The problem with his suggestion is that map square isn't valid Python syntax--and if it were, that new syntax would be the thing that replaces a need for lambda, not the compose function. Which is obvious if you look at how you'd write that in valid Python syntax: compose(root, mean, lambda xs: map(square, xs)). I've used the compose(...) form instead of the @ operator form, but the result is exactly the same either way. > And apparently there are a few here who think doing it with lambda's or other means is less than ideal. I agree with them--but I don't think adding compose to Python, either as a stdlib function or as an operator--actually solves that problem. If we had auto-curried functions and adjacency as apply and a suite of HOFs like flip and custom infix operators and operator sectioning and so on, then the lack of compose would be a problem that forced people to write unnecessary lambda expressions (although still not a huge problem, since it's so trivial to write). But with none of those things, adding compose doesn't actually help you avoid lambdas, except in a few contrived cases. (And maybe in NumPy-like array processing.) > Personally I'm not convinced yet either. > > Cheers, > Ron > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Sun May 10 08:01:11 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 May 2015 16:01:11 +1000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com> References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com> <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com> Message-ID: <20150510060111.GH5663@ando.pearwood.info> On Sun, May 10, 2015 at 04:58:29AM +0000, Douglas La Rocca wrote: > (Newcomer here.) > > I use function composition pretty extensively. I've found it to be > incredibly powerful, but can lead to bad practices. Certain other > drawbacks are there as well, like unreadable tracebacks. But in many > cases there are real benefits. And for data pipelines where you want > to avoid state and mutation it works well. Thanks for the well-thought out and very detailed post! The concrete experience you bring to this discussion is a welcome change from all the theoretical "wouldn't it be nice (or awful) if ..." from many of us, and I include myself. The fact that you have extensive experience with using function composition in practice, and can point out the benefits and disadvantages, is great. -- Steve From larocca at abiresearch.com Sun May 10 08:06:37 2015 From: larocca at abiresearch.com (Douglas La Rocca) Date: Sun, 10 May 2015 06:06:37 +0000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <20150510060111.GH5663@ando.pearwood.info> References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com> <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com>, <20150510060111.GH5663@ando.pearwood.info> Message-ID: <0af83e9ead994e639aa41a2f5e678a61@swordfish.abiresearch.com> Thanks! Not sure what took me so long to get on the python lists, but I finally did and to my excitement you were talking about my favorite topic! --- For replacing the need to write `lambda x: x...` inside compositions *in a limited set of cases*, you could use a sort of "doppelganger" type/metaclass: class ThisType(type): def __getattr__(cls, attr): def attribute(*args, **kwargs): def method(this): this_attr = getattr(this, attr) if callable(this_attr): return this_attr(*args, **kwargs) else: return this_attr return method return attribute def __call__(cls, *args, **kwargs): def decorator(fn): return fn(*args, **kwargs) return decorator def __getitem__(cls, item): return lambda x: x[item] class this(metaclass=ThisType): pass Basically, it records whatever is done to it, then returns a function that takes a value and does those things to the value. So any call, __getattr__ with arguments, and __getitem__ you'd want to do with a value mid-pipe would be staged or set up by doing them to `this`. So rather than writing >>> compose(lambda s: s.strip('<>'), lambda s: s.lower())('') you can write >>> compose(this.strip('<>'), this.lower())('') 'html' or >>> compose(float, this.__str__)('1') '1.0' But there are two caveats: Property attributes would need to be *called*, which feels weird when you already know an API well, so e.g. >>> from lxml import html >>> html.fromstring('bold text').text 'bold text' >>> compose(html.fromstring, this.text())('bold text') 'bold text' It's also a bit weird because attributes that return functions/methods/callables *aren't* called (like above with `this.__str__`: `__str__` is a method of `float`). Second caveat is that nothing past the __getitem__ and __getattr__ will work, so e.g. >>> from pandas import DataFrame >>> df = DataFrame([1]*2, columns=['A','B']) A B 0 1 1 1 1 1 >>> compose(this.applymap(str), this['A'])(df) 0 1 1 1 Name: A, dtype: object >>> compose(this.applymap(str), this['A'], this.shape())(df) (2,) ...but... >>> compose(this.applymap(str), this['A'].shape)(df) AttributeError: 'function' object has no attribute 'shape' ________________________________________ From: Python-ideas on behalf of Steven D'Aprano Sent: Sunday, May 10, 2015 2:01 AM To: python-ideas at python.org Subject: Re: [Python-ideas] Function composition (was no subject) On Sun, May 10, 2015 at 04:58:29AM +0000, Douglas La Rocca wrote: > (Newcomer here.) > > I use function composition pretty extensively. I've found it to be > incredibly powerful, but can lead to bad practices. Certain other > drawbacks are there as well, like unreadable tracebacks. But in many > cases there are real benefits. And for data pipelines where you want > to avoid state and mutation it works well. Thanks for the well-thought out and very detailed post! The concrete experience you bring to this discussion is a welcome change from all the theoretical "wouldn't it be nice (or awful) if ..." from many of us, and I include myself. The fact that you have extensive experience with using function composition in practice, and can point out the benefits and disadvantages, is great. -- Steve _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ From levkivskyi at gmail.com Sun May 10 09:13:51 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Sun, 10 May 2015 09:13:51 +0200 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <84C6058F-8015-4703-979D-59CD9780F93A@yahoo.com> References: <84C6058F-8015-4703-979D-59CD9780F93A@yahoo.com> Message-ID: On 10 May 2015 at 02:05, Andrew Barnert wrote: > On May 9, 2015, at 16:28, Ivan Levkivskyi wrote: > > I was thinking about recent ideas discussed here. I also returned back to > origins of my initial idea. The point is that it came from Numpy, I use > Numpy arrays everyday, and typically I do exactly something like > root(mean(square(data))). > > Now I am thinking: what is actually a matrix? It is something that takes a > vector and returns a vector. But on the other hand the same actually do > elementwise functions. It does not really matter, what we do with a vector: > transform by a product of matrices or by composition of functions. In other > words I agree with Andrew that "elementwise" is a good match with compose, > and what we really need is to "pipe" things that take a vector (or just an > iterable) and return a vector (iterable). > > So that probably a good place (in a potential future) for compose would be > not functools but itertools. But indeed a good place to test this would be > Numpy. > > > Itertools is an interesting idea. > > Anyway, assuming NumPy isn't going to add this in the near future (has > anyone even brought it up on the NumPy list, or only here?), it wouldn't be > that hard to write a (maybe inefficient but working) @composable wrapper > and wrap all the relevant callables from NumPy or from itertools, upload it > to PyPI, and let people start coming up with good examples. If it's later > worth direct support in NumPy and/or Python (for simplicity or > performance), the module will still be useful for backward compatibility. > > This is a good step-by-step approach. This is what I would try. > An additional comment: it is indeed good to have both @ and | for compose > and rcompose. > Side note, one can actually overload __rmatmul__ on arrays as well so that > you can write > > root @ mean @ square @ data > > > But this doesn't need to overload it on arrays, only on the utuncs, right? > > Unless you're suggesting that one of these operations could be a matrix as > easily as a function, and NumPy users often won't have to care which it is? > > Exactly, this is what I want. Note that in such approach you have no parentheses at all. > > Moreover, one can overload __or__ on arrays, so that one can write > > data | square | mean | root > > even with ordinary functions (not Numpy's ufuncs or composable) . > > > That's an interesting point. But I think this will be a bit confusing, > because now it _does_ matter whether square is a matrix or a > function--you'll get elementwise bitwise or instead of application. (And > really, this is the whole reason for @ in the first place--we needed an > operator that never means elementwise.) > > Also, this doesn't let you actually compose functions--if you want square > | mean | root to be a function, square has to have a __or__ operator. > > This is true. The | is more limited because of its current semantics. The fact that | operator already has a widely used semantics is also why I would choose @ if I would need to choose only one: @ or | > These examples are actually "flat is better than nested" in the extreme > form. > > Anyway, they (Numpy) are going to implement the @ operator for arrays, may > be it would be a good idea to check that if something on the left from me > (array) is not an array but a callable then apply it elementwise. > > Concerning the multi-argument functions, I don't like $ symbol, don't know > why. It seems really unintuitive why it means partial application. > One can autocurry composable functions and apply same rules that Numpy > uses for ufuncs. > More precisely, if I write > > add(data1, data2) > > with arrays it applies add pairwise. But if I write > > add(data1, 42) > > it is also fine, it simply adds 42 to every element. With autocurrying one > could write > > root @ mean @ add(data) @ square @ data2 > > or > > root @ mean @ square @ add(42) @ data > > However, as I see it now it is not very readable, so that may be the best > choise is to reserve @ and | for "piping" iterables through transformers > that take one argument. In other words it should be left to user to make > add(42) of an appropriate type. It is the same logic as for decorators, if > I write > > @modify(arg) > def func(x): > return None > > I must care that modify(arg) evaluates to something that takes one > callable and returns a callable. > > > On May 9, 2015, at 01:36, Stephen J. Turnbull wrote: >> > >> > Andrew Barnert writes: >> >>> On May 8, 2015, at 19:58, Stephen J. Turnbull >> wrote: >> >>> >> >>> Koos Zevenhoven writes: >> >>> >> >>>> As a random example, (root @ mean @ square)(x) would produce the >> right >> >>>> order for rms when using [2]. >> >>> >> >>> Hardly interesting. :-) The result is an exception, as root and >> square >> >>> are conceptually scalar-to-scalar, while mean is sequence-to-scalar. >> >> >> >> Unless you're using an elementwise square and an array-to-scalar >> >> mean, like the ones in NumPy, >> > >> > Erm, why would square be elementwise and root not? I would suppose >> > that everything is element-wise in Numpy (not a user yet). >> >> Most functions in NumPy are elementwise when applied to arrays, but can >> also be applied to scalars. So, square is elementwise because it's called >> on an array, root is scalar because it's called on a scalar. (In fact, root >> could also be elementwise--aggregating functions like mean can be applied >> across just one axis of a 2D or higher array, reducing it by one dimension, >> if you want.) >> >> Before you try it, this sounds like a complicated nightmare that can't >> possibly work in practice. But play with it for just a few minutes and it's >> completely natural. (Except for a few cases where you want some array-wide >> but not element-wise operation, most famously matrix multiplication, which >> is why we now have the @ operator to play with.) >> >> >> in which case it works perfectly well... >> > >> > But that's an aspect of my point (evidently, obscure). Conceptually, >> > as taught in junior high school or so, root and square are scalar-to- >> > scalar. If you are working in a context such as Numpy where it makes >> > sense to assume they are element-wise and thus composable, the context >> > should provide the compose operator(s). >> >> I was actually thinking on these lines: what if @ didn't work on >> types.FunctionType, but did work on numpy.ufunc (the name for the >> "universal function" type that knows how to broadcast across arrays but >> also work on scalars)? That's something NumPy could implement without any >> help from the core language. (Methods are a minor problem here, but it's >> obvious how to solve them, so I won't get into it.) And if it turned out to >> be useful all over the place in NumPy, that might turn up some great uses >> for the idiomatic non-NumPy Python, or it might show that, like elementwise >> addition, it's really more a part of NumPy than of Python. >> >> But of course that's more of a proposal for NumPy than for Python. >> >> > Without that context, Koos's >> > example looks like a TypeError. >> >> >> But Koos's example, even if it was possibly inadvertent, shows that >> >> I may be wrong about that. Maybe compose together with element-wise >> >> operators actually _is_ sufficient for something beyond toy >> >> examples. >> > >> > Of course it is! I didn't really think there was any doubt >> > about that. >> >> I think there was, and still is. People keep coming up with abstract toy >> examples, but as soon as someone tries to give a good real example, it only >> makes sense with NumPy (Koos's) or with some syntax that Python doesn't >> have (yours), because to write them with actual Python functions would >> actually be ugly and verbose (my version of yours). >> >> I don't think that's a coincidence. You didn't write "map square" because >> you don't know how to think in Python, but because using compose profitably >> inherently implies not thinking in Python. (Except, maybe, in the case of >> NumPy... which is a different idiom.) Maybe someone has a bunch of obvious >> good use cases for compose that don't also require other functions, >> operators, or syntax we don't have, but so far, nobody's mentioned one. >> >> ------------------------------ >> >> On 5/9/2015 6:19 AM, Andrew Barnert via Python-ideas wrote: >> >> > I think there was, and still is. People keep coming up with abstract >> toy examples, but as soon as someone tries to give a good real example, it >> only makes sense with NumPy (Koos's) or with some syntax that Python >> doesn't have (yours), because to write them with actual Python functions >> would actually be ugly and verbose (my version of yours). >> > >> > I don't think that's a coincidence. You didn't write "map square" >> because you don't know how to think in Python, but because using compose >> profitably inherently implies not thinking in Python. (Except, maybe, in >> the case of NumPy... which is a different idiom.) Maybe someone has a bunch >> of obvious good use cases for compose that don't also require other >> functions, operators, or syntax we don't have, but so far, nobody's >> mentioned one. >> >> I agree that @ is most likely to be usefull in numpy's restricted context. >> >> A composition operator is usually defined by application: f at g(x) is >> defined as f(g(x)). (I sure there are also axiomatic treatments.) It >> is an optional syntactic abbreviation. It is most useful in a context >> where there is one set of data objects, such as the real numbers, or one >> set + arrays (vectors) defined on the one set; where all function are >> univariate (or possible multivariate, but that can can be transformed to >> univariate on vectors); *and* where parameter names are dummies like >> 'x', 'y', 'z', or '_'. >> >> The last point is important. Abbreviating h(x) = f(g(x)) with h = f @ g >> does not lose any information as 'x' is basically a placeholder (so get >> rid of it). But parameter names are important in most practical >> contexts, both for understanding a composition and for using it. >> >> dev npv(transfers, discount): >> '''Return the net present value of discounted transfers. >> >> transfers: finite iterable of amounts at constant intervals >> discount: fraction per interval >> ''' >> divisor = 1 + discount >> return sum(tranfer/divisor**time >> for time, transfer in enumerate(transfers)) >> >> Even if one could replace the def statement with >> npv = >> with parameter names omitted, it would be harder to understand. Using >> it would require the ability to infer argument types and order from the >> composed expression. >> >> I intentionally added a statement to calculate the common subexpression >> prior to the return. I believe it would have to put back in the return >> expression before converting. >> >> -- >> Terry Jan Reedy >> >> >> >> ------------------------------ >> >> On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote: >> >> >I suppose you could write (root @ mean @ (map square)) (xs), >> >> > Actually, you can't. You could write (root @ mean @ partial(map, >> > square))(xs), but that's pretty clearly less readable than >> > root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's >> > been my main argument: Without a full suite of higher-level operators >> > and related syntax, compose alone doesn't do you any good except for toy >> > examples. >> >> How about an operator for partial? >> >> root @ mean @ map $ square(xs) >> >> >> Actually I'd rather reuse the binary operators. (I'd be happy if they >> were >> just methods on bytes objects BTW.) >> >> compose(root, mean, map(square, xs)) >> >> root ^ mean ^ map & square (xs) >> >> root ^ mean ^ map & square ^ xs () >> >> Read this as... >> >> compose root, of mean, of map with square, of xs >> >> Or... >> >> apply(map(square, xs), mean, root) >> >> map & square | mean | root (xs) >> >> xs | map & square | mean | root () >> >> >> Read this as... >> >> apply xs, to map with square, to mean, to root >> >> >> These are kind of cool, but does it make python code easier to read? That >> seems like it may be subjective depending on the amount of programming >> experience someone has. >> >> Cheers, >> Ron >> >> >> >> ------------------------------ >> >> Hi, >> I had to answer some of these questions when I wrote Lawvere: >> https://pypi.python.org/pypi/lawvere >> >> First, there is two kind of composition: pipe and circle so I think a >> single operator like @ is a bit restrictive. >> I like "->" and "<-" >> >> Then, for function name and function to string I had to introduce function >> signature (a tuple). >> It provides a good tool for decomposition, introspection and comparison in >> respect with mathematic definition. >> >> Finally, for me composition make sense when you have typed functions >> otherwise it can easily become a mess and this make composition tied to >> multiple dispatch. >> >> I really hope composition will be introduced in python but I can't see how >> it be made without rethinking a good part of function definition. >> >> >> >> 2015-05-09 17:38 GMT+02:00 Ron Adam : >> >> > >> > >> > On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote: >> > >> >> >I suppose you could write (root @ mean @ (map square)) (xs), >> >>> >> >> >> > Actually, you can't. You could write (root @ mean @ partial(map, >> >> square))(xs), but that's pretty clearly less readable than >> >> root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's >> >> been my main argument: Without a full suite of higher-level operators >> >> and related syntax, compose alone doesn't do you any good except for >> toy >> >> examples. >> >> >> > >> > How about an operator for partial? >> > >> > root @ mean @ map $ square(xs) >> > >> > >> > Actually I'd rather reuse the binary operators. (I'd be happy if they >> > were just methods on bytes objects BTW.) >> > >> > compose(root, mean, map(square, xs)) >> > >> > root ^ mean ^ map & square (xs) >> > >> > root ^ mean ^ map & square ^ xs () >> > >> > Read this as... >> > >> > compose root, of mean, of map with square, of xs >> > >> > Or... >> > >> > apply(map(square, xs), mean, root) >> > >> > map & square | mean | root (xs) >> > >> > xs | map & square | mean | root () >> > >> > >> > Read this as... >> > >> > apply xs, to map with square, to mean, to root >> > >> > >> > These are kind of cool, but does it make python code easier to read? >> That >> > seems like it may be subjective depending on the amount of programming >> > experience someone has. >> > >> > Cheers, >> > Ron >> > >> > >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun May 10 10:18:02 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 10 May 2015 01:18:02 -0700 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <84C6058F-8015-4703-979D-59CD9780F93A@yahoo.com> Message-ID: <1DD4C041-7C97-4F4B-8240-9C31A88F55BD@yahoo.com> On May 10, 2015, at 00:13, Ivan Levkivskyi wrote: > >> On 10 May 2015 at 02:05, Andrew Barnert wrote: >>> On May 9, 2015, at 16:28, Ivan Levkivskyi wrote: >>> >>> I was thinking about recent ideas discussed here. I also returned back to origins of my initial idea. The point is that it came from Numpy, I use Numpy arrays everyday, and typically I do exactly something like root(mean(square(data))). >>> >>> Now I am thinking: what is actually a matrix? It is something that takes a vector and returns a vector. But on the other hand the same actually do elementwise functions. It does not really matter, what we do with a vector: transform by a product of matrices or by composition of functions. In other words I agree with Andrew that "elementwise" is a good match with compose, and what we really need is to "pipe" things that take a vector (or just an iterable) and return a vector (iterable). >>> >>> So that probably a good place (in a potential future) for compose would be not functools but itertools. But indeed a good place to test this would be Numpy. >> >> Itertools is an interesting idea. >> >> Anyway, assuming NumPy isn't going to add this in the near future (has anyone even brought it up on the NumPy list, or only here?), it wouldn't be that hard to write a (maybe inefficient but working) @composable wrapper and wrap all the relevant callables from NumPy or from itertools, upload it to PyPI, and let people start coming up with good examples. If it's later worth direct support in NumPy and/or Python (for simplicity or performance), the module will still be useful for backward compatibility. > > This is a good step-by-step approach. This is what I would try. > >>> An additional comment: it is indeed good to have both @ and | for compose and rcompose. >>> Side note, one can actually overload __rmatmul__ on arrays as well so that you can write >>> >>> root @ mean @ square @ data >> >> But this doesn't need to overload it on arrays, only on the utuncs, right? >> >> Unless you're suggesting that one of these operations could be a matrix as easily as a function, and NumPy users often won't have to care which it is? > > Exactly, this is what I want. Note that in such approach you have no parentheses at all. It's worth working up some practical examples here. Annoyingly, I actually had a perfect example a few years ago, but I can't find it. I'm sure you can imagine what it was. We built-in vector transforms implemented as functions, and a way for a user to input new transforms as matrices, and a way for the user to chain built-in and user-defined transforms. Under the covers, we had to wrap each user transform in a function just so they'd all be callables, which led to a couple of annoying debugging sessions and probably a performance hit. If we could compose them interchangeably, that might have avoided those problems. But if I can't find the code, it's hard to say for sure, so now I'm offering the same vague, untestable use cases that I was complaining about. :) > >>> >>> Moreover, one can overload __or__ on arrays, so that one can write >>> >>> data | square | mean | root >>> >>> even with ordinary functions (not Numpy's ufuncs or composable) . >> >> That's an interesting point. But I think this will be a bit confusing, because now it _does_ matter whether square is a matrix or a function--you'll get elementwise bitwise or instead of application. (And really, this is the whole reason for @ in the first place--we needed an operator that never means elementwise.) >> >> Also, this doesn't let you actually compose functions--if you want square | mean | root to be a function, square has to have a __or__ operator. > > This is true. The | is more limited because of its current semantics. The fact that | operator already has a widely used semantics is also why I would choose @ if I would need to choose only one: @ or | > >>> These examples are actually "flat is better than nested" in the extreme form. >>> >>> Anyway, they (Numpy) are going to implement the @ operator for arrays, may be it would be a good idea to check that if something on the left from me (array) is not an array but a callable then apply it elementwise. >>> >>> Concerning the multi-argument functions, I don't like $ symbol, don't know why. It seems really unintuitive why it means partial application. >>> One can autocurry composable functions and apply same rules that Numpy uses for ufuncs. >>> More precisely, if I write >>> >>> add(data1, data2) >>> >>> with arrays it applies add pairwise. But if I write >>> >>> add(data1, 42) >>> >>> it is also fine, it simply adds 42 to every element. With autocurrying one could write >>> >>> root @ mean @ add(data) @ square @ data2 >>> >>> or >>> >>> root @ mean @ square @ add(42) @ data >>> >>> However, as I see it now it is not very readable, so that may be the best choise is to reserve @ and | for "piping" iterables through transformers that take one argument. In other words it should be left to user to make add(42) of an appropriate type. It is the same logic as for decorators, if I write >>> >>> @modify(arg) >>> def func(x): >>> return None >>> >>> I must care that modify(arg) evaluates to something that takes one callable and returns a callable. >>> >>> >>>> On May 9, 2015, at 01:36, Stephen J. Turnbull wrote: >>>> > >>>> > Andrew Barnert writes: >>>> >>> On May 8, 2015, at 19:58, Stephen J. Turnbull wrote: >>>> >>> >>>> >>> Koos Zevenhoven writes: >>>> >>> >>>> >>>> As a random example, (root @ mean @ square)(x) would produce the right >>>> >>>> order for rms when using [2]. >>>> >>> >>>> >>> Hardly interesting. :-) The result is an exception, as root and square >>>> >>> are conceptually scalar-to-scalar, while mean is sequence-to-scalar. >>>> >> >>>> >> Unless you're using an elementwise square and an array-to-scalar >>>> >> mean, like the ones in NumPy, >>>> > >>>> > Erm, why would square be elementwise and root not? I would suppose >>>> > that everything is element-wise in Numpy (not a user yet). >>>> >>>> Most functions in NumPy are elementwise when applied to arrays, but can also be applied to scalars. So, square is elementwise because it's called on an array, root is scalar because it's called on a scalar. (In fact, root could also be elementwise--aggregating functions like mean can be applied across just one axis of a 2D or higher array, reducing it by one dimension, if you want.) >>>> >>>> Before you try it, this sounds like a complicated nightmare that can't possibly work in practice. But play with it for just a few minutes and it's completely natural. (Except for a few cases where you want some array-wide but not element-wise operation, most famously matrix multiplication, which is why we now have the @ operator to play with.) >>>> >>>> >> in which case it works perfectly well... >>>> > >>>> > But that's an aspect of my point (evidently, obscure). Conceptually, >>>> > as taught in junior high school or so, root and square are scalar-to- >>>> > scalar. If you are working in a context such as Numpy where it makes >>>> > sense to assume they are element-wise and thus composable, the context >>>> > should provide the compose operator(s). >>>> >>>> I was actually thinking on these lines: what if @ didn't work on types.FunctionType, but did work on numpy.ufunc (the name for the "universal function" type that knows how to broadcast across arrays but also work on scalars)? That's something NumPy could implement without any help from the core language. (Methods are a minor problem here, but it's obvious how to solve them, so I won't get into it.) And if it turned out to be useful all over the place in NumPy, that might turn up some great uses for the idiomatic non-NumPy Python, or it might show that, like elementwise addition, it's really more a part of NumPy than of Python. >>>> >>>> But of course that's more of a proposal for NumPy than for Python. >>>> >>>> > Without that context, Koos's >>>> > example looks like a TypeError. >>>> >>>> >> But Koos's example, even if it was possibly inadvertent, shows that >>>> >> I may be wrong about that. Maybe compose together with element-wise >>>> >> operators actually _is_ sufficient for something beyond toy >>>> >> examples. >>>> > >>>> > Of course it is! I didn't really think there was any doubt >>>> > about that. >>>> >>>> I think there was, and still is. People keep coming up with abstract toy examples, but as soon as someone tries to give a good real example, it only makes sense with NumPy (Koos's) or with some syntax that Python doesn't have (yours), because to write them with actual Python functions would actually be ugly and verbose (my version of yours). >>>> >>>> I don't think that's a coincidence. You didn't write "map square" because you don't know how to think in Python, but because using compose profitably inherently implies not thinking in Python. (Except, maybe, in the case of NumPy... which is a different idiom.) Maybe someone has a bunch of obvious good use cases for compose that don't also require other functions, operators, or syntax we don't have, but so far, nobody's mentioned one. >>>> >>>> ------------------------------ >>>> >>>> On 5/9/2015 6:19 AM, Andrew Barnert via Python-ideas wrote: >>>> >>>> > I think there was, and still is. People keep coming up with abstract toy examples, but as soon as someone tries to give a good real example, it only makes sense with NumPy (Koos's) or with some syntax that Python doesn't have (yours), because to write them with actual Python functions would actually be ugly and verbose (my version of yours). >>>> > >>>> > I don't think that's a coincidence. You didn't write "map square" because you don't know how to think in Python, but because using compose profitably inherently implies not thinking in Python. (Except, maybe, in the case of NumPy... which is a different idiom.) Maybe someone has a bunch of obvious good use cases for compose that don't also require other functions, operators, or syntax we don't have, but so far, nobody's mentioned one. >>>> >>>> I agree that @ is most likely to be usefull in numpy's restricted context. >>>> >>>> A composition operator is usually defined by application: f at g(x) is >>>> defined as f(g(x)). (I sure there are also axiomatic treatments.) It >>>> is an optional syntactic abbreviation. It is most useful in a context >>>> where there is one set of data objects, such as the real numbers, or one >>>> set + arrays (vectors) defined on the one set; where all function are >>>> univariate (or possible multivariate, but that can can be transformed to >>>> univariate on vectors); *and* where parameter names are dummies like >>>> 'x', 'y', 'z', or '_'. >>>> >>>> The last point is important. Abbreviating h(x) = f(g(x)) with h = f @ g >>>> does not lose any information as 'x' is basically a placeholder (so get >>>> rid of it). But parameter names are important in most practical >>>> contexts, both for understanding a composition and for using it. >>>> >>>> dev npv(transfers, discount): >>>> '''Return the net present value of discounted transfers. >>>> >>>> transfers: finite iterable of amounts at constant intervals >>>> discount: fraction per interval >>>> ''' >>>> divisor = 1 + discount >>>> return sum(tranfer/divisor**time >>>> for time, transfer in enumerate(transfers)) >>>> >>>> Even if one could replace the def statement with >>>> npv = >>>> with parameter names omitted, it would be harder to understand. Using >>>> it would require the ability to infer argument types and order from the >>>> composed expression. >>>> >>>> I intentionally added a statement to calculate the common subexpression >>>> prior to the return. I believe it would have to put back in the return >>>> expression before converting. >>>> >>>> -- >>>> Terry Jan Reedy >>>> >>>> >>>> >>>> ------------------------------ >>>> >>>> On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote: >>>> >> >I suppose you could write (root @ mean @ (map square)) (xs), >>>> >>>> > Actually, you can't. You could write (root @ mean @ partial(map, >>>> > square))(xs), but that's pretty clearly less readable than >>>> > root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's >>>> > been my main argument: Without a full suite of higher-level operators >>>> > and related syntax, compose alone doesn't do you any good except for toy >>>> > examples. >>>> >>>> How about an operator for partial? >>>> >>>> root @ mean @ map $ square(xs) >>>> >>>> >>>> Actually I'd rather reuse the binary operators. (I'd be happy if they were >>>> just methods on bytes objects BTW.) >>>> >>>> compose(root, mean, map(square, xs)) >>>> >>>> root ^ mean ^ map & square (xs) >>>> >>>> root ^ mean ^ map & square ^ xs () >>>> >>>> Read this as... >>>> >>>> compose root, of mean, of map with square, of xs >>>> >>>> Or... >>>> >>>> apply(map(square, xs), mean, root) >>>> >>>> map & square | mean | root (xs) >>>> >>>> xs | map & square | mean | root () >>>> >>>> >>>> Read this as... >>>> >>>> apply xs, to map with square, to mean, to root >>>> >>>> >>>> These are kind of cool, but does it make python code easier to read? That >>>> seems like it may be subjective depending on the amount of programming >>>> experience someone has. >>>> >>>> Cheers, >>>> Ron >>>> >>>> >>>> >>>> ------------------------------ >>>> >>>> Hi, >>>> I had to answer some of these questions when I wrote Lawvere: >>>> https://pypi.python.org/pypi/lawvere >>>> >>>> First, there is two kind of composition: pipe and circle so I think a >>>> single operator like @ is a bit restrictive. >>>> I like "->" and "<-" >>>> >>>> Then, for function name and function to string I had to introduce function >>>> signature (a tuple). >>>> It provides a good tool for decomposition, introspection and comparison in >>>> respect with mathematic definition. >>>> >>>> Finally, for me composition make sense when you have typed functions >>>> otherwise it can easily become a mess and this make composition tied to >>>> multiple dispatch. >>>> >>>> I really hope composition will be introduced in python but I can't see how >>>> it be made without rethinking a good part of function definition. >>>> >>>> >>>> >>>> 2015-05-09 17:38 GMT+02:00 Ron Adam : >>>> >>>> > >>>> > >>>> > On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote: >>>> > >>>> >> >I suppose you could write (root @ mean @ (map square)) (xs), >>>> >>> >>>> >> >>>> > Actually, you can't. You could write (root @ mean @ partial(map, >>>> >> square))(xs), but that's pretty clearly less readable than >>>> >> root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's >>>> >> been my main argument: Without a full suite of higher-level operators >>>> >> and related syntax, compose alone doesn't do you any good except for toy >>>> >> examples. >>>> >> >>>> > >>>> > How about an operator for partial? >>>> > >>>> > root @ mean @ map $ square(xs) >>>> > >>>> > >>>> > Actually I'd rather reuse the binary operators. (I'd be happy if they >>>> > were just methods on bytes objects BTW.) >>>> > >>>> > compose(root, mean, map(square, xs)) >>>> > >>>> > root ^ mean ^ map & square (xs) >>>> > >>>> > root ^ mean ^ map & square ^ xs () >>>> > >>>> > Read this as... >>>> > >>>> > compose root, of mean, of map with square, of xs >>>> > >>>> > Or... >>>> > >>>> > apply(map(square, xs), mean, root) >>>> > >>>> > map & square | mean | root (xs) >>>> > >>>> > xs | map & square | mean | root () >>>> > >>>> > >>>> > Read this as... >>>> > >>>> > apply xs, to map with square, to mean, to root >>>> > >>>> > >>>> > These are kind of cool, but does it make python code easier to read? That >>>> > seems like it may be subjective depending on the amount of programming >>>> > experience someone has. >>>> > >>>> > Cheers, >>>> > Ron >>>> > >>>> > >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun May 10 10:54:48 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 10 May 2015 01:54:48 -0700 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com> References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com> <554EAB9A.2090501@aalto.fi> <20150510032016.GF5663@ando.pearwood.info> <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com> Message-ID: On May 9, 2015, at 21:58, Douglas La Rocca wrote: > > (Newcomer here.) > > I use function composition pretty extensively. I've found it to be incredibly powerful, but can lead to bad practices. Certain other drawbacks are there as well, like unreadable tracebacks. But in many cases there are real benefits. And for data pipelines where you want to avoid state and mutation it works well. > > The fn and pymonad modules implement infix composition functions through overloading but I've found this to be unworkable. > > For me, the ideal infix operator would simply be a space, with the composition wrapped in parentheses. So e.g. > >>>> (list str sorted)(range(10)) > [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ',', ',', ',', ',', ',', ',', ',', ',', ',', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '[', ']'] > > I might be overlooking something, but it seems to me this would work with existing syntax and semantics and wouldn't conflict with anything else like operator overloading would. The only other place non-indentation level spaces are significant is with keywords which can't be re-assigned. So e.g. (yield from gen()) wouldn't be parsed as 3 functions, and (def func) would raise SyntaxError. > > Here's the composition function I'm working with, stripped of the little debugging helpers: > > ``` > def compose(*fns): > def compose_(*x): > fn, *fns = fns > value = fn(*x) > if fns: > return compose(*fns)(value) > else: > return value > return compose_ > > O=compose > ``` > > I haven't had any issues with the recursion. The `O` alias rubs me the wrong way but seemed to make sense at the time. The thought was that it should look like an operator because it acts like one. > > So the use looks like > >>>> O(fn1, fn2, fn3, ...)('string to be piped') > > The problem for composition is essentially argument passing and has to do with the convenience of *args, **kwargs. > > The way to make composition work predictably is to curry the functions yourself, wrapping the arguments you expect to get with nested closures, then repairing the __name__ etc with functools.wraps or update_wrapper in the usual way. This looks much nicer and almost natural when you write it with lambdas, e.g. > >>>> getitem = lambda item: lambda container: container[item] > > (Apologies for having named that lambda there...) I understand why you named it; I don't understand why you didn't just use def if you were going to name it (and declare it in a statement instead of the middle of an expression). Anyway, this is already in operator, as itemgetter, and it's definitely useful to functional code, especially itertools-style generator-driven functional code. And it feels like the pattern ought to be generalizable... but other than attrgetter, it's hard to think of another example where you want the same thing. After all, Python only has a couple of syntactic forms that you'd want to wrap up as functions at all, so it only has a couple of syntactic forms that you'd want to wrap up as curried functions. > The other way to manage passing values from one function to the next is to define a function like > > def star(x): > return lambda fn: fn(*x) > > Then if you get a list at one point in the pipeline and your function takes *args, you can decorate the function and call it like > >>>> star(getattr)((getattr, '__name__')) > 'getattr' > > I've run into problems using the @curried decorators from the fn and pymonad modules because they don't how to handle *args, i.e. when to stop collecting arguments and finally make the function call. > > If you want to have the composition order reversed you could decorate the definition with > > ``` > def flip(f): > def flip_(*x): > f(*reversed(x)) > return flip_ > ``` > > Once we have composition we can write partials for `map`, `filter`, and `reduce`, but with a small twist: make them variadic in the first argument and pass the arguments to compose: > > def fmap(*fn): > def fmap_(x): > return list(map(compose(*fn),x)) > return fmap_ I don't understand why this is called fmap. I see below that you're not implying anything like Haskell's fmap (which confused me...), but then what _does_ the f mean? It seems like this is just a manually curried map, that returns a list instead of an iterator, and only takes one iterable instead of one or more. None of those things say "f" to me, but maybe I'm still hung up on expecting it to mean "functor" and I'll feel like an idiot once you clear it up. :) Also, why _is_ it calling list? Do your notions of composition and currying not play well with iterators? If so, that seems like a pretty major thing to give up. And why isn't it variadic in the iterables? You can trivially change that by just having the wrapped function take and pass *x, but I assume there's some reason you didn't? > def ffilter(fn): > def ffilter_(xs): > return list(filter(fn, xs)) > return ffilter_ > > def freduce(fn): > def _freduce(xs): > return reduce(fn, xs) > return _freduce These two aren't variadic in fn like fmap was. Is that just a typo, or is there a reason not to be? > def Fmap(*fns): > def Fmap_(x): > return list(map(lambda fn:fn(x), fns)) > return Fmap_ > > The `Fmap` function seemed like some sort of "conjugate" to `fmap` so I tried to give it name suggesting this (again, at the expense of abusing naming conventions). > > Instead of mapping a function over a iterable like `fmap`, `Fmap` applies a each given function to a value. So > >>>> Fmap(add(1), sub(1))(1) > [2, 0] > > I've called them `fmap`, `ffilter`, and `freduce` but don't much like these names as they imply they might be the same as Haskell's `fmap`, and they're not. And there's no way to make them anything like Haskell as far as I can tell and they shouldn't be. If these implement a "paradigm" it's not purely functional but tacit/concatenative. > > It made sense to compose the passed arguments because there's no reason to pass anything else to `fmap` in the first call. So sequential calls to (the return value of) `fmap` inside a pipeline, like > >>>> O(mul(10), > ... fmap(add(1)), > ... fmap(mul(2)) > ... )([1]) > [4, 4, 4, 4, 4, 4, 4, 4, 4, 4] > > can instead be written like > >>>> O(mul(10), > ... fmap(add(1), > ... mul(2)) > ... )([1]) > [4, 4, 4, 4, 4, 4, 4, 4, 4, 4] > > It also makes it easier to work at different levels inside nested structures. In these heavily nested cases the composition pipeline even begins to resemble the data structure passing through, which makes sense. > > As another example, following is part of a pipeline that takes strings of bullet-separated strings of "key:value" pairs and converts each one to a dictionary, then folds the result together: > >>>> d = [' foo00 : bar00 ? foo01 : bar01 ', > ... ' foo10 : bar10 ? foo11 : bar11 ', > ... ' foo20 : bar10 ? foo21 : bar21 ',] > >>>> dict_foldl = freduce(lambda d1, d2: dict(d1, **d2)) >>>> strip = lambda x: lambda s: s.strip(x) >>>> split = lambda x: lambda s: s.split(x) > >>>> f = O(fmap(strip(' '), > ... split('?'), > ... fmap(split(':'), > ... strip(' '), > ... tuple), > ... tuple, > ... dict), > ... dict_foldl) Now that we have a concrete example... This looks like a nifty translation of what you might write in Haskell, but it doesn't look at all like Python to me. And compare: def f(d): pairs = (pair.strip(' ').split(':') for pair in d.split('?')) strippedpairs = ((part.strip(' ') for part in pair) for pair in pairs) return dict(strippedpairs) Or, even better: def f(d): pairs = (pair.strip(' ').split(':') for pair in d.split('?')) return {k.strip(' '): v.strip(' ') for k, v in pairs} Of course I skipped a lot of steps--turning the inner iterables into tuples, then into dicts, then turning the outer iterable into a list, then merging all the dicts, and of course wrapping various subsets of the process up into functions and calling them--but that's because those steps are unnecessary. We have comprehensions, we have iterators, why try to write for Python 2.2? And notice that any chain of iterator transformations like this _could_ be written as a single expression. But the fact that it doesn't _have_ to be--that you can take any step you want and name the intermediate iterable without having to change anything (and with negligible performance cost), and you can make your code vertical and play into Python indentation instead of writing it horizontally and faking indentation with paren-continuation--is what makes generator expressions and map and filter so nice. Well, that, and the fact that in a comprehension I can just write an expression and it means that expression. I don't have to wrap the expression in a function, or try to come up with a higher-order expression that will effect that first-order expression when evaluated. >>>> f(d) > {'foo00': 'bar00', > 'foo01': 'bar01', > 'foo10': 'bar10', > 'foo11': 'bar11', > 'foo20': 'bar10', > 'foo21': 'bar21'} > > The combination of `compose`, `fmap`, and `Fmap` can be amazingly powerful for doing lots of work in a neat way while keeping the focus on the pipeline itself and not the individual values passing through. But often, the individual values have useful names that make it easier to keep track of them. Like calling the keys and values k and v instead of having them be elements 0 and 1 of an implicit *args. > The other thing is that this opens the door to a full "algebra" of maps which is kind of insane: > > def mapeach(*fns): > def mapeach_(*xs): > return list(map(lambda fn, *x: fn(*x), fns, *xs)) > return mapeach_ > > def product_map(fns): > return lambda xs: list(map(lambda x: map(lambda fn: fn(x), fns), xs)) > > def smap(*fns): > "star map" > return lambda xs: list(map(O(*fns),*xs)) > > def pmap(*fns): > return lambda *xs: list(map(lambda *x:list(map(lambda fn:fn(*x),fns)),*xs)) > > def matrix_map(*_fns): > def matrix_map_(*_xs): > return list(map(lambda fns, xs: list(map(lambda fn, x: fmap(fn)(x), fns, xs)), _fns, _xs)) > return matrix_map_ > > def mapcat(*fn): > "clojure-inspired?" > return compose(fmap(*fn), freduce(list.__add__)) > > def filtercat(*fn): > return compose(ffilter(*fn), freduce(list.__add__)) > > I rarely use any of these of these. They grew out of an attempt to tease out some hidden structure behind the combination of `map` and star packing/unpacking. > > I do think there's something there but the names get in the way--it would be better to find a way to define a function that takes a specification of the structures of functions and values and knows what to do, e.g. something like > >>>> from types import FunctionType >>>> fn = FunctionType >>>> # then the desired/imaginary version of map... >>>> _map(fn, [int])(add(1))(range(5)) # sort of like `fmap` > [1,2,3,4,5] >>>> _map([fn], [int])((add(x) for x in range(5)))(range(5)) # sort of like `mapeach` > [0,2,4,6,8] >>>> _map([[fn]], [[int]])(((add(x) for x in range(5))*10))((list(range(5)))*10) # sort of like `matrix_map` > [[[0, 1, 2, 3, 4], > [1, 2, 3, 4, 5], > [2, 3, 4, 5, 6], > [3, 4, 5, 6, 7], > [4, 5, 6, 7, 8], > [0, 1, 2, 3, 4], > [1, 2, 3, 4, 5], > [2, 3, 4, 5, 6], > [3, 4, 5, 6, 7], > [4, 5, 6, 7, 8]]] > > In most cases the first argument would just be `fn`, but it would be *really* nice to be able to do something like > >>>> map(fn, [[int], [[int],[[str],[str]]]]) > > where all you need to do is give the schema and indicate which values to apply the function to. Giving the type would be an added measure, but passing `type` in the schema for unknowns should work just as well. > ________________________________________ > From: Python-ideas on behalf of Steven D'Aprano > Sent: Saturday, May 09, 2015 11:20 PM > To: python-ideas at python.org > Subject: Re: [Python-ideas] Function composition (was no subject) > >> On Sun, May 10, 2015 at 03:51:38AM +0300, Koos Zevenhoven wrote: >> >> Another way to deal with elementwise operations on iterables would be to >> make a small, mostly backwards compatible change in map: >> >> When map is called with just one argument, for instance map(square), it >> would return a function that takes iterables and maps them element-wise. >> >> Now it would be easier to use map in pipelines, for example: >> >> rms = sqrt @ mean @ map(square) > > Or just use a tiny helper function: > > def vectorise(func): > return partial(map, func) > > rms = sqrt @ mean @ vectorise(square) > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Sun May 10 11:04:21 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 May 2015 19:04:21 +1000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <1DD4C041-7C97-4F4B-8240-9C31A88F55BD@yahoo.com> References: <84C6058F-8015-4703-979D-59CD9780F93A@yahoo.com> <1DD4C041-7C97-4F4B-8240-9C31A88F55BD@yahoo.com> Message-ID: <20150510090421.GI5663@ando.pearwood.info> On Sun, May 10, 2015 at 01:18:02AM -0700, Andrew Barnert via Python-ideas wrote: [...] Not picking on Andrew specifically, but could folks please trim their replies occasionally to keep the amount of quoted text manageable? Andrew's post is about 10 pages of mostly-quoted text (depending on how you count pages, mutt claims it's 14 but I think it means screenfuls, not pages), and I'm seeing up to nine levels of quoting: > >>>> >>>> As a random example, (root @ mean @ square)(x) would produce the right > >>>> >>>> order for rms when using [2]. Thanks in advance. -- Steve From rosuav at gmail.com Sun May 10 11:25:31 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 10 May 2015 19:25:31 +1000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com> References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com> <554EAB9A.2090501@aalto.fi> <20150510032016.GF5663@ando.pearwood.info> <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com> Message-ID: On Sun, May 10, 2015 at 2:58 PM, Douglas La Rocca wrote: > (Newcomer here.) Welcome to Bikeshed Central! Here, we take a plausible idea and fiddle around with all the little detaily bits :) > For me, the ideal infix operator would simply be a space, with the composition wrapped in parentheses. So e.g. > >>>> (list str sorted)(range(10)) > [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ',', ',', ',', ',', ',', ',', ',', ',', ',', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '[', ']'] > > I might be overlooking something, but it seems to me this would work with existing syntax and semantics and wouldn't conflict with anything else like operator overloading would. > One of the problems with using mere whitespace is that it's very easy to do accidentally. There's already places where this can happen, for instance: strings = [ "String one", "String two is a bit longer" "String three", "String four" ] How many strings are there in my list? Clearly the programmer's intention is to have four, but that's not what ends up happening. (Now imagine there are actually hundreds of strings, and one gets selected at random every time you do something. Have fun figuring out why, just occasionally, it prints out two messages instead of one. For bonus points, figure that out when there are two or three such bugs in the list, so it's not always the exact same pair that come out together.) At the moment, we can safely build up a list of functions like this: funcs = [ list, str, sorted, ] because omitting a comma will produce an instant SyntaxError. Python currently is pretty good at detecting problems in source code. (Not all languages are, as you'll know as soon as you run into one of those "oops I left out a semicolon and my JavaScript function does something slightly different" bugs.) Part of that comes from having a fairly simple set of rules governing syntax, such that any deviation results in a simple and quick error *at or very near to* the place where the error occurs. You won't, for instance, get an error at the bottom of a file saying "Unmatched '{' or missing '}'", leaving you to dig through your code to figure out exactly where the problem was. At worst, you get an error on the immediately-following line of code: def func1(): value = x * (y + z # oops, forgot the close parens print(value) # boom, SyntaxError on this line But if "function function" meant composition, this would actually be legal, and you'd get an error rather further down. If you're lucky, this is the end of this function, and the "def" keyword trips the error; but otherwise, this would be validly parsed as "compose z and print into a function, then call that with value", and we're still looking for a close parens. So I would strongly suggest having some sort of operator in between. Okay. Can I just say something crazy? (Hans: I love crazy!) How about using a comma? >>> (fn1, fn2, fn3, ...)('string to be piped') Currently, this produces a runtime TypeError: 'tuple' object is not callable, but I could easily define my own callable subclass of tuple. >>> class functuple(tuple): ... def __call__(self, arg): ... for func in self: arg = func(arg) ... return arg ... >>> f = functuple((fn1,fn2)) >>> f("this is a test") (Use whatever semantics you like for handling multiple arguments. I'm not getting into that part of the debate, as I have no idea how function composition ought to work in the face of *args and **kwargs.) The syntax is reasonably clean, and it actually doesn't require many changes - just making tuples callable in some logical fashion. No new syntax needed, and it's an already-known light-weight way to pack up a bunch of things into one object. Does it make sense to do this? ChrisA From steve at pearwood.info Sun May 10 11:31:25 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 May 2015 19:31:25 +1000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <20150509181642.GB5663@ando.pearwood.info> Message-ID: <20150510093125.GJ5663@ando.pearwood.info> On Sat, May 09, 2015 at 01:30:17PM -0500, David Mertz wrote: > On Sat, May 9, 2015 at 1:16 PM, Steven D'Aprano wrote: > > > On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote: > > > > > How about an operator for partial? > > > > > > root @ mean @ map $ square(xs) > > > > I have trouble seeing the advantage of a special function composition > operator when it is easy to write a general 'compose()' function that can > produce such things easily enough. Do you have trouble seeing the advantage of a special value addition operator when it is easy enough to write a general "add()" function? *wink* I think that, mentally, operators "feel" lightweight. If I write: getattr(obj, 'method')(arg) it puts too much emphasis on the attribute access. But using an operator: obj.method(arg) put the emphasis on calling the method, not looking it up, which is just right. Even though both forms do about the same about of work, mentally, the dot pseudo-operator feels much more lightweight. The same with compose(grep, filter)(data) versus (grep @ filter)(data) The first sends my attention to the wrong place, the composition. The second does not. I don't expect everyone to agree with me, but I think this explains why people keep suggesting syntax or an operator to do function composition instead of a function. Not everyone thinks this way, but for those who do, a compose() function is like eating a great big bowl gruel that contains all the nutrients you need for the day and tastes of cardboard and smells of wet dog. It might do everything that you want functionally, but it feels wrong and looks wrong and it is not in the least bit pleasurable to use. -- Steve From steve at pearwood.info Sun May 10 11:57:30 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 May 2015 19:57:30 +1000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com> <554EAB9A.2090501@aalto.fi> <20150510032016.GF5663@ando.pearwood.info> <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com> Message-ID: <20150510095729.GK5663@ando.pearwood.info> On Sun, May 10, 2015 at 07:25:31PM +1000, Chris Angelico wrote: > strings = [ > "String one", > "String two is a bit longer" > "String three", > "String four" > ] > > How many strings are there in my list? Clearly the programmer's > intention is to have four, but that's not what ends up happening. (Now > imagine there are actually hundreds of strings, and one gets selected > at random every time you do something. If you are embedding hundreds of strings in the source, instead of reading them from a file, you deserve whatever horribleness you get :-) > Have fun figuring out why, just > occasionally, it prints out two messages instead of one. That would actually be pretty easy to solve. When you get the unexpected "String two is a bit longerString three" message, just grep through the file for the first few words, and lo and behold, you are missing a comma. But your point about syntactically meaningful whitespace is otherwise a good one. Python doesn't give whitespace in expressions any particular meaning, except as a separator. I'd be very dubious about making function composition an exception. > So I would strongly suggest having some sort of operator in between. > Okay. Can I just say something crazy? (Hans: I love crazy!) How about > using a comma? > > >>> (fn1, fn2, fn3, ...)('string to be piped') > > Currently, this produces a runtime TypeError: 'tuple' object is not > callable, but I could easily define my own callable subclass of tuple. There's lots of code that assumes that a tuple of functions is a sequence: for f in (len, str, ord, chr, repr): test(f) so we would need to keep that. But we don't want a composed function to be a sequence, any more than we want a partial or a regular function to be sequences. If I pass you a Composed object, and you try slicing it, that should be an error. -- Steve From rosuav at gmail.com Sun May 10 12:17:09 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 10 May 2015 20:17:09 +1000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <20150510095729.GK5663@ando.pearwood.info> References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com> <554EAB9A.2090501@aalto.fi> <20150510032016.GF5663@ando.pearwood.info> <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com> <20150510095729.GK5663@ando.pearwood.info> Message-ID: On Sun, May 10, 2015 at 7:57 PM, Steven D'Aprano wrote: >> So I would strongly suggest having some sort of operator in between. >> Okay. Can I just say something crazy? (Hans: I love crazy!) How about >> using a comma? >> >> >>> (fn1, fn2, fn3, ...)('string to be piped') >> >> Currently, this produces a runtime TypeError: 'tuple' object is not >> callable, but I could easily define my own callable subclass of tuple. > > There's lots of code that assumes that a tuple of functions is a > sequence: > > for f in (len, str, ord, chr, repr): > test(f) > > so we would need to keep that. But we don't want a composed function to > be a sequence, any more than we want a partial or a regular function to > be sequences. If I pass you a Composed object, and you try slicing it, > that should be an error. Well, I told you it was crazy :) But the significance here is that there would be no Composed object, just a tuple. You could slice it, iterate over it, etc; and if you call it, it calls each of its arguments. I'm not sure that it's a fundamental problem for a composed function to be sliceable, any more than it's a problem for any other available operation that you aren't using. Tuples already have several related uses (they can be used as "record" types, or as frozen lists for hashability, etc), and this would simply mean that a tuple of callables is callable. ChrisA From larocca at abiresearch.com Sun May 10 12:36:59 2015 From: larocca at abiresearch.com (Douglas La Rocca) Date: Sun, 10 May 2015 10:36:59 +0000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com> <554EAB9A.2090501@aalto.fi> <20150510032016.GF5663@ando.pearwood.info> <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com> Message-ID: > I understand why you named it; I don't understand why you didn't just use > def if you were going to name it (and declare it in a statement instead of the > middle of an expression). Anyway, this is already in operator, as itemgetter, > and it's definitely useful to functional code, especially itertools-style > generator-driven functional code. And it feels like the pattern ought to be > generalizable... but other than attrgetter, it's hard to think of another > example where you want the same thing. After all, Python only has a couple > of syntactic forms that you'd want to wrap up as functions at all, so it only has > a couple of syntactic forms that you'd want to wrap up as curried functions. Sorry for the confusion here--I was trying to say that it's correct to use def in order to properly set __name__, give space for doc strings, etc. The downside is that the nesting can strain readability. I was only showing the "formal" equivalent in lambda-style to point out how currying arguments isn't really confusing at all, considering lambda x: lambda y: lambda z: begins to resemble syntactically def anon(x, y, z): (obviously semantically these are different). Regarding the `getitem` example, this wasn't intended as a use-case. It's true Python has few syntactic forms you'd want to wrap (isinstance, hasattr, etc.). I mostly had external module apis in mind here. > I don't understand why this is called fmap. I see below that you're not > implying anything like Haskell's fmap (which confused me...), but then what > _does_ the f mean? It seems like this is just a manually curried map, that > returns a list instead of an iterator, and only takes one iterable instead of one > or more. None of those things say "f" to me, but maybe I'm still hung up on > expecting it to mean "functor" and I'll feel like an idiot once you clear it up. :) > Also, why _is_ it calling list? Do your notions of composition and currying not > play well with iterators? If so, that seems like a pretty major thing to give up. > And why isn't it variadic in the iterables? You can trivially change that by just > having the wrapped function take and pass *x, but I assume there's some > reason you didn't? It was only called fmap to leave the builtin map in the namespace, the 'f' just meant 'function'. Taking a single iterable as the first item rather than varargs avoids the use of the `star` shim in the composition. I do use a wrapper `s` for this but find it ugly to use. It's basically a conventional decision that's forced by the difference between passing a single value to a "monadic" (in the APL not Haskell sense) function and a variadic function. In my own util library this also shows up as two versions of the identity function: def identity(x): return x def identity_star(*x): return x It will seem these are useless but purpose becomes felt when you're in the middle of a composition. For data structures where you want to map over lists of lists of lists etc., you can either define a higher map or do something like fmap(fmap(fmap(function_to_apply)))(iterable) which would incidentally be the same as the uglier compose(*(fmap,)*3)(function_to_apply)(iterable) though the latter makes it possible to parametrize the iteration depth. As for wrapping in `list`--in some cases (I can't immediately recall them all) the list actually needed to be built in order for the composition to work. A simple case would be compose(mul(10), fmap(len), len)([[1]*10]*10) which would return TypeError. I should look again to see if there's a better way to fix it. But I reverted the default back to 2.x because I made full use of generators before moving to 3.x and decided I didn't need map to be lazy. To be honest, the preference for everything to be lazy seems somewhat fashionable at the moment... you can get along just as well knowing where things shouldn't be fully loaded into memory (i.e. when to use a generator). > These two aren't variadic in fn like fmap was. Is that just a typo, or is there a > reason not to be? Yes just a typo! > Now that we have a concrete example... This looks like a nifty translation of > what you might write in Haskell, but it doesn't look at all like Python to me. > > And compare: > > def f(d): > pairs = (pair.strip(' ').split(':') for pair in d.split('?')) > strippedpairs = ((part.strip(' ') for part in pair) for pair in pairs) > return dict(strippedpairs) > > Or, even better: > > def f(d): > pairs = (pair.strip(' ').split(':') for pair in d.split('?')) > return {k.strip(' '): v.strip(' ') for k, v in pairs} > > Of course I skipped a lot of steps--turning the inner iterables into tuples, > then into dicts, then turning the outer iterable into a list, then merging all the > dicts, and of course wrapping various subsets of the process up into > functions and calling them--but that's because those steps are unnecessary. > We have comprehensions, we have iterators, why try to write for Python > 2.2? I agree these work just as well. > And notice that any chain of iterator transformations like this _could_ be > written as a single expression. But the fact that it doesn't _have_ to be--that > you can take any step you want and name the intermediate iterable without > having to change anything (and with negligible performance cost), and you > can make your code vertical and play into Python indentation instead of > writing it horizontally and faking indentation with paren-continuation--is > what makes generator expressions and map and filter so nice. > Well, that, and the fact that in a comprehension I can just write an expression > and it means that expression. I don't have to wrap the expression in a > function, or try to come up with a higher-order expression that will effect > that first-order expression when evaluated. > But often, the individual values have useful names that make it easier to > keep track of them. Like calling the keys and values k and v instead of having > them be elements 0 and 1 of an implicit *args. I agree for the most part, but there are cases where you're really deep into some structure, manipulating the values in a generic way, and the names *do* get in the way. The temptation for me in those cases is to use x, y, z, s, t, etc. At this point the readability really suffers. The alternative is to modularize more, breaking the functions apart, but this only helps so much... In a certain way I find `(pair.strip(' ').split(':') for pair in d.split('?'))` to be less readable than the first steps in the composition--with the generator I'm reading back and forth in order to find out what's happening whereas the composition + map outlines the steps in a tree-like structure. From ron3200 at gmail.com Sun May 10 16:45:49 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sun, 10 May 2015 10:45:49 -0400 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <91A6985C-A94B-4132-99B1-0305933950B5@yahoo.com> Message-ID: On 05/10/2015 01:24 AM, Andrew Barnert via Python-ideas wrote: > On May 9, 2015, at 20:08, Ron Adam wrote: >> >>> On 05/09/2015 06:45 PM, Andrew Barnert via Python-ideas wrote: >>>> On May 9, 2015, at 08:38, Ron Adam wrote: >>> But, more importantly, this doesn't work. Your square(xs) isn't going >>> to evaluate to a function, but to a whatever falling square on xs returns. >>> (Which is presumably a TypeError, or you wouldn't be looking to map in the >>> first place). And, even if that did work, you're not actually composing a >>> function here anyway; your @ is just a call operator, which we already have >>> in Python, spelled with parens. >> >> This is following the patterns being discussed in the thread. (or at least an attempt to do so.) >> >> The @ and $ above would bind more tightly than the (). Like the doc "." does for method calls. > > @ can't bind more tightly than (). The operator already exists (that's > the whole reason people are suggesting it for compose), and it has the same > precedence as *. Yes, and so it may need different symbols to work, but there are not many easy to type and read symbols left. So some double symbols of some sort may work. Picking what those should be is a topic all its own, and it's not even an issue until the concept works. I should not even given examples earlier. The point I was trying to make was an operator that indicates the next argument is not complete my be useful. And I think the initial (or another) example implementation does do that, but uses a tuple to package the function with the partial arguments instead. Cheers, Ron From koos.zevenhoven at aalto.fi Sun May 10 17:15:58 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Sun, 10 May 2015 18:15:58 +0300 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <26588_1431226604_554EC8EC_26588_715_1_20150510025630.GC5663@ando.pearwood.info> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <26588_1431226604_554EC8EC_26588_715_1_20150510025630.GC5663@ando.pearwood.info> Message-ID: <554F762E.3030009@aalto.fi> On 10.5.2015 5:56, Steven D'Aprano wrote: [...] >> You could in addition have: >> >> spam @ eggs @ cheese @ arg # equivalent to spam(eggs(cheese(arg))) >> >> arg | spam | eggs | cheese # equivalent to cheese(eggs(spam(arg))) >> >> Here, arg would thus be recognized as not a function. > No. I think it is absolutely vital to distinguish by syntax the > difference between composition and function application, and not try to > "do what I mean". DWIM software has a bad history of doing the wrong > thing. > > Every other kind of callable uses obj(arg) to call it: types, functions, > methods, partial objects, etc. We shouldn't make function composition > try to be different. If I write sqrt at 100 I should get a runtime error, > not 10. > > I don't mind if the error is delayed until I actually try to call the > composed object, but at some point I should get a TypeError that 100 is > not callable. > That is in fact a part of why I added a function call () to the sketch in my recent post (extended partial operator, there using ->). This way, the composition operator would never do the actual call by itself, but instead make a partial. But I admit that (sqrt at 100)() still would give 10, not the runtime error you want (which may indeed cause problems with callable arguments). It only solves half the problem. Another way to feed the left-to-right | composition from the left would of course be (feed(x) | spam | eggs | cheese)() # feed would be just def feed(x): return x But I'm not sure I like it. Luckily, (cheese @ eggs @ spam)(x) does not have this problem. However, if cheese, eggs and spam were matrix transformations, one would write cheese @ eggs @ spam @ x But perhaps numpy would want to bridge this gap with extended behavior (allow calling numpy functions with @ or "calling a matrix transformation" with () ). Or perhaps not :). -- Koos From koos.zevenhoven at aalto.fi Sun May 10 17:30:50 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Sun, 10 May 2015 18:30:50 +0300 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <26588_1431270974_554F763D_26588_7409_1_554F762E.3030009@aalto.fi> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <26588_1431226604_554EC8EC_26588_715_1_20150510025630.GC5663@ando.pearwood.info> <26588_1431270974_554F763D_26588_7409_1_554F762E.3030009@aalto.fi> Message-ID: <554F79AA.60100@aalto.fi> Just a small correction to my below email (the definition of "feed"): On 10.5.2015 18:15, Koos Zevenhoven wrote: > On 10.5.2015 5:56, Steven D'Aprano wrote: > > [...] >>> You could in addition have: >>> >>> spam @ eggs @ cheese @ arg # equivalent to spam(eggs(cheese(arg))) >>> >>> arg | spam | eggs | cheese # equivalent to cheese(eggs(spam(arg))) >>> >>> Here, arg would thus be recognized as not a function. >> No. I think it is absolutely vital to distinguish by syntax the >> difference between composition and function application, and not try to >> "do what I mean". DWIM software has a bad history of doing the wrong >> thing. >> >> Every other kind of callable uses obj(arg) to call it: types, functions, >> methods, partial objects, etc. We shouldn't make function composition >> try to be different. If I write sqrt at 100 I should get a runtime error, >> not 10. >> >> I don't mind if the error is delayed until I actually try to call the >> composed object, but at some point I should get a TypeError that 100 is >> not callable. >> > > That is in fact a part of why I added a function call () to the sketch > in my recent post (extended partial operator, there using ->). This > way, the composition operator would never do the actual call by > itself, but instead make a partial. But I admit that (sqrt at 100)() > still would give 10, not the runtime error you want (which may indeed > cause problems with callable arguments). It only solves half the problem. > > Another way to feed the left-to-right | composition from the left > would of course be > > (feed(x) | spam | eggs | cheese)() # feed would be just def > feed(x): return x > Sorry, I messed that up. "feed" would of course be: def feed(x): def feeder(): return x return feeder > But I'm not sure I like it. Luckily, (cheese @ eggs @ spam)(x) does > not have this problem. However, if cheese, eggs and spam were matrix > transformations, one would write > > cheese @ eggs @ spam @ x > > But perhaps numpy would want to bridge this gap with extended behavior > (allow calling numpy functions with @ or "calling a matrix > transformation" with () ). Or perhaps not :). > > -- Koos > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From stephen at xemacs.org Sun May 10 19:52:37 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 11 May 2015 02:52:37 +0900 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> Message-ID: <87k2wg2uca.fsf@uwakimon.sk.tsukuba.ac.jp> Gregory Salvan writes: > Nobody convinced by arrow operator ? > > like: arg -> spam -> eggs -> cheese > or cheese <- eggs <- spam <- arg Yuck. There are living languages (R) that use an arrow as an assignment operator, and others (or perhaps you consider C a zombie language ) that uses one as a member operator. I would prefer the C++ pipe operators, ie, << and >>. But that's just bikeshedding a moot point; I doubt most people would be favorable to introducing more operator symbols for this purpose, and I personally would be opposed. If functools was more popular and its users were screaming for operators the way the numerical folk screamed for a matrix multiplication operator, I'd be more sympathetic. But they're not screaming that I can hear. To give an idea of how difficult it is to get an operator added, it took at least a decade to get the matrix multiplication operator added after it was first proposed, and two of the key steps were first the introduction of unary "@" for decorator application (another case that screamed for a new operator), and then the proponents dropping the "@@" operator from their proposal. From larocca at abiresearch.com Sun May 10 20:40:25 2015 From: larocca at abiresearch.com (Douglas La Rocca) Date: Sun, 10 May 2015 18:40:25 +0000 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <87k2wg2uca.fsf@uwakimon.sk.tsukuba.ac.jp> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> , <87k2wg2uca.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87F614BB-795F-4B77-9D87-B544E75786AF@abiresearch.com> I agree an operator is really unnecessary, especially because parens would be needed anyway for subexpressions. A LISP-like syntax would work better than something trying to imitate Haskell. make_breakfast = (make_spam make_eggs(2, 'overeasy') make_cheese) make_breakfast() It also has the sense of collecting and reifying a series of functions rather than declaring a tuple or list or some other data structure. Why bother with the left/right associative issues of infix operators? Suppose you wanted a simple quick composition of a few functions where the expression would be called right away. With infix it might look like (list @ sorted @ ','.join)('a string of chars to be sorted, then joined on commas') But you already have the parens so why not just (list sorted ','.join)('a string ...') On May 10, 2015, at 1:53 PM, Stephen J. Turnbull wrote: > > Gregory Salvan writes: > >> Nobody convinced by arrow operator ? >> >> like: arg -> spam -> eggs -> cheese >> or cheese <- eggs <- spam <- arg > > Yuck. There are living languages (R) that use an arrow as an > assignment operator, and others (or perhaps you consider C a zombie > language ) that uses one as a member operator. I would prefer > the C++ pipe operators, ie, << and >>. > > But that's just bikeshedding a moot point; I doubt most people would > be favorable to introducing more operator symbols for this purpose, > and I personally would be opposed. If functools was more popular and > its users were screaming for operators the way the numerical folk > screamed for a matrix multiplication operator, I'd be more sympathetic. > But they're not screaming that I can hear. > > To give an idea of how difficult it is to get an operator added, it > took at least a decade to get the matrix multiplication operator added > after it was first proposed, and two of the key steps were first the > introduction of unary "@" for decorator application (another case that > screamed for a new operator), and then the proponents dropping the > "@@" operator from their proposal. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ron3200 at gmail.com Sun May 10 21:55:03 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sun, 10 May 2015 15:55:03 -0400 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <20150510095729.GK5663@ando.pearwood.info> References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com> <554EAB9A.2090501@aalto.fi> <20150510032016.GF5663@ando.pearwood.info> <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com> <20150510095729.GK5663@ando.pearwood.info> Message-ID: On 05/10/2015 05:57 AM, Steven D'Aprano wrote: > There's lots of code that assumes that a tuple of functions is a > sequence: > > for f in (len, str, ord, chr, repr): > test(f) > > so we would need to keep that. But we don't want a composed function to > be a sequence, any more than we want a partial or a regular function to > be sequences. If I pass you a Composed object, and you try slicing it, > that should be an error. It seems to me a linked list of composed objects works (rather than a sequence). It's easier to understand what is going on in it. from functools import partial from operator import * from statistics import mean def root(x): return x ** .5 def square(x): return x ** 2 class CF: def __init__(self, f, *rest): if isinstance(f, tuple): self.f = partial(*f) else: self.f = f if rest: self.child = CF(*rest) else: self.child = None def __call__(self, data): if self.child == None: return self.f(data) return self.f(self.child(data)) def __repr__(self): if self.child != None: s = repr(self.child) else: s = "CS()" return s[:3] + ("%s, " % repr(self.f)) + s[3:] CF(print, root, mean, (map, square)) ([4, 9, 16]) Prints: 10.847426730181986 From koos.zevenhoven at aalto.fi Sun May 10 22:06:21 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Sun, 10 May 2015 23:06:21 +0300 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> Message-ID: <554FBA3D.30907@aalto.fi> Reading the recent emails in the function composition thread started by Ivan, I realized that my below sketch for a composition operator would be better if it did not actually do function composition ;). Instead, -> would be quite powerful as 'just' a partial operator -- perhaps even more powerful, as I demonstrate below. However, this is not an argument against @ composition, which might in fact play together with this quite nicely. This allows some nice things with multi-argument functions too. I realize that it may be unlikely that a new operator would be added, but here it is anyway, as food for thought. (With an existing operator, I suspect it would be even less likely, because of precedence rules : ) So, -> would be an operator with a precedence similar to .attribute access (but lower than .attribute): # The simple definition of what it does: arg->func # equivalent to functools.partial(func, arg) This would allow for instance: arg -> spam() -> cheese(kind = 'gouda') -> eggs() which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda')) Or even together together with the proposed @ composition: rms = root @ mean @ square->map # for an iterable non-numpy argument And here's something I find quite interesting. Together with @singledispatch from 3.4 (or possibly an enhanced version using type annotations in the future?), one could add 'third-party methods' to classes in other libraries without monkey patching. A dummy example: from numpy import array my_list = [1,2,3] my_array = array(my_list) my_mean = my_array.mean() # This currently works in numpy from rmslib import rms my_rms = my_array->rms() # efficient rms for numpy arrays my_other_rms = my_list->rms() # rms that works on any iterable One would be able to distinguish between calls to methods and 'third-party methods' based on whether . or -> is used for accessing them, which I think is a good thing. Also, third-party methods would be less likely to mutate the object, just like func(obj) is less likely to mutate obj than obj.method(). See more examples below. I converted my examples from last night to this IMO better version, because at least some of them would still be relevant. On 10.5.2015 2:07, Koos Zevenhoven wrote: > On 10.5.2015 1:03, Gregory Salvan wrote: >> Nobody convinced by arrow operator ? >> >> like: arg -> spam -> eggs -> cheese >> or cheese <- eggs <- spam <- arg >> >> > > I like | a lot because of the pipe analogy. However, having a new > operator for this could solve some issues about operator precedence. > > Today, I sketched one possible version that would use a new .. > operator. I'll explain what it would do (but with your -> instead of > my ..) > > Here, the operator (.. or ->) would have a higher precedence than > function calls () but a lower precedence than attribute access (obj.attr). > > First, with single-argument functions spam, eggs and cheese, and a > non-function arg: > > arg->eggs->spam->cheese() # equivalent to cheese(spam(eggs(arg))) With -> as a partial operator, this would instead be: arg->eggs()->spam()->cheese() # equivalent to cheese(spam(eggs(arg))) > eggs->spam->cheese # equivalent to lambda arg: cheese(spam(eggs(arg))) > With -> as a partial operator this could be: lambda arg: arg->eggs()->spam()->cheese() > Then, if spam and eggs both took two arguments; eggs(arg1, arg2), > spam(arg1, arg2) > > arg->eggs # equivalent to partial(eggs, arg) > eggs->spam(a, b, c) # equivalent to spam(eggs(a, b), c) With -> as a partial operator, the first one would work, and the second would become: eggs(a,b)->spam(c) # equivalent to spam(eggs(a, b), c) > arg->eggs->spam(b,c) # equivalent to spam(eggs(arg, b), c) > This would become: arg->eggs(b)->spam(c) # equivalent to spam(eggs(arg, b), c) Note that this would be quite flexible in partial 'piping' of multi-argument functions. > So you could think of -> as an extended partial operator. And this > would naturally generalize to functions with even more arguments. The > arguments would always be fed in the same order as in the equivalent > function call, which makes for a nice rule of thumb. However, I > suppose one would usually avoid combinations that are difficult to > understand. > > Some examples that this would enable: > > # Example 1 > from numpy import square, mean, sqrt > rms = square->mean->sqrt # I think this order is fine because it is > not @ > This would become: def rms(arr): return arr->square()->mean()->sqrt() > # Example 2 (both are equivalent) > spam(args)->eggs->cheese() # the shell-syntax analogy that Steven > mentioned. > This would be: spam(args)->eggs()->cheese() Of course the shell piping analogy would be quite far, because it looks so different. > # Example 3 > # Last but not least, we would finally have this :) > some_sequence->len() > some_object->isinstance(MyType) > And: func->map(seq) func->reduce(seq) -- Koos From apieum at gmail.com Sun May 10 23:11:51 2015 From: apieum at gmail.com (Gregory Salvan) Date: Sun, 10 May 2015 23:11:51 +0200 Subject: [Python-ideas] Function composition (was no subject) In-Reply-To: <87k2wg2uca.fsf@uwakimon.sk.tsukuba.ac.jp> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <87k2wg2uca.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Stephen J. Turnbull, ok, I was wong about community expectation, I thougth functools would be more popular with new symbols. Personnally, I made a wide use of functionnal paradigm but except when I need an heavy use of partial and reduce, the simple fact of importing functools and use the "partial" function has a higher cost than making it differently. That's also because python syntax is really convenient and lambda, decorators, iterators... allow a lot of things. -------------- next part -------------- An HTML attachment was scrubbed... URL: From apieum at gmail.com Sun May 10 23:23:40 2015 From: apieum at gmail.com (Gregory Salvan) Date: Sun, 10 May 2015 23:23:40 +0200 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: <554FBA3D.30907@aalto.fi> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> Message-ID: In my opinion, this syntax make problems when your arguments are functions/callables. And if you code in a functionnal paradigm it is quite common to inject functions in arguments otherwise how would you do polymorphism ? The only way I see to distinguish cases is to have tuples, but syntax is quite strange. instead of : arg->eggs(b)->spam(c) my_partial = (arg, b)->eggs->(c, )->spam Then how would you call my_partial ? For example, if you have: def eggs(a, b, c)... def spam(d, e)... my_partial(c, e) or my_partial(c)(e) ? 2015-05-10 22:06 GMT+02:00 Koos Zevenhoven : > Reading the recent emails in the function composition thread started by > Ivan, I realized that my below sketch for a composition operator would be > better if it did not actually do function composition ;). Instead, -> would > be quite powerful as 'just' a partial operator -- perhaps even more > powerful, as I demonstrate below. However, this is not an argument against > @ composition, which might in fact play together with this quite nicely. > > This allows some nice things with multi-argument functions too. > > I realize that it may be unlikely that a new operator would be added, but > here it is anyway, as food for thought. (With an existing operator, I > suspect it would be even less likely, because of precedence rules : ) > > So, -> would be an operator with a precedence similar to .attribute access > (but lower than .attribute): > > # The simple definition of what it does: > arg->func # equivalent to functools.partial(func, arg) > > This would allow for instance: > arg -> spam() -> cheese(kind = 'gouda') -> eggs() > > which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda')) > > Or even together together with the proposed @ composition: > rms = root @ mean @ square->map # for an iterable non-numpy argument > > And here's something I find quite interesting. Together with > @singledispatch from 3.4 (or possibly an enhanced version using type > annotations in the future?), one could add 'third-party methods' to classes > in other libraries without monkey patching. A dummy example: > > from numpy import array > my_list = [1,2,3] > my_array = array(my_list) > my_mean = my_array.mean() # This currently works in numpy > > from rmslib import rms > my_rms = my_array->rms() # efficient rms for numpy arrays > my_other_rms = my_list->rms() # rms that works on any iterable > > One would be able to distinguish between calls to methods and 'third-party > methods' based on whether . or -> is used for accessing them, which I think > is a good thing. Also, third-party methods would be less likely to mutate > the object, just like func(obj) is less likely to mutate obj than > obj.method(). > > See more examples below. I converted my examples from last night to this > IMO better version, because at least some of them would still be relevant. > > On 10.5.2015 2:07, Koos Zevenhoven wrote: > >> On 10.5.2015 1:03, Gregory Salvan wrote: >> >>> Nobody convinced by arrow operator ? >>> >>> like: arg -> spam -> eggs -> cheese >>> or cheese <- eggs <- spam <- arg >>> >>> >>> >> I like | a lot because of the pipe analogy. However, having a new >> operator for this could solve some issues about operator precedence. >> >> Today, I sketched one possible version that would use a new .. operator. >> I'll explain what it would do (but with your -> instead of my ..) >> >> Here, the operator (.. or ->) would have a higher precedence than >> function calls () but a lower precedence than attribute access (obj.attr). >> >> First, with single-argument functions spam, eggs and cheese, and a >> non-function arg: >> >> arg->eggs->spam->cheese() # equivalent to cheese(spam(eggs(arg))) >> > > With -> as a partial operator, this would instead be: > > arg->eggs()->spam()->cheese() # equivalent to cheese(spam(eggs(arg))) > > eggs->spam->cheese # equivalent to lambda arg: cheese(spam(eggs(arg))) >> >> > With -> as a partial operator this could be: > > lambda arg: arg->eggs()->spam()->cheese() > > > Then, if spam and eggs both took two arguments; eggs(arg1, arg2), >> spam(arg1, arg2) >> >> arg->eggs # equivalent to partial(eggs, arg) >> eggs->spam(a, b, c) # equivalent to spam(eggs(a, b), c) >> > > With -> as a partial operator, the first one would work, and the second > would become: > > eggs(a,b)->spam(c) # equivalent to spam(eggs(a, b), c) > > arg->eggs->spam(b,c) # equivalent to spam(eggs(arg, b), c) >> >> > This would become: > > arg->eggs(b)->spam(c) # equivalent to spam(eggs(arg, b), c) > > Note that this would be quite flexible in partial 'piping' of > multi-argument functions. > > So you could think of -> as an extended partial operator. And this would >> naturally generalize to functions with even more arguments. The arguments >> would always be fed in the same order as in the equivalent function call, >> which makes for a nice rule of thumb. However, I suppose one would usually >> avoid combinations that are difficult to understand. >> >> Some examples that this would enable: >> >> # Example 1 >> from numpy import square, mean, sqrt >> rms = square->mean->sqrt # I think this order is fine because it is not >> @ >> >> > This would become: > > def rms(arr): > return arr->square()->mean()->sqrt() > > # Example 2 (both are equivalent) >> spam(args)->eggs->cheese() # the shell-syntax analogy that Steven >> mentioned. >> >> > This would be: > > spam(args)->eggs()->cheese() > > Of course the shell piping analogy would be quite far, because it looks so > different. > > # Example 3 >> # Last but not least, we would finally have this :) >> some_sequence->len() >> some_object->isinstance(MyType) >> >> > And: > > func->map(seq) > func->reduce(seq) > > -- Koos > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From koos.zevenhoven at aalto.fi Sun May 10 23:41:59 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Mon, 11 May 2015 00:41:59 +0300 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> Message-ID: <554FD0A7.8010606@aalto.fi> Hi Gregory, Did you look at the new version carefully? If I understand the problem you are describing (mentioned also by Steven), my previous version had that issue, but the new one does not. That is why I added examples with callable arguments :). -- Koos On 11.5.2015 0:23, Gregory Salvan wrote: > In my opinion, this syntax make problems when your arguments are > functions/callables. > And if you code in a functionnal paradigm it is quite common to inject > functions in arguments otherwise how would you do polymorphism ? > > The only way I see to distinguish cases is to have tuples, but syntax > is quite strange. > > instead of : arg->eggs(b)->spam(c) > my_partial = (arg, b)->eggs->(c, )->spam > > Then how would you call my_partial ? > For example, if you have: > def eggs(a, b, c)... > def spam(d, e)... > > my_partial(c, e) or my_partial(c)(e) ? > > > > 2015-05-10 22:06 GMT+02:00 Koos Zevenhoven >: > > Reading the recent emails in the function composition thread > started by Ivan, I realized that my below sketch for a composition > operator would be better if it did not actually do function > composition ;). Instead, -> would be quite powerful as 'just' a > partial operator -- perhaps even more powerful, as I demonstrate > below. However, this is not an argument against @ composition, > which might in fact play together with this quite nicely. > > This allows some nice things with multi-argument functions too. > > I realize that it may be unlikely that a new operator would be > added, but here it is anyway, as food for thought. (With an > existing operator, I suspect it would be even less likely, because > of precedence rules : ) > > So, -> would be an operator with a precedence similar to > .attribute access (but lower than .attribute): > > # The simple definition of what it does: > arg->func # equivalent to functools.partial(func, arg) > > This would allow for instance: > arg -> spam() -> cheese(kind = 'gouda') -> eggs() > > which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda')) > > Or even together together with the proposed @ composition: > rms = root @ mean @ square->map # for an iterable non-numpy > argument > > And here's something I find quite interesting. Together with > @singledispatch from 3.4 (or possibly an enhanced version using > type annotations in the future?), one could add 'third-party > methods' to classes in other libraries without monkey patching. A > dummy example: > > from numpy import array > my_list = [1,2,3] > my_array = array(my_list) > my_mean = my_array.mean() # This currently works in numpy > > from rmslib import rms > my_rms = my_array->rms() # efficient rms for numpy arrays > my_other_rms = my_list->rms() # rms that works on any iterable > > One would be able to distinguish between calls to methods and > 'third-party methods' based on whether . or -> is used for > accessing them, which I think is a good thing. Also, third-party > methods would be less likely to mutate the object, just like > func(obj) is less likely to mutate obj than obj.method(). > > See more examples below. I converted my examples from last night > to this IMO better version, because at least some of them would > still be relevant. > > On 10.5.2015 2:07, Koos Zevenhoven wrote: > > On 10.5.2015 1:03, Gregory Salvan wrote: > > Nobody convinced by arrow operator ? > > like: arg -> spam -> eggs -> cheese > or cheese <- eggs <- spam <- arg > > > > I like | a lot because of the pipe analogy. However, having a > new operator for this could solve some issues about operator > precedence. > > Today, I sketched one possible version that would use a new .. > operator. I'll explain what it would do (but with your -> > instead of my ..) > > Here, the operator (.. or ->) would have a higher precedence > than function calls () but a lower precedence than attribute > access (obj.attr). > > First, with single-argument functions spam, eggs and cheese, > and a non-function arg: > > arg->eggs->spam->cheese() # equivalent to > cheese(spam(eggs(arg))) > > > With -> as a partial operator, this would instead be: > > arg->eggs()->spam()->cheese() # equivalent to > cheese(spam(eggs(arg))) > > eggs->spam->cheese # equivalent to lambda arg: > cheese(spam(eggs(arg))) > > > With -> as a partial operator this could be: > > lambda arg: arg->eggs()->spam()->cheese() > > > Then, if spam and eggs both took two arguments; eggs(arg1, > arg2), spam(arg1, arg2) > > arg->eggs # equivalent to partial(eggs, arg) > eggs->spam(a, b, c) # equivalent to spam(eggs(a, b), c) > > > With -> as a partial operator, the first one would work, and the > second would become: > > eggs(a,b)->spam(c) # equivalent to spam(eggs(a, b), c) > > arg->eggs->spam(b,c) # equivalent to spam(eggs(arg, b), c) > > > This would become: > > arg->eggs(b)->spam(c) # equivalent to spam(eggs(arg, b), c) > > Note that this would be quite flexible in partial 'piping' of > multi-argument functions. > > So you could think of -> as an extended partial operator. And > this would naturally generalize to functions with even more > arguments. The arguments would always be fed in the same order > as in the equivalent function call, which makes for a nice > rule of thumb. However, I suppose one would usually avoid > combinations that are difficult to understand. > > Some examples that this would enable: > > # Example 1 > from numpy import square, mean, sqrt > rms = square->mean->sqrt # I think this order is fine > because it is not @ > > > This would become: > > def rms(arr): > return arr->square()->mean()->sqrt() > > # Example 2 (both are equivalent) > spam(args)->eggs->cheese() # the shell-syntax analogy that > Steven mentioned. > > > This would be: > > spam(args)->eggs()->cheese() > > Of course the shell piping analogy would be quite far, because it > looks so different. > > # Example 3 > # Last but not least, we would finally have this :) > some_sequence->len() > some_object->isinstance(MyType) > > > And: > > func->map(seq) > func->reduce(seq) > > -- Koos > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From koos.zevenhoven at aalto.fi Mon May 11 00:42:27 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Mon, 11 May 2015 01:42:27 +0300 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') Message-ID: <554FDED3.8030200@aalto.fi> Hi everyone! (Sorry about double posting, but I wanted to start a new thread, which I tried but apparently failed to do last time. Although inspired by and related to the function composition discussion, this is now something different and should not be cluttering the composition thread.) Reading the recent emails in the function composition thread started by Ivan, I realized that my sketch for a composition operator (from yesterday, quoted below) would be much better if it did not actually do function composition . Instead, -> would be quite powerful as 'just' a partial operator -- perhaps even more powerful, as I demonstrate below. However, this is not an argument against @ composition, which might in fact play together with this quite nicely. This allows some nice things with multi-argument functions too. I realize that it may be unlikely that a new operator would be added, but here it is anyway, as food for thought. (With an existing operator, I suspect it would be even less likely, because of precedence rules : ) So, -> would be an operator with a precedence similar to .attribute access (but lower than .attribute): # The simple definition of what it does: arg->func # equivalent to functools.partial(func, arg) This would allow for instance: arg -> spam() -> cheese(kind = 'gouda') -> eggs() which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda')) Or even together together with the proposed @ composition: rms = root @ mean @ square->map # for an iterable non-numpy argument And here's something I find quite interesting. Together with @singledispatch from 3.4 (or possibly an enhanced version using type annotations in the future?), one could add 'third-party methods' to classes in other libraries without monkey patching. A dummy example: from numpy import array my_list = [1,2,3] my_array = array(my_list) my_mean = my_array.mean() # This currently works in numpy from rmslib import rms my_rms = my_array->rms() # efficient rms for numpy arrays my_other_rms = my_list->rms() # rms that works on any iterable One would be able to distinguish between calls to methods and 'third-party methods' based on whether . or -> is used for accessing them, which I think is a good thing. Also, third-party methods would be less likely to mutate the object, just like func(obj) is less likely to mutate obj than obj.method(). See more examples below. I converted my examples from last night to this IMO better version, because at least some of them would still be relevant. On 10.5.2015 2:07, Koos Zevenhoven wrote: > On 10.5.2015 1:03, Gregory Salvan wrote: >> Nobody convinced by arrow operator ? >> >> like: arg -> spam -> eggs -> cheese >> or cheese <- eggs <- spam <- arg >> >> > > I like | a lot because of the pipe analogy. However, having a new > operator for this could solve some issues about operator precedence. > > Today, I sketched one possible version that would use a new .. > operator. I'll explain what it would do (but with your -> instead of > my ..) > > Here, the operator (.. or ->) would have a higher precedence than > function calls () but a lower precedence than attribute access > (obj.attr). > > First, with single-argument functions spam, eggs and cheese, and a > non-function arg: > > arg->eggs->spam->cheese() # equivalent to cheese(spam(eggs(arg))) With -> as a partial operator, this would instead be: arg->eggs()->spam()->cheese() # equivalent to cheese(spam(eggs(arg))) > eggs->spam->cheese # equivalent to lambda arg: cheese(spam(eggs(arg))) > With -> as a partial operator this could be: lambda arg: arg->eggs()->spam()->cheese() > Then, if spam and eggs both took two arguments; eggs(arg1, arg2), > spam(arg1, arg2) > > arg->eggs # equivalent to partial(eggs, arg) > eggs->spam(a, b, c) # equivalent to spam(eggs(a, b), c) With -> as a partial operator, the first one would work, and the second would become: eggs(a,b)->spam(c) # equivalent to spam(eggs(a, b), c) > arg->eggs->spam(b,c) # equivalent to spam(eggs(arg, b), c) > This would become: arg->eggs(b)->spam(c) # equivalent to spam(eggs(arg, b), c) Note that this would be quite flexible in partial 'piping' of multi-argument functions. > So you could think of -> as an extended partial operator. And this > would naturally generalize to functions with even more arguments. The > arguments would always be fed in the same order as in the equivalent > function call, which makes for a nice rule of thumb. However, I > suppose one would usually avoid combinations that are difficult to > understand. > > Some examples that this would enable: > > # Example 1 > from numpy import square, mean, sqrt > rms = square->mean->sqrt # I think this order is fine because it is > not @ > This would become: def rms(arr): return arr->square()->mean()->sqrt() > # Example 2 (both are equivalent) > spam(args)->eggs->cheese() # the shell-syntax analogy that Steven > mentioned. > This would be: spam(args)->eggs()->cheese() Of course the shell piping analogy would be quite far, because it looks so different. > # Example 3 > # Last but not least, we would finally have this > some_sequence->len() > some_object->isinstance(MyType) > And: func->map(seq) func->reduce(seq) -- Koos From apieum at gmail.com Mon May 11 01:40:23 2015 From: apieum at gmail.com (Gregory Salvan) Date: Mon, 11 May 2015 01:40:23 +0200 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: <554FD0A7.8010606@aalto.fi> References: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com> <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <554FD0A7.8010606@aalto.fi> Message-ID: Nope sorry I've misread your code, but it changes nothing. for example with spam(args)->eggs()->cheese() if instead you have: args=something spam = lambda: args spam()->eggs()->cheese() should be treaten as: cheese(eggs(spam())) or cheese(eggs(args)) or partial(cheese) circle partial(eggs) circle partial(spam) ? I don't find this syntax convenient, sorry. 2015-05-10 23:41 GMT+02:00 Koos Zevenhoven : > Hi Gregory, > > Did you look at the new version carefully? If I understand the problem you > are describing (mentioned also by Steven), my previous version had that > issue, but the new one does not. That is why I added examples with callable > arguments :). > > -- Koos > > > > On 11.5.2015 0:23, Gregory Salvan wrote: > > In my opinion, this syntax make problems when your arguments are > functions/callables. > And if you code in a functionnal paradigm it is quite common to inject > functions in arguments otherwise how would you do polymorphism ? > > The only way I see to distinguish cases is to have tuples, but syntax is > quite strange. > > instead of : arg->eggs(b)->spam(c) > my_partial = (arg, b)->eggs->(c, )->spam > > Then how would you call my_partial ? > For example, if you have: > def eggs(a, b, c)... > def spam(d, e)... > > my_partial(c, e) or my_partial(c)(e) ? > > > > 2015-05-10 22:06 GMT+02:00 Koos Zevenhoven : > >> Reading the recent emails in the function composition thread started by >> Ivan, I realized that my below sketch for a composition operator would be >> better if it did not actually do function composition ;). Instead, -> would >> be quite powerful as 'just' a partial operator -- perhaps even more >> powerful, as I demonstrate below. However, this is not an argument against >> @ composition, which might in fact play together with this quite nicely. >> >> This allows some nice things with multi-argument functions too. >> >> I realize that it may be unlikely that a new operator would be added, but >> here it is anyway, as food for thought. (With an existing operator, I >> suspect it would be even less likely, because of precedence rules : ) >> >> So, -> would be an operator with a precedence similar to .attribute >> access (but lower than .attribute): >> >> # The simple definition of what it does: >> arg->func # equivalent to functools.partial(func, arg) >> >> This would allow for instance: >> arg -> spam() -> cheese(kind = 'gouda') -> eggs() >> >> which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda')) >> >> Or even together together with the proposed @ composition: >> rms = root @ mean @ square->map # for an iterable non-numpy argument >> >> And here's something I find quite interesting. Together with >> @singledispatch from 3.4 (or possibly an enhanced version using type >> annotations in the future?), one could add 'third-party methods' to classes >> in other libraries without monkey patching. A dummy example: >> >> from numpy import array >> my_list = [1,2,3] >> my_array = array(my_list) >> my_mean = my_array.mean() # This currently works in numpy >> >> from rmslib import rms >> my_rms = my_array->rms() # efficient rms for numpy arrays >> my_other_rms = my_list->rms() # rms that works on any iterable >> >> One would be able to distinguish between calls to methods and >> 'third-party methods' based on whether . or -> is used for accessing them, >> which I think is a good thing. Also, third-party methods would be less >> likely to mutate the object, just like func(obj) is less likely to mutate >> obj than obj.method(). >> >> See more examples below. I converted my examples from last night to this >> IMO better version, because at least some of them would still be relevant. >> >> On 10.5.2015 2:07, Koos Zevenhoven wrote: >> >>> On 10.5.2015 1:03, Gregory Salvan wrote: >>> >>>> Nobody convinced by arrow operator ? >>>> >>>> like: arg -> spam -> eggs -> cheese >>>> or cheese <- eggs <- spam <- arg >>>> >>>> >>>> >>> I like | a lot because of the pipe analogy. However, having a new >>> operator for this could solve some issues about operator precedence. >>> >>> Today, I sketched one possible version that would use a new .. operator. >>> I'll explain what it would do (but with your -> instead of my ..) >>> >>> Here, the operator (.. or ->) would have a higher precedence than >>> function calls () but a lower precedence than attribute access (obj.attr). >>> >>> First, with single-argument functions spam, eggs and cheese, and a >>> non-function arg: >>> >>> arg->eggs->spam->cheese() # equivalent to cheese(spam(eggs(arg))) >>> >> >> With -> as a partial operator, this would instead be: >> >> arg->eggs()->spam()->cheese() # equivalent to cheese(spam(eggs(arg))) >> >> eggs->spam->cheese # equivalent to lambda arg: cheese(spam(eggs(arg))) >>> >>> >> With -> as a partial operator this could be: >> >> lambda arg: arg->eggs()->spam()->cheese() >> >> >> Then, if spam and eggs both took two arguments; eggs(arg1, arg2), >>> spam(arg1, arg2) >>> >>> arg->eggs # equivalent to partial(eggs, arg) >>> eggs->spam(a, b, c) # equivalent to spam(eggs(a, b), c) >>> >> >> With -> as a partial operator, the first one would work, and the second >> would become: >> >> eggs(a,b)->spam(c) # equivalent to spam(eggs(a, b), c) >> >> arg->eggs->spam(b,c) # equivalent to spam(eggs(arg, b), c) >>> >>> >> This would become: >> >> arg->eggs(b)->spam(c) # equivalent to spam(eggs(arg, b), c) >> >> Note that this would be quite flexible in partial 'piping' of >> multi-argument functions. >> >> So you could think of -> as an extended partial operator. And this would >>> naturally generalize to functions with even more arguments. The arguments >>> would always be fed in the same order as in the equivalent function call, >>> which makes for a nice rule of thumb. However, I suppose one would usually >>> avoid combinations that are difficult to understand. >>> >>> Some examples that this would enable: >>> >>> # Example 1 >>> from numpy import square, mean, sqrt >>> rms = square->mean->sqrt # I think this order is fine because it is >>> not @ >>> >>> >> This would become: >> >> def rms(arr): >> return arr->square()->mean()->sqrt() >> >> # Example 2 (both are equivalent) >>> spam(args)->eggs->cheese() # the shell-syntax analogy that Steven >>> mentioned. >>> >>> >> This would be: >> >> spam(args)->eggs()->cheese() >> >> Of course the shell piping analogy would be quite far, because it looks >> so different. >> >> # Example 3 >>> # Last but not least, we would finally have this :) >>> some_sequence->len() >>> some_object->isinstance(MyType) >>> >>> >> And: >> >> func->map(seq) >> func->reduce(seq) >> >> -- Koos >> >> >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon May 11 03:44:12 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 11 May 2015 11:44:12 +1000 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: <554FBA3D.30907@aalto.fi> References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> Message-ID: <20150511014412.GL5663@ando.pearwood.info> On Sun, May 10, 2015 at 11:06:21PM +0300, Koos Zevenhoven wrote: > So, -> would be an operator with a precedence similar to .attribute > access (but lower than .attribute): Dot . is not an operator. If I remember correctly, the docs describe it as a delimiter. > # The simple definition of what it does: > arg->func # equivalent to functools.partial(func, arg) I believe you require that -> is applied before function application, so arg->func # returns partial(func, arg) arg->func(x) # returns partial(func, arg)(x) arg->(func(x)) # returns partial(func(x), arg) > This would allow for instance: > arg -> spam() -> cheese(kind = 'gouda') -> eggs() I am having a lot of difficulty seeing that as anything other than "call spam with no arguments, then apply arg to the result". But, teasing it apart with the precedence I established above: arg->spam() # returns partial(spam, arg)() == spam(arg) """ -> cheese # returns partial(cheese, spam(arg)) """ (kind='gouda') # returns partial(cheese, spam(arg))(kind='gouda') # == cheese(spam(arg), kind='gouda') """ -> eggs # returns partial(eggs, cheese(spam(arg), kind='gouda')) """ () # calls the previous partial, with no arguments, giving: # partial(eggs, cheese(spam(arg), kind='gouda'))() # == eggs(cheese(spam(arg), kind='gouda')) > which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda')) Amazingly, you are correct! :-) I think this demonstrates an abuse of partial and the sort of thing that gives functional idioms a bad name. To tease this apart and understand what it does was very difficult to me. And I don't understand the point of creating partial applications that you are then immediately going to call, that just adds an extra layer of indirection to slow the code down. If you write partial(len, 'foo')() instead of just len('foo'), something has gone drastically wrong. So instead of arg->spam()->cheese(kind='gouda')->eggs() which includes *three* partial objects which are immediately called, wouldn't it be easier to just call the functions in the first place? eggs(cheese(spam(arg), kind='gouda')) It will certainly be more efficient! Let's run through a simple chain with no parens: a -> b # partial(b, a) a -> b -> c # partial(c, partial(b, a)) a -> b -> c -> d # partial(d, partial(c, partial(b, a))) I'm not seeing why I would want to write something like that. Let's apply multiple arguments: a -> func # partial(func, a) b -> (a -> func) # partial(partial(func, a), b) c -> (b -> (a -> func)) # partial(partial(partial(func, a), b), c) Perhaps a sufficiently clever implementation of partial could optimize partial(partial(func, a), b) to just a single layer of indirection partial(func, a, b), so it's not *necessarily* as awful as it looks. (I would expect a function composition operator to do the same.) Note that we have to write the second argument first, and bracket the second arrow clause. Writing it the "obvious" way is wrong: a -> b -> func # partial(func, partial(b, a)) I think this is imaginative but hard to read, hard to understand, hard to use correctly, inefficient, and even if used correctly, there are not very many times that you would need it. > Or even together together with the proposed @ composition: > rms = root @ mean @ square->map # for an iterable non-numpy argument I think that a single arrow may be reasonable as syntactic sugar for partial, but once you start chaining them, it all falls apart into a mess. That, in my mind, is a sign that the idea doesn't scale. We can chain dots with no problem: fe.fi.fo.fum and function calls in numerous ways: foo(bar(baz())) foo(bar)(baz) and although they can get hard to read just because of the sheer number of components, they are not conceptually difficult. But chaining arrows is conceptually difficult even with as few as two arrows. I think the problem here is that partial application is an N-ary operation. This is not Haskell where single-argument currying is enforced everywhere! You're trying to perform something which conceptually takes N arguments partial(func, 1, 2, 3, ..., N) using only a operator which can only take two arguments a->b. Things are going to get messy. > And here's something I find quite interesting. Together with > @singledispatch from 3.4 (or possibly an enhanced version using type > annotations in the future?), one could add 'third-party methods' to > classes in other libraries without monkey patching. A dummy example: > > from numpy import array > my_list = [1,2,3] > my_array = array(my_list) > my_mean = my_array.mean() # This currently works in numpy > > from rmslib import rms > my_rms = my_array->rms() # efficient rms for numpy arrays > my_other_rms = my_list->rms() # rms that works on any iterable That looks cute, but isn't very interesting. Effectively, you've invented a new (and less efficient) syntax for calling a function: spam->eggs(cheese) # eggs(spam, cheese) It's less efficient because it builds a partial object first, so instead of one call you end up with two, and a temporary object that gets thrown away immediately after it is used. Yes, you could keep the partial object around, but as your example shows, you don't. And because it is cute, people will write: a->func(), b->func(), c->func() and not realise that it creates three partial functions before calling them. Writing: func(a), func(b), func(c) will avoid that needless overhead. -- Steve From larocca at abiresearch.com Mon May 11 04:53:29 2015 From: larocca at abiresearch.com (Douglas La Rocca) Date: Mon, 11 May 2015 02:53:29 +0000 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: <20150511014412.GL5663@ando.pearwood.info> References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi>,<20150511014412.GL5663@ando.pearwood.info> Message-ID: <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> I agree here--I don't think a special operator for functools.partial is desirable. The proposal seems to suggest something between an ordinary lambda expression and Haskell's (>>=) bind. I expect the arrow to work as it does in Haskell and julia for anonymous functions x, *xs -> Or in (very-)pseudo notation (->) argspec expr Then bind (>>=) also comes to mind because it takes a value on the left and a function on the right. But doesn't have the nice things you get with monads. These become non-issues if functions either explicitly accept and bind one argument at a time (currying/incremental binding), or if a @curried decorator is used. Or Gregory's @arrow decorator (which I've just now discovered!). So arg -> spam() -> cheese(kind = 'gouda') -> eggs() would be (with composition) written as compose(spam, cheese(kind='gouda'), eggs)(arg) If you want to wrap `cheese` to avoid the awkwardness, you can do >>> cheese_kind = lambda kind: lambda *args, kind=kind, **kwargs: cheese(kind=kind)(*args, **kwargs) >>> compose(spam, cheese_kind('gouda'), eggs)(arg) Then if you don't like two explicit sequential function calls, i.e. f(x)(y), there are ways to sugar it up, like def single_value_pipeline(fn): def wrapper(x, **kwargs): return compose(lambda *_: x, *fn(**kwargs))() return wrapper Which would hide `compose` altogether (very anti-PEP8!) and allow binding keyword names across the pipeline: @single_value_pipeline def breakfast(cheese_kind='gouda'): return (spam, cheese(kind=cheese_kind), eggs) breakfast(arg, kind='something other than gouda') # what is gouda anyway?! (`single_value_pipeline` is perhaps a bad name though...) ________________________________________ From: Python-ideas on behalf of Steven D'Aprano Sent: Sunday, May 10, 2015 9:44 PM To: python-ideas at python.org Subject: Re: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] On Sun, May 10, 2015 at 11:06:21PM +0300, Koos Zevenhoven wrote: > So, -> would be an operator with a precedence similar to .attribute > access (but lower than .attribute): Dot . is not an operator. If I remember correctly, the docs describe it as a delimiter. > # The simple definition of what it does: > arg->func # equivalent to functools.partial(func, arg) I believe you require that -> is applied before function application, so arg->func # returns partial(func, arg) arg->func(x) # returns partial(func, arg)(x) arg->(func(x)) # returns partial(func(x), arg) > This would allow for instance: > arg -> spam() -> cheese(kind = 'gouda') -> eggs() I am having a lot of difficulty seeing that as anything other than "call spam with no arguments, then apply arg to the result". But, teasing it apart with the precedence I established above: arg->spam() # returns partial(spam, arg)() == spam(arg) """ -> cheese # returns partial(cheese, spam(arg)) """ (kind='gouda') # returns partial(cheese, spam(arg))(kind='gouda') # == cheese(spam(arg), kind='gouda') """ -> eggs # returns partial(eggs, cheese(spam(arg), kind='gouda')) """ () # calls the previous partial, with no arguments, giving: # partial(eggs, cheese(spam(arg), kind='gouda'))() # == eggs(cheese(spam(arg), kind='gouda')) > which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda')) Amazingly, you are correct! :-) I think this demonstrates an abuse of partial and the sort of thing that gives functional idioms a bad name. To tease this apart and understand what it does was very difficult to me. And I don't understand the point of creating partial applications that you are then immediately going to call, that just adds an extra layer of indirection to slow the code down. If you write partial(len, 'foo')() instead of just len('foo'), something has gone drastically wrong. So instead of arg->spam()->cheese(kind='gouda')->eggs() which includes *three* partial objects which are immediately called, wouldn't it be easier to just call the functions in the first place? eggs(cheese(spam(arg), kind='gouda')) It will certainly be more efficient! Let's run through a simple chain with no parens: a -> b # partial(b, a) a -> b -> c # partial(c, partial(b, a)) a -> b -> c -> d # partial(d, partial(c, partial(b, a))) I'm not seeing why I would want to write something like that. Let's apply multiple arguments: a -> func # partial(func, a) b -> (a -> func) # partial(partial(func, a), b) c -> (b -> (a -> func)) # partial(partial(partial(func, a), b), c) Perhaps a sufficiently clever implementation of partial could optimize partial(partial(func, a), b) to just a single layer of indirection partial(func, a, b), so it's not *necessarily* as awful as it looks. (I would expect a function composition operator to do the same.) Note that we have to write the second argument first, and bracket the second arrow clause. Writing it the "obvious" way is wrong: a -> b -> func # partial(func, partial(b, a)) I think this is imaginative but hard to read, hard to understand, hard to use correctly, inefficient, and even if used correctly, there are not very many times that you would need it. > Or even together together with the proposed @ composition: > rms = root @ mean @ square->map # for an iterable non-numpy argument I think that a single arrow may be reasonable as syntactic sugar for partial, but once you start chaining them, it all falls apart into a mess. That, in my mind, is a sign that the idea doesn't scale. We can chain dots with no problem: fe.fi.fo.fum and function calls in numerous ways: foo(bar(baz())) foo(bar)(baz) and although they can get hard to read just because of the sheer number of components, they are not conceptually difficult. But chaining arrows is conceptually difficult even with as few as two arrows. I think the problem here is that partial application is an N-ary operation. This is not Haskell where single-argument currying is enforced everywhere! You're trying to perform something which conceptually takes N arguments partial(func, 1, 2, 3, ..., N) using only a operator which can only take two arguments a->b. Things are going to get messy. > And here's something I find quite interesting. Together with > @singledispatch from 3.4 (or possibly an enhanced version using type > annotations in the future?), one could add 'third-party methods' to > classes in other libraries without monkey patching. A dummy example: > > from numpy import array > my_list = [1,2,3] > my_array = array(my_list) > my_mean = my_array.mean() # This currently works in numpy > > from rmslib import rms > my_rms = my_array->rms() # efficient rms for numpy arrays > my_other_rms = my_list->rms() # rms that works on any iterable That looks cute, but isn't very interesting. Effectively, you've invented a new (and less efficient) syntax for calling a function: spam->eggs(cheese) # eggs(spam, cheese) It's less efficient because it builds a partial object first, so instead of one call you end up with two, and a temporary object that gets thrown away immediately after it is used. Yes, you could keep the partial object around, but as your example shows, you don't. And because it is cute, people will write: a->func(), b->func(), c->func() and not realise that it creates three partial functions before calling them. Writing: func(a), func(b), func(c) will avoid that needless overhead. -- Steve _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ From guettliml at thomas-guettler.de Mon May 11 10:42:16 2015 From: guettliml at thomas-guettler.de (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=) Date: Mon, 11 May 2015 10:42:16 +0200 Subject: [Python-ideas] Policy for altering sys.path In-Reply-To: <554A1F8C.1040005@thomas-guettler.de> References: <554A1F8C.1040005@thomas-guettler.de> Message-ID: <55506B68.90504@thomas-guettler.de> Hi, for this case, the sys.path modification was solved like this: -sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path +sys.path[:] = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path See https://github.com/pypa/pip/issues/2759 Am 06.05.2015 um 16:05 schrieb Thomas G?ttler: > I am missing a policy how sys.path should be altered. > > We run a custom sub class of list in sys.path. We set it in sitecustomize.py > > This instance get replace by a common list in lines like this: > > sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path > > The above line is from pip, it similar things happen in a lot of packages. > > Before trying to solve this with code, I think the python community should agree an a policy for altering sys.path. > > What can I do to this done? > > We use Python 2.7. > > > Related: http://bugs.python.org/issue24135 > > Regards, > Thomas G?ttler > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From guido at python.org Mon May 11 16:41:01 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 11 May 2015 07:41:01 -0700 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info> <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> Message-ID: As long as I'm "in charge" the chances of this (or anything like it) being accepted into Python are zero. I get a headache when I try to understand code that uses function composition, and I end up having to laboriously rewrite it using more traditional call notation before I move on to understanding what it actually does. Python is not Haskell, and perhaps more importantly, Python users are not like Haskel users. Either way, what may work out beautifully in Haskell will be like a fish out of water in Python. I understand that it's fun to try to sole this puzzle, but evolving Python is more than solving puzzles. Enjoy debating the puzzle, but in the end Python will survive without the solution. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From apieum at gmail.com Mon May 11 18:13:28 2015 From: apieum at gmail.com (Gregory Salvan) Date: Mon, 11 May 2015 18:13:28 +0200 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info> <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> Message-ID: I don't want to insist and I respect your point of view, I just want to give a simplified real life example to show that function composition can be less painful than another syntax. When validating a lot of data you may want to reuse parts of already writen validators. It can also be a mess to test complex data validation. You can reduce this mess and reuse parts of your code by writing atomic validators and compose them. # sorry for using my own lib, but if I make no mistakes this code functions, so... import re from lawvere import curry # curry is an arrow without type checking, inherits composition, mutiple dispatch user_match = re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match domain_match = re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match strict_user_match = re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match @curry def is_string(value): assert isinstance(value, str), '%s is not a string' %value return value @curry def apply_until_char(func, char, value): func(value[:value.index(char)]) return value @curry def apply_from_char(func, char, value): func(value[value.index(char) + 1:]) return value @curry def has_char(char, value): assert value.count(char) == 1 return value @curry def assert_ends_with(text, value): assert value.endswith(text), '%s do not ends with %s' % (value, text) return value @curry def assert_user(user): assert user_match(user) is not None, '%s is not a valid user name' % value return user @curry def assert_strict_user(user): assert strict_user_match(user) is not None, '%s is not a valid strict user' % value return user @curry def assert_domain(domain): assert domain_match(domain) is not None, '%s is not a valid domain name' % value return domain # currying (be made with partial) has_user = apply_until_char(assert_user, '@') has_strict_user = apply_until_char(assert_strict_user, '@') has_domain = apply_from_char(assert_domain, '@') # composition: is_email_address = is_string >> has_char('@') >> has_user >> has_domain is_strict_email_address = is_string >> has_char('@') >> has_strict_user >> has_domain # we just want org adresses ? is_org_addess = is_email_address >> assert_ends_with('.org') I found a lot of interest in this syntax, mainly for testing purpose, readability and maintenability of code. No matters if I'm a fish out of python waters. :) 2015-05-11 16:41 GMT+02:00 Guido van Rossum : > As long as I'm "in charge" the chances of this (or anything like it) being > accepted into Python are zero. I get a headache when I try to understand > code that uses function composition, and I end up having to laboriously > rewrite it using more traditional call notation before I move on to > understanding what it actually does. Python is not Haskell, and perhaps > more importantly, Python users are not like Haskel users. Either way, what > may work out beautifully in Haskell will be like a fish out of water in > Python. > > I understand that it's fun to try to sole this puzzle, but evolving Python > is more than solving puzzles. Enjoy debating the puzzle, but in the end > Python will survive without the solution. > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From larocca at abiresearch.com Mon May 11 19:08:54 2015 From: larocca at abiresearch.com (Douglas La Rocca) Date: Mon, 11 May 2015 17:08:54 +0000 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info> <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> , Message-ID: Operator overloading (>>) has intuitive readability but in my experience it's better have functions remain "ordinary" functions, not class instances so you know what to expect regarding the type and so on. The other downside is that with (>>) only the functions you wrap can play together. Leaving aside the readability concern, the really major problem is that your tracebacks are so badly mangled. And if your implementation of the composition function uses recursion it gets even worse. You also lose the benefits of reflection/inspection--for example, with the code below, what happens if I call help ?? in ipython on `is_email_address`? ________________________________ From: Python-ideas on behalf of Gregory Salvan Sent: Monday, May 11, 2015 12:13 PM To: Guido van Rossum Cc: python-ideas at python.org Subject: Re: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] I don't want to insist and I respect your point of view, I just want to give a simplified real life example to show that function composition can be less painful than another syntax. When validating a lot of data you may want to reuse parts of already writen validators. It can also be a mess to test complex data validation. You can reduce this mess and reuse parts of your code by writing atomic validators and compose them. # sorry for using my own lib, but if I make no mistakes this code functions, so... import re from lawvere import curry # curry is an arrow without type checking, inherits composition, mutiple dispatch user_match = re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match domain_match = re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match strict_user_match = re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match @curry def is_string(value): assert isinstance(value, str), '%s is not a string' %value return value @curry def apply_until_char(func, char, value): func(value[:value.index(char)]) return value @curry def apply_from_char(func, char, value): func(value[value.index(char) + 1:]) return value @curry def has_char(char, value): assert value.count(char) == 1 return value @curry def assert_ends_with(text, value): assert value.endswith(text), '%s do not ends with %s' % (value, text) return value @curry def assert_user(user): assert user_match(user) is not None, '%s is not a valid user name' % value return user @curry def assert_strict_user(user): assert strict_user_match(user) is not None, '%s is not a valid strict user' % value return user @curry def assert_domain(domain): assert domain_match(domain) is not None, '%s is not a valid domain name' % value return domain # currying (be made with partial) has_user = apply_until_char(assert_user, '@') has_strict_user = apply_until_char(assert_strict_user, '@') has_domain = apply_from_char(assert_domain, '@') # composition: is_email_address = is_string >> has_char('@') >> has_user >> has_domain is_strict_email_address = is_string >> has_char('@') >> has_strict_user >> has_domain # we just want org adresses ? is_org_addess = is_email_address >> assert_ends_with('.org') I found a lot of interest in this syntax, mainly for testing purpose, readability and maintenability of code. No matters if I'm a fish out of python waters. :) 2015-05-11 16:41 GMT+02:00 Guido van Rossum >: As long as I'm "in charge" the chances of this (or anything like it) being accepted into Python are zero. I get a headache when I try to understand code that uses function composition, and I end up having to laboriously rewrite it using more traditional call notation before I move on to understanding what it actually does. Python is not Haskell, and perhaps more importantly, Python users are not like Haskel users. Either way, what may work out beautifully in Haskell will be like a fish out of water in Python. I understand that it's fun to try to sole this puzzle, but evolving Python is more than solving puzzles. Enjoy debating the puzzle, but in the end Python will survive without the solution. -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon May 11 19:45:35 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 11 May 2015 13:45:35 -0400 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info> <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> Message-ID: On 5/11/2015 10:41 AM, Guido van Rossum wrote: > As long as I'm "in charge" the chances of this (or anything like it) > being accepted into Python are zero. I have been waiting for this response (which I agree with). By 'this', I presume you mean either more new syntax other than '@', or official support of '@' other than for matrix or array multiplication. > I get a headache when I try to > understand code that uses function composition, Function composition is the *process* of using the output of one function (broadly speaking) as the input (or one of the inputs) of another function. All python code does this. The discussion is about adding a composition operator or function or notation (and accoutrements) as a duplicate *syntax* for expressing composition. As I posted before, mathematician's usually define the operator in terms of call syntax, which can also express composition. > and I end up having to > laboriously rewrite it using more traditional call notation before I > move on to understanding what it actually does. Mathematicians do rewrites also ;-). The proof of (f @ g) @ h = f @ (g @ h) (associativity) is that ((f @ g) @ h)(x) and (f @ (g @ h))(x) can both be rewritten as f(g(h(x))). > I understand that it's fun to try to sole this puzzle, but evolving > Python is more than solving puzzles. Leaving aside the problem of stack overflow, one can rewrite "for x in iterable: process x" to perform the same computational process with recursive syntax (using iter and next and catching StopIteration). But one would have to be really stuck on the recursive syntax, as opposed to the inductive process, to use it in practice. -- Terry Jan Reedy From apieum at gmail.com Mon May 11 19:46:00 2015 From: apieum at gmail.com (Gregory Salvan) Date: Mon, 11 May 2015 19:46:00 +0200 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info> <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> Message-ID: 'is_email_address' is a special tuple which contains functions. (is_email_address[0] returns is_string) For help, it's a feature I've not implemented but it's easy to return the help of each function, plus details as each function has an object representing it's signature. For traceback mangling, I don't see what is the problem. When you call is_email_address(something) it pretty like if you've called: def is_email_address(value): is_string(value) has_char('@', value) has_user(value) has_domain(value) return value 2015-05-11 19:08 GMT+02:00 Douglas La Rocca : > Operator overloading (>>) has intuitive readability but in my experience > it's better have functions remain "ordinary" functions, not class > instances so you know what to expect regarding the type and so on. The > other downside is that with (>>) only the functions you wrap can play > together. > > > Leaving aside the readability concern, the really major problem is that > your tracebacks are so badly mangled. And if your implementation of > the composition function uses recursion it gets even worse. > > > You also lose the benefits of reflection/inspection--for example, with > the code below, what happens if I call help ?? in ipython on ` > is_email_address`? > > > ------------------------------ > *From:* Python-ideas abiresearch.com at python.org> on behalf of Gregory Salvan > *Sent:* Monday, May 11, 2015 12:13 PM > *To:* Guido van Rossum > *Cc:* python-ideas at python.org > *Subject:* Re: [Python-ideas] Partial operator (and 'third-party methods' > and 'piping') [was Re: Function composition (was no subject)] > > I don't want to insist and I respect your point of view, I just want > to give a simplified real life example to show that function composition > can be less painful than another syntax. > > When validating a lot of data you may want to reuse parts of already > writen validators. It can also be a mess to test complex data validation. > You can reduce this mess and reuse parts of your code by writing atomic > validators and compose them. > > # sorry for using my own lib, but if I make no mistakes this code > functions, so... > > import re > from lawvere import curry # curry is an arrow without type checking, > inherits composition, mutiple dispatch > > user_match = > re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match > domain_match = > re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match > strict_user_match = > re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match > > @curry > def is_string(value): > assert isinstance(value, str), '%s is not a string' %value > return value > > @curry > def apply_until_char(func, char, value): > func(value[:value.index(char)]) > return value > > @curry > def apply_from_char(func, char, value): > func(value[value.index(char) + 1:]) > return value > > @curry > def has_char(char, value): > assert value.count(char) == 1 > return value > > @curry > def assert_ends_with(text, value): > assert value.endswith(text), '%s do not ends with %s' % (value, text) > return value > > @curry > def assert_user(user): > assert user_match(user) is not None, '%s is not a valid user name' % > value > return user > > @curry > def assert_strict_user(user): > assert strict_user_match(user) is not None, '%s is not a valid strict > user' % value > return user > > @curry > def assert_domain(domain): > assert domain_match(domain) is not None, '%s is not a valid domain > name' % value > return domain > > # currying (be made with partial) > has_user = apply_until_char(assert_user, '@') > has_strict_user = apply_until_char(assert_strict_user, '@') > has_domain = apply_from_char(assert_domain, '@') > > # composition: > is_email_address = is_string >> has_char('@') >> has_user >> has_domain > is_strict_email_address = is_string >> has_char('@') >> has_strict_user >> > has_domain > > # we just want org adresses ? > is_org_addess = is_email_address >> assert_ends_with('.org') > > > I found a lot of interest in this syntax, mainly for testing purpose, > readability and maintenability of code. > No matters if I'm a fish out of python waters. :) > > > > > 2015-05-11 16:41 GMT+02:00 Guido van Rossum : > >> As long as I'm "in charge" the chances of this (or anything like it) >> being accepted into Python are zero. I get a headache when I try to >> understand code that uses function composition, and I end up having to >> laboriously rewrite it using more traditional call notation before I move >> on to understanding what it actually does. Python is not Haskell, and >> perhaps more importantly, Python users are not like Haskel users. Either >> way, what may work out beautifully in Haskell will be like a fish out of >> water in Python. >> >> I understand that it's fun to try to sole this puzzle, but evolving >> Python is more than solving puzzles. Enjoy debating the puzzle, but in the >> end Python will survive without the solution. >> >> -- >> --Guido van Rossum (python.org/~guido) >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon May 11 19:49:10 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 11 May 2015 10:49:10 -0700 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info> <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> Message-ID: On Mon, May 11, 2015 at 10:45 AM, Terry Reedy wrote: > On 5/11/2015 10:41 AM, Guido van Rossum wrote: > >> As long as I'm "in charge" the chances of this (or anything like it) >> being accepted into Python are zero. >> > > I have been waiting for this response (which I agree with). > By 'this', I presume you mean either more new syntax other than '@', or > official support of '@' other than for matrix or array multiplication. > Or even adding a compose() function (or similar) to the stdlib. I'm sorry, I don't have time to argue about this. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon May 11 19:54:50 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 11 May 2015 13:54:50 -0400 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info> <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> Message-ID: On 5/11/2015 12:13 PM, Gregory Salvan wrote: > I don't want to insist and I respect your point of view, I just want to > give a simplified real life example to show that function composition > can be less painful than another syntax. > > When validating a lot of data you may want to reuse parts of already > writen validators. It can also be a mess to test complex data validation. > You can reduce this mess and reuse parts of your code by writing atomic > validators and compose them. > > # sorry for using my own lib, but if I make no mistakes this code > functions, so... > > import re > from lawvere import curry # curry is an arrow without type checking, > inherits composition, mutiple dispatch > > user_match = > re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match > domain_match = > re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match > strict_user_match = > re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match > > @curry > def is_string(value): > assert isinstance(value, str), '%s is not a string' %value > return value > > @curry > def apply_until_char(func, char, value): > func(value[:value.index(char)]) > return value > > @curry > def apply_from_char(func, char, value): > func(value[value.index(char) + 1:]) > return value > > @curry > def has_char(char, value): > assert value.count(char) == 1 > return value > > @curry > def assert_ends_with(text, value): > assert value.endswith(text), '%s do not ends with %s' % (value, text) > return value > > @curry > def assert_user(user): > assert user_match(user) is not None, '%s is not a valid user name' > % value > return user > > @curry > def assert_strict_user(user): > assert strict_user_match(user) is not None, '%s is not a valid > strict user' % value > return user > > @curry > def assert_domain(domain): > assert domain_match(domain) is not None, '%s is not a valid domain > name' % value > return domain > > # currying (be made with partial) > has_user = apply_until_char(assert_user, '@') > has_strict_user = apply_until_char(assert_strict_user, '@') > has_domain = apply_from_char(assert_domain, '@') > > # composition: > is_email_address = is_string >> has_char('@') >> has_user >> has_domain > is_strict_email_address = is_string >> has_char('@') >> has_strict_user > >> has_domain > > # we just want org adresses ? > is_org_addess = is_email_address >> assert_ends_with('.org') > > > I found a lot of interest in this syntax, mainly for testing purpose, > readability and maintenability of code. > No matters if I'm a fish out of python waters. :) You could do much the same with standard syntax by writing an str subclass with multiple methods that return self, and then chain together the method calls. class VString: # verifiable string def has_char_once(self, char): assert self.count(char) == 1 return self ... def is_email_address(self): # or make standalone return self.has_char_once('@').has_user().has_domain() data = VString(input()) data.is_email() -- Terry Jan Reedy From tjreedy at udel.edu Mon May 11 20:21:28 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 11 May 2015 14:21:28 -0400 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info> <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> Message-ID: On 5/11/2015 1:49 PM, Guido van Rossum wrote: > On Mon, May 11, 2015 at 10:45 AM, Terry Reedy > > wrote: > > On 5/11/2015 10:41 AM, Guido van Rossum wrote: > > As long as I'm "in charge" the chances of this (or anything like it) > being accepted into Python are zero. > > I have been waiting for this response (which I agree with). > By 'this', I presume you mean either more new syntax other than '@', > or official support of '@' other than for matrix or array > multiplication. > > Or even adding a compose() function (or similar) to the stdlib. > I'm sorry, I don't have time to argue about this. -- Terry Jan Reedy From abarnert at yahoo.com Mon May 11 20:25:24 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 11 May 2015 18:25:24 +0000 (UTC) Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: Message-ID: <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com> On Monday, May 11, 2015 9:15 AM, Gregory Salvan wrote: >I don't want to insist and I respect your point of view, I just want to give a simplified real life example to show that function composition can be less painful than another syntax. OK, let's compare your example to a Pythonic implementation of the same thing. import re ruser = re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$") rdomain = re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$") rstrict_user = re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$") def is_email_address(addr): user, domain = addr.split('@', 1) return ruser.match(user) and rdomain.match(domain) def is_strict_email_address(addr): user, domain = addr.split('@', 1) return rstrictuser.match(user) and rdomain.match(domain) def is_org_address(addr): return is_email_address(addr) and addr.ends_with('.org') (An even better solution, given that you're already using regexps, might be to just use a single regexp with named groups for the user or strict-user, full domain, and TLD? but I've left yours alone.) Far from being more painful, the Pythonic version is easier to write, easier to read, easier to debug, shorter, and understandable to even a novice, without having to rewrite anything in your head. It also handles invalid input by returning failure values and/or raising appropriate exceptions rather than asserting and exiting. And it's almost certainly going to be significantly more efficient. And it works with any string-like type (that is, any type that has a .split method and works with re.match). And if you have to debug something, you will have, e.g., values named user and domain, rather than both being named value at different levels on the call stack. If you really want to come up with a convincing example for your idea, I'd take an example out of Learn You a Haskell or another book or tutorial and translate that to Python with your library. I suspect it would still have some of the same problems, but this example wouldn't even really be good in Haskell, so it's just making it harder to see why anyone would want anything like it. And by offering this as the response to Guido's "You're never going to convince me," well, if he _was_ still reading this thread with an open mind, he probably isn't anymore (although, to be honest, he probably wasn't reading it anyway). >import re > >from lawvere import curry # curry is an arrow without type checking, inherits composition, mutiple dispatch > >user_match = re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match >domain_match = re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match >strict_user_match = re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match > >@curry>def is_string(value): > assert isinstance(value, str), '%s is not a string' %value > return value > >@curry >def apply_until_char(func, char, value): > func(value[:value.index(char)]) > return value > >@curry >def apply_from_char(func, char, value): > func(value[value.index(char) + 1:]) > return value > >@curry > >def has_char(char, value): > assert value.count(char) == 1 > return value > >@curry >def assert_ends_with(text, value): > assert value.endswith(text), '%s do not ends with %s' % (value, text) > return value > >@curry >def assert_user(user): > assert user_match(user) is not None, '%s is not a valid user name' % value > return user > >@curry >def assert_strict_user(user): > assert strict_user_match(user) is not None, '%s is not a valid strict user' % value > return user > >@curry >def assert_domain(domain): > assert domain_match(domain) is not None, '%s is not a valid domain name' % value > return domain > ># currying (be made with partial) > >has_user = apply_until_char(assert_user, '@') > >has_strict_user = apply_until_char(assert_strict_user, '@') > >has_domain = apply_from_char(assert_domain, '@') > > ># composition: > >is_email_address = is_string >> has_char('@') >> has_user >> has_domain > >is_strict_email_address = is_string >> has_char('@') >> has_strict_user >> has_domain > > ># we just want org adresses ? > >is_org_addess = is_email_address >> assert_ends_with('.org') > > > > >I found a lot of interest in this syntax, mainly for testing purpose, readability and maintenability of code. > >No matters if I'm a fish out of python waters. :) > > > > > > > > >2015-05-11 16:41 GMT+02:00 Guido van Rossum : > >As long as I'm "in charge" the chances of this (or anything like it) being accepted into Python are zero. I get a headache when I try to understand code that uses function composition, and I end up having to laboriously rewrite it using more traditional call notation before I move on to understanding what it actually does. Python is not Haskell, and perhaps more importantly, Python users are not like Haskel users. Either way, what may work out beautifully in Haskell will be like a fish out of water in Python. >> >>I understand that it's fun to try to sole this puzzle, but evolving Python is more than solving puzzles. Enjoy debating the puzzle, but in the end Python will survive without the solution. >> >> >> >>-- >> >>--Guido van Rossum (python.org/~guido) >>_______________________________________________ >>Python-ideas mailing list >>Python-ideas at python.org >>https://mail.python.org/mailman/listinfo/python-ideas >>Code of Conduct: http://python.org/psf/codeofconduct/ >> > > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ > > From abarnert at yahoo.com Mon May 11 20:43:17 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 11 May 2015 18:43:17 +0000 (UTC) Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: Message-ID: <1134540450.4344373.1431369797892.JavaMail.yahoo@mail.yahoo.com> On Monday, May 11, 2015 10:46 AM, Terry Reedy wrote: > On 5/11/2015 10:41 AM, Guido van Rossum wrote: >> As long as I'm "in charge" the chances of this (or anything > like it) >> being accepted into Python are zero. > > I have been waiting for this response (which I agree with). > By 'this', I presume you mean either more new syntax other than > '@', or > official support of '@' other than for matrix or array multiplication. I don't think it's worth trying to push for this directly in Python, even with the @ operator or a functools.compose function, even if someone thinks they've solved all the problems. If anyone really wants this feature, the obvious thing to do at this point is to prepare a NumPy-wrapper library that adds __matmul__ and __rmatmul__ to ufuncs, and some examples, convince the NumPy team to accept it, and then, once it becomes idiomatic in NumPy code, come back to python-ideas. Maybe there is nothing about function composition which inherently requires broadcast-style operations to make it useful, but the only decent examples anyone's come up with in this thread (root-mean-square) all do, which has to mean something. And the NumPy core devs haven't explicitly announced that they don't want to be convinced. >> I get a headache when I try to >> understand code that uses function composition, > > Function composition is the *process* of using the output of one > function (broadly speaking) as the input (or one of the inputs) of > another function. All python code does this. The discussion is about > adding a composition operator or function or notation (and > accoutrements) as a duplicate *syntax* for expressing composition. As I > posted before, mathematician's usually define the operator in terms of > call syntax, which can also express composition. > >> and I end up having to >> laboriously rewrite it using more traditional call notation before I >> move on to understanding what it actually does. > > Mathematicians do rewrites also ;-). > The proof of (f @ g) @ h = f @ (g @ h) (associativity) is that > ((f @ g) @ h)(x) and (f @ (g @ h))(x) can both be rewritten as > f(g(h(x))). > >> I understand that it's fun to try to sole this puzzle, but evolving >> Python is more than solving puzzles. > > Leaving aside the problem of stack overflow, one can rewrite "for x in > iterable: process x" to perform the same computational process with > recursive syntax (using iter and next and catching StopIteration). But > one would have to be really stuck on the recursive syntax, as opposed to > the inductive process, to use it in practice. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From levkivskyi at gmail.com Mon May 11 21:00:30 2015 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Mon, 11 May 2015 21:00:30 +0200 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] Message-ID: Dear Guido, 1. The longest program that I had written in Haskell was 4 lines. 2. You don't need to accept anything, everything is already accepted. Namely, @one @two @three def fun(x): ... already means fun = one(two(three(fun))) Also now we have @ operator. 3. My idea in its current state is to overload @ to allow piping of arbitrary transformers of iterables, not only multiplication of matrices. Semantics is the same: matrix is something that takes a vector and returns a vector and multiplication of matrices is exactly "piping" the corresponding transformations. I now think one does not need any partial applications or something similar. The rules should be the same as for decorators. If I write: @deco(arg) def fun(x): ... it is my duty to be sure that deco(arg) evaluates to something that takes one function and returns one function. Same should be for vector-transformers, each should be "one vector in - one out". 4. Since you don't want this in stdlib, let's move this discussion to Numpy lists. 5. I never thought that evolving Python is solving puzzles. My intention was helping people that might have same problems with me. If it is not the best place to do so, sorry for disturbing. Date: Mon, 11 May 2015 07:41:01 -0700 > From: Guido van Rossum > To: "python-ideas at python.org" > Subject: Re: [Python-ideas] Partial operator (and 'third-party > methods' and 'piping') [was Re: Function composition (was no > subject)] > Message-ID: > z-XyLCosTpp1g at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > As long as I'm "in charge" the chances of this (or anything like it) being > accepted into Python are zero. I get a headache when I try to understand > code that uses function composition, and I end up having to laboriously > rewrite it using more traditional call notation before I move on to > understanding what it actually does. Python is not Haskell, and perhaps > more importantly, Python users are not like Haskel users. Either way, what > may work out beautifully in Haskell will be like a fish out of water in > Python. > > I understand that it's fun to try to sole this puzzle, but evolving Python > is more than solving puzzles. Enjoy debating the puzzle, but in the end > Python will survive without the solution. > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From apieum at gmail.com Mon May 11 21:44:19 2015 From: apieum at gmail.com (Gregory Salvan) Date: Mon, 11 May 2015 21:44:19 +0200 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com> References: <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com> Message-ID: Andrew Barnet we disagree. In your example you have no information about if error comes from domain, user name or domain extension... Writing a big regexp with group... really ? it is easy to maintain, test and reuse ? and for a novice ? muliply this by thousands of validators and their respectives tests. I call that a mess and inside a project I lead, I will not accept it. Even in Haskell people rarelly use arrows, I don't criticize this choice as arrows comes from category theory and we are used to think inside ZF set theory. Somes prefer a syntax over another, there is not a good answer, but this also mean there is no irrelevant answer. In fact both exists and choosing within the case is never easy. Thinking the same way for each problem is also wrong, so I will never pretend to resolve every problem with a single lib. Now I understand this idea is not a priority, I've seen more and more threads about functional tools, I regret we can't find a solution but effectively this absence of solution now can't convince me to stop digging other paths. This is not irrespectuous. 2015-05-11 20:25 GMT+02:00 Andrew Barnert : > On Monday, May 11, 2015 9:15 AM, Gregory Salvan wrote: > > > >I don't want to insist and I respect your point of view, I just want to > give a simplified real life example to show that function composition can > be less painful than another syntax. > > OK, let's compare your example to a Pythonic implementation of the same > thing. > > import re > > ruser = > re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$") > rdomain = > re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$") > rstrict_user = re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$") > > > def is_email_address(addr): > user, domain = addr.split('@', 1) > return ruser.match(user) and rdomain.match(domain) > > def is_strict_email_address(addr): > user, domain = addr.split('@', 1) > return rstrictuser.match(user) and rdomain.match(domain) > > > def is_org_address(addr): > return is_email_address(addr) and addr.ends_with('.org') > > (An even better solution, given that you're already using regexps, might > be to just use a single regexp with named groups for the user or > strict-user, full domain, and TLD? but I've left yours alone.) > > Far from being more painful, the Pythonic version is easier to write, > easier to read, easier to debug, shorter, and understandable to even a > novice, without having to rewrite anything in your head. It also handles > invalid input by returning failure values and/or raising appropriate > exceptions rather than asserting and exiting. And it's almost certainly > going to be significantly more efficient. And it works with any string-like > type (that is, any type that has a .split method and works with re.match). > And if you have to debug something, you will have, e.g., values named user > and domain, rather than both being named value at different levels on the > call stack. > > If you really want to come up with a convincing example for your idea, I'd > take an example out of Learn You a Haskell or another book or tutorial and > translate that to Python with your library. I suspect it would still have > some of the same problems, but this example wouldn't even really be good in > Haskell, so it's just making it harder to see why anyone would want > anything like it. And by offering this as the response to Guido's "You're > never going to convince me," well, if he _was_ still reading this thread > with an open mind, he probably isn't anymore (although, to be honest, he > probably wasn't reading it anyway). > > >import re > > > >from lawvere import curry # curry is an arrow without type checking, > inherits composition, mutiple dispatch > > > >user_match = > re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match > >domain_match = > re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match > >strict_user_match = > re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match > > > >@curry>def is_string(value): > > assert isinstance(value, str), '%s is not a string' %value > > return value > > > >@curry > >def apply_until_char(func, char, value): > > func(value[:value.index(char)]) > > return value > > > >@curry > >def apply_from_char(func, char, value): > > func(value[value.index(char) + 1:]) > > return value > > > >@curry > > > >def has_char(char, value): > > assert value.count(char) == 1 > > return value > > > >@curry > >def assert_ends_with(text, value): > > assert value.endswith(text), '%s do not ends with %s' % (value, text) > > return value > > > >@curry > >def assert_user(user): > > assert user_match(user) is not None, '%s is not a valid user name' % > value > > return user > > > >@curry > >def assert_strict_user(user): > > assert strict_user_match(user) is not None, '%s is not a valid strict > user' % value > > return user > > > >@curry > >def assert_domain(domain): > > assert domain_match(domain) is not None, '%s is not a valid domain > name' % value > > return domain > > > ># currying (be made with partial) > > > >has_user = apply_until_char(assert_user, '@') > > > >has_strict_user = apply_until_char(assert_strict_user, '@') > > > >has_domain = apply_from_char(assert_domain, '@') > > > > > ># composition: > > > >is_email_address = is_string >> has_char('@') >> has_user >> has_domain > > > >is_strict_email_address = is_string >> has_char('@') >> has_strict_user > >> has_domain > > > > > ># we just want org adresses ? > > > >is_org_addess = is_email_address >> assert_ends_with('.org') > > > > > > > > > >I found a lot of interest in this syntax, mainly for testing purpose, > readability and maintenability of code. > > > >No matters if I'm a fish out of python waters. :) > > > > > > > > > > > > > > > > > >2015-05-11 16:41 GMT+02:00 Guido van Rossum : > > > >As long as I'm "in charge" the chances of this (or anything like it) > being accepted into Python are zero. I get a headache when I try to > understand code that uses function composition, and I end up having to > laboriously rewrite it using more traditional call notation before I move > on to understanding what it actually does. Python is not Haskell, and > perhaps more importantly, Python users are not like Haskel users. Either > way, what may work out beautifully in Haskell will be like a fish out of > water in Python. > >> > >>I understand that it's fun to try to sole this puzzle, but evolving > Python is more than solving puzzles. Enjoy debating the puzzle, but in the > end Python will survive without the solution. > >> > >> > >> > >>-- > >> > >>--Guido van Rossum (python.org/~guido) > >>_______________________________________________ > >>Python-ideas mailing list > >>Python-ideas at python.org > >>https://mail.python.org/mailman/listinfo/python-ideas > >>Code of Conduct: http://python.org/psf/codeofconduct/ > >> > > > > > >_______________________________________________ > >Python-ideas mailing list > >Python-ideas at python.org > >https://mail.python.org/mailman/listinfo/python-ideas > >Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From apieum at gmail.com Mon May 11 23:59:56 2015 From: apieum at gmail.com (Gregory Salvan) Date: Mon, 11 May 2015 23:59:56 +0200 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com> Message-ID: In case you've not seen how it divides the volume of code you'll need to write, here are tests of "is_email_address": # What's an email address ? def test_it_is_a_string(self): assert is_string in is_email_address def test_it_has_a_user_name(self): assert has_user in is_email_address def test_it_contains_at(self): assert has_char('@') in is_email_address def test_it_has_a_domain_name(self): assert has_domain in is_email_address # answer: an email address is a string with a user name, char '@' and a domain name. @Teddy Reedy with a class you'll have to write more tests and abuse of inheritance. 2015-05-11 21:44 GMT+02:00 Gregory Salvan : > Andrew Barnet we disagree. > In your example you have no information about if error comes from domain, > user name or domain extension... > Writing a big regexp with group... really ? it is easy to maintain, test > and reuse ? and for a novice ? muliply this by thousands of validators and > their respectives tests. > I call that a mess and inside a project I lead, I will not accept it. > > Even in Haskell people rarelly use arrows, I don't criticize this choice > as arrows comes from category theory and we are used to think inside ZF set > theory. > Somes prefer a syntax over another, there is not a good answer, but this > also mean there is no irrelevant answer. > In fact both exists and choosing within the case is never easy. Thinking > the same way for each problem is also wrong, so I will never pretend to > resolve every problem with a single lib. > > Now I understand this idea is not a priority, I've seen more and more > threads about functional tools, I regret we can't find a solution but > effectively this absence of solution now can't convince me to stop digging > other paths. This is not irrespectuous. > > > > 2015-05-11 20:25 GMT+02:00 Andrew Barnert : > >> On Monday, May 11, 2015 9:15 AM, Gregory Salvan wrote: >> >> >> >I don't want to insist and I respect your point of view, I just want to >> give a simplified real life example to show that function composition can >> be less painful than another syntax. >> >> OK, let's compare your example to a Pythonic implementation of the same >> thing. >> >> import re >> >> ruser = >> re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$") >> rdomain = >> re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$") >> rstrict_user = re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$") >> >> >> def is_email_address(addr): >> user, domain = addr.split('@', 1) >> return ruser.match(user) and rdomain.match(domain) >> >> def is_strict_email_address(addr): >> user, domain = addr.split('@', 1) >> return rstrictuser.match(user) and rdomain.match(domain) >> >> >> def is_org_address(addr): >> return is_email_address(addr) and addr.ends_with('.org') >> >> (An even better solution, given that you're already using regexps, might >> be to just use a single regexp with named groups for the user or >> strict-user, full domain, and TLD? but I've left yours alone.) >> >> Far from being more painful, the Pythonic version is easier to write, >> easier to read, easier to debug, shorter, and understandable to even a >> novice, without having to rewrite anything in your head. It also handles >> invalid input by returning failure values and/or raising appropriate >> exceptions rather than asserting and exiting. And it's almost certainly >> going to be significantly more efficient. And it works with any string-like >> type (that is, any type that has a .split method and works with re.match). >> And if you have to debug something, you will have, e.g., values named user >> and domain, rather than both being named value at different levels on the >> call stack. >> >> If you really want to come up with a convincing example for your idea, >> I'd take an example out of Learn You a Haskell or another book or tutorial >> and translate that to Python with your library. I suspect it would still >> have some of the same problems, but this example wouldn't even really be >> good in Haskell, so it's just making it harder to see why anyone would want >> anything like it. And by offering this as the response to Guido's "You're >> never going to convince me," well, if he _was_ still reading this thread >> with an open mind, he probably isn't anymore (although, to be honest, he >> probably wasn't reading it anyway). >> >> >import re >> > >> >from lawvere import curry # curry is an arrow without type checking, >> inherits composition, mutiple dispatch >> > >> >user_match = >> re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match >> >domain_match = >> re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match >> >strict_user_match = >> re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match >> > >> >@curry>def is_string(value): >> > assert isinstance(value, str), '%s is not a string' %value >> > return value >> > >> >@curry >> >def apply_until_char(func, char, value): >> > func(value[:value.index(char)]) >> > return value >> > >> >@curry >> >def apply_from_char(func, char, value): >> > func(value[value.index(char) + 1:]) >> > return value >> > >> >@curry >> > >> >def has_char(char, value): >> > assert value.count(char) == 1 >> > return value >> > >> >@curry >> >def assert_ends_with(text, value): >> > assert value.endswith(text), '%s do not ends with %s' % (value, text) >> > return value >> > >> >@curry >> >def assert_user(user): >> > assert user_match(user) is not None, '%s is not a valid user name' % >> value >> > return user >> > >> >@curry >> >def assert_strict_user(user): >> > assert strict_user_match(user) is not None, '%s is not a valid >> strict user' % value >> > return user >> > >> >@curry >> >def assert_domain(domain): >> > assert domain_match(domain) is not None, '%s is not a valid domain >> name' % value >> > return domain >> > >> ># currying (be made with partial) >> > >> >has_user = apply_until_char(assert_user, '@') >> > >> >has_strict_user = apply_until_char(assert_strict_user, '@') >> > >> >has_domain = apply_from_char(assert_domain, '@') >> > >> > >> ># composition: >> > >> >is_email_address = is_string >> has_char('@') >> has_user >> has_domain >> > >> >is_strict_email_address = is_string >> has_char('@') >> has_strict_user >> >> has_domain >> > >> > >> ># we just want org adresses ? >> > >> >is_org_addess = is_email_address >> assert_ends_with('.org') >> > >> > >> > >> > >> >I found a lot of interest in this syntax, mainly for testing purpose, >> readability and maintenability of code. >> > >> >No matters if I'm a fish out of python waters. :) >> > >> > >> > >> > >> > >> > >> > >> > >> >2015-05-11 16:41 GMT+02:00 Guido van Rossum : >> > >> >As long as I'm "in charge" the chances of this (or anything like it) >> being accepted into Python are zero. I get a headache when I try to >> understand code that uses function composition, and I end up having to >> laboriously rewrite it using more traditional call notation before I move >> on to understanding what it actually does. Python is not Haskell, and >> perhaps more importantly, Python users are not like Haskel users. Either >> way, what may work out beautifully in Haskell will be like a fish out of >> water in Python. >> >> >> >>I understand that it's fun to try to sole this puzzle, but evolving >> Python is more than solving puzzles. Enjoy debating the puzzle, but in the >> end Python will survive without the solution. >> >> >> >> >> >> >> >>-- >> >> >> >>--Guido van Rossum (python.org/~guido) >> >>_______________________________________________ >> >>Python-ideas mailing list >> >>Python-ideas at python.org >> >>https://mail.python.org/mailman/listinfo/python-ideas >> >>Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> > >> > >> >_______________________________________________ >> >Python-ideas mailing list >> >Python-ideas at python.org >> >https://mail.python.org/mailman/listinfo/python-ideas >> >Code of Conduct: http://python.org/psf/codeofconduct/ >> > >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From apieum at gmail.com Tue May 12 00:12:21 2015 From: apieum at gmail.com (Gregory Salvan) Date: Tue, 12 May 2015 00:12:21 +0200 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com> Message-ID: Sorry the fun part: the more you write code the less you have to write tests. # what's a strict email address: def test_it_is_an_email_address_with_a_strict_user_name(self): assert is_email_address.replace(has_user, has_strict_user) == is_strict_email_address 2015-05-11 23:59 GMT+02:00 Gregory Salvan : > In case you've not seen how it divides the volume of code you'll need to > write, here are tests of "is_email_address": > > # What's an email address ? > def test_it_is_a_string(self): > assert is_string in is_email_address > > def test_it_has_a_user_name(self): > assert has_user in is_email_address > > def test_it_contains_at(self): > assert has_char('@') in is_email_address > > def test_it_has_a_domain_name(self): > assert has_domain in is_email_address > > # answer: an email address is a string with a user name, char '@' and a > domain name. > > @Teddy Reedy with a class you'll have to write more tests and abuse of > inheritance. > > > 2015-05-11 21:44 GMT+02:00 Gregory Salvan : > >> Andrew Barnet we disagree. >> In your example you have no information about if error comes from domain, >> user name or domain extension... >> Writing a big regexp with group... really ? it is easy to maintain, test >> and reuse ? and for a novice ? muliply this by thousands of validators and >> their respectives tests. >> I call that a mess and inside a project I lead, I will not accept it. >> >> Even in Haskell people rarelly use arrows, I don't criticize this choice >> as arrows comes from category theory and we are used to think inside ZF set >> theory. >> Somes prefer a syntax over another, there is not a good answer, but this >> also mean there is no irrelevant answer. >> In fact both exists and choosing within the case is never easy. Thinking >> the same way for each problem is also wrong, so I will never pretend to >> resolve every problem with a single lib. >> >> Now I understand this idea is not a priority, I've seen more and more >> threads about functional tools, I regret we can't find a solution but >> effectively this absence of solution now can't convince me to stop digging >> other paths. This is not irrespectuous. >> >> >> >> 2015-05-11 20:25 GMT+02:00 Andrew Barnert : >> >>> On Monday, May 11, 2015 9:15 AM, Gregory Salvan >>> wrote: >>> >>> >>> >I don't want to insist and I respect your point of view, I just want to >>> give a simplified real life example to show that function composition can >>> be less painful than another syntax. >>> >>> OK, let's compare your example to a Pythonic implementation of the same >>> thing. >>> >>> import re >>> >>> ruser = >>> re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$") >>> rdomain = >>> re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$") >>> rstrict_user = re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$") >>> >>> >>> def is_email_address(addr): >>> user, domain = addr.split('@', 1) >>> return ruser.match(user) and rdomain.match(domain) >>> >>> def is_strict_email_address(addr): >>> user, domain = addr.split('@', 1) >>> return rstrictuser.match(user) and rdomain.match(domain) >>> >>> >>> def is_org_address(addr): >>> return is_email_address(addr) and addr.ends_with('.org') >>> >>> (An even better solution, given that you're already using regexps, might >>> be to just use a single regexp with named groups for the user or >>> strict-user, full domain, and TLD? but I've left yours alone.) >>> >>> Far from being more painful, the Pythonic version is easier to write, >>> easier to read, easier to debug, shorter, and understandable to even a >>> novice, without having to rewrite anything in your head. It also handles >>> invalid input by returning failure values and/or raising appropriate >>> exceptions rather than asserting and exiting. And it's almost certainly >>> going to be significantly more efficient. And it works with any string-like >>> type (that is, any type that has a .split method and works with re.match). >>> And if you have to debug something, you will have, e.g., values named user >>> and domain, rather than both being named value at different levels on the >>> call stack. >>> >>> If you really want to come up with a convincing example for your idea, >>> I'd take an example out of Learn You a Haskell or another book or tutorial >>> and translate that to Python with your library. I suspect it would still >>> have some of the same problems, but this example wouldn't even really be >>> good in Haskell, so it's just making it harder to see why anyone would want >>> anything like it. And by offering this as the response to Guido's "You're >>> never going to convince me," well, if he _was_ still reading this thread >>> with an open mind, he probably isn't anymore (although, to be honest, he >>> probably wasn't reading it anyway). >>> >>> >import re >>> > >>> >from lawvere import curry # curry is an arrow without type checking, >>> inherits composition, mutiple dispatch >>> > >>> >user_match = >>> re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match >>> >domain_match = >>> re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match >>> >strict_user_match = >>> re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match >>> > >>> >@curry>def is_string(value): >>> > assert isinstance(value, str), '%s is not a string' %value >>> > return value >>> > >>> >@curry >>> >def apply_until_char(func, char, value): >>> > func(value[:value.index(char)]) >>> > return value >>> > >>> >@curry >>> >def apply_from_char(func, char, value): >>> > func(value[value.index(char) + 1:]) >>> > return value >>> > >>> >@curry >>> > >>> >def has_char(char, value): >>> > assert value.count(char) == 1 >>> > return value >>> > >>> >@curry >>> >def assert_ends_with(text, value): >>> > assert value.endswith(text), '%s do not ends with %s' % (value, >>> text) >>> > return value >>> > >>> >@curry >>> >def assert_user(user): >>> > assert user_match(user) is not None, '%s is not a valid user name' >>> % value >>> > return user >>> > >>> >@curry >>> >def assert_strict_user(user): >>> > assert strict_user_match(user) is not None, '%s is not a valid >>> strict user' % value >>> > return user >>> > >>> >@curry >>> >def assert_domain(domain): >>> > assert domain_match(domain) is not None, '%s is not a valid domain >>> name' % value >>> > return domain >>> > >>> ># currying (be made with partial) >>> > >>> >has_user = apply_until_char(assert_user, '@') >>> > >>> >has_strict_user = apply_until_char(assert_strict_user, '@') >>> > >>> >has_domain = apply_from_char(assert_domain, '@') >>> > >>> > >>> ># composition: >>> > >>> >is_email_address = is_string >> has_char('@') >> has_user >> has_domain >>> > >>> >is_strict_email_address = is_string >> has_char('@') >> has_strict_user >>> >> has_domain >>> > >>> > >>> ># we just want org adresses ? >>> > >>> >is_org_addess = is_email_address >> assert_ends_with('.org') >>> > >>> > >>> > >>> > >>> >I found a lot of interest in this syntax, mainly for testing purpose, >>> readability and maintenability of code. >>> > >>> >No matters if I'm a fish out of python waters. :) >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> >2015-05-11 16:41 GMT+02:00 Guido van Rossum : >>> > >>> >As long as I'm "in charge" the chances of this (or anything like it) >>> being accepted into Python are zero. I get a headache when I try to >>> understand code that uses function composition, and I end up having to >>> laboriously rewrite it using more traditional call notation before I move >>> on to understanding what it actually does. Python is not Haskell, and >>> perhaps more importantly, Python users are not like Haskel users. Either >>> way, what may work out beautifully in Haskell will be like a fish out of >>> water in Python. >>> >> >>> >>I understand that it's fun to try to sole this puzzle, but evolving >>> Python is more than solving puzzles. Enjoy debating the puzzle, but in the >>> end Python will survive without the solution. >>> >> >>> >> >>> >> >>> >>-- >>> >> >>> >>--Guido van Rossum (python.org/~guido) >>> >>_______________________________________________ >>> >>Python-ideas mailing list >>> >>Python-ideas at python.org >>> >>https://mail.python.org/mailman/listinfo/python-ideas >>> >>Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >>> > >>> > >>> >_______________________________________________ >>> >Python-ideas mailing list >>> >Python-ideas at python.org >>> >https://mail.python.org/mailman/listinfo/python-ideas >>> >Code of Conduct: http://python.org/psf/codeofconduct/ >>> > >>> > >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue May 12 04:15:44 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 May 2015 12:15:44 +1000 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 12 May 2015 at 08:12, Gregory Salvan wrote: > Sorry the fun part: the more you write code the less you have to write > tests. I think this is the key for the folks hoping to make the case for increased support for function composition in the future (it's definitely too late in the cycle for 3.5): focus on the *pragmatic* benefits in testability, and argue that this makes up for the *loss* of readability. "It's easier to read" is *not* a true statement for anyone that hasn't already learned to think functionally, and "It is worth your while to learn to think functionally, even if it takes you years" is a very *different* statement. The human brain tends to think procedurally by default (presumably because our stream of consciousness is typically experienced as a linear series of events), while object oriented programming can benefit from analogies with physical objects (especially when taught via robotics or other embodied systems), and message passing based concurrent systems can benefit from analogies with human communications. By contrast, there aren't any easy "interaction with the physical world" analogies to draw on for functional programming, so it takes extensive training and practice to teach people to think in functional terms. Folks with a strong mathematical background (especially in formal mathematical proofs) often already have that training (even if they're only novice programmers), while the vast majority of software developers (even professional ones), don't. As a result, I think the more useful perspective to take is the one taken for the PEP 484 type hinting PEP: positioning function composition as an advanced tool for providing increased correctness guarantees for critical components by building them up from independently tested composable parts, rather than relying on ad hoc procedural logic that may itself be a source of bugs. Aside from more accurately reflecting the appropriate role of function composition in Pythonic development (i.e. as a high barrier to entry technique that is nevertheless sometimes worth the additional conceptual complexity, akin to deciding to use metaclasses to solve a problem), it's also likely to prove beneficial that Guido's recently been on the other side of this kind of argument when it comes to both type hinting in PEP 484 and async/await in PEP 492. I assume he'll still remain skeptical of the value of the trade-off when it comes to further improvements to Python's functional programming support, but at least he'll be familiar with the form of the argument :) On the "pragmatic benefits in testability" front, I believe one key tool to focus on is the Quick Check test case generator (https://wiki.haskell.org/Introduction_to_QuickCheck1) which lets the test generator take care of determining appropriate boundary conditions to check based on a specification of the desired externally visible behaviour of a function, rather than relying on the developer to manually specify those boundary conditions as particular test cases. I personally learned about that approach earlier this year through a talk that Fraser Tweedale gave at LCA in January: https://speakerdeck.com/frasertweedale/the-best-test-data-is-random-test-data & https://www.youtube.com/watch?v=p7oRMB5V2kE For Python, Fraser pointed out http://xion.io/pyqcy/ and Google tells me there's also https://pypi.python.org/pypi/pytest-quickcheck Gary Bernhardt's work is also worth exploring, including the "Functional Core, Imperative Shell" model discussed in his "Boundaries" presentation (https://www.youtube.com/watch?v=yTkzNHF6rMs) a few years back (an implementation of this approach is available for Python at https://pypi.python.org/pypi/nonobvious/). His closing keynote presentation at PyCon this year was also relevant (relating to the differences between the assurances that testing can provide vs those offered by powerful type systems like Idris), but unfortunately not available online. Andrew's recommendation to "approach via NumPy" is also a good one. Scientific programmers tend to be much better mathematicians than other programmers (and hence more likely to appreciate the value of development techniques based on function composition), and the rapid acceptance of the matrix multiplication PEP shows the scientific Python community have also become quite skilled at making the case to python-dev for new language level features of interest to them :) Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From nicholas.chammas at gmail.com Tue May 12 04:23:43 2015 From: nicholas.chammas at gmail.com (Nicholas Chammas) Date: Tue, 12 May 2015 02:23:43 +0000 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com> Message-ID: > For Python, Fraser pointed out http://xion.io/pyqcy/ and Google tells me there's also https://pypi.python.org/pypi/pytest-quickcheck Don't forget about the latest and greatest Python library for property-based testing, Hypothesis ! On Mon, May 11, 2015 at 10:16 PM Nick Coghlan wrote: > On 12 May 2015 at 08:12, Gregory Salvan wrote: > > Sorry the fun part: the more you write code the less you have to write > > tests. > > I think this is the key for the folks hoping to make the case for > increased support for function composition in the future (it's > definitely too late in the cycle for 3.5): focus on the *pragmatic* > benefits in testability, and argue that this makes up for the *loss* > of readability. "It's easier to read" is *not* a true statement for > anyone that hasn't already learned to think functionally, and "It is > worth your while to learn to think functionally, even if it takes you > years" is a very *different* statement. > > The human brain tends to think procedurally by default (presumably > because our stream of consciousness is typically experienced as a > linear series of events), while object oriented programming can > benefit from analogies with physical objects (especially when taught > via robotics or other embodied systems), and message passing based > concurrent systems can benefit from analogies with human > communications. By contrast, there aren't any easy "interaction with > the physical world" analogies to draw on for functional programming, > so it takes extensive training and practice to teach people to think > in functional terms. Folks with a strong mathematical background > (especially in formal mathematical proofs) often already have that > training (even if they're only novice programmers), while the vast > majority of software developers (even professional ones), don't. > > As a result, I think the more useful perspective to take is the one > taken for the PEP 484 type hinting PEP: positioning function > composition as an advanced tool for providing increased correctness > guarantees for critical components by building them up from > independently tested composable parts, rather than relying on ad hoc > procedural logic that may itself be a source of bugs. Aside from more > accurately reflecting the appropriate role of function composition in > Pythonic development (i.e. as a high barrier to entry technique that > is nevertheless sometimes worth the additional conceptual complexity, > akin to deciding to use metaclasses to solve a problem), it's also > likely to prove beneficial that Guido's recently been on the other > side of this kind of argument when it comes to both type hinting in > PEP 484 and async/await in PEP 492. I assume he'll still remain > skeptical of the value of the trade-off when it comes to further > improvements to Python's functional programming support, but at least > he'll be familiar with the form of the argument :) > > On the "pragmatic benefits in testability" front, I believe one key > tool to focus on is the Quick Check test case generator > (https://wiki.haskell.org/Introduction_to_QuickCheck1) which lets the > test generator take care of determining appropriate boundary > conditions to check based on a specification of the desired externally > visible behaviour of a function, rather than relying on the developer > to manually specify those boundary conditions as particular test > cases. > > I personally learned about that approach earlier this year through a > talk that Fraser Tweedale gave at LCA in January: > > https://speakerdeck.com/frasertweedale/the-best-test-data-is-random-test-data > & > > https://www.youtube.com/watch?v=p7oRMB5V2kE > > For Python, Fraser pointed out http://xion.io/pyqcy/ and Google tells > me there's also https://pypi.python.org/pypi/pytest-quickcheck > > Gary Bernhardt's work is also worth exploring, including the > "Functional Core, Imperative Shell" model discussed in his > "Boundaries" presentation > (https://www.youtube.com/watch?v=yTkzNHF6rMs) a few years back (an > implementation of this approach is available for Python at > https://pypi.python.org/pypi/nonobvious/). His closing keynote > presentation at PyCon this year was also relevant (relating to the > differences between the assurances that testing can provide vs those > offered by powerful type systems like Idris), but unfortunately not > available online. > > Andrew's recommendation to "approach via NumPy" is also a good one. > Scientific programmers tend to be much better mathematicians than > other programmers (and hence more likely to appreciate the value of > development techniques based on function composition), and the rapid > acceptance of the matrix multiplication PEP shows the scientific > Python community have also become quite skilled at making the case to > python-dev for new language level features of interest to them :) > > Regards, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rustompmody at gmail.com Tue May 12 07:06:31 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Tue, 12 May 2015 10:36:31 +0530 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info> <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> Message-ID: On Mon, May 11, 2015 at 8:11 PM, Guido van Rossum wrote: > > > As long as I'm "in charge" the chances of this (or anything like it) being > accepted into Python are zero. I get a headache when I try to understand > code that uses function composition, > I find it piquant to see this comment from the creator of a language that traces its lineage to Lambert Meertens :-) [Was reading one of the classics just yesterday http://www.kestrel.edu/home/people/meertens/publications/papers/Algorithmics.pdf ] Personally, yeah I dont think python blindly morphing into haskell is a neat idea In the specific case of composition my position is... sqrt(mean(square(x))) is ugly in a lispy way (sqrt @ mean @ square)(x) is backward in one way (square @ mean @ sqrt)(x) is backward in another way sqrt @ mean @ square is neat for being point-free and reads easy like a Unix '|' but the '@' is more strikingly ugly sqrt o mean o square is a parsing nightmare square ? mean ? root Just right! [Assuming the unicode gods favor its transmission!] ...hopefully not too frivolous to say this but the ugliness of @ overrides the succinctness of the math for me -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Tue May 12 10:36:22 2015 From: flying-sheep at web.de (Philipp A.) Date: Tue, 12 May 2015 08:36:22 +0000 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info> <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> Message-ID: ha, i love unicode operators (e.g. in scala), but i think guido said python will stay ASCII. i hope we one day gain the ability to *optionally* use unicode alternatives, even if that would put an end to our __matmul__ ? function combination aspirations: * ? ? @ ? ? (not ?) / ? ? ... ? ? lambda ? ? ? phil Rustom Mody schrieb am Di., 12. Mai 2015 um 07:07 Uhr: > On Mon, May 11, 2015 at 8:11 PM, Guido van Rossum > wrote: > >> >> >> As long as I'm "in charge" the chances of this (or anything like it) >> being accepted into Python are zero. I get a headache when I try to >> understand code that uses function composition, >> > > I find it piquant to see this comment from the creator of a language that > traces its lineage to Lambert Meertens :-) > [Was reading one of the classics just yesterday > > http://www.kestrel.edu/home/people/meertens/publications/papers/Algorithmics.pdf > ] > Personally, yeah I dont think python blindly morphing into haskell is a > neat idea > In the specific case of composition my position is... > > sqrt(mean(square(x))) > is ugly in a lispy way > > (sqrt @ mean @ square)(x) > is backward in one way > > (square @ mean @ sqrt)(x) > is backward in another way > > sqrt @ mean @ square > is neat for being point-free and reads easy like a Unix '|' but the '@' is > more strikingly ugly > > sqrt o mean o square > is a parsing nightmare > > square ? mean ? root > Just right! [Assuming the unicode gods favor its transmission!] > > ...hopefully not too frivolous to say this but the ugliness of @ overrides > the succinctness of the math for me > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmcs at jsantos.eu Tue May 12 11:01:01 2015 From: jmcs at jsantos.eu (=?UTF-8?B?Sm/Do28gU2FudG9z?=) Date: Tue, 12 May 2015 09:01:01 +0000 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info> <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> Message-ID: Python already supports unicode operators (kind of). You just have to use a custom codec that translates the unicode characters to proper python. On Tue, 12 May 2015 at 10:42 Philipp A. wrote: > ha, i love unicode operators (e.g. in scala), but i think guido said > python will stay ASCII. > > i hope we one day gain the ability to *optionally* use unicode > alternatives, even if that would put an end to our __matmul__ ? function > combination aspirations: > > * ? ? > @ ? ? (not ?) > / ? ? > ... ? ? > lambda ? ? > > ? phil > > Rustom Mody schrieb am Di., 12. Mai 2015 um > 07:07 Uhr: > >> On Mon, May 11, 2015 at 8:11 PM, Guido van Rossum >> wrote: >> >>> >>> >>> As long as I'm "in charge" the chances of this (or anything like it) >>> being accepted into Python are zero. I get a headache when I try to >>> understand code that uses function composition, >>> >> >> I find it piquant to see this comment from the creator of a language that >> traces its lineage to Lambert Meertens :-) >> [Was reading one of the classics just yesterday >> >> http://www.kestrel.edu/home/people/meertens/publications/papers/Algorithmics.pdf >> ] >> Personally, yeah I dont think python blindly morphing into haskell is a >> neat idea >> In the specific case of composition my position is... >> >> sqrt(mean(square(x))) >> is ugly in a lispy way >> >> (sqrt @ mean @ square)(x) >> is backward in one way >> >> (square @ mean @ sqrt)(x) >> is backward in another way >> >> sqrt @ mean @ square >> is neat for being point-free and reads easy like a Unix '|' but the '@' >> is more strikingly ugly >> >> sqrt o mean o square >> is a parsing nightmare >> >> square ? mean ? root >> Just right! [Assuming the unicode gods favor its transmission!] >> >> ...hopefully not too frivolous to say this but the ugliness of @ >> overrides the succinctness of the math for me >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From spencerb21 at live.com Tue May 12 11:30:55 2015 From: spencerb21 at live.com (Spencer Brown) Date: Tue, 12 May 2015 19:30:55 +1000 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info> <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> Message-ID: It might be neat to be able to use the superscript and subscript number glyphs for exponentiation and indexing, so 'x?, x?, x?' == 'x[0], x[1], x[2]' and 'some_var????' == 'some_var ** 35.1'. (That probably shouldn't support anything other than numbers and '.' to keep things simple). There's also the comparison operators (?, ?, ?, ?, ?), '?' for in and perhaps even additional overloads for sets (?, ?, ?, ?, ?, ?, ?). Maybe the math module could have a math.? alias as well for people who wish to import it. - Spencer > On 12 May 2015, at 7:01 pm, Jo?o Santos wrote: > > Python already supports unicode operators (kind of). You just have to use a custom codec that translates the unicode characters to proper python. > >> On Tue, 12 May 2015 at 10:42 Philipp A. wrote: >> ha, i love unicode operators (e.g. in scala), but i think guido said python will stay ASCII. >> >> i hope we one day gain the ability to optionally use unicode alternatives, even if that would put an end to our __matmul__ ? function combination aspirations: >> >> * ? ? >> @ ? ? (not ?) >> / ? ? >> ... ? ? >> lambda ? ? >> >> ? phil -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ From rustompmody at gmail.com Tue May 12 13:27:41 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Tue, 12 May 2015 16:57:41 +0530 Subject: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)] In-Reply-To: References: <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp> <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info> <554E5CC9.3010406@aalto.fi> <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com> <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi> <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info> <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com> Message-ID: On Tue, May 12, 2015 at 2:06 PM, Philipp A. wrote: > ha, i love unicode operators (e.g. in scala), but i think guido said > python will stay ASCII. > Or Julia http://iaindunning.com/blog/julia-unicode.html Also Fortress, Agda and the classic APL Interestingly Haskell is one step ahead of Python in some areas and behind in others --------- GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude> let (x?, x?) = (1, 2) Prelude> (x?, x?) (1,2) Prelude> --------- However wrt getting ligatures right python is ahead: [Haskell] Prelude> let ?ag = True Prelude> flag :5:1: Not in scope: `flag' [Equivalent of NameError] ------------- [Python3] >>> ?ag = True >>> flag True -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed May 13 08:24:04 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 12 May 2015 23:24:04 -0700 (PDT) Subject: [Python-ideas] Add math.iszero() and math.isequal()? In-Reply-To: <4537a315-a08c-4838-8d55-1483ac9656bc@googlegroups.com> References: <4537a315-a08c-4838-8d55-1483ac9656bc@googlegroups.com> Message-ID: <85fe56bf-84a1-45b9-84bf-26b2ff389486@googlegroups.com> See PEP 485, which appears to be still a draft: https://www.python.org/dev/peps/pep-0485/ Best, Neil On Tuesday, May 12, 2015 at 3:18:47 AM UTC-4, Mark Summerfield wrote: > > From Python 3.2 it is easy to compare floats, e.g., > > iszero = lambda x: hash(x) == hash(0) > isequal = lambda a, b: hash(a) == hash(b) > > Clearly these are trivial functions (but perphaps math experts could > provide better implementations; I'm not proposing the implementations > shown, just the functions however they are implemented). > > It seems that not everyone is aware of the issues regarding comparing > floats for equality and so I still see code that compares floats using == > or !=. > > If these functions were in the math module it would be convenient (since I > find I need them in most non-trivial programs), but also provide a place to > document that they should be used rather than == or != for floats. (I guess > a similar argument might apply to the cmath module?) > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.cliffe at btinternet.com Wed May 13 11:53:32 2015 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Wed, 13 May 2015 10:53:32 +0100 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <20150507153123.GT5663@ando.pearwood.info> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <20150507153123.GT5663@ando.pearwood.info> Message-ID: <55531F1C.2090509@btinternet.com> On 07/05/2015 16:31, Steven D'Aprano wrote: > >> Imageine if we were starting to design the 21st century from scratch, >> throwing away all the history? How would we go about it? > Well, for starters I would insist on re-introducing thorn ? and eth ? > back into English :-) > > I'd second that. :-) Seriously, thanks to everyone who took the trouble to reply to my rant, instead of just dismissing it as the ravings of an idiot. I found your replies quite enlightening. Rob Cliffe From random832 at fastmail.us Wed May 13 16:33:28 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 13 May 2015 10:33:28 -0400 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com> On Thu, May 7, 2015, at 18:30, Stephen J. Turnbull wrote: > Chris Barker writes: > > > I've read many of the rants about UTF-16, but in fact, it's really > > not any worse than UTF-8 > > Yes, it is. It's not ASCII compatible. You can safely use the usual > libc string APIs on UTF-8 (except for any that might return only part > of a string), but not on UTF-16 (nulls). This is a pretty big > advantage for UTF-8 in practice. If you're using libc, why shouldn't you be using the native wide character types (whether that it UTF-16 or UCS-4) and using the wide string APIs? From ncoghlan at gmail.com Wed May 13 18:22:41 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 May 2015 02:22:41 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: (Note: I've posted to the issue suggesting we defer further consideration to 3.6, as well as suggesting a new "string.internals" submodule as a possible home for them, but I'm following up here to capture my current thinking on the topic) On 7 May 2015 4:47 pm, "Nick Coghlan" wrote: > Regardless of which specific approach you take, handling surrogates > explicitly when a string is passed to you from an API that uses > permissive decoding lets you avoid both unexpected UnicodeEncodeError > exceptions (if the surrogates end up being encoded with an error > handler other than surrogatepass or surrogateescape) or propagating > mojibake (if the surrogates are encoded with a suitable error handler, > but an encoding that differs from the original). Considering this rationale further, the key purpose of the proposed new surrogate handling functions is to take an input string that may contain surrogate code points, and produce one that is guaranteed *not* to contain such surrogates (either because they've been removed or replaced, or because an exception will be thrown if there are any present in the input). They're designed to let a developer either make a program eagerly detect improperly decoded data, or else to convert the surrogates to an encodable form (potentially losing data in the process) Three potential expected sources of surrogates have been identified: * escaped surrogates smuggling arbitrary bytes passed through decoding by the "surrogateescape" error handler * surrogates passed through the decoding process by the "surrogatepass" error handler * decomposed surrogate pairs for astral characters The various reasonable "data scrubbing" techniques that have been proposed are: 1. compose surrogate pairs to the corresponding astral code point 2. throw an error for any surrogates found 3. delete any surrogates found 4. replace any surrogates found with the Unicode replacement character 5. replace any surrogates found with their corresponding backslash escaped sequence 6. as with the preceding, but only for surrogate escaped data, not arbitrary surrogates The first of those is handled by the suggested "compose_surrogate_pairs()", which will convert valid pairs to their corresponding astral code points. 2-5 are handled by rehandle_surrogatepass(), with the corresponding decoding error handler (strict, ignore, replace, backslashreplace) 6 is handled by rehandle_surrogateescape(), again with the corresponding error handlers A potential downside of this approach of exposing the error handlers directly as part of the data scrubbing API is that passing in "surrogateescape" or "surrogatepass" as the error handler may break the assurance that the output doesn't contain any surrogates (this could be avoided if those two error handlers don't support str->str conversions). Anyway, I think we can readily put this question aside for now, and revisit it again for 3.6 after folks have a chance to get more experience with some of the other bytes/text handling changes in 3.5. I created a tracking issue (http://bugs.python.org/issue22555) for those a while back, and just did a pass through them the other day to see if there were any I particularly wanted to see make it into 3.5 (all the still open ones ended up in the "wait for other developments before pursuing further" category). Cheers, Nick. From stephen at xemacs.org Wed May 13 19:45:15 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 14 May 2015 02:45:15 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp> <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com> Message-ID: <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp> random832 at fastmail.us writes: > If you're using libc, why shouldn't you be using the native wide > character types (whether that it UTF-16 or UCS-4) and using the wide > string APIs? Who says you are using libc? You might be writing an operating system or a shell script. And if you do use the native wide character type, you're guaranteed not to be portable because some systems have wide characters are actually variable width and others aren't, as you just pointed out. Or you might have an ancient byte-oriented program you want to use. I'm not saying that UTF-8 is a panacea; just that every problem that UTF-8 has, UTF-16 also has -- but UTF-16 does have problems that UTF-8 doesn't. Specifically, surrogates and ASCII incompatibility. From abarnert at yahoo.com Wed May 13 20:18:44 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 13 May 2015 11:18:44 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp> <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com> Message-ID: On May 13, 2015, at 07:33, random832 at fastmail.us wrote: > >> On Thu, May 7, 2015, at 18:30, Stephen J. Turnbull wrote: >> Chris Barker writes: >> >>> I've read many of the rants about UTF-16, but in fact, it's really >>> not any worse than UTF-8 >> >> Yes, it is. It's not ASCII compatible. You can safely use the usual >> libc string APIs on UTF-8 (except for any that might return only part >> of a string), but not on UTF-16 (nulls). This is a pretty big >> advantage for UTF-8 in practice. > > If you're using libc, why shouldn't you be using the native wide > character types (whether that it UTF-16 or UCS-4) and using the wide > string APIs? That's exactly how you create the problems this thread is trying to solve. If you treat wchar_t as a "native wide char type" and call any of the wcs functions on UTF-16 strings, you will count astral characters as two characters, illegally split strings in the middle of surrogates, etc. And you'll count BOMs as two characters and split them. These are basically all the same problems you have using char with UTF-8, and more, and harder to notice in testing (not just because you may not think to test for astral characters, but because even if you do, you may not think to test both byte orders). And that's not even taking into account the fact that C explicitly allows wchar_t to be as small as 8 bits. The Unicode and C standards both explicitly say that you should never use wchar_t for Unicode characters in portable code, only use it for storing the native characters of any wider-than-char locale encodings that a specific compiler supports. Later versions of C and POSIX (as in later than what Python requires) provide explicit __CHAR16_TYPE__ and __CHAR_32_TYPE__, but they don't provide APIs for analogs of strlen, strchr, strtok, etc. for those types, so you have to be explicit about whether you're counting code points or characters (and, if characters, how you're dealing with endianness). From stephen at xemacs.org Thu May 14 06:52:32 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 14 May 2015 13:52:32 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87r3qj2227.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > Three potential expected sources of surrogates have been identified: [omitted] > * decomposed surrogate pairs for astral characters I wouldn't call that "expected", as it requires wilful malice on the part of a programmer (not users or other external sources of input), though. No standard codec should produce such in a PEP 393 Python. From storchaka at gmail.com Thu May 14 07:31:13 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 14 May 2015 08:31:13 +0300 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 13.05.15 19:22, Nick Coghlan wrote: > Three potential expected sources of surrogates have been identified: > > * escaped surrogates smuggling arbitrary bytes passed through decoding > by the "surrogateescape" error handler > * surrogates passed through the decoding process by the > "surrogatepass" error handler > * decomposed surrogate pairs for astral characters * json * pickle * email * nntplib * SimpleHTTPRequestHandler * wsgiref * cgi * tarfile * filesystem names (os.decode) and other os calls * platform and sysconfig * other serializers From ncoghlan at gmail.com Thu May 14 10:20:28 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 May 2015 18:20:28 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 14 May 2015 at 15:31, Serhiy Storchaka wrote: > On 13.05.15 19:22, Nick Coghlan wrote: >> >> Three potential expected sources of surrogates have been identified: >> >> * escaped surrogates smuggling arbitrary bytes passed through decoding >> by the "surrogateescape" error handler >> * surrogates passed through the decoding process by the >> "surrogatepass" error handler >> * decomposed surrogate pairs for astral characters > > > * json > * pickle > * email > * nntplib > * SimpleHTTPRequestHandler > * wsgiref > * cgi > * tarfile > * filesystem names (os.decode) and other os calls > * platform and sysconfig > * other serializers Right, those are the kinds of boundary APIs that drove the introduction of Python 3's arbitrary bytes smuggling capabilities in the first place. The key changes I realised it's potentially worth waiting and seeing the impact of are: * the restoration of printf-style formatting for binary data * the introduction of bytes.hex() * the rise of systemd as the preferred init system for Linux (while that doesn't solve the "bad locale settings" problem for *nix systems, it tackles a reasonable chunk of them) The first two should make it easier to just stay in the binary domain when working with arbitrary binary data, while the last will hopefully eliminate one of the common sources of declared-vs-actual encoding mismatches. I *expect* we'll still want these proposed APIs (or a comparable alternative) by the time 3.6 rolls around, but I also see value in continuing to be cautious about adding them (since we'll be stuck with them once we do, although I guess we could also go down the path of declaring "string.internals" to be a provisional API in PEP 411 terms). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Thu May 14 10:48:42 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 14 May 2015 08:48:42 +0000 (UTC) Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: Message-ID: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> On Wednesday, May 13, 2015 10:31 PM, Serhiy Storchaka wrote: > On 13.05.15 19:22, Nick Coghlan wrote: >> Three potential expected sources of surrogates have been identified: >> >> * escaped surrogates smuggling arbitrary bytes passed through decoding >> by the "surrogateescape" error handler >> * surrogates passed through the decoding process by the >> "surrogatepass" error handler >> * decomposed surrogate pairs for astral characters > > * json > * pickle > * email > * nntplib > * SimpleHTTPRequestHandler > * wsgiref > * cgi > * tarfile > * filesystem names (os.decode) and other os calls > * platform and sysconfig > * other serializers As far as I can tell, all of your extra cases are just examples of the surrogateescape error handler, which Nick already mentioned. Beyond that, some of these modules may need to understand surrogates internally, but I can't see how they could get anywhere near the module boundaries. For example, to build and parse JSON's 12-character escape sequences, like "\uD834\uDD1E" for U+1D11E, you obviously need to be able to decompose and compose astrals internally, but that shouldn't even generate unicode strings with surrogate pairs in 3.3+, much less expose them to user code. From storchaka at gmail.com Thu May 14 12:15:18 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 14 May 2015 13:15:18 +0300 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 14.05.15 11:48, Andrew Barnert via Python-ideas wrote: > On Wednesday, May 13, 2015 10:31 PM, Serhiy Storchaka wrote: >> On 13.05.15 19:22, Nick Coghlan wrote: >>> Three potential expected sources of surrogates have been identified: >>> >>> * escaped surrogates smuggling arbitrary bytes passed through decoding >>> by the "surrogateescape" error handler >>> * surrogates passed through the decoding process by the >>> "surrogatepass" error handler >>> * decomposed surrogate pairs for astral characters >> >> * json >> * pickle >> * email >> * nntplib >> * SimpleHTTPRequestHandler >> * wsgiref >> * cgi >> * tarfile >> * filesystem names (os.decode) and other os calls >> * platform and sysconfig >> * other serializers > > As far as I can tell, all of your extra cases are just examples of the surrogateescape error handler, which Nick already mentioned. Not all. JSON allows to inject surrogates as \uXXXX. Pickle with protocol 0 uses the raw-unicode-escape encoding that allows surrogates. There is also the UTF-7 encoding that allows surrogates. And yet one source of surrogates -- Python sources. eval(), etc. Tkinter can produce surrogates. XML parser unfortunately can't (unfortunately - because it makes impossible to handle with Python some files generated by third-party programs). I'm not sure about sqlite3. Any extension module, any wrapper around third-party library could potentially produce surrogates. From koos.zevenhoven at aalto.fi Thu May 14 13:03:39 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Thu, 14 May 2015 14:03:39 +0300 Subject: [Python-ideas] Units in type hints Message-ID: <5554810B.7050409@aalto.fi> Hi all, How about extending the type annotations for int, float and complex to optionally include also a unit? For instance, def sleep(duration : Float['s']): ... Now the type checker could catch the error of trying to pass the sleep duration in milliseconds, Float['ms']. This would also be useful for documentation, avoiding the 'need' for having names like duration_s. At least the notation with square brackets would resemble the way units are often written in science. Another example: def calculate_travel_time(distance: Float['km']) -> Float['h']: speed = get_current_speed() # type: Float['km/h'] return distance / speed Now, if you try to pass the distance in miles, or Float['mi'], the type checker would catch the error. Note that the type checker would also understand that 'km' divided by 'km/h' becomes 'h'. Or should these be something like units.km / units.h? But if you do have your distance in miles, you do calculate_travel_time(units.convert(distance_mi, 'mi', 'km')) and the type checker and programmer get what they want. Anyone interested? -- Koos From steve at pearwood.info Thu May 14 13:59:58 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 14 May 2015 21:59:58 +1000 Subject: [Python-ideas] Units in type hints In-Reply-To: <5554810B.7050409@aalto.fi> References: <5554810B.7050409@aalto.fi> Message-ID: <20150514115956.GW5663@ando.pearwood.info> On Thu, May 14, 2015 at 02:03:39PM +0300, Koos Zevenhoven wrote: > Hi all, > > How about extending the type annotations for int, float and complex to > optionally include also a unit? I really, really like the idea of having unit-aware calculations. But this is not the way to do it. See below: > For instance, > > def sleep(duration : Float['s']): > ... > > Now the type checker could catch the error of trying to pass the sleep > duration in milliseconds, Float['ms']. But that's not an error. Calling sleep(weight_in_kilograms) is an error. But calling sleep(milliseconds(1000)) should be the same as calling sleep(seconds(1)). If the user has to do the conversion themselves, that's a source of error: sleep(time_in_milliseconds / 1000) # convert to seconds If you think that's too obvious an error for anyone to make, (1) you're wrong, I've made that error, yes even that simple, and (2) you should try it with more complex sets of units. How many pound-foot per minute squared in a newton? Having the language support unit calculations is not just to catch the wrong dimensions (passing a weight where a time is needed), but to manage unit conversions automatically without the user being responsible for getting the conversion right. A type checker is the wrong tool for the job. If you want to see what a good unit-aware language should be capable of, check out: - Frink: https://futureboy.us/frinkdocs/ - the HP-28 and HP-48 series of calculators; - the Unix/Linux "units" utility. There are also some existing Python libraries which do unit calculations. You should look into them. -- Steve From abarnert at yahoo.com Thu May 14 14:21:10 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 14 May 2015 05:21:10 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> Message-ID: <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> On May 14, 2015, at 03:15, Serhiy Storchaka wrote: > >> On 14.05.15 11:48, Andrew Barnert via Python-ideas wrote: >>> On Wednesday, May 13, 2015 10:31 PM, Serhiy Storchaka wrote: >>>> On 13.05.15 19:22, Nick Coghlan wrote: >>>> Three potential expected sources of surrogates have been identified: >>>> >>>> * escaped surrogates smuggling arbitrary bytes passed through decoding >>>> by the "surrogateescape" error handler >>>> * surrogates passed through the decoding process by the >>>> "surrogatepass" error handler >>>> * decomposed surrogate pairs for astral characters >>> >>> * json >>> * pickle >>> * email >>> * nntplib >>> * SimpleHTTPRequestHandler >>> * wsgiref >>> * cgi >>> * tarfile >>> * filesystem names (os.decode) and other os calls >>> * platform and sysconfig >>> * other serializers >> >> As far as I can tell, all of your extra cases are just examples of the surrogateescape error handler, which Nick already mentioned. > > Not all. JSON allows to inject surrogates as \uXXXX. JSON specifically requires treating \uXXXX\uYYYY as a "12-character escape sequence" for a single character if XXXX and YYYY are a surrogate pair. If Python is handling that wrong, then it needs to be fixed (but I don't think it is; I'll test tomorrow). > Pickle with protocol 0 uses the raw-unicode-escape encoding that allows surrogates. Sure, if you pickle a unicode object in a narrow 2.x, it gets pickled as surrogates. But when you unpickle it in 3.4, surely those surrogates are converted to astrals? If not, then every time you, e.g., pickle a Windows filename for use with win32api with astrals in 2.x, and unpickle it in 3.4 and try to use it with win32api it wouldn't work. Unless we actually are breaking those filenames, but win32api (and everything else) is working around the problem? Even if that's true, it seems like the obvious answer would be to fix the problem rather than provide tools for workarounds to libraries that must already have those workarounds anyway. > There is also the UTF-7 encoding that allows surrogates. Encoding to UTF-7 requires first encoding to UTF-16 and then doing the modified-base-64 thing. And decoding from UTF-7 requires reversing both those steps. There's no way surrogates can escape into Unicode from that. I suppose you could, instead of decoding from UTF-7, just do the base 64 decode and then skip the UTF-16 decode and instead just widen the code units, but that's not a valid thing to do, and I can't see why anyone would do it. > And yet one source of surrogates -- Python sources. eval(), etc. If I type '\uD834\uDD1E' in Python 3.4 source, am I actually going to get an illegal Unicode string made of 2 surrogate code points instead of either an error or the single-character string '\U0001D11E'? If so, again, I think that's a bug that needs to be fixed, not worked around. There's no legitimate reason for any source code to expect that to be an illegal length-2 string. > Tkinter can produce surrogates. XML parser unfortunately can't (unfortunately - because it makes impossible to handle with Python some files generated by third-party programs). I'm not sure about sqlite3. Any extension module, any wrapper around third-party library could potentially produce surrogates. What C API function are they calling to make a PyUnicode out of a UTF-16 char* or wchar_t* or whatever without decoding it as UTF-16? And why do we have such a function? -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu May 14 15:18:33 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 14 May 2015 23:18:33 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> Message-ID: <20150514131832.GX5663@ando.pearwood.info> On Thu, May 14, 2015 at 05:21:10AM -0700, Andrew Barnert via Python-ideas wrote: > On May 14, 2015, at 03:15, Serhiy Storchaka wrote: [...] > > There is also the UTF-7 encoding that allows surrogates. > > Encoding to UTF-7 requires first encoding to UTF-16 and then doing the > modified-base-64 thing. And decoding from UTF-7 requires reversing > both those steps. There's no way surrogates can escape into Unicode > from that. I suppose you could, instead of decoding from UTF-7, just > do the base 64 decode and then skip the UTF-16 decode and instead just > widen the code units, but that's not a valid thing to do, and I can't > see why anyone would do it. I don't see how UTF-7 could include surrogates. It's a 7-bit encoding, which means it can only include bytes \x00 through \x7F, i.e. ASCII compatible. http://unicode.org/glossary/#UTF_7 For example, this passes: for i in range(0x110000): c = chr(i) b = c.encode('utf-7') m = max(b) assert m <= 127 so where are the surrogates coming from? > > And yet one source of surrogates -- Python sources. eval(), etc. > > If I type '\uD834\uDD1E' in Python 3.4 source, am I actually going to > get an illegal Unicode string made of 2 surrogate code points instead > of either an error or the single-character string '\U0001D11E'? I certainly hope so :-) I think that we should understand Unicode strings as sequences of code points from U+0000 to U+10FFFF inclusive. I don't think we should try to enforce a rule that all Python strings are surrogate-free. That would make it awfully inconvenient to process the whole Unicode character set at once, like I did above. I'd need to write: for i in list(range(0xD800)) + list(range(0xE000, 0x110000)): ... instead, or catch the exception in chr(i), or something equally annoying. The cost of that simplicity is that when you go to encode to bytes, you might get an exception. I think so long as we have tools for dealing with that (e.g. str->str transformations to remove or replace surrogates) that's a fair trade-off. Another possibility would be to introduce a separate type, strict_unicode, which does enforce the rule that there are no surrogates in [strict unicode] strings. But having two unicode string types might be overkill/confusing. I think it might be better to have a is_strict() or is_surrogate() method that reports if the string contains surrogates, and let the user remove or replace them as needed. > If so, again, I think that's a bug that needs to be fixed, not worked > around. There's no legitimate reason for any source code to expect > that to be an illegal length-2 string. Well, there's backwards compatibility. There's also testing: assert unicodedata.category('\uD800') == 'Cs' I'm sure there are others. -- Steve From skip.montanaro at gmail.com Thu May 14 16:05:07 2015 From: skip.montanaro at gmail.com (Skip Montanaro) Date: Thu, 14 May 2015 09:05:07 -0500 Subject: [Python-ideas] Units in type hints In-Reply-To: <5554810B.7050409@aalto.fi> References: <5554810B.7050409@aalto.fi> Message-ID: On Thu, May 14, 2015 at 6:03 AM, Koos Zevenhoven wrote: > How about extending the type annotations for int, float and complex to > optionally include also a unit? Not sure that's going to fly, but you might want to check out the magnitude package: https://pypi.python.org/pypi/magnitude/0.9.1 I've used it in situations where I want to specify units scaled to a more natural (to me) size. For example, the gobject.timeout_add function takes a delay in milliseconds. Given that most of the time I want delays in seconds or minutes, it's much more natural for me to let magnitude do the work silently. Skip From stephen at xemacs.org Thu May 14 16:38:57 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 14 May 2015 23:38:57 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> Message-ID: <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert via Python-ideas writes: > > And yet one source of surrogates -- Python sources. eval(), etc. Yep: $ python3.4 Python 3.4.3 (default, Mar 10 2015, 14:53:35) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> chr((16*13+8)*256) '\ud800' >>> '\ud800' '\ud800' >>> '\ud834\udd1e' '\ud834\udd1e' >>> > If I type '\uD834\uDD1E' in Python 3.4 source, am I actually going > to get an illegal Unicode string made of 2 surrogate code points > instead of either an error or the single-character string > '\U0001D11E'? Yes. How else do you propose to test the surrogateescape error handler? Now, are you sitting down? If not, you should before looking at the next example. ;-) >>> '\U0000d834\U0000dd1e' '\ud834\udd1e' >>> Isn't that disgusting? But in Python, str is an array of code units. Literals and chr() can be used to produce str containing surrogates, as well as codec error handling. From random832 at fastmail.us Thu May 14 16:45:50 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 14 May 2015 10:45:50 -0400 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp> <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com> <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com> On Wed, May 13, 2015, at 13:45, Stephen J. Turnbull wrote: > random832 at fastmail.us writes: > > > If you're using libc, why shouldn't you be using the native wide > > character types (whether that it UTF-16 or UCS-4) and using the wide > > string APIs? > > Who says you are using libc? If you're not using libc, then "You can safely use the usual libc string APIs" is not a benefit. > You might be writing an operating system > or a shell script. And if you do use the native wide character type, > you're guaranteed not to be portable because some systems have wide > characters are actually variable width and others aren't, as you just > pointed out. Or you might have an ancient byte-oriented program you > want to use. Using UTF-8 *without* ensuring that the native multibyte character set is UTF-8 [by setting the locale appropriately] and that it is supported end-to-end (by your program, by the curses library if applicable, by the terminal if applicable) just turns obvious problems into subtle ones - not exactly an improvement. > I'm not saying that UTF-8 is a panacea; just that every problem that > UTF-8 has, UTF-16 also has -- but UTF-16 does have problems that UTF-8 > doesn't. Specifically, surrogates and ASCII incompatibility. ASCII incompatibility is a feature, not a bug - it prevents you from doing stupid things that cause subtle bugs. On Wed, May 13, 2015, at 14:18, Andrew Barnert wrote: > That's exactly how you create the problems this thread is trying to > solve. The point I was getting at was more "you can't benefit from libc functions at all, therefore your argument for UTF-8 is bad" than "you should be using the native wchar_t type". Libc only has functions to deal with native char strings [but these do not generally count characters or respect character boundaries in multibyte character sets even if UTF-8 *is* the native multibyte character set] and native wchar_t strings, not any other kind of string. > > If you treat wchar_t as a "native wide char type" and call any of the wcs > functions on UTF-16 strings, you will count astral characters as two > characters, illegally split strings in the middle of surrogates, etc. No worse than UTF-8. If you can solve these problems for UTF-8 you can solve them for UTF-16. > And > you'll count BOMs as two characters and split them. Wait, what? The BOM is a single code unit in UTF-16. There is *no* encoding in which a BOM is two code units (it's three in UTF-8). Anyway, BOM shouldn't be used for in-memory strings, only text files. > These are basically > all the same problems you have using char with UTF-8, and more, and > harder to notice in testing (not just because you may not think to test > for astral characters, but because even if you do, you may not think to > test both byte orders). Byte orders are not an issue for anything other than file I/O, and I'm not proposing using any type other than UTF-8 for *text files*, anyway, only in-memory strings. > Later versions of C and POSIX (as in later than what Python requires) > provide explicit __CHAR16_TYPE__ and __CHAR_32_TYPE__, but they don't > provide APIs for analogs of strlen, strchr, strtok, etc. for those types, > so you have to be explicit about whether you're counting code points or > characters (and, if characters, how you're dealing with endianness). There are no analogs of these for UTF-8 either. And endianness is not an issue for in-memory strings stored using any of these types. From random832 at fastmail.us Thu May 14 16:49:07 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 14 May 2015 10:49:07 -0400 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> Message-ID: <1431614947.2825480.268771377.2227A960@webmail.messagingengine.com> On Thu, May 14, 2015, at 04:48, Andrew Barnert via Python-ideas wrote: > As far as I can tell, all of your extra cases are just examples of the > surrogateescape error handler, which Nick already mentioned. Technically filesystem names (and other similar boundary APIs like environ, anything ctypes, etc) on Windows can contain arbitrary surrogates and have nothing to do with surrogateescape. From alexander at tutorfair.com Thu May 14 16:52:55 2015 From: alexander at tutorfair.com (Alexander Atkins) Date: Thu, 14 May 2015 15:52:55 +0100 Subject: [Python-ideas] lazy list Message-ID: Hi, I'm new to this mailing list. I needed a lazy list implementation for something, so I created one. I was a little bit surprised to find that there wasn't one in the *itertools* module and it seemed like quite a basic thing to me, as someone who has used Haskell before, so I thought probably I should share it. I'm wondering whether something like this should be part of the standard library? A fuller explanation is in the README, which is here: https://github.com/jadatkins/python-lazylist The gist of it is that it allows you to index into a generator. Previously evaluated elements are remembered, so foo[5] returns the same thing each time, and you can later call foo[4] and get the previous element. There are many uses for such a thing, but if you're not expecting it in the language, you might not necessarily think of them. Warning: it may contain bugs, especially the stuff to do with slicing, which is not really what it's for. -- *J Alexander D Atkins* -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu May 14 17:01:43 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 15 May 2015 01:01:43 +1000 Subject: [Python-ideas] lazy list In-Reply-To: References: Message-ID: On Fri, May 15, 2015 at 12:52 AM, Alexander Atkins wrote: > I needed a lazy list implementation for something, so I created one. I was > a little bit surprised to find that there wasn't one in the itertools module > and it seemed like quite a basic thing to me, as someone who has used > Haskell before, so I thought probably I should share it. I'm wondering > whether something like this should be part of the standard library? > It may well already exist on PyPI. There are a few things with "lazy" in their names; you'd have to poke around and see if one of them is of use to you. Another thing you might want to search for is "indexable map()", which is a related concept (imagine calling map() with a function and a list; the result is theoretically subscriptable, but not with Py3's basic map() implementation) that I'm fairly sure I've seen around at times. https://pypi.python.org/pypi Have fun searching. There's a huge lot out there, most of which isn't what you want... but you never know what you'll find! ChrisA From steve at pearwood.info Thu May 14 17:24:33 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 15 May 2015 01:24:33 +1000 Subject: [Python-ideas] lazy list In-Reply-To: References: Message-ID: <20150514152433.GY5663@ando.pearwood.info> On Thu, May 14, 2015 at 03:52:55PM +0100, Alexander Atkins wrote: > Hi, I'm new to this mailing list. > > I needed a lazy list implementation for something, so I created one. I was > a little bit surprised to find that there wasn't one in the *itertools* > module Why? It's not really an iterator tool. The things in itertools are tools for processing streams, not containers. > and it seemed like quite a basic thing to me, as someone who has > used Haskell before, so I thought probably I should share it. I'm > wondering whether something like this should be part of the standard > library? > > A fuller explanation is in the README, which is here: > https://github.com/jadatkins/python-lazylist > The gist of it is that it allows you to index into a generator. Previously > evaluated elements are remembered, so foo[5] returns the same thing each > time, and you can later call foo[4] and get the previous element. There > are many uses for such a thing, but if you're not expecting it in the > language, you might not necessarily think of them. What sort of uses? Can you give some examples? I'm having trouble thinking of a situation where I might use something like that. If I want random access, I'd use a list, or a computed sequence like (x)range. I don't think I would want something which acts like a generator but quietly holds onto all the items it has seen before, whether I need them or not. > Warning: it may contain bugs, especially the stuff to do with slicing, > which is not really what it's for. A slice is just a subsequence of indexed values. If you can index it, you should be able to slice it. assert spam[start:end:step] == [spam[i] for i in range(start, end, step)] -- Steve From alexander at tutorfair.com Thu May 14 17:44:19 2015 From: alexander at tutorfair.com (Alexander Atkins) Date: Thu, 14 May 2015 16:44:19 +0100 Subject: [Python-ideas] lazy list In-Reply-To: <20150514152433.GY5663@ando.pearwood.info> References: <20150514152433.GY5663@ando.pearwood.info> Message-ID: On 14 May 2015 at 16:24, Steven D'Aprano wrote: > What sort of uses? Can you give some examples? > > I'm having trouble thinking of a situation where I might use something > like that. If I want random access, I'd use a list, or a computed > sequence like (x)range. I don't think I would want something which acts > like a generator but quietly holds onto all the items it has seen > before, whether I need them or not. > Yes: you might want random access into an infinite sequence, where you can't be sure how many values you'll need at the start but you will want to reuse or refer back to earlier values later, and you can't be sure which ones you'll need. If you know you only want each value once, then you should stick with a generator. For example, you might need to read from a network stream or stdin, and the point where you stop reading might depend on content, but you might need to refer back to earlier items read, where the index of the item you need is determined at run-time. In my particular case, I was writing a program that reads from a large, but finite sequence, where I usually only need the first few items, but the maximum index that I need is determined at runtime,* and it was taking too long to process the whole list before starting. So I wrote a generator for the sequence instead, and used my LazyList wrapper to get random access on it. * Actually, it's not determined at runtime, but it is determined by another part of the program outside of the function I was writing. > A slice is just a subsequence of indexed values. If you can index it, > you should be able to slice it. > > assert spam[start:end:step] == [spam[i] for i in range(start, end, step)] > What I was trying to do was to create a slice without losing the laziness. For example, in my implementation you can take a slice like foo[start:] from an infinite sequence without causing problems. I haven't quite done it right, because I've returned an iterator instead of another LazyList object, but I could fix it up. I discuss this a bit more in the example program given in the repository. Python draws a lot from Haskell, especially the itertools module. Almost everything that's cool about lazy evaluation in Haskell is in Python somewhere. But Haskell has neither the list type that Python has, nor the generator type: it only has lazy linked lists, which have to serve for both. So it seemed to me to be an obvious omission for those few cases where that's really what you want. But I can totally believe that if nobody's thought of this so far then it's probably not commonly useful. -- *J Alexander D Atkins* Personal: ? 07963 237265 Work: ? 020 3322 4748 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander at tutorfair.com Thu May 14 17:55:07 2015 From: alexander at tutorfair.com (Alexander Atkins) Date: Thu, 14 May 2015 16:55:07 +0100 Subject: [Python-ideas] lazy list In-Reply-To: References: Message-ID: Whoops, at some point I hit 'Reply' instead of 'Reply All', so some of these messages didn't end up in the public group. On 14 May 2015 at 16:22, Alexander Atkins wrote: > > On 14 May 2015 at 16:01, Chris Angelico wrote: > > > > It may well already exist on PyPI. There are a few things with "lazy" > > in their names; you'd have to poke around and see if one of them is of > > use to you. > > Ah, yes. The package zc.lazylist looks quite similar. In some ways it's better than mine, in some ways it's not so ambitious. It's quite difficult to work out who the author is, though. It just says "Copyright Zope 2006", which isn't very helpful. > > I should perhaps reiterate that I have already written a lazy-list implementation, and therefore I don't need another one. What I was wondering is whether I should make any effort to share it with the community. I'm quite happy to shut up and go back to my paid work if that's not useful to anyone. (I'm leaving out Chris' intervening message in case he didn't intend it to be public ? not that there's anything saucy in there.) On 14 May 2015 at 16:46, Alexander Atkins wrote: > > On 14 May 2015 at 16:29, Chris Angelico wrote: > > > > Fair enough. What I'd recommend is putting it up on PyPI yourself; if > > something like this ever does make it into the standard library, it'll > > most likely be by incorporation of a PyPI package. > > Yes, that seems to be the thing to do. Some day when I've got a minute, I'll incorporate the improvements from Zope's implementation (the ideas, I mean, not the code) and fix up my slice implementation, then put it on PyPi. -- J Alexander D Atkins -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu May 14 18:51:40 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 14 May 2015 12:51:40 -0400 Subject: [Python-ideas] lazy list In-Reply-To: References: Message-ID: On 5/14/2015 11:01 AM, Chris Angelico wrote: > On Fri, May 15, 2015 at 12:52 AM, Alexander Atkins > wrote: >> I needed a lazy list implementation for something, so I created one. I was >> a little bit surprised to find that there wasn't one in the itertools module >> and it seemed like quite a basic thing to me, as someone who has used >> Haskell before, so I thought probably I should share it. I'm wondering >> whether something like this should be part of the standard library? This is a memoizer using a list rather than a dict. This is appropriate for f(count) = g(count-1). > It may well already exist on PyPI. There are a few things with "lazy" > in their names; you'd have to poke around and see if one of them is of > use to you. I would also try 'memo' and 'memoize'. > Another thing you might want to search for is "indexable map()", which > is a related concept (imagine calling map() with a function and a > list; the result is theoretically subscriptable, but not with Py3's > basic map() implementation) that I'm fairly sure I've seen around at > times. > > https://pypi.python.org/pypi > > Have fun searching. There's a huge lot out there, most of which isn't > what you want... but you never know what you'll find! The problem with putting any one thing in stdlib is that there are so many little variations. -- Terry Jan Reedy From abarnert at yahoo.com Thu May 14 21:38:19 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 14 May 2015 12:38:19 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp> References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On May 14, 2015, at 07:38, Stephen J. Turnbull wrote: > > Andrew Barnert via Python-ideas writes: > >>> And yet one source of surrogates -- Python sources. eval(), etc. > > Yep: > > $ python3.4 > Python 3.4.3 (default, Mar 10 2015, 14:53:35) > [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> chr((16*13+8)*256) > '\ud800' >>>> '\ud800' > '\ud800' >>>> '\ud834\udd1e' > '\ud834\udd1e' > >> If I type '\uD834\uDD1E' in Python 3.4 source, am I actually going >> to get an illegal Unicode string made of 2 surrogate code points >> instead of either an error or the single-character string >> '\U0001D11E'? > > Yes. How else do you propose to test the surrogateescape error > handler? Now, are you sitting down? If not, you should before > looking at the next example. ;-) > >>>> '\U0000d834\U0000dd1e' > '\ud834\udd1e' > > Isn't that disgusting? No; if the former gave you surrogates, the latter pretty much has to. Otherwise, that would essentially mean you can create illegal strings by accident, but it's hard to create them in the obvious explicitly intentional way. (The other way around might be reasonable, however.) At any rate, I can see that allowing people to go out of their way to create invalid strings is potentially useful (for testing invalid string handling, if nothing else) and possibly a "consenting adults" issue even if it weren't. So maybe that's the one case from the list that isn't just an example of Nick's three general cases. But meanwhile: if you're intentionally writing literals for invalid strings to test for invalid string handling, is that an argument for this proposal? For example, I might want to test that some fast JSON library does the same thing as the stdlib one in all cases; if there's a text-to-text codec in front of it, that makes the test a lot harder to write. So I think it still comes down to what Nick said: if you've got surrogates in your unicode, either you have a bug at your boundaries, you're dealing with surrogate escapes, or I forget the third... Or you're doing it intentionally and don't want to fix it. (And, although you didn't re-raise it, Serhiy mentioned eval, so let me just say that something like "I called eval on some arbitrary string that happened to be JSON not Python" sounds like a bug at the boundaries case, not a separate problem.) From abarnert at yahoo.com Thu May 14 21:48:16 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 14 May 2015 12:48:16 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <1431614947.2825480.268771377.2227A960@webmail.messagingengine.com> References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <1431614947.2825480.268771377.2227A960@webmail.messagingengine.com> Message-ID: On May 14, 2015, at 07:49, random832 at fastmail.us wrote: > >> On Thu, May 14, 2015, at 04:48, Andrew Barnert via Python-ideas wrote: >> As far as I can tell, all of your extra cases are just examples of the >> surrogateescape error handler, which Nick already mentioned. > > Technically filesystem names (and other similar boundary APIs like > environ, anything ctypes, etc) on Windows can contain arbitrary > surrogates Are you sure? I thought that, unless you're using Win95 or NT 3.1 or something, Win32 *W APIs are explicitly for Unicode characters (not code units), minus nulls and any relevant reserved characters (e.g.. no slashes in filenames, no control characters in filenames except for substream names, etc.). That's what the Naming Files doc seems to imply. (Then again, there are other areas that seem confusing or misleading--e.g., where it tells you not to worry about normalization because once the string gets through Win32 and to the filesystem it's just a string of WCHARs, which sounds to me like that's exactly why you _should_ worry about normalization...) > and have nothing to do with surrogateescape. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Thu May 14 22:17:15 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 14 May 2015 13:17:15 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp> <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com> <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp> <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com> Message-ID: <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com> On May 14, 2015, at 07:45, random832 at fastmail.us wrote: [snipping reply to Stephen J. Turnbull] >> On Wed, May 13, 2015, at 14:18, Andrew Barnert wrote: >> That's exactly how you create the problems this thread is trying to >> solve. > > The point I was getting at was more "you can't benefit from libc > functions at all, therefore your argument for UTF-8 is bad" than "you > should be using the native wchar_t type". I'm not sure is this was Stephen's point, but _my_ point is not that it's easier to use UTF-16 incorrectly, but rather that it's just as easy to do, and much more likely to get through unit testing and lead to a later debugging nightmare when you do. The only bug that's easier to catch with UTF-16 is the incredibly obvious "why am I only seeing the first character of my filename" bug. > Libc only has functions to > deal with native char strings [but these do not generally count > characters or respect character boundaries in multibyte character sets > even if UTF-8 *is* the native multibyte character set] and native > wchar_t strings, not any other kind of string. > >> >> If you treat wchar_t as a "native wide char type" and call any of the wcs >> functions on UTF-16 strings, you will count astral characters as two >> characters, illegally split strings in the middle of surrogates, etc. > > No worse than UTF-8. If you can solve these problems for UTF-8 you can > solve them for UTF-16. > >> And >> you'll count BOMs as two characters and split them. > > Wait, what? The BOM is a single code unit in UTF-16. Sorry, that "two" was a stupid typo (or braino) for "one", which then changes the meaning of the rest of the paragraph badly. The point is that you can miscount lengths by counting the BOM, and you can split a BOM stream into a BOM steam and an "I hope it's in native order or we're screwed" stream. > There is *no* > encoding in which a BOM is two code units (it's three in UTF-8). Anyway, > BOM shouldn't be used for in-memory strings, only text files. In a language with StringIO and socket.makefile and FTP and HTTP requests as transparent file-like objects and a slew of libraries that can take an open binary or text file or a bytes or str, that last point doesn't work as well. For example, if I pass a binary file to you library's spam.parse function, I can expect that to be the same as reading the binary file and passing it to your spam.fromstring function. So, I may expect to be able to, say, re.split the document into smaller documents and pass them to spam.fromstring as well. Which is wrong, but it works when I test it, because most UTF-16 files are little-endian, and so is my machine. And then someone runs my app on a big-endian machine and they get a hard-to-debug exception (or, if we're really unlucky, silent mojibake, but that's pretty rare). >> These are basically >> all the same problems you have using char with UTF-8, and more, and >> harder to notice in testing (not just because you may not think to test >> for astral characters, but because even if you do, you may not think to >> test both byte orders). > > Byte orders are not an issue for anything other than file I/O, and I'm > not proposing using any type other than UTF-8 for *text files*, anyway, > only in-memory strings. Why do you want to use UTF-16 for in-memory strings? If you need to avoid the problems of UTF-8 (and can't use a higher-level Unicode API like Python's str type), you can use UTF-32, which solves all of the problems, or you can use UTF-16, which solves almost none of them, but makes them less likely to be caught in testing. There's a reason very new frameworks force you to use UTF-16 APIs and string types, only the ones that were originally written for UCS2 and it's too late to change (Win32, Cocoa, Java, and a couple others). >> Later versions of C and POSIX (as in later than what Python requires) >> provide explicit __CHAR16_TYPE__ and __CHAR_32_TYPE__, but they don't >> provide APIs for analogs of strlen, strchr, strtok, etc. for those types, >> so you have to be explicit about whether you're counting code points or >> characters (and, if characters, how you're dealing with endianness). > > There are no analogs of these for UTF-8 either. And endianness is not an > issue for in-memory strings stored using any of these types. Sure, if you've, say, explicitly encoded text to UTF-16-LE and want to treat it as UTF-16-LE, you don't need to worry about endianness; a WCHAR or char16_t is a WCHAR or char16_t. But why would you do that in the first place? Usually, when you have WCHARs, it's because you opened a file and wread from it, or received UTF-16 over the network of from a Windows FooW API, in which case you have the same endianness issues as any other binary I/O on non-char-sized types. And yes, of course the right answer is to decode at input, but if you're doing that, why wouldn't you just decide to Unicode instead of byte-swapping the WCHARs? From abarnert at yahoo.com Thu May 14 22:29:58 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 14 May 2015 13:29:58 -0700 Subject: [Python-ideas] lazy list In-Reply-To: References: <20150514152433.GY5663@ando.pearwood.info> Message-ID: <38002E1A-14E0-4251-A89C-28C70C186EF8@yahoo.com> On May 14, 2015, at 08:44, Alexander Atkins wrote: > >> A slice is just a subsequence of indexed values. If you can index it, >> you should be able to slice it. >> >> assert spam[start:end:step] == [spam[i] for i in range(start, end, step)] > > What I was trying to do was to create a slice without losing the laziness. For example, in my implementation you can take a slice like foo[start:] from an infinite sequence without causing problems. I haven't quite done it right, because I've returned an iterator instead of another LazyList object, but I could fix it up. I discuss this a bit more in the example program given in the repository. Having gone through this whole idea before (and then never finding a good use for it...), that's the only hard part--and the easiest way to solve that hard part is to create a generic sequence view library, which turns out to be more useful than the lazy list library anyway. (Plus, once you build the slice view type of the sequence view abstract type, it's pretty easy to build a deque- or rope-like sequence of discontiguous, or even different-source, slices, at which point tail-sharing becomes trivial, which makes lazy lists a lot more useful.) One more thing: a lot of the problems you (at least if you're thinking the same way I was) think you want lazy lists for, you only need tee--or you only need tee with its cache exposed so you can explicitly access it. Being able to directly index or slice or even delete from it as a sequence is a neat problem to solve, but it's hard to find a case where explicitly working on the cache is significantly less readable, and it's a lot simpler. But anyway, if you think it could be useful to someone else, you don't need to ask python-ideas whether to upload it to PyPI; just do it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu May 14 22:37:00 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 14 May 2015 13:37:00 -0700 Subject: [Python-ideas] Fwd: Add math.iszero() and math.isequal()? In-Reply-To: References: <4537a315-a08c-4838-8d55-1483ac9656bc@googlegroups.com> <85fe56bf-84a1-45b9-84bf-26b2ff389486@googlegroups.com> Message-ID: something went weird with the google groups mirror of this list -- sorry if this lands twice. -Chris ---------- Forwarded message ---------- From: Chris Barker Date: Thu, May 14, 2015 at 1:34 PM Subject: Re: [Python-ideas] Add math.iszero() and math.isequal()? To: Neil Girdhar Cc: "python-ideas at googlegroups.com" On Tue, May 12, 2015 at 11:24 PM, Neil Girdhar wrote: > See PEP 485, which appears to be still a draft: > https://www.python.org/dev/peps/pep-0485/ > It's been approved, and it's "just" waiting for me to implement the code and get it reviewed, etc. I've been much sidetracked, but hoping to get to in the next couple days.... iszero = lambda x: hash(x) == hash(0) >> isequal = lambda a, b: hash(a) == hash(b) >> >> Clearly these are trivial functions (but perphaps math experts could >> provide better implementations; I'm not proposing the implementations >> shown, just the functions however they are implemented). >> > I'm not familiar with how hashing works for floats, but I can't image this would even work -- == an !== work for floats, then just don't test what people most often want :-) Anyway, see the PEP, and the quite long and drawn out discussion on this list a couple months back. -CHB > >> It seems that not everyone is aware of the issues regarding comparing >> floats for equality and so I still see code that compares floats using == >> or !=. >> >> If these functions were in the math module it would be convenient (since >> I find I need them in most non-trivial programs), but also provide a place >> to document that they should be used rather than == or != for floats. (I >> guess a similar argument might apply to the cmath module?) >> >> >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Thu May 14 22:17:58 2015 From: ram at rachum.com (Ram Rachum) Date: Thu, 14 May 2015 23:17:58 +0300 Subject: [Python-ideas] Add `Executor.filter` In-Reply-To: References: Message-ID: I'd like to move `Executor.filter` forward, if that's possible. Can we get more people on the list to express their opinion about whether `Executor.filter` should be added to the stdlib? (See my implementation in a previous message on this thread.) On Thu, May 7, 2015 at 6:56 AM, Nick Coghlan wrote: > On 2 May 2015 at 19:25, Ram Rachum wrote: > > Okay, I implemented it. Might be getting something wrong because I've > never > > worked with the internals of this module before. > > I think this is sufficiently tricky to get right that it's worth > adding filter() as a parallel to the existing map() API. > > However, it did raise a separate question for me: is it currently > possible to use Executor.map() and the as_completed() module level > function together? Unless I'm missing something, it doesn't look like > it, as map() hides the futures from the caller, so you only have > something to pass to as_completed() if you invoke submit() directly. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu May 14 23:03:02 2015 From: guido at python.org (Guido van Rossum) Date: Thu, 14 May 2015 14:03:02 -0700 Subject: [Python-ideas] Add `Executor.filter` In-Reply-To: References: Message-ID: If there's a working patch and you can get a core developer as a reviewer I'm fine with that. No PEP needed.) On Thu, May 14, 2015 at 1:17 PM, Ram Rachum wrote: > I'd like to move `Executor.filter` forward, if that's possible. Can we get > more people on the list to express their opinion about whether > `Executor.filter` should be added to the stdlib? (See my implementation in > a previous message on this thread.) > > On Thu, May 7, 2015 at 6:56 AM, Nick Coghlan wrote: > >> On 2 May 2015 at 19:25, Ram Rachum wrote: >> > Okay, I implemented it. Might be getting something wrong because I've >> never >> > worked with the internals of this module before. >> >> I think this is sufficiently tricky to get right that it's worth >> adding filter() as a parallel to the existing map() API. >> >> However, it did raise a separate question for me: is it currently >> possible to use Executor.map() and the as_completed() module level >> function together? Unless I'm missing something, it doesn't look like >> it, as map() hides the futures from the caller, so you only have >> something to pass to as_completed() if you invoke submit() directly. >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu May 14 23:10:25 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 14 May 2015 14:10:25 -0700 Subject: [Python-ideas] Add `Executor.filter` In-Reply-To: References: Message-ID: <55550F41.2090805@stoneleaf.us> On 05/14/2015 01:17 PM, Ram Rachum wrote: > I'd like to move `Executor.filter` forward, if that's possible. Can we > get more people on the list to express their opinion about whether > `Executor.filter` should be added to the stdlib? (See my implementation > in a previous message on this thread.) Open up an issue on the tracker and attach your patch. -- ~Ethan~ From solipsis at pitrou.net Thu May 14 23:24:41 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 14 May 2015 23:24:41 +0200 Subject: [Python-ideas] Add `Executor.filter` References: Message-ID: <20150514232441.2aa648cf@fsol> On Thu, 14 May 2015 23:17:58 +0300 Ram Rachum wrote: > I'd like to move `Executor.filter` forward, if that's possible. Can we get > more people on the list to express their opinion about whether > `Executor.filter` should be added to the stdlib? (See my implementation in > a previous message on this thread.) I don't think there's a common use case for Executor.filter(). Builtin filter() and map() are a bad analogy, because they are meant to be easily composable in order to define more complex processing chains. But I don't see a reason to compose Executor operations. Regards Antoine. From mertz at gnosis.cx Fri May 15 02:11:28 2015 From: mertz at gnosis.cx (David Mertz) Date: Thu, 14 May 2015 17:11:28 -0700 Subject: [Python-ideas] lazy list In-Reply-To: References: Message-ID: I actually taught almost exactly this two days ago as an example of a class in the context of laziness, and included it in a white paper I wrote for O'Reilly on _Functional Programming in Python_ that will be given out starting at OSCon. I'm sure I'm also not the first or the 50th person to think of it. My basic implementation--made to exhibit a concept not to be complete--was rather short: from collections.abc import Sequence class ExpandingSequence(Sequence): def __init__(self, it): self.it = it self._cache = [] def __getitem__(self, index): while len(self._cache) <= index: self._cache.append(next(self.it)) return self._cache[index] def __len__(self): return len(self._cache) I think it's kinda cute. Especially when passed in something like an infinite iterator of all the primes or all the Fibonacci numbers. But I can't really recommend it (nor the fleshed out version the OP wrote) for the standard library. There's no size limit to the object, and so we don't *really* save space over just appending more elements to a list. I can certainly see that it could be OK for particular people with particular use cases, but it doesn't feel general enough for stdlib. On Thu, May 14, 2015 at 7:52 AM, Alexander Atkins wrote: > Hi, I'm new to this mailing list. > > I needed a lazy list implementation for something, so I created one. I > was a little bit surprised to find that there wasn't one in the > *itertools* module and it seemed like quite a basic thing to me, as > someone who has used Haskell before, so I thought probably I should share > it. I'm wondering whether something like this should be part of the > standard library? > > A fuller explanation is in the README, which is here: > https://github.com/jadatkins/python-lazylist > The gist of it is that it allows you to index into a generator. > Previously evaluated elements are remembered, so foo[5] returns the same > thing each time, and you can later call foo[4] and get the previous > element. There are many uses for such a thing, but if you're not expecting > it in the language, you might not necessarily think of them. > > Warning: it may contain bugs, especially the stuff to do with slicing, > which is not really what it's for. > > -- > > *J Alexander D Atkins* > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Fri May 15 03:02:26 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 15 May 2015 10:02:26 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > >>>> '\U0000d834\U0000dd1e' > > '\ud834\udd1e' > > > > Isn't that disgusting? > > No; if the former gave you surrogates, the latter pretty much has to. That, of course. What I was referring to as "disgusting" was using 32-bit syntax for Unicode literals to create surrogates. > But meanwhile: if you're intentionally writing literals for invalid > strings to test for invalid string handling, is that an argument > for this proposal? No. I see three cases: (1) Problem: You created a Python string which is invalid Unicode using literals or chr(). Solution: You know why you did that, we don't. You deal with it. (aka, "consenting adults") (2) Problem: You used surrogateescape or surrogatepass because you want the invalid Unicode to get to the other side some times. Solution: That's not a problem, that's a solution. Advice: Handle with care, like radioactives. Use strict error handling everywhere except the "out" door for invalid Unicode. If you can't afford a UnicodeError if such a string inadvertantly gets mixed with other stuff, use "try". (aka, "consenting adults") (3) Problem: Code you can't or won't fix buggily passes you Unicode that might have surrogates in it. Solution: text-to-text codecs (but I don't see why they can't be written as encode-decode chains). As I've written before, I think text-to-text codecs are an attractive nuisance. The temptation to use them in most cases should be refused, because it's a better solution to deal with the problem at the incoming boundary or the outgoing boundary (using str<->bytes codecs). Dealing with them elsewhere and reintroducing the corrupted str into the data flow is likely to cause issues with correctness (if altered data is actually OK, why didn't you use a replace error handler in the first place?) And most likely unless you do a complete analysis of all the ways str can get into or out of your module, you've just started a game of whack-a-mole. I could very easily be wrong about my assessment of where the majority of these Unicode handling defects get injected: it's possible the great majority comes from assorted legacy modules, and whack-a-mole is the most cost-effective way to deal with them for most programs. I hope not, though. :-/ From p.f.moore at gmail.com Fri May 15 14:21:29 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 15 May 2015 13:21:29 +0100 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp> <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 15 May 2015 at 02:02, Stephen J. Turnbull wrote: > (3) Problem: Code you can't or won't fix buggily passes you Unicode > that might have surrogates in it. > Solution: text-to-text codecs (but I don't see why they can't be > written as encode-decode chains). > > As I've written before, I think text-to-text codecs are an attractive > nuisance. The temptation to use them in most cases should be refused, > because it's a better solution to deal with the problem at the > incoming boundary or the outgoing boundary (using str<->bytes codecs). One case I'd found a need for text->text handling (although not related to surrogates) was taking arbitrary Unicode and applying an error handler to it before writing it to a stream with "strict" encoding. (So something like "arbitrary text".encode('latin1', 'errors='backslashescape').decode('latin1')). The encode/decode pair seemed ugly, although it was the only way I could find. I could easily imagine using a "rehandle" type of function for this (although I wouldn't use the actual proposed functions here, as the use of "surrogate" and "astral" in the names would lead me to assume they were inappropriate). Whether that's an argument for or against the idea that they are an attractive nuisance, I'm not sure :-) Paul From koos.zevenhoven at aalto.fi Fri May 15 16:00:26 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Fri, 15 May 2015 17:00:26 +0300 Subject: [Python-ideas] Units in type hints In-Reply-To: <16724_1431604817_55548E51_16724_7557_1_20150514115956.GW5663@ando.pearwood.info> References: <5554810B.7050409@aalto.fi> <16724_1431604817_55548E51_16724_7557_1_20150514115956.GW5663@ando.pearwood.info> Message-ID: <5555FBFA.8090805@aalto.fi> On 14.5.2015 14:59, Steven D'Aprano wrote: > On Thu, May 14, 2015 at 02:03:39PM +0300, Koos Zevenhoven wrote: >> Hi all, >> >> How about extending the type annotations for int, float and complex to >> optionally include also a unit? > I really, really like the idea of having unit-aware calculations. > > But this is not the way to do it. See below: > > Getting something even better would of course be great. Needless to say, I would not be in favor of adding my first rough sketch to Python. I do believe that, whatever the solution, it would need to be some kind of a standard for it to really work. See comments below. >> For instance, >> >> def sleep(duration : Float['s']): >> ... >> >> Now the type checker could catch the error of trying to pass the sleep >> duration in milliseconds, Float['ms']. > But that's not an error. Calling sleep(weight_in_kilograms) is an error. In the example I gave, it is clearly an error. And it would be an error with time.sleep. But you are obviously right, sleeping for kilograms is also an error, although a very bizarre one. > But calling sleep(milliseconds(1000)) should be the same as calling > sleep(seconds(1)). Yes, something like that would be nice. What would sleep(1) do? > If the user has to do the conversion themselves, > that's a source of error: > > sleep(time_in_milliseconds / 1000) # convert to seconds > > If you think that's too obvious an error for anyone to make, You lost me now. There does not seem to be an error in the line of code you provided, especially not when using Python 3, which has true division by default. However, in what I proposed, the type checker would complain because you made a manual conversion without changing the unit hint (which is also potential source of error, and you seem to agree). According to my preliminary sketch, the correct way (which you did not quote) would be sleep(convert(time_in_milliseconds, 'ms', 's')) I do think this might be unnecessarily verbose. Anyway, I was not proposing the 'user' should do the actual conversion calculation by hand. > (1) you're > wrong, I've made that error, yes even that simple, and (2) you should > try it with more complex sets of units. How many pound-foot per minute > squared in a newton? There's no error so we will never find out whether I would have been wrong :(. But I can assure you, I have made errors in unit conversions too. Anyway, you did not quote the part of my email which addresses conversions and derived units (km/h). Regarding your example, it might work like this (not that I think this is optimal, though): convert(value, 'lb * ft / min**2', 'N') > Having the language support unit calculations is not just to catch the > wrong dimensions (passing a weight where a time is needed), but to > manage unit conversions automatically without the user being responsible > for getting the conversion right. That would be ideal, I agree. Would that not be a really hard thing to introduce into the language, taking into account backwards compatibility and all? I intentionally proposed something less than that. > A type checker is the wrong tool for > the job. At least not ideal. I do think the error of using the wrong unit is conceptually similar to many cases of accidentally passing something with the wrong type. Also, the type hints have other uses besides type checkers. Of course, having everything just work, without the user/programmer having to care, would be even better. > If you want to see what a good unit-aware language should be capable of, > check out: > > - Frink:https://futureboy.us/frinkdocs/ > > - the HP-28 and HP-48 series of calculators; > > - the Unix/Linux "units" utility. > > There are also some existing Python libraries which do unit > calculations. You should look into them. > There was also a talk at PyCon about existing libraries, but I can't seem to find it now. I assume some of you have seen it. -- Koos From koos.zevenhoven at aalto.fi Fri May 15 16:21:58 2015 From: koos.zevenhoven at aalto.fi (Koos Zevenhoven) Date: Fri, 15 May 2015 17:21:58 +0300 Subject: [Python-ideas] Units in type hints In-Reply-To: References: <5554810B.7050409@aalto.fi> Message-ID: <55560106.4080909@aalto.fi> Thanks for the email and tip! For my own code, I tend to always use SI units or those derived from them. If I want 3 milliseconds, I do 3e-3. Although seconds are pretty universal, not everyone has the privilege of being born and raised in SI units :P. Well, I guess m/s is rarely the everyday unit for speed, anywhere. For me, the problems arise when there are third-party non-SI functions or things like functions that take a duration in terms of samples of discretized signals (potentially Int['samples'] or in some cases Float['samples']). -- Koos On 2015-05-14 17:05, Skip Montanaro wrote: > On Thu, May 14, 2015 at 6:03 AM, Koos Zevenhoven > wrote: >> How about extending the type annotations for int, float and complex to >> optionally include also a unit? > Not sure that's going to fly, but you might want to check out the > magnitude package: > > https://pypi.python.org/pypi/magnitude/0.9.1 > > I've used it in situations where I want to specify units scaled to a > more natural (to me) size. For example, the gobject.timeout_add > function takes a delay in milliseconds. Given that most of the time I > want delays in seconds or minutes, it's much more natural for me to > let magnitude do the work silently. > > Skip From rosuav at gmail.com Fri May 15 17:28:36 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 16 May 2015 01:28:36 +1000 Subject: [Python-ideas] Units in type hints In-Reply-To: <5555FBFA.8090805@aalto.fi> References: <5554810B.7050409@aalto.fi> <16724_1431604817_55548E51_16724_7557_1_20150514115956.GW5663@ando.pearwood.info> <5555FBFA.8090805@aalto.fi> Message-ID: On Sat, May 16, 2015 at 12:00 AM, Koos Zevenhoven wrote: > On 14.5.2015 14:59, Steven D'Aprano wrote: >> But that's not an error. Calling sleep(weight_in_kilograms) is an error. > > In the example I gave, it is clearly an error. And it would be an error with > time.sleep. But you are obviously right, sleeping for kilograms is also an > error, although a very bizarre one. I dunno, maybe you're a heavy sleeper? :) >> If the user has to do the conversion themselves, >> that's a source of error: >> >> sleep(time_in_milliseconds / 1000) # convert to seconds >> >> If you think that's too obvious an error for anyone to make, > > You lost me now. There does not seem to be an error in the line of code you > provided, especially not when using Python 3, which has true division by > default. However, in what I proposed, the type checker would complain > because you made a manual conversion without changing the unit hint (which > is also potential source of error, and you seem to agree). Dividing a unit-aware value by a scalar shouldn't be an error. "I have an A4 sheet of paper. If I fold it in half seven times, how big will it be?" => 210mm*297mm/(2**7) == 487.265625 mm^2. The unit would simply stay the same after the division; what you'd have is the thousandth part of the time, still in milliseconds. If you have a typing system that's unit-aware, this would still be an error, but it would be an error because you're still giving milliseconds to a function that wants seconds. It'd possibly be best to have actual real types for your unit-aware values. Something like: class UnitAware: def __init__(self, value: float, unit: str): self.value = value self.unit = unit def __mul__(self, other): if isinstance(other, UnitAware): # perform compatibility/conversion checks else: return UnitAware(self.value * other, self.unit) # etc def as(self, unit): # attempt to convert this value into the other unit Then you could have hinting types that stipulate specific units: class Unit(str): def __instancecheck__(self, val): return isinstance(val, UnitAware) and val.unit == self ms = Unit("ms") sec = Unit("sec") m = Unit("m") This would allow you to go a lot further than just type hints. But maybe this would defeat the purpose, in that it'd have to have every caller and callee aware that they're looking for a unit-aware value rather than a raw number - so it wouldn't be easy to deploy backward-compatibly. ChrisA From skip.montanaro at gmail.com Fri May 15 18:07:58 2015 From: skip.montanaro at gmail.com (Skip Montanaro) Date: Fri, 15 May 2015 11:07:58 -0500 Subject: [Python-ideas] Units in type hints In-Reply-To: References: <5554810B.7050409@aalto.fi> <16724_1431604817_55548E51_16724_7557_1_20150514115956.GW5663@ando.pearwood.info> <5555FBFA.8090805@aalto.fi> Message-ID: On Fri, May 15, 2015 at 10:28 AM, Chris Angelico wrote: > Dividing a unit-aware value by a scalar shouldn't be an error. "I > have an A4 sheet of paper. If I fold it in half seven times, how big > will it be?" => 210mm*297mm/(2**7) == 487.265625 mm^2. Here's this example using the magnitude module: >>> from magnitude import mg >>> a4 = mg(210, "mm") * mg(297, "mm") >>> a4 >>> # Invalid - units are actually mm^2 >>> a4.ounit("mm") Traceback (most recent call last): File "", line 1, in File "/opt/local/lib/python2.7/site-packages/magnitude.py", line 397, in ounit (self.out_factor.unit, self.unit)) MagnitudeError: Inconsistent Magnitude units: [1, 0, 0, 0, 0, 0, 0, 0, 0], [2, 0, 0, 0, 0, 0, 0, 0, 0] >>> a4.ounit("mm2") >>> a4.ounit("mm2").toval() 62370.0 >>> a4.toval() 62370.0 >>> 210 * 297 62370 >>> # Not sure why dimensionless / isn't supported >>> folded7x = a4 * (1/2**7) >>> folded7x >>> folded7x.ounit("mm2").toval() 487.265625 Skip From apieum at gmail.com Fri May 15 18:13:26 2015 From: apieum at gmail.com (Gregory Salvan) Date: Fri, 15 May 2015 18:13:26 +0200 Subject: [Python-ideas] Units in type hints In-Reply-To: <55560106.4080909@aalto.fi> References: <5554810B.7050409@aalto.fi> <55560106.4080909@aalto.fi> Message-ID: Hi, why don't you try a tuple of types ? (Float, Samples) Period: (int, Time) and if you want to force seconds eventually: (int, Time[seconds]) 2015-05-15 16:21 GMT+02:00 Koos Zevenhoven : > Thanks for the email and tip! > > For my own code, I tend to always use SI units or those derived from them. > If I want 3 milliseconds, I do 3e-3. Although seconds are pretty universal, > not everyone has the privilege of being born and raised in SI units :P. > Well, I guess m/s is rarely the everyday unit for speed, anywhere. > > For me, the problems arise when there are third-party non-SI functions or > things like functions that take a duration in terms of samples of > discretized signals (potentially Int['samples'] or in some cases > Float['samples']). > > -- Koos > > > > On 2015-05-14 17:05, Skip Montanaro wrote: > >> On Thu, May 14, 2015 at 6:03 AM, Koos Zevenhoven >> wrote: >> >>> How about extending the type annotations for int, float and complex to >>> optionally include also a unit? >>> >> Not sure that's going to fly, but you might want to check out the >> magnitude package: >> >> https://pypi.python.org/pypi/magnitude/0.9.1 >> >> I've used it in situations where I want to specify units scaled to a >> more natural (to me) size. For example, the gobject.timeout_add >> function takes a delay in milliseconds. Given that most of the time I >> want delays in seconds or minutes, it's much more natural for me to >> let magnitude do the work silently. >> >> Skip >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri May 15 19:14:07 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 16 May 2015 03:14:07 +1000 Subject: [Python-ideas] Units in type hints In-Reply-To: <5555FBFA.8090805@aalto.fi> References: <5554810B.7050409@aalto.fi> <16724_1431604817_55548E51_16724_7557_1_20150514115956.GW5663@ando.pearwood.info> <5555FBFA.8090805@aalto.fi> Message-ID: <20150515171407.GZ5663@ando.pearwood.info> On Fri, May 15, 2015 at 05:00:26PM +0300, Koos Zevenhoven wrote: > On 14.5.2015 14:59, Steven D'Aprano wrote: [...] > >>For instance, > >> > >> def sleep(duration : Float['s']): > >> ... > >> > >>Now the type checker could catch the error of trying to pass the sleep > >>duration in milliseconds, Float['ms']. > > > >But that's not an error. Calling sleep(weight_in_kilograms) is an error. > > In the example I gave, it is clearly an error. And it would be an error > with time.sleep. But you are obviously right, sleeping for kilograms is > also an error, although a very bizarre one. Calling sleep(x) where x is a millisecond unit should not be an error, because millisecond is just a constant times second. To be precise, milliseconds and seconds both have the same dimension, T (time) and so differ only by a fixed conversion constant. Hence: sleep( 1000 millisecond ) sleep( 1 second ) sleep( 0.016666667 minute ) sleep( 8.2671958e-07 fortnight ) sleep( 0.91134442 feet/kph ) etc. should all have exactly the same result, namely, to sleep for one second. (Obviously "1 second" is not valid Python syntax. I'm just using it as shorthand for whatever syntax is used, possibly a function call.) > >But calling sleep(milliseconds(1000)) should be the same as calling > >sleep(seconds(1)). > > Yes, something like that would be nice. What would sleep(1) do? That depends on the sleep function. If we're talking about the actual time.sleep function that exists today, it will sleep for one second. But that's because it's not aware of units. A unit-aware function could: - assume you know what you are doing and assign a default unit to scalar quantities, e.g. treat 1 as "1 second"; - treat 1 as a dimensionless quantity and raise an exception ("no dimension" is not compatible with "time dimension"). Of the two, the Zen suggests the second is the right thing to do. ("In the face of ambiguity, refuse the temptation to guess.") But perhaps backwards-compatibility requires the first. You could, I suppose, use a static type checker to get a really poor unit checker: def sleep(t: Second): time.sleep(t) class Second(float): pass sleep(1.0) # type checker flags this as wrong sleep(Second(1.0)) # type checker allows this But it's a really poor one, because it utter fails to enforce conversion factors, not even a little bit: class Minute(float): pass one_hour = Minute(60.0) sleep(one_hour) # flagged as wrong sleep(Second(one_hour)) # allowed but of course that will sleep for 60 seconds, not one hour. The problem here is that we've satisfied the type checker with meaningless types that don't do any conversions, and left all the conversions up to the user. > >If the user has to do the conversion themselves, > >that's a source of error: > > > >sleep(time_in_milliseconds / 1000) # convert to seconds > > > >If you think that's too obvious an error for anyone to make, > > You lost me now. There does not seem to be an error in the line of code > you provided, especially not when using Python 3, which has true > division by default. D'oh! Well, I demonstrated my point that unit conversions are prone to human error, only not the way I intended to. I *intended* to write the conversion the wrong way around, except I got it wrong myself. I *wrongly* convinced myself that the conversion factor was milliseconds * 1000 -> seconds, hence /1000 would get it wrong. Only it isn't. Believe me, I didn't intend to make my point in such a convoluted way. This was a genuine screw-up on my part. > However, in what I proposed, the type checker would > complain because you made a manual conversion without changing the unit > hint (which is also potential source of error, and you seem to agree). > According to my preliminary sketch, the correct way (which you did not > quote) would be > > sleep(convert(time_in_milliseconds, 'ms', 's')) That can't work, because how does the static type checker know that convert(time_in_milliseconds, 'ms', 's') returns seconds rather than milliseconds or minutes or days? What sort of annotations can you give convert() that will be known at compile time? Maybe you can see something I haven't thought of, but I cannot think of any possible static declaration which would allow a type checker to correctly reason that convert(x, 'ms', 's') returns Second or Float['s'], and so does this: new_unit = get_unit_from_config() convert(y, 'minute', new_unit) but not this: convert(z, 's', 'hour') Let alone more complex cases involving units multiplied, divided, and raised to powers. -- Steve From random832 at fastmail.us Fri May 15 20:14:52 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 15 May 2015 14:14:52 -0400 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <1431614947.2825480.268771377.2227A960@webmail.messagingengine.com> Message-ID: <1431713692.3280482.269840281.21784468@webmail.messagingengine.com> On Thu, May 14, 2015, at 15:48, Andrew Barnert wrote: > > Technically filesystem names (and other similar boundary APIs like > > environ, anything ctypes, etc) on Windows can contain arbitrary > > surrogates > > Are you sure? I thought that, unless you're using Win95 or NT 3.1 or > something, Win32 *W APIs are explicitly for Unicode characters (not code > units), Windows documentation often uses "unicode" to mean UTF-16 and "character" to mean WCHAR. The real point is that the APIs perform no validation, and existing filenames on the disk, user input into edit controls, etc, can contain invalid surrogates. There's basically nothing at any point to reject invalid surrogates. I can create a file now whose filename consists of a single surrogate code unit. I can copy that filename to the clipboard, paste it anywhere, create more files with it in the filename or contents, etc. (Notepad, incidentally, will save a UTF-16 file containing an invalid surrogate, but saving it as UTF-8 will replace it with U+FFFD, the one and only place I could find where invalid surrogates are rejected by Windows). > minus nulls and any relevant reserved characters (e.g.. no > slashes in filenames, no control characters in filenames except for > substream names, etc.). That's what the Naming Files doc seems to imply. > (Then again, there are other areas that seem confusing or > misleading--e.g., where it tells you not to worry about normalization > because once the string gets through Win32 and to the filesystem it's > just a string of WCHARs, which sounds to me like that's exactly why you > _should_ worry about normalization...)' Well, it depends on why you're worried about it. No normalization is great for being able to expect that your filename you just saved will come back unchanged in a directory listing. From random832 at fastmail.us Fri May 15 20:19:35 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 15 May 2015 14:19:35 -0400 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp> <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com> <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp> <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com> <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com> Message-ID: <1431713975.3281476.269867321.225D681F@webmail.messagingengine.com> On Thu, May 14, 2015, at 16:17, Andrew Barnert wrote: > The point is that you can miscount lengths by counting the BOM, and you > can split a BOM stream into a BOM steam and an "I hope it's in native > order or we're screwed" stream. Python provides no operations for splitting streams. You mention re.split further on, but that only works on in-memory strings, which should have already had the BOM stripped and been put in native order. In-memory wide strings should _never_ be in an endianness other than the machine's native one and should _never_ have a BOM. That should be taken care of when reading it off the disk/wire. If you haven't done that, you still have a byte array, which it's not so easy to accidentally assume you'll be able to split up and pass to your fromstring function. > Which is wrong, but it works when I test it, > because most UTF-16 files are little-endian, and so is my machine. And > then someone runs my app on a big-endian machine and they get a > hard-to-debug exception (or, if we're really unlucky, silent mojibake, > but that's pretty rare). The proper equivalent of a UTF-16 file with a byte-order-mark would be a _binary_ StringIO on a _byte_ array containing a BOM and UTF-16. You can layer a TextIOWrapper on top of either of them. And it never makes sense to expect to be able to arbitrarily split up encoded byte arrays, whether those are in UTF-16 or not. > Usually, when you have WCHARs, it's because you opened a file and wread > from it, or received UTF-16 over the network of from a Windows FooW API, > in which case you have the same endianness issues as any other binary I/O > on non-char-sized types. And yes, of course the right answer is to decode > at input, but if you're doing that, why wouldn't you just decide to > Unicode instead of byte-swapping the WCHARs? You shouldn't have WCHARS (of any kind) in the first place until you've decoded. If you're receiving UTF-16 of unknown endianness over the network you should be receiving it as bytes. If you're directly calling a FooW API, you are obviously on a win32 system and you've already got native WCHARs in native endianness. But, once again, that wasn't really my point. My point that there are no native libc functions for working with utf-8 strings - even if you're willing to presume that the native multibyte character set is UTF-8, there are very few standard functions for working with multibyte characters. "ascii compatibility" means you're going to write something using strchr or strtok that works for ascii characters and does something terrible when given non-ascii multibyte characters to search for. The benefits of using libc only work if you play by libc's rules, which we've established are inadequate. If you're _not_ going to use libc string functions, then there's no reason not to prefer UTF-32 (when you're not using the FSR, which is essentially a fancy immutable container for UTF-32 code points) over UTF-8. From abarnert at yahoo.com Fri May 15 21:37:57 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 15 May 2015 12:37:57 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <1431713975.3281476.269867321.225D681F@webmail.messagingengine.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp> <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com> <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp> <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com> <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com> <1431713975.3281476.269867321.225D681F@webmail.messagingengine.com> Message-ID: <503A25DF-4B83-4DBA-A5AA-3F3B0224B596@yahoo.com> On May 15, 2015, at 11:19, random832 at fastmail.us wrote: > >> On Thu, May 14, 2015, at 16:17, Andrew Barnert wrote: >> The point is that you can miscount lengths by counting the BOM, and you >> can split a BOM stream into a BOM steam and an "I hope it's in native >> order or we're screwed" stream. > > Python provides no operations for splitting streams. You mention > re.split further on, but that only works on in-memory strings, which > should have already had the BOM stripped and been put in native order. If you're decoding to text, you don't have UTF-16 anymore (or, if you do under the covers, you neither know nor care that you do), you have Unicode text. Conversely, if you have UTF-16--even in native order and with the BOM stripped--you don't have text, you still have bytes (or WCHARs, if you prefer, but not in Python). Why would you want to transcode from one encoding to another in memory just to still have to work on encoded bytes? There's no more reason for you to be passing byteswapped, BOM-stripped UTF-16 to re.split than there is for you to be passing any other encoded bytes to re.split. > In-memory wide strings should _never_ be in an endianness other than the > machine's native one and should _never_ have a BOM. That should be taken > care of when reading it off the disk/wire. If you haven't done that, you > still have a byte array, which it's not so easy to accidentally assume > you'll be able to split up and pass to your fromstring function. I explicitly mentioned opening the file in binary mode, reading it in, and passing it to some fromstring function that takes bytes, so yes, of course you have a byte array. And again, if you have UTF-16, even in native endianness and without a BOM, that's still a byte array, so how is that any different? And of course you can have in-memory byte arrays with a BOM, or in non-native endianness; that's what the UTF-16 and UTF-16-BE (or -LE) codecs produce and consume. And it _is_ easy to use those byte arrays, exactly as easy as to use UTF-8 byte arrays or native-endian BOM-less UTF-16 byte arrays or anything else. All you need is a library that's willing to do the decoding for you in its loads/fromstring/etc. function, which includes most libraries on PyPI (because otherwise they wouldn't work with str in 2.x). See simplejson, for an example. >> Which is wrong, but it works when I test it, >> because most UTF-16 files are little-endian, and so is my machine. And >> then someone runs my app on a big-endian machine and they get a >> hard-to-debug exception (or, if we're really unlucky, silent mojibake, >> but that's pretty rare). > > The proper equivalent of a UTF-16 file with a byte-order-mark would be a > _binary_ StringIO on a _byte_ array containing a BOM and UTF-16. I mentioned BytesIO; that's what a binary StringIO is called. > You can > layer a TextIOWrapper on top of either of them. And it never makes sense > to expect to be able to arbitrarily split up encoded byte arrays, > whether those are in UTF-16 or not. There are countless protocols and file formats that _require_ being able to split byte arrays before decoding them. That's how you split the header and body of an RFC822 message like an email or an HTTP response, and how you parse OLE substreams out of a binary-format Office file. >> Usually, when you have WCHARs, it's because you opened a file and wread >> from it, or received UTF-16 over the network of from a Windows FooW API, >> in which case you have the same endianness issues as any other binary I/O >> on non-char-sized types. And yes, of course the right answer is to decode >> at input, but if you're doing that, why wouldn't you just decide to >> Unicode instead of byte-swapping the WCHARs? > > You shouldn't have WCHARS (of any kind) in the first place until you've > decoded. And yet Microsoft's APIs, both Win32 and MSVCRT, are full of wread and similar functions. But anyway, I'll grant that you usually shouldn't have WCHARs before you've decoded. But you definitely should not have WCHARs _after_ you've decoded. In fact, you _can't_ have them after you've decoded, because a WCHAR isn't big enough to hold a Unicode code point. If you have WCHARs, either you're still encoded (or just transcoded to UTF-16), or your code will break as soon as you get a Chinese user with a moderately uncommon last name. So, you should never have WCHARs. Which was my point in the first place. If you need to deal with UTF-16 streams, treat them as streams of bytes and decode them the same way you would UTF-8 or Big5 or anything else, don't treat them as streams of WCHARs that are often but not always complete Unicode characters. > If you're receiving UTF-16 of unknown endianness over the > network you should be receiving it as bytes. If you're directly calling > a FooW API, you are obviously on a win32 system and you've already got > native WCHARs in native endianness. Only if you got those characters from another win32 FooW API, as opposed to, say, from user input from a cross-platform GUI framework that may have different rules from Windows. > But, once again, that wasn't really > my point. > > My point that there are no native libc functions for working with utf-8 > strings - even if you're willing to presume that the native multibyte > character set is UTF-8, there are very few standard functions for > working with multibyte characters. "ascii compatibility" means you're > going to write something using strchr or strtok that works for ascii > characters and does something terrible when given non-ascii multibyte > characters to search for. But many specific static patterns _do_ work with ASCII compatible encodings. Again, think of HTTP responses. Even though the headers and body are both text, they're defined as being separated by b"\r\n\r\n". If this were never useful--or if it often seemed useful but was really just an attractive nuisance--Python 3 wouldn't have bytes.split and bytes.find and be adding bytes.__mod__. Or do you think that proposal is a mistake? > The benefits of using libc only work if you play by libc's rules, which > we've established are inadequate. If you're _not_ going to use libc > string functions, then there's no reason not to prefer UTF-32 (when > you're not using the FSR, which is essentially a fancy immutable > container for UTF-32 code points) over UTF-8. Preferring UTF-32 over UTF-8 makes perfect sense. But that's not what you started out arguing. Nick mentioned off-hand that UTF-16 has the worst of both worlds of UTF-8 and UTF-32, Stephen explained that further to someone else, and you challenged his explanation, arguing that UTF-16 doesn't introduce any problems over UTF-8. But it does. It introduces all the same problems as UTF-32, but without any of the benefits. From wes.turner at gmail.com Fri May 15 23:23:10 2015 From: wes.turner at gmail.com (Wes Turner) Date: Fri, 15 May 2015 16:23:10 -0500 Subject: [Python-ideas] Units in type hints In-Reply-To: <5554810B.7050409@aalto.fi> References: <5554810B.7050409@aalto.fi> Message-ID: * https://pint.readthedocs.org/en/latest/ (supports NumPy) * QUDT maintains SI units, non-SI units, conversion factors, labels, etc. as RDF classes and instances with properties: * https://wrdrd.com/docs/consulting/knowledge-engineering#qudt On Thu, May 14, 2015 at 6:03 AM, Koos Zevenhoven wrote: > Hi all, > > How about extending the type annotations for int, float and complex to > optionally include also a unit? > > For instance, > > def sleep(duration : Float['s']): > ... > > Now the type checker could catch the error of trying to pass the sleep > duration in milliseconds, Float['ms']. This would also be useful for > documentation, avoiding the 'need' for having names like duration_s. At > least the notation with square brackets would resemble the way units are > often written in science. > > Another example: > > def calculate_travel_time(distance: Float['km']) -> Float['h']: > speed = get_current_speed() # type: Float['km/h'] > return distance / speed > > Now, if you try to pass the distance in miles, or Float['mi'], the type > checker would catch the error. Note that the type checker would also > understand that 'km' divided by 'km/h' becomes 'h'. Or should these be > something like units.km / units.h? > > But if you do have your distance in miles, you do > > calculate_travel_time(units.convert(distance_mi, 'mi', 'km')) > > and the type checker and programmer get what they want. > > Anyone interested? > > > -- Koos > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Fri May 15 23:52:18 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 15 May 2015 17:52:18 -0400 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <503A25DF-4B83-4DBA-A5AA-3F3B0224B596@yahoo.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp> <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com> <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp> <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com> <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com> <1431713975.3281476.269867321.225D681F@webmail.messagingengine.com> <503A25DF-4B83-4DBA-A5AA-3F3B0224B596@yahoo.com> Message-ID: <1431726738.3335993.270006353.65CC50BF@webmail.messagingengine.com> On Fri, May 15, 2015, at 15:37, Andrew Barnert wrote: > Conversely, if you have UTF-16--even in native order and with the BOM > stripped--you don't have text, you still have bytes (or WCHARs, if you > prefer, but not in Python). This line of discussion began with someone asserting the [dubious] merits of using the native libc functions, which on windows does mean UTF-16 WCHARs as well as (ASCII, but certainly not properly-handled UTF-8) bytes. > I explicitly mentioned opening the file in binary mode, reading it in, > and passing it to some fromstring function that takes bytes, so yes, of > course you have a byte array. Why would a fromstring function take bytes? How would you use re.split on it? > > You shouldn't have WCHARS (of any kind) in the first place until you've > > decoded. > > And yet Microsoft's APIs, both Win32 and MSVCRT, are full of wread and > similar functions. No such thing as "wread". And given the appropriate flags to _open, _read can perform decoding. > But anyway, I'll grant that you usually shouldn't have WCHARs before > you've decoded. > > But you definitely should not have WCHARs _after_ you've decoded. In > fact, you _can't_ have them after you've decoded, because a WCHAR isn't > big enough to hold a Unicode code point. You're nitpicking on word choice. Going from bytes to UTF-16 words [whether as WCHAR or unsigned short] is a form of decoding. Or don't you think python narrow builds' decode function was properly named? > But many specific static patterns _do_ work with ASCII compatible > encodings. Again, think of HTTP responses. Even though the headers and > body are both text, they're defined as being separated by b"\r\n\r\n". Right, but those aren't UTF-8. Working with ASCII is fine, but don't pretend you've actually found a way to work with UTF-8. > Preferring UTF-32 over UTF-8 makes perfect sense. But that's not what you > started out arguing. Nick mentioned off-hand that UTF-16 has the worst of > both worlds of UTF-8 and UTF-32, Stephen explained that further to > someone else, and you challenged his explanation, arguing that UTF-16 > doesn't introduce any problems over UTF-8. > But it does. It introduces all > the same problems as UTF-32, but without any of the benefits. No, because UTF-32 has the additional problem, shared with UTF-8, that (Windows) libc doesn't support it. My point was that if you want the benefits of using libc you have to pay the costs of using libc, and that means using libc's native encodings. Which, on Windows, are UTF-16 and (e.g.) Codepage 1252. If you don't want the benefits of using libc, then there's no benefit to using UTF-8. From abarnert at yahoo.com Sat May 16 01:44:23 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 15 May 2015 16:44:23 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <1431726738.3335993.270006353.65CC50BF@webmail.messagingengine.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp> <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com> <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp> <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com> <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com> <1431713975.3281476.269867321.225D681F@webmail.messagingengine.com> <503A25DF-4B83-4DBA-A5AA-3F3B0224B596@yahoo.com> <1431726738.3335993.270006353.65CC50BF@webmail.messagingengine.com> Message-ID: <5F666AAB-E680-4EF5-973F-AC33A03F64F2@yahoo.com> On May 15, 2015, at 14:52, random832 at fastmail.us wrote: > >> On Fri, May 15, 2015, at 15:37, Andrew Barnert wrote: >> I explicitly mentioned opening the file in binary mode, reading it in, >> and passing it to some fromstring function that takes bytes, so yes, of >> course you have a byte array. > > Why would a fromstring function take bytes? I just gave you a specific example of this (simplejson.loads), and explained why they do it (because the same code is how they work with str in 2.x), in the very next paragraph, which you snipped out. And I'd already explained it in the previous email. I'm not sure how many other ways there are to explain it. I'd bet that the vast majority of modules on PyPI that have a fromstring/loads/parsetxt/readcsv/etc.-style function can take bytes; how is this surprising to you? > How would you use re.split > on it? On a bytes? This is explained in the second line of the re docs: re works with byte patterns and strings just as it works with Unicode patterns and strings. >> But anyway, I'll grant that you usually shouldn't have WCHARs before >> you've decoded. >> >> But you definitely should not have WCHARs _after_ you've decoded. In >> fact, you _can't_ have them after you've decoded, because a WCHAR isn't >> big enough to hold a Unicode code point. > > You're nitpicking on word choice. No, I'm not. Pretending 16-bit wide chars are "Unicode" is not just a trivial matter of bad word choice, it's wrong, and it's exactly how the world created the problems that this thread is thing to help solve. Win32, Cocoa, and Java have the good excuse that they were created back when Unicode only had 64K code points and, as far as anyone believed, always would. So they were based on UCS2, and later going from there to UTF-16 broke less code than going from there to UCS4 would have. But that isn't a good reason for any new framework, library, or app to use UTF-16. > Going from bytes to UTF-16 words > [whether as WCHAR or unsigned short] is a form of decoding. Only in the same sense that going from Shift-JIS to UTF-8 is a form of decoding. Or, for that matter, going from UTF-16 to baudot 6-bit units, if that's what your code wants to work on. If your code treats UTF-8 or UTF-16 or Shift-JIS strings as sequences of unicode characters, it makes sense to call that decoding. If your code treats them as sequences of bytes or words, then your strings are still encoded bytes or words, not strings, and it's misleading to call that decoding. > Or don't you > think python narrow builds' decode function was properly named? The real problem was that Python narrow builds shouldn't exist in the first place. Which was fixed in 3.3, so I don't think I need to argue that it should be fixed. >> But many specific static patterns _do_ work with ASCII compatible >> encodings. Again, think of HTTP responses. Even though the headers and >> body are both text, they're defined as being separated by b"\r\n\r\n". > > Right, but those aren't UTF-8. Working with ASCII is fine, but don't > pretend you've actually found a way to work with UTF-8. But the same functions _do_ work for UTF-8. That's one of the whole points of UTF-8: every byte is unambiguously either a single character, a leading byte, or a continuation byte. This means you can search any UTF-8 encoded string for any UTF-8-encoded substring (or any regex pattern) and it will never have false positives (or negatives), whether that substring or pattern is b'\r\n\r\n' or '?'.encode('utf-8'). And that's the only reason that searching UTF-16 works: every word is unambiguously either a single character, a leading surrogate, or a continuation surrogate. So UTF-16 is exactly the same as UTF-8 here, for exactly the same reason; it's not better. >> Preferring UTF-32 over UTF-8 makes perfect sense. But that's not what you >> started out arguing. Nick mentioned off-hand that UTF-16 has the worst of >> both worlds of UTF-8 and UTF-32, Stephen explained that further to >> someone else, and you challenged his explanation, arguing that UTF-16 >> doesn't introduce any problems over UTF-8. >> But it does. It introduces all >> the same problems as UTF-32, but without any of the benefits. > > No, because UTF-32 has the additional problem, shared with UTF-8, that > (Windows) libc doesn't support it. But Windows libc doesn't support UTF-16. When you call wcslen on "?", that emoji counts as 2 characters, not 1. It returns "the count of characters" in "wide (two-byte) characters", which aren't actually characters. > My point was that if you want the benefits of using libc you have to pay > the costs of using libc, and that means using libc's native encodings. > Which, on Windows, are UTF-16 and (e.g.) Codepage 1252. If you don't > want the benefits of using libc, then there's no benefit to using UTF-8. The traditional libc functions like strlen and strstr don't care what your native encoding or actual encoding are. Some of them will produce the right result with UTF-8 even if it isn't your encoding (strstr), some will produce the wrong wrong even if it is (strlen). There are also some newer functions that do care (mbslen), which are only right if UTF-8 is your locale encoding (which it probably isn't, and you're probably not going to set LC_CTYPE yourself). The ones that are always right with UTF-8 have corresponding wide functions that are right with UTF-16, the ones the are always wrong with UTF-8 have corresponding wide functions that are always wrong with UTF-16, the ones that are locale-dependent don't have corresponding wide functions at all, forcing you to use functions that are always wrong. Microsoft's libc documentation is seriously misleading, and refers to functions like wcslen as returning "the count in characters", but is generally not misleading for strlen. Catching UTF-8 strlen-style bugs before release requires testing some non-English text; catching UTF-16 wcslen-style bugs requires testing very specific kinds of text (you pretty much have to know what an astral is to even guess what kind of text you need--although emoji are making the problem more noticeable). Which part of that is as advantage for UTF-16 with libc? In every case, it's either the same as UTF-8 (strstr) or worse (both mbslen and strlen, for different reasons). From stephen at xemacs.org Sat May 16 05:56:07 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 16 May 2015 12:56:07 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp> <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87bnhl18h4.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > One case I'd found a need for text->text handling (although not > related to surrogates) was taking arbitrary Unicode and applying an > error handler to it before writing it to a stream with "strict" > encoding. (So something like "arbitrary text".encode('latin1', > 'errors='backslashescape').decode('latin1')). That's not the use case envisioned for these functions, though. You want to change the textual content of the stream (by restricting the repertoire), not change the representation of non-textual content. > The encode/decode pair seemed ugly, although it was the only way I > could find. I find the fact that there's an output stream with an inappropriate error handler far uglier! Note that the encode/decode pair is quite efficient, although the "rehandle" function could be about twice as fast. Still, if you're output-bound by the speed of a disk or the like, encode/decode will have no trouble keeping up. > I could easily imagine using a "rehandle" type of function for this > (although I wouldn't use the actual proposed functions here, as the > use of "surrogate" and "astral" in the names would lead me to > assume they were inappropriate). AFAICT, you'd be right -- they don't (as proposed) handle your use case of restricting to a Unicode subset. Your kind of use case is why I think general repertoire filtering functions in unicodedata (or a new unicodetools package) would be a much better home for this functionality. > Whether that's an argument for or against the idea that they are an > attractive nuisance, I'm not sure :-) I think your use case is quite independent of that issue. From stephen at xemacs.org Sat May 16 06:26:19 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 16 May 2015 13:26:19 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <1431726738.3335993.270006353.65CC50BF@webmail.messagingengine.com> References: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp> <554AC2CE.5040705@btinternet.com> <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com> <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp> <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com> <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp> <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com> <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com> <1431713975.3281476.269867321.225D681F@webmail.messagingengine.com> <503A25DF-4B83-4DBA-A5AA-3F3B0224B596@yahoo.com> <1431726738.3335993.270006353.65CC50BF@webmail.messagingengine.com> Message-ID: <87a8x5172s.fsf@uwakimon.sk.tsukuba.ac.jp> random832 at fastmail.us writes: > My point was that if you want the benefits of using libc you have > to pay the costs of using libc, and that means using libc's native > encodings. Of course it doesn't mean any such thing. My point was that there are many utility functions in libc and out that don't care at all that the array of bytes is encoded text, only that its content not contain NULs, and that it be NUL-terminated. Sure, nowadays there are better alternatives for handling text as text (for example, Python 3 str! -- whose design *nobody* is proposing to change here, although in the past some have asked that it be turned into something Unicode compatible), but at least on POSIX systems the traditional utilities still assume those classic characteristics, which UTF-8 satisfies and UTF-16 does not. Incompatibility with those utilities is an issue for UTF-16, but not for UTF-8. That's all. From ncoghlan at gmail.com Sat May 16 09:50:41 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 16 May 2015 17:50:41 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp> <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 15 May 2015 at 22:21, Paul Moore wrote: > On 15 May 2015 at 02:02, Stephen J. Turnbull wrote: >> (3) Problem: Code you can't or won't fix buggily passes you Unicode >> that might have surrogates in it. >> Solution: text-to-text codecs (but I don't see why they can't be >> written as encode-decode chains). >> >> As I've written before, I think text-to-text codecs are an attractive >> nuisance. The temptation to use them in most cases should be refused, >> because it's a better solution to deal with the problem at the >> incoming boundary or the outgoing boundary (using str<->bytes codecs). > > One case I'd found a need for text->text handling (although not > related to surrogates) was taking arbitrary Unicode and applying an > error handler to it before writing it to a stream with "strict" > encoding. (So something like "arbitrary text".encode('latin1', > 'errors='backslashescape').decode('latin1')). > > The encode/decode pair seemed ugly, although it was the only way I > could find. I could easily imagine using a "rehandle" type of function > for this (although I wouldn't use the actual proposed functions here, > as the use of "surrogate" and "astral" in the names would lead me to > assume they were inappropriate). That's a different case, as you need to know the encoding of the target stream in order to know which code points that codec can't handle. Even when you do know the target encoding, Python itself has no idea which code points a given text encoding can and can't handle, so the only way to find out is to try it and see what happens. The unique thing about the surrogate case is that *no* codec is supposed to encode them, not even the universal ones: >>> '\ud834\udd1e'.encode("utf-8") Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\ud834' in position 0: surrogates not allowed >>> '\ud834\udd1e'.encode("utf-16-le") Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-16-le' codec can't encode character '\ud834' in position 0: surrogates not allowed >>> '\ud834\udd1e'.encode("utf-32") Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-32' codec can't encode character '\ud834' in position 0: surrogates not allowed The fact that it's purely a code point level manipulation of the entire surrogate range (rehandle_surrogatepass), or a particular usage pattern of that range (rehandle_surrogateescape) is the difference that makes it possible to define text->text APIs for surrogate manipulation without caring about the eventual text encoding used (if any). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Sat May 16 11:47:02 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 16 May 2015 02:47:02 -0700 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp> <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <7F341A38-69A4-4EA8-858D-C08D50E9D3C1@yahoo.com> On May 16, 2015, at 00:50, Nick Coghlan wrote: > >> On 15 May 2015 at 22:21, Paul Moore wrote: >>> On 15 May 2015 at 02:02, Stephen J. Turnbull wrote: >>> (3) Problem: Code you can't or won't fix buggily passes you Unicode >>> that might have surrogates in it. >>> Solution: text-to-text codecs (but I don't see why they can't be >>> written as encode-decode chains). >>> >>> As I've written before, I think text-to-text codecs are an attractive >>> nuisance. The temptation to use them in most cases should be refused, >>> because it's a better solution to deal with the problem at the >>> incoming boundary or the outgoing boundary (using str<->bytes codecs). >> >> One case I'd found a need for text->text handling (although not >> related to surrogates) was taking arbitrary Unicode and applying an >> error handler to it before writing it to a stream with "strict" >> encoding. (So something like "arbitrary text".encode('latin1', >> 'errors='backslashescape').decode('latin1')). >> >> The encode/decode pair seemed ugly, although it was the only way I >> could find. I could easily imagine using a "rehandle" type of function >> for this (although I wouldn't use the actual proposed functions here, >> as the use of "surrogate" and "astral" in the names would lead me to >> assume they were inappropriate). > > That's a different case, as you need to know the encoding of the > target stream in order to know which code points that codec can't > handle. Even when you do know the target encoding, Python itself has > no idea which code points a given text encoding can and can't handle, > so the only way to find out is to try it and see what happens. > > The unique thing about the surrogate case is that *no* codec is > supposed to encode them, not even the universal ones: Python doesn't have a CESU-8 codec (or "JNI UTF-8" or any of the other near-equivalent abominations), right? Because IIRC, CESU-8 says that (in Python terms) '\U00010400' and '\uD801\uDC00' should both encode to b'\xED\xA0\x81\xED\xB0\x80', instead of the former encoding to b'\xF0\x90\x90\x80' and the latter not being encodable because it's not a string. Anyway, I don't know if that counts as a Unicode encoding, since it's only described in a TR, not the standard itself. And Python is probably right to ignore it (assuming I'm remembering right and Python does ignore it...), even if that makes problems for Jython or Oracle DB-API libs or whatever. From steve at pearwood.info Sat May 16 12:02:41 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 16 May 2015 20:02:41 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <7F341A38-69A4-4EA8-858D-C08D50E9D3C1@yahoo.com> References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp> <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp> <7F341A38-69A4-4EA8-858D-C08D50E9D3C1@yahoo.com> Message-ID: <20150516100240.GE5663@ando.pearwood.info> On Sat, May 16, 2015 at 02:47:02AM -0700, Andrew Barnert via Python-ideas wrote: > > The unique thing about the surrogate case is that *no* codec is > > supposed to encode them, not even the universal ones: > > Python doesn't have a CESU-8 codec (or "JNI UTF-8" or any of the other > near-equivalent abominations), right? *shrug* Even if it doesn't, it's just a codec, not new syntax. Anyone can create their own codecs. There probably are people who need CESU-8 for compatibility with other apps, and if the std lib can include UTF-8-sig, it can probably include CESU-8. Or it can be left for those who need it to implement it themselves. > Because IIRC, CESU-8 says that > (in Python terms) '\U00010400' and '\uD801\uDC00' should both encode > to b'\xED\xA0\x81\xED\xB0\x80', instead of the former encoding to > b'\xF0\x90\x90\x80' and the latter not being encodable because it's > not a string. Sounds about right as far as the first half goes: http://unicode.org/reports/tr26/ As far as the second half goes, the TR doesn't say anything about processing surrogate pairs in the source Unicode string. Since (strict) Unicode strings cannot contain surrogates, I think that CESU-8 should treat it as an error just like UTF-8. The TR does say: CESU-8 defines an encoding scheme for Unicode identical to UTF-8 except for its representation of supplementary characters. That seems pretty clear to me: if '\uDC00'.encode('utf-8') raises an error, then so should '\uDC00'.encode('cesu-8'). -- Steve From p.f.moore at gmail.com Sat May 16 12:19:26 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 16 May 2015 11:19:26 +0100 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <87bnhl18h4.fsf@uwakimon.sk.tsukuba.ac.jp> References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp> <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp> <87bnhl18h4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 16 May 2015 at 04:56, Stephen J. Turnbull wrote: > That's not the use case envisioned for these functions, though. You > want to change the textual content of the stream (by restricting the > repertoire), not change the representation of non-textual content. Thanks. I see the difference now. (Plus Nick's point about needing to know the encoding in my use case). > > The encode/decode pair seemed ugly, although it was the only way I > > could find. > > I find the fact that there's an output stream with an inappropriate > error handler far uglier! The stream in this case was sys.stdout, which you can't blame me for, though :-) The use case in question was specifically wanting to avoid encoding errors when printing arbitrary text. (On Windows, where sys.stdout.encoding is not UTF-8). This is a pretty common issue that I see raised a lot, and it is frustrating to have to deal with it in application code. I don't know enough about the issues to make a good case that errors='strict' is the wrong error handling policy for sys.stdout, though. And you can't change the policy on an existing stream, so the application is stuck with strict unless it wants to re-wrap sys.stdout.buffer (which I'm always a little reluctant to do, as it seems like it may cause other issues, although I don't know why I think that :-)). > Note that the encode/decode pair is quite efficient, although the > "rehandle" function could be about twice as fast. Still, if you're > output-bound by the speed of a disk or the like, encode/decode will > have no trouble keeping up. Yeah, it's not a performance issue, just a mild feeling of "this looks clumsy". Paul From stephen at xemacs.org Sat May 16 15:50:49 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 16 May 2015 22:50:49 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: <7F341A38-69A4-4EA8-858D-C08D50E9D3C1@yahoo.com> References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp> <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp> <7F341A38-69A4-4EA8-858D-C08D50E9D3C1@yahoo.com> Message-ID: <87617s1vie.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert via Python-ideas writes: > Python doesn't have a CESU-8 codec (or "JNI UTF-8" or any of the > other near-equivalent abominations), right? Because IIRC, CESU-8 > says that (in Python terms) '\U00010400' and '\uD801\uDC00' should > both encode to b'\xED\xA0\x81\xED\xB0\x80', instead of the former > encoding to b'\xF0\x90\x90\x80' and the latter not being encodable > because it's not a string. It's ambiguous what the TR intends. It does say it encodes code points, which would argue that '\uD801\uDC00' is encodable. However, it also defines itself as a representation of UTF-16, and the definition of the encoding itself states "Prior to transforming data into CESU-8, supplementary characters must first be converted to their surrogate pair UTF-16 representation." UTF-16's normative definition defines it a Unicode transformation format, and therefore a UTF-16 stream cannot contain surrogates representing themselves, and there's nothing in the document that refers to the possible interpretation of surrogate code points as themselves. So I agree with Steven that a str-to-bytes CESU-8 encoder should error on any surrogates, and the decoder should error on surrogates not encountered as a valid surrogate pair. Possibly you'd want special error handlers that allow handling of the UTF-8 encoding of surrogates. > Anyway, I don't know if that counts as a Unicode encoding, since > it's only described in a TR, not the standard itself. The TR specifically excludes it from the standard. > And Python is probably right to ignore it (assuming I'm remembering > right and Python does ignore it...), even if that makes problems > for Jython or Oracle DB-API libs or whatever. Why would it cause trouble for them? They're not going to use byte-oriented functions to manipulate Unicode after going to all that trouble to implement UTF-16 handling internally. We're getting kinda far afield here, aren't we? From stephen at xemacs.org Sat May 16 16:15:10 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 16 May 2015 23:15:10 +0900 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp> <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp> <87bnhl18h4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <874mnc1udt.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > The stream in this case was sys.stdout, which you can't blame me > for, though :-) Yeah, I think there's an issue or two on that. > I don't know enough about the issues to make a good case that > errors='strict' is the wrong error handling policy for sys.stdout, > though. No, errors='strict' is always the right default policy, especially for UTF-encoded output, but for other encodings as well. > And you can't change the policy on an existing stream, Hm. I would not want the job of rewriting the codec machinery to guarantee that users would get what they deserve from changing encodings on a stream -- I suspect that would be hard, or even impossible for a stateful encoding (eg, a 7-bit ISO-2022 encoding). But I can't really see where the harm would be in allowing changes of the error handler. (Of course that goes in the categories of "for consenting adults" and "you can keep any bullets that lodge in your foot".) I'll have to think hard about it. > so the application is stuck with strict unless it wants to re-wrap > sys.stdout.buffer (which I'm always a little reluctant to do, as it > seems like it may cause other issues, although I don't know why I > think that :-)). In your case, I don't see why it would cause a problem unless there's other output potentially incompatible with the sys.stdout encoding that *you* *do* want errors on. I can imagine there exist cases where you have something like log output where you *know* that the logger produces 30 columns of ASCII and then up to 45 columns copied from its input, and only the first 30 "really need" to be accurate and valid in the output encoding. (I don't actually have such a case to hand, though -- I've never seen a logger that randomly inserted Japanese in timestamps or something like that.) From ncoghlan at gmail.com Sat May 16 16:44:52 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 17 May 2015 00:44:52 +1000 Subject: [Python-ideas] Processing surrogates in In-Reply-To: References: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com> <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp> <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp> <87bnhl18h4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 16 May 2015 at 20:19, Paul Moore wrote: > On 16 May 2015 at 04:56, Stephen J. Turnbull wrote: >> That's not the use case envisioned for these functions, though. You >> want to change the textual content of the stream (by restricting the >> repertoire), not change the representation of non-textual content. > > Thanks. I see the difference now. (Plus Nick's point about needing to > know the encoding in my use case). > >> > The encode/decode pair seemed ugly, although it was the only way I >> > could find. >> >> I find the fact that there's an output stream with an inappropriate >> error handler far uglier! > > The stream in this case was sys.stdout, which you can't blame me for, though :-) > > The use case in question was specifically wanting to avoid encoding > errors when printing arbitrary text. (On Windows, where > sys.stdout.encoding is not UTF-8). This is a pretty common issue that > I see raised a lot, and it is frustrating to have to deal with it in > application code. I don't know enough about the issues to make a good > case that errors='strict' is the wrong error handling policy for > sys.stdout, though. And you can't change the policy on an existing > stream, so the application is stuck with strict unless it wants to > re-wrap sys.stdout.buffer (which I'm always a little reluctant to do, > as it seems like it may cause other issues, although I don't know why > I think that :-)). It has the potential to cause problems if anything still has a reference to the old stream (such as, say, sys.__stdout__, or an eagerly bound reference in a default argument value). If you call detach(), the old references will be entirely broken, if you don't then you have two different text wrappers sharing the same underlying buffered stream. Creating a completely new IO stream that only shares the operating system level file descriptor has similar data interleaving problems to the latter approach. There's an open issue to support changing the encoding and error handling of an existing stream in place, which I'd suggested deferring to 3.6 based on the fact we're switching the *nix streams to use surrogateescape if the system claims the locale encoding is ASCII: http://bugs.python.org/issue15216#msg242942 However, it the lack of that capability is causing problems on Windows as well, then it may be worth updating Nikolaus Rath's patch and applying it for 3.5 and dealing with the consequences. The main reason I've personally been wary of the change is because I expect there to be various edge cases encountered with different codecs, so I suspect that adding this feature will be setting the stage for an "interesting" collection of future bug reports. On the other hand, there's certain kinds of programs (like an iconv equivalent) that could most readily be implemented by being able to change the encoding of the standard streams based on application level configuration settings, which means having a way to override the default settings chosen by the interpreter. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From charleshixsn at earthlink.net Sun May 17 20:07:42 2015 From: charleshixsn at earthlink.net (Charles Hixson) Date: Sun, 17 May 2015 11:07:42 -0700 Subject: [Python-ideas] an unless statement would occasionally be useful Message-ID: <5558D8EE.8010105@earthlink.net> I'm envisioning "unless" as a synonym for "if not(...):" currently I use if .... : pass else: ... which works. N.B.: This isn't extremely important as there are already two ways to accomplish the same purpose, but it would be useful, seems easy to implement, and is already used by many other languages. The advantage is that when the condition is long it simplifies understanding. From mertz at gnosis.cx Sun May 17 21:02:00 2015 From: mertz at gnosis.cx (David Mertz) Date: Sun, 17 May 2015 12:02:00 -0700 Subject: [Python-ideas] an unless statement would occasionally be useful In-Reply-To: <5558D8EE.8010105@earthlink.net> References: <5558D8EE.8010105@earthlink.net> Message-ID: This exists and is spelled 'not' in Python :-) On May 17, 2015 11:16 AM, "Charles Hixson" wrote: > I'm envisioning "unless" as a synonym for "if not(...):" currently I use > > if .... : > pass > else: > ... > > which works. > > N.B.: This isn't extremely important as there are already two ways to > accomplish the same purpose, but it would be useful, seems easy to > implement, and is already used by many other languages. The advantage is > that when the condition is long it simplifies understanding. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cory at lukasa.co.uk Sun May 17 21:02:34 2015 From: cory at lukasa.co.uk (Cory Benfield) Date: Sun, 17 May 2015 20:02:34 +0100 Subject: [Python-ideas] an unless statement would occasionally be useful In-Reply-To: <5558D8EE.8010105@earthlink.net> References: <5558D8EE.8010105@earthlink.net> Message-ID: > On 17 May 2015, at 19:07, Charles Hixson wrote: > > I'm envisioning "unless" as a synonym for "if not(...):" currently I use > > if .... : > pass > else: > ? That?s interesting, Personally, I think I?d invert that conditional (or, if the rest of the body is long, do an early return). Playing the role of opposition for a moment, I?d argue that we don?t need ?unless? because we already have a spelling for that: ?if not?. Is it not said: "There should be one-- and preferably only one --obvious way to do it.?? From tjreedy at udel.edu Sun May 17 21:04:51 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 17 May 2015 15:04:51 -0400 Subject: [Python-ideas] an unless statement would occasionally be useful In-Reply-To: <5558D8EE.8010105@earthlink.net> References: <5558D8EE.8010105@earthlink.net> Message-ID: On 5/17/2015 2:07 PM, Charles Hixson wrote: > I'm envisioning "unless" as a synonym for "if not(...):" currently I use > > if .... : > pass > else: > ... > > which works. > > N.B.: This isn't extremely important as there are already two ways to > accomplish the same purpose, but it would be useful, seems easy to > implement, and is already used by many other languages. The advantage > is that when the condition is long it simplifies understanding. We try not to bloat Python with minor synonyms. They make it harder to learn and remember the language and chose which synonym to use. -- Terry Jan Reedy From breamoreboy at yahoo.co.uk Sun May 17 22:28:01 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sun, 17 May 2015 21:28:01 +0100 Subject: [Python-ideas] an unless statement would occasionally be useful In-Reply-To: <5558D8EE.8010105@earthlink.net> References: <5558D8EE.8010105@earthlink.net> Message-ID: On 17/05/2015 19:07, Charles Hixson wrote: > I'm envisioning "unless" as a synonym for "if not(...):" currently I use > > if .... : > pass > else: > ... > > which works. > > N.B.: This isn't extremely important as there are already two ways to > accomplish the same purpose, but it would be useful, seems easy to > implement, and is already used by many other languages. The advantage > is that when the condition is long it simplifies understanding. IMHO if a statement is only "occasionally useful" is should not be in the Python language. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From larocca at abiresearch.com Sun May 17 22:31:29 2015 From: larocca at abiresearch.com (Douglas La Rocca) Date: Sun, 17 May 2015 20:31:29 +0000 Subject: [Python-ideas] an unless statement would occasionally be useful In-Reply-To: References: <5558D8EE.8010105@earthlink.net>, Message-ID: <4F01CDCF-8527-472A-9B47-DD41246608D7@abiresearch.com> it could also be confused as a synonym for while not condition: ... > On May 17, 2015, at 4:28 PM, Mark Lawrence wrote: > >> On 17/05/2015 19:07, Charles Hixson wrote: >> I'm envisioning "unless" as a synonym for "if not(...):" currently I use >> >> if .... : >> pass >> else: >> ... >> >> which works. >> >> N.B.: This isn't extremely important as there are already two ways to >> accomplish the same purpose, but it would be useful, seems easy to >> implement, and is already used by many other languages. The advantage >> is that when the condition is long it simplifies understanding. > > IMHO if a statement is only "occasionally useful" is should not be in the Python language. > > -- > My fellow Pythonistas, ask not what our language can do for you, ask > what you can do for our language. > > Mark Lawrence > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rymg19 at gmail.com Sun May 17 22:57:11 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sun, 17 May 2015 15:57:11 -0500 Subject: [Python-ideas] an unless statement would occasionally be useful In-Reply-To: <4F01CDCF-8527-472A-9B47-DD41246608D7@abiresearch.com> References: <5558D8EE.8010105@earthlink.net>, <4F01CDCF-8527-472A-9B47-DD41246608D7@abiresearch.com> Message-ID: <174563E1-4BA9-47C7-9B47-6FD7B3070E5C@gmail.com> Has anyone done that????? I mean, I like Python without `unless`, but I've never seen it used to mean that. Usually, `until` is used. On May 17, 2015 3:31:29 PM CDT, Douglas La Rocca wrote: >it could also be confused as a synonym for > > while not condition: > ... > > > > >> On May 17, 2015, at 4:28 PM, Mark Lawrence >wrote: >> >>> On 17/05/2015 19:07, Charles Hixson wrote: >>> I'm envisioning "unless" as a synonym for "if not(...):" currently >I use >>> >>> if .... : >>> pass >>> else: >>> ... >>> >>> which works. >>> >>> N.B.: This isn't extremely important as there are already two ways >to >>> accomplish the same purpose, but it would be useful, seems easy to >>> implement, and is already used by many other languages. The >advantage >>> is that when the condition is long it simplifies understanding. >> >> IMHO if a statement is only "occasionally useful" is should not be in >the Python language. >> >> -- >> My fellow Pythonistas, ask not what our language can do for you, ask >> what you can do for our language. >> >> Mark Lawrence >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sun May 17 23:07:02 2015 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 17 May 2015 22:07:02 +0100 Subject: [Python-ideas] an unless statement would occasionally be useful In-Reply-To: <4F01CDCF-8527-472A-9B47-DD41246608D7@abiresearch.com> References: <5558D8EE.8010105@earthlink.net>, <4F01CDCF-8527-472A-9B47-DD41246608D7@abiresearch.com> Message-ID: <555902F6.60300@mrabarnett.plus.com> On 2015-05-17 21:31, Douglas La Rocca wrote: > it could also be confused as a synonym for > > while not condition: > ... > No, that would be: until condition: ... which we don't want either. :-) >> On May 17, 2015, at 4:28 PM, Mark Lawrence wrote: >> >>> On 17/05/2015 19:07, Charles Hixson wrote: >>> I'm envisioning "unless" as a synonym for "if not(...):" currently I use >>> >>> if .... : >>> pass >>> else: >>> ... >>> >>> which works. >>> >>> N.B.: This isn't extremely important as there are already two ways to >>> accomplish the same purpose, but it would be useful, seems easy to >>> implement, and is already used by many other languages. The advantage >>> is that when the condition is long it simplifies understanding. >> >> IMHO if a statement is only "occasionally useful" is should not be in the Python language. >> From abarnert at yahoo.com Sun May 17 23:35:01 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 17 May 2015 14:35:01 -0700 Subject: [Python-ideas] an unless statement would occasionally be useful In-Reply-To: <5558D8EE.8010105@earthlink.net> References: <5558D8EE.8010105@earthlink.net> Message-ID: <032DB30B-2DFB-447D-B558-7B8C247113EC@yahoo.com> On May 17, 2015, at 11:07, Charles Hixson wrote: > > I'm envisioning "unless" as a synonym for "if not(...):" currently I use > > if .... : > pass > else: > ... > > which works. > > N.B.: This isn't extremely important as there are already two ways to accomplish the same purpose, but it would be useful, seems easy to implement, and is already used by many other languages. The advantage is that when the condition is long it simplifies understanding. But if you just use not instead of else, it simplifies understanding just as much--and without making the language larger (which makes it harder to learn/remember when switching languages, makes the parser bigger, etc.): if not ...: ... It seems like every year someone proposes either "unless" or "until" or the whole suite of Perl variants (inherently-negated keywords, postfix, do...while-type syntax), but nobody ever asks for anything clever. Think of what you could do with a "lest" statement, which will speculatively execute the body and then test the condition before deciding whether to actually have executed the body. Or a "without" that closes a context before the body instead of after. Or a "butfor" that iterates over every extant object that isn't contained in the Iterable. Or a "because" that raises instead of skipping the body if the condition isn't truthy. Or a "before" that remembers the body for later and executes it a synchronously when the condition becomes true. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From stephen at xemacs.org Mon May 18 04:53:21 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 18 May 2015 11:53:21 +0900 Subject: [Python-ideas] an unless statement would occasionally be useful In-Reply-To: <032DB30B-2DFB-447D-B558-7B8C247113EC@yahoo.com> References: <5558D8EE.8010105@earthlink.net> <032DB30B-2DFB-447D-B558-7B8C247113EC@yahoo.com> Message-ID: <87r3qezjdq.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert via Python-ideas writes: > Or a "before" that remembers the body for later and executes it a > synchronously when the condition becomes true. That's not Perl, that's Make. From ram at rachum.com Mon May 18 10:14:31 2015 From: ram at rachum.com (Ram Rachum) Date: Mon, 18 May 2015 11:14:31 +0300 Subject: [Python-ideas] Making it easy to prepare for PEP479 Message-ID: Hi everybody, I just heard about PEP479, and I want to prepare my open-source projects for it. I have no problem changing the code so it won't depend on StopIteration to stop generators, but I'd also like to test it in my test suite. In Python 3.5 I could use `from __future__ import generator_stop` so the test would be real (i.e. would fail wherever I rely on StopIteration to stop a generator). But I can't really put this snippet in my code because then it would fail on all Python versions below 3.5. This makes me think of two ideas: 1. Maybe we should allow `from __future__ import whatever` in code, even if `whatever` wasn't invented yet, and simply make it a no-op? This wouldn't help now but it could prevent these problems in the future. 2. Maybe introduce a way to do `from __future__ import generator_stop` without including it in code? Maybe a flag to the `python` command? (If something like this exists please let me know.) Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon May 18 10:38:32 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 18 May 2015 18:38:32 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: Message-ID: On Mon, May 18, 2015 at 6:14 PM, Ram Rachum wrote: > Hi everybody, > > I just heard about PEP479, and I want to prepare my open-source projects for > it. > > I have no problem changing the code so it won't depend on StopIteration to > stop generators, but I'd also like to test it in my test suite. In Python > 3.5 I could use `from __future__ import generator_stop` so the test would be > real (i.e. would fail wherever I rely on StopIteration to stop a generator). > But I can't really put this snippet in my code because then it would fail on > all Python versions below 3.5. > > This makes me think of two ideas: > > 1. Maybe we should allow `from __future__ import whatever` in code, even if > `whatever` wasn't invented yet, and simply make it a no-op? This wouldn't > help now but it could prevent these problems in the future. Downside: A typo would silently stop a future directive from working. If "from __future__ import generator_stop" doesn't cause an error in <3.5, then "from __future__ import genarator_stop" would cause no error in any version, and that's a problem. > 2. Maybe introduce a way to do `from __future__ import generator_stop` > without including it in code? Maybe a flag to the `python` command? (If > something like this exists please let me know.) The problem is that it's hard to try-except special directives like this. You can try-except a regular import, catch the run-time error, and do something else; but short of exec'ing your code, you can't catch SyntaxError. I'm not sure how best to deal with this. However, it ought to be possible to simply run your tests with generator_stop active, even if that means using exec instead of regular imports. Something like this: # utils.py # In the presence of generator_stop, this will bomb def f(): raise StopIteration def g(): yield f() # test_utils.py # Instead of: # import utils # Try this: with open("utils.py") as f: code = "from __future__ import generator_stop\n" + f.read() import sys # Any module at all utils = type(sys)("utils") exec(code,vars(utils)) # At this point, you can write regular tests involving the # 'utils' module, which has been executed in the presence # of the generator_stop directive. list(utils.g()) It's ugly, and it depends on the module being in the current directory (though you could probably use importlib to deal with that part), but it's just for your tests. I don't know of any way to simplify this out, but it may well be possible (using some mechanism similar to what the interactive interpreter does); in any case, all the ugliness should be in a single block up the top of your test runner - and you could turn it into a function, as you'll probably want to do this for lots of modules. Experts of python-ideas, is there a way to use an import hook to do this? ChrisA From steve at pearwood.info Mon May 18 14:00:55 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 18 May 2015 22:00:55 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: Message-ID: <20150518120054.GK5663@ando.pearwood.info> On Mon, May 18, 2015 at 11:14:31AM +0300, Ram Rachum wrote: > Hi everybody, > > I just heard about PEP479, and I want to prepare my open-source projects > for it. > > I have no problem changing the code so it won't depend on StopIteration to > stop generators, but I'd also like to test it in my test suite. In Python > 3.5 I could use `from __future__ import generator_stop` so the test would > be real (i.e. would fail wherever I rely on StopIteration to stop a > generator). But I can't really put this snippet in my code because then it > would fail on all Python versions below 3.5. Sometimes you have to do things the old fashioned way: if sys.version_info[:2] < (3, 5): # write test one way else: # write test another way At least it's not a change of syntax :-) You can also move tests into a separate file that is version specific. That's a bit of a nuisance with small projects where you would a single test file, but for larger projects there's nothing wrong with splitting tests across multiple files. > This makes me think of two ideas: > > 1. Maybe we should allow `from __future__ import whatever` in code, even if > `whatever` wasn't invented yet, and simply make it a no-op? This wouldn't > help now but it could prevent these problems in the future. from __future__ import spelling_mistaek # code that depends on spelling_mistake feature will now behave weirdly > 2. Maybe introduce a way to do `from __future__ import generator_stop` > without including it in code? Maybe a flag to the `python` command? (If > something like this exists please let me know.) I don't think that is important enough to require either an environment variable or a command line switch. -- Steve From greg.ewing at canterbury.ac.nz Mon May 18 14:07:55 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 19 May 2015 00:07:55 +1200 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: Message-ID: <5559D61B.3060302@canterbury.ac.nz> Chris Angelico wrote: > However, it ought to be possible to simply run your tests with > generator_stop active, even if that means using exec instead of > regular imports. Would it be possible for site.py to monkey-patch something into the __future__ module, to make importing it a no-op? -- Greg From rosuav at gmail.com Mon May 18 14:51:02 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 18 May 2015 22:51:02 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: <5559D61B.3060302@canterbury.ac.nz> References: <5559D61B.3060302@canterbury.ac.nz> Message-ID: On Mon, May 18, 2015 at 10:07 PM, Greg Ewing wrote: > Chris Angelico wrote: >> >> However, it ought to be possible to simply run your tests with >> generator_stop active, even if that means using exec instead of >> regular imports. > > > Would it be possible for site.py to monkey-patch > something into the __future__ module, to make > importing it a no-op? I doubt it; __future__ imports are special compiler magic. >>> import __future__ >>> __future__.all_feature_names.append("asdf") >>> __future__.asdf = __future__.with_statement >>> from __future__ import asdf File "", line 1 SyntaxError: future feature asdf is not defined ChrisA From python at mrabarnett.plus.com Mon May 18 15:16:20 2015 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 18 May 2015 14:16:20 +0100 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: <20150518120054.GK5663@ando.pearwood.info> References: <20150518120054.GK5663@ando.pearwood.info> Message-ID: <5559E624.7030708@mrabarnett.plus.com> On 2015-05-18 13:00, Steven D'Aprano wrote: > On Mon, May 18, 2015 at 11:14:31AM +0300, Ram Rachum wrote: >> Hi everybody, >> >> I just heard about PEP479, and I want to prepare my open-source projects >> for it. >> >> I have no problem changing the code so it won't depend on StopIteration to >> stop generators, but I'd also like to test it in my test suite. In Python >> 3.5 I could use `from __future__ import generator_stop` so the test would >> be real (i.e. would fail wherever I rely on StopIteration to stop a >> generator). But I can't really put this snippet in my code because then it >> would fail on all Python versions below 3.5. > > Sometimes you have to do things the old fashioned way: > > if sys.version_info[:2] < (3, 5): > # write test one way > else: > # write test another way > > At least it's not a change of syntax :-) > > You can also move tests into a separate file that is version specific. > That's a bit of a nuisance with small projects where you would a single > test file, but for larger projects there's nothing wrong with splitting > tests across multiple files. > > >> This makes me think of two ideas: >> >> 1. Maybe we should allow `from __future__ import whatever` in code, even if >> `whatever` wasn't invented yet, and simply make it a no-op? This wouldn't >> help now but it could prevent these problems in the future. > > from __future__ import spelling_mistaek > # code that depends on spelling_mistake feature will now behave weirdly > Suppose I used: from __future__ import unicode_literals in Python 2.5 and it didn't complain. I'd then be puzzled why my plain string literals weren't Unicode. > >> 2. Maybe introduce a way to do `from __future__ import generator_stop` >> without including it in code? Maybe a flag to the `python` command? (If >> something like this exists please let me know.) > > I don't think that is important enough to require either an environment > variable or a command line switch. > From jsbueno at python.org.br Mon May 18 15:32:35 2015 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Mon, 18 May 2015 10:32:35 -0300 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: <5559E624.7030708@mrabarnett.plus.com> References: <20150518120054.GK5663@ando.pearwood.info> <5559E624.7030708@mrabarnett.plus.com> Message-ID: Indeed - importing as NOP would surely be broken - The nice fix would be to be able to do from __future__ import jaberwock and have a plain "ImportError" that could be catched. But, as Chris Angelico put it, it might be complicated. Manually testing sys.version seens to be the way to go Because, even if making __future__ imports raise ImportError, taht would also only be available from Py 3.5/3.6 onwards. (Otherwise from __future__ import from__future__import_ImportError seens fun enough to actually be created) On 18 May 2015 at 10:16, MRAB wrote: > On 2015-05-18 13:00, Steven D'Aprano wrote: >> >> On Mon, May 18, 2015 at 11:14:31AM +0300, Ram Rachum wrote: >>> >>> Hi everybody, >>> >>> I just heard about PEP479, and I want to prepare my open-source projects >>> for it. >>> >>> I have no problem changing the code so it won't depend on StopIteration >>> to >>> stop generators, but I'd also like to test it in my test suite. In Python >>> 3.5 I could use `from __future__ import generator_stop` so the test would >>> be real (i.e. would fail wherever I rely on StopIteration to stop a >>> generator). But I can't really put this snippet in my code because then >>> it >>> would fail on all Python versions below 3.5. >> >> >> Sometimes you have to do things the old fashioned way: >> >> if sys.version_info[:2] < (3, 5): >> # write test one way >> else: >> # write test another way >> >> At least it's not a change of syntax :-) >> >> You can also move tests into a separate file that is version specific. >> That's a bit of a nuisance with small projects where you would a single >> test file, but for larger projects there's nothing wrong with splitting >> tests across multiple files. >> >> >>> This makes me think of two ideas: >>> >>> 1. Maybe we should allow `from __future__ import whatever` in code, even >>> if >>> `whatever` wasn't invented yet, and simply make it a no-op? This wouldn't >>> help now but it could prevent these problems in the future. >> >> >> from __future__ import spelling_mistaek >> # code that depends on spelling_mistake feature will now behave weirdly >> > Suppose I used: > > from __future__ import unicode_literals > > in Python 2.5 and it didn't complain. > > I'd then be puzzled why my plain string literals weren't Unicode. > >> >>> 2. Maybe introduce a way to do `from __future__ import generator_stop` >>> without including it in code? Maybe a flag to the `python` command? (If >>> something like this exists please let me know.) >> >> >> I don't think that is important enough to require either an environment >> variable or a command line switch. >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rosuav at gmail.com Mon May 18 16:13:21 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 19 May 2015 00:13:21 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: <20150518120054.GK5663@ando.pearwood.info> <5559E624.7030708@mrabarnett.plus.com> Message-ID: On Mon, May 18, 2015 at 11:32 PM, Joao S. O. Bueno wrote: > Indeed - importing as NOP would surely be broken - > > The nice fix would be to be able to do > from __future__ import jaberwock > > and have a plain "ImportError" that could be catched. Indeed. Though I'm not sure what a correctly-spelled "from __future__ import jabberwock" would do; exceptions already "burble" up the call stack until they meet "the clause that catch[es]" them. :) > But, as Chris Angelico put it, it might be complicated. > Manually testing sys.version seens to be the way to go > Because, even if making __future__ imports raise > ImportError, taht would also only be available from > Py 3.5/3.6 onwards. > > (Otherwise > from __future__ import from__future__import_ImportError > seens fun enough to actually be created) Heh. Though there's no particular reason to guard this with a future directive; if the behaviour were to be changed, it could just be done immediately - you wouldn't need a couple of minor versions' notice that something's going to stop raising errors. The way to make this work would be two-fold. Firstly, an incorrect __future__ directive would have to no longer be a SyntaxError; and secondly, __future__ directives would have to be permitted after a try statement (currently, they're not allowed to follow anything, so the 'try' would have to be special-cased to be allowed in). With those two changes, though, the failing of a __future__ directive would now become a failure at the (usually-ignored) run-time import - the regular action of "from module import name" would fail when it tries to import something that isn't present in the module. As a side effect, some specific directives would become legal no-ops: from __future__ import CO_FUTURE_PRINT_FUNCTION from __future__ import __builtins__ # etc I don't see this as a problem, given that the point of the SyntaxError is to catch either outright spelling errors or version issues (eg trying to use "from __future__ import print_function" in Python 2.5), both of which will still raise ImportError. The question is, how often is it actually useful to import a module and ignore a __future__ directive? Going through all_feature_names: nested_scopes: No idea; I think code is legal with or without it. generators: Using "yield" as a keyword will fail division: Yes, this one would work absolute_import: This would work with_statement: Any actual use of 'with' will bomb out print_function: Might work if you restrict yourself unicode_literals: Possibly would work, but ow, big confusion barry_as_FLUFL: No idea, give it a try! generator_stop: Yes, would work. So three of them would definitely work (in the sense that code is syntactically correct in both forms), and you could cope in some way with an except block; print_function would work as long as you build your code with that in mind (but if you're doing that anyway, just drop the future directive); and unicode_literals *might* work, maybe. The rest? If you're using the future directive, it's because you want the new keyword, which means you're going to be using it. If the future directive isn't recognized, you're getting syntax errors elsewhere, so there's no opportunity to try/except the problem away. What will the future of Python future directives be like? Most likely a similarly mixed bag, so this is a feature that could potentially have very little value. Is it worth downgrading an instant SyntaxError to a run-time ImportError to allow a narrow use-case? ChrisA From tjreedy at udel.edu Mon May 18 16:17:06 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 18 May 2015 10:17:06 -0400 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: Message-ID: On 5/18/2015 4:14 AM, Ram Rachum wrote: > I just heard about PEP479, and I want to prepare my open-source projects > for it. > > I have no problem changing the code so it won't depend on StopIteration > to stop generators, but I'd also like to test it in my test suite. In > Python 3.5 I could use `from __future__ import generator_stop` so the > test would be real (i.e. would fail wherever I rely on StopIteration to > stop a generator). But I can't really put this snippet in my code > because then it would fail on all Python versions below 3.5. The purpose of future imports is to allow one to use a future feature, at the cost of either not supporting older Python versions, or of branching your code and making separate releases. This future is an anomaly in that it import a future disablement of a current feature. So you just want to make sure your one, no-branch code base is ready for that feature removal by not using it now. You do not want to have a separate branch and release for 3.5 with the future imports. Try the following: add the future statement to the top of modules with generators, compile with 3.5, and when successful, comment-out the statement. For continued testing, especially with multiple authors, write functions to un-comment and re-comment a file. In the test file: if <3.5>: uncomment('xyz') # triggers re-compile on import import xyz if <3.5>: recomment('xyz') # ditto, If this works, put pep479_helper on pypi. -- Terry Jan Reedy From steve at pearwood.info Mon May 18 16:52:25 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 19 May 2015 00:52:25 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: Message-ID: <20150518145225.GM5663@ando.pearwood.info> On Mon, May 18, 2015 at 10:17:06AM -0400, Terry Reedy wrote: > Try the following: add the future statement to the top of modules with > generators, compile with 3.5, and when successful, comment-out the > statement. For continued testing, especially with multiple authors, > write functions to un-comment and re-comment a file. In the test file: > > if <3.5>: uncomment('xyz') # triggers re-compile on import > import xyz > if <3.5>: recomment('xyz') # ditto, > > If this works, put pep479_helper on pypi. o_O I'm not entirely sure what you are trying to do, but I *think* what you are trying is to have the byte code in the .pyc file be different from what the source code in the .py file says. Fortunately Python does not make that easy to do. You would have to change the datestamp on the files so that the .pyc file appears newer than the source code. I once worked on a system where it was easy to get the source and byte code out of sync. The original programmer was a frustrated C developer, and so he had built this intricate system where you edited the source code in one place, then ran the Unix "make" utility which compiled it, and moved the byte code to a completely different place on the PYTHONPATH. Oh, and you couldn't just run the Python modules as scripts, you had to run bash wrapper scripts which set up a bunch of environment variables. And of course there was no documentation other than rants about how stupid Python was but at least it was better than Perl. Believe me, debugging code where the byte code being imported is different from the source code you are reading is fun, if your idea of fun is horrible pain. -- Steve From tjreedy at udel.edu Mon May 18 17:24:06 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 18 May 2015 11:24:06 -0400 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: <20150518145225.GM5663@ando.pearwood.info> References: <20150518145225.GM5663@ando.pearwood.info> Message-ID: On 5/18/2015 10:52 AM, Steven D'Aprano wrote: > On Mon, May 18, 2015 at 10:17:06AM -0400, Terry Reedy wrote: > >> Try the following: add the future statement to the top of modules with >> generators, compile with 3.5, and when successful, comment-out the >> statement. For continued testing, especially with multiple authors, >> write functions to un-comment and re-comment a file. In the test file: >> >> if <3.5>: uncomment('xyz') # triggers re-compile on import >> import xyz >> if <3.5>: recomment('xyz') # ditto, >> >> If this works, put pep479_helper on pypi. > I'm not entirely sure what you are trying to do, Solve the OP's problem. What are *you* trying to do? If you do not think that the offered solution will work, please explain why, instead of diverting attention to some insane projection of yours. > but I *think* what you > are trying is to have the byte code in the .pyc file be different from > what the source code in the .py file says. Do you really think I meant the opposite of what I said? Standard Python behavior that you are completely familiar with: edit x.py and import it; Python assumes that x.pyc is obsolete, recompiles x.py and rewrites x.pyc. -- Terry Jan Reedy From steve at pearwood.info Mon May 18 18:18:27 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 19 May 2015 02:18:27 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: <20150518120054.GK5663@ando.pearwood.info> <5559E624.7030708@mrabarnett.plus.com> Message-ID: <20150518161827.GN5663@ando.pearwood.info> On Tue, May 19, 2015 at 12:13:21AM +1000, Chris Angelico wrote: > On Mon, May 18, 2015 at 11:32 PM, Joao S. O. Bueno > wrote: > > Indeed - importing as NOP would surely be broken - > > > > The nice fix would be to be able to do > > from __future__ import jaberwock > > > > and have a plain "ImportError" that could be catched. > > Indeed. Though I'm not sure what a correctly-spelled "from __future__ > import jabberwock" would do; exceptions already "burble" up the call > stack until they meet "the clause that catch[es]" them. :) You cannot catch errors in "from __future__ import" lines, because they are compile-time errors, not runtime errors. Any __future__ lines must be the first lines of executable code in the module. Only comments, blank lines, the module docstring, and other __future__ lines can preceed them, so this cannot work: try: from __future__ import feature except: pass for the same reason that this cannot work: try: thing = }{ except SyntaxError: thing = {} It is best to think of the __future__ imports as directives to the compiler. They tell the compiler to produce different code, change syntax, or similar. Except in the interactive interpreter, you cannot change the compiler settings part way through compiling the module. There is a real __future__ module, but it exists only for introspection purposes. [...] > The way to make this work would be two-fold. Firstly, an incorrect > __future__ directive would have to no longer be a SyntaxError; and > secondly, __future__ directives would have to be permitted after a try > statement (currently, they're not allowed to follow anything, so the > 'try' would have to be special-cased to be allowed in). It's not enough to merely change the wording of the error from SyntaxError to something else. You have to change when it occurs: it can no longer be raised at compile time, but has to happen at run time. That means that __future__ imports have to have compile to something which runs at run time, instead of just being a directive to the compiler. As for the changes necessary to the compiler, I have no idea how extensive they would be, but my guess is "extremely". Also, consider that once you are allowing __future__ directives to occur after a try statement, expect there to be a lot more pressure to allow it after any arbitrary code. After all, I might want to write: if sys.version != '3.7' and read_config('config.ini')['allow_jabberwocky']: from __future__ import jabberwocky so you're opening the doors to a LOT more complexity. Which, as far as I am concerned, is a good thing, because it makes the chances of this actually happening to be somewhere between Buckley's and none *wink* -- Steve From rosuav at gmail.com Mon May 18 18:45:19 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 19 May 2015 02:45:19 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: <20150518161827.GN5663@ando.pearwood.info> References: <20150518120054.GK5663@ando.pearwood.info> <5559E624.7030708@mrabarnett.plus.com> <20150518161827.GN5663@ando.pearwood.info> Message-ID: On Tue, May 19, 2015 at 2:18 AM, Steven D'Aprano wrote: >> The way to make this work would be two-fold. Firstly, an incorrect >> __future__ directive would have to no longer be a SyntaxError; and >> secondly, __future__ directives would have to be permitted after a try >> statement (currently, they're not allowed to follow anything, so the >> 'try' would have to be special-cased to be allowed in). > > It's not enough to merely change the wording of the error from > SyntaxError to something else. You have to change when it occurs: it can > no longer be raised at compile time, but has to happen at run time. That > means that __future__ imports have to have compile to something which > runs at run time, instead of just being a directive to the compiler. Precisely. I'm not saying that the incorrect future directive would be some other sort of error instead of SyntaxError - for this to be possible, it would have to *not be an error at all* at compile time, leaving the faulty directive unannounced until it gets to the second step (actual run-time importing of the __future__ module) to catch errors. (Hence the side effect that "from __future__ import all_feature_names" would actually not be an error; to the compiler, it's an unknown future directive and thus ignored, and to the run-time, it's a perfectly valid way to grab the list of features.) > As for the changes necessary to the compiler, I have no idea how > extensive they would be, but my guess is "extremely". Actually, not much. Since it's just the nerfing of one error, it can be done fairly easily - as proof of concept, I just commented out lines 50 through 54 of future.c (the "else" block that raises an error) and compiled: rosuav at sikorsky:~/cpython$ cat futuredemo.py from __future__ import generator_stop from __future__ import all_feature_names from __future__ import oops rosuav at sikorsky:~/cpython$ ./python futuredemo.py Traceback (most recent call last): File "futuredemo.py", line 3, in from __future__ import oops ImportError: cannot import name 'oops' > Also, consider that once you are allowing __future__ directives to occur > after a try statement, expect there to be a lot more pressure to allow > it after any arbitrary code. After all, I might want to write: > > if sys.version != '3.7' and read_config('config.ini')['allow_jabberwocky']: > from __future__ import jabberwocky > > so you're opening the doors to a LOT more complexity. Yes, now that is a much bigger concern. I did say that the "try:" part of a try block would have to be deemed not-code, as a special case. Simply nerfing that error (in compile.c and future.c) does make for a viable proof-of-concept, though, so it's still nothing that requires extensive changes to the compiler. However... > Which, as far as I am concerned, is a good thing, because it makes the > chances of this actually happening to be somewhere between Buckley's and > none *wink* ... this I agree with. I don't think the feature is all that useful, and while it might well not be all that hard to implement, it would complicate things somewhat, and that's not good. (It also may end up being quite hard, and more so depending on the complexity of the definition of what's allowed prior to a __future__ directive.) I can imagine, for instance, a special case given to this precise structure: try: from __future__ import feature except: pass which would then be an "optional future import"; but again, how often is it even useful, much less necessary? ChrisA From steve at pearwood.info Mon May 18 19:13:20 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 19 May 2015 03:13:20 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: <20150518145225.GM5663@ando.pearwood.info> Message-ID: <20150518171319.GO5663@ando.pearwood.info> On Mon, May 18, 2015 at 11:24:06AM -0400, Terry Reedy wrote: > On 5/18/2015 10:52 AM, Steven D'Aprano wrote: > >On Mon, May 18, 2015 at 10:17:06AM -0400, Terry Reedy wrote: > > > >>Try the following: add the future statement to the top of modules with > >>generators, compile with 3.5, and when successful, comment-out the > >>statement. For continued testing, especially with multiple authors, > >>write functions to un-comment and re-comment a file. In the test file: > >> > >>if <3.5>: uncomment('xyz') # triggers re-compile on import > >>import xyz > >>if <3.5>: recomment('xyz') # ditto, > >> > >>If this works, put pep479_helper on pypi. > > >I'm not entirely sure what you are trying to do, > > Solve the OP's problem. What are *you* trying to do? If you do not > think that the offered solution will work, please explain why, instead > of diverting attention to some insane projection of yours. You call my comments an "insane projection", but it's your code snippet which does exactly what I warned against: first you modify the source code, compile and import using the new, modified source, then change the source back to the way it was before the import so that what's inside the byte code no longer matches what's in the source. Here it is again: In the test file: if <3.5>: uncomment('xyz') # triggers re-compile on import import xyz if <3.5>: recomment('xyz') # ditto, In other words: edit source, compile, revert source, use compiled version. See the problem now? You might say, "But it's only a single line that is different." I say, *any* difference is too much. I've been burnt too badly by people using "clever hacks" that lead to the .pyc file being imported and the .py source being out of sync to trust even a single line difference. If I interpret your words as you wrote them, the solution seems to risk becoming as convoluted and messy as the code I had to work with in real life. If I try to interpret your words more sensibly ("surely Terry cannot possibly mean what he said...?") the suggestion is *still* convoluted. If Ram is permitted multiple test files, then the simplest solution is to split off the code that relies on the future directive into its own file: try: import xyz # requires the future directive except ImportError: xyz = None if xyz: # tests with directive else: # tests without Instead of going through your process of editing the source code, compiling, importing, re-editing, just have two source files. -- Steve From rosuav at gmail.com Mon May 18 19:36:12 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 19 May 2015 03:36:12 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: <20150518171319.GO5663@ando.pearwood.info> References: <20150518145225.GM5663@ando.pearwood.info> <20150518171319.GO5663@ando.pearwood.info> Message-ID: On Tue, May 19, 2015 at 3:13 AM, Steven D'Aprano wrote: > If I interpret your words as you wrote them, the solution seems to risk > becoming as convoluted and messy as the code I had to work with in real > life. If I try to interpret your words more sensibly ("surely Terry > cannot possibly mean what he said...?") the suggestion is *still* > convoluted. If Ram is permitted multiple test files, then the simplest > solution is to split off the code that relies on the future directive > into its own file: > > try: > import xyz # requires the future directive > except ImportError: > xyz = None > > if xyz: > # tests with directive > else: > # tests without > > > Instead of going through your process of editing the source code, > compiling, importing, re-editing, just have two source files. My understanding of the OP's problem is this: # Utility file def some_generator(): yield stuff # Tests file import utils assert next(some_generator()) == stuff Now, PEP 479 says that his code should never raise StopIteration in a generator, or in anything called by a generator. He has no problem with this, philosophically, but to prove that the change has indeed happened, it would be good to run the test suite with generator_stop active - equivalent to running Python 3.7 on the test suite. However, simply adding a future directive to the tests file will have no effect (obviously), and adding a future directive to the utility module itself will break it on Python <3.5, even though it would work just fine. So the options are: 1) Omit the directive, and trust that it's all working - no benefit from PEP 479 until Python 3.7. 2) Include the directive, and require Python 3.5+ for no reason other than this check. 3) Hack something so that the tests are run with the directive active, but normal running doesn't use it. 4) Hack something so Python 3.5 and 3.6 use the directive, and others don't. The first two are easy, but have nasty consequences. The third is what I provided a hack to accomplish (exec the code with a line prepended), and which Terry suggested the "adorn, import, unadorn" scheme, which probably also counts as a hack. The fourth is the notion of try/except around future directives, which I think won't fly. Terry's proposal doesn't actually require that the .pyc bytecode file differ from the source code; it will simply mean that the in-memory-being-executed bytecode will differ from the source. In the case of future directives like unicode_literals, yes, that would be a nightmare to debug; but for generator_stop, I doubt it'll cause problems. The trouble here is that it's not so much "some code needs the future directive, some doesn't" as "some use-cases want strict checking, but we still want compatibility". Ideally, it should be possible to prove that your test suite now passes in a post-PEP-479 world, without breaking anything on 3.4 or 2.7. The only question is, how much hackery are we prepared to accept in order to do this? Maybe the simplest hackery of all is just to build a tweaked Python that just always uses generator_stop. It's not hard to do - either hard-code the bitflag into the default value for ff_features (future.c:135), or remove part of the condition that actually does the work (genobject.c:137) - and then you have a 3.7-like Python that assumes generator_stop semantics. Run your tests with that, and don't use the future directive at all. ChrisA From abarnert at yahoo.com Mon May 18 20:45:30 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 18 May 2015 11:45:30 -0700 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: <20150518171319.GO5663@ando.pearwood.info> References: <20150518145225.GM5663@ando.pearwood.info> <20150518171319.GO5663@ando.pearwood.info> Message-ID: <0D0794EF-60D4-4103-BA13-434F0A1F65D5@yahoo.com> On May 18, 2015, at 10:13, Steven D'Aprano wrote: > > If Ram is permitted multiple test files, then the simplest > solution is to split off the code that relies on the future directive > into its own file: > > try: > import xyz # requires the future directive > except ImportError: > xyz = None > > if xyz: > # tests with directive > else: > # tests without Would this break unittest's automated discovery, setuptools' automatic test command, etc., unless you moved xyz out of the tests directory and added a sys.path.import before trying to import it? If so, that might be something worth explaining in a howto or a section of the packaging developer guide. From greg.ewing at canterbury.ac.nz Tue May 19 01:06:03 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 19 May 2015 11:06:03 +1200 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: <20150518161827.GN5663@ando.pearwood.info> References: <20150518120054.GK5663@ando.pearwood.info> <5559E624.7030708@mrabarnett.plus.com> <20150518161827.GN5663@ando.pearwood.info> Message-ID: <555A705B.4080100@canterbury.ac.nz> Steven D'Aprano wrote: > After all, I might want to write: > > if sys.version != '3.7' and read_config('config.ini')['allow_jabberwocky']: > from __future__ import jabberwocky You might want to, but I would have no qualms about firmly telling you that you can't. Putting try: in front of a future import still doesn't introduce any executable code before it, whereas the above does. -- Greg From rosuav at gmail.com Tue May 19 02:22:07 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 19 May 2015 10:22:07 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: <555A705B.4080100@canterbury.ac.nz> References: <20150518120054.GK5663@ando.pearwood.info> <5559E624.7030708@mrabarnett.plus.com> <20150518161827.GN5663@ando.pearwood.info> <555A705B.4080100@canterbury.ac.nz> Message-ID: On Tue, May 19, 2015 at 9:06 AM, Greg Ewing wrote: > Steven D'Aprano wrote: >> >> After all, I might want to write: >> >> if sys.version != '3.7' and >> read_config('config.ini')['allow_jabberwocky']: >> from __future__ import jabberwocky > > > You might want to, but I would have no qualms about > firmly telling you that you can't. Putting try: > in front of a future import still doesn't introduce > any executable code before it, whereas the above does. Yes, but imagine what happens if you want to have _two_ future imports guarded by try/except. Either something gets completely special-cased ("try: from __future__ import foo except: pass", and no other except/finally permitted), or you're allowed a maximum of one guarded future import (though it might have more than one keyword in it), or there's arbitrary code permitted in the "except" clause prior to a future import, which would be a major problem. ChrisA From python at mrabarnett.plus.com Tue May 19 02:32:33 2015 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 19 May 2015 01:32:33 +0100 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: <20150518120054.GK5663@ando.pearwood.info> <5559E624.7030708@mrabarnett.plus.com> <20150518161827.GN5663@ando.pearwood.info> <555A705B.4080100@canterbury.ac.nz> Message-ID: <555A84A1.60608@mrabarnett.plus.com> On 2015-05-19 01:22, Chris Angelico wrote: > On Tue, May 19, 2015 at 9:06 AM, Greg Ewing wrote: >> Steven D'Aprano wrote: >>> >>> After all, I might want to write: >>> >>> if sys.version != '3.7' and >>> read_config('config.ini')['allow_jabberwocky']: >>> from __future__ import jabberwocky >> >> >> You might want to, but I would have no qualms about >> firmly telling you that you can't. Putting try: >> in front of a future import still doesn't introduce >> any executable code before it, whereas the above does. > > Yes, but imagine what happens if you want to have _two_ future imports > guarded by try/except. Either something gets completely special-cased > ("try: from __future__ import foo except: pass", and no other > except/finally permitted), or you're allowed a maximum of one guarded > future import (though it might have more than one keyword in it), or > there's arbitrary code permitted in the "except" clause prior to a > future import, which would be a major problem. > I think that part of the problem is that it looks like an import statement, but it's really a compiler directive in disguise... From greg.ewing at canterbury.ac.nz Tue May 19 00:48:41 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 19 May 2015 10:48:41 +1200 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: <20150518120054.GK5663@ando.pearwood.info> <5559E624.7030708@mrabarnett.plus.com> Message-ID: <555A6C49.50505@canterbury.ac.nz> Joao S. O. Bueno wrote: > (Otherwise > from __future__ import from__future__import_ImportError > seens fun enough to actually be created) I don't think it would even be all that hard to implement. As I understand things, a __future__ import already results in a run-time import in addition to its magical effects. So all the compiler needs to do is ignore undefined future features, and an ImportError will result at run time. (The rules would need to be relaxed slightly to allow a try-except around future imports, but that doesn't seem like a big problem.) A benefit of this arrangement is that it would permit monkey-patching of __future__ at run time to get no-ops. -- Greg From steve at pearwood.info Tue May 19 03:17:36 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 19 May 2015 11:17:36 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: <555A705B.4080100@canterbury.ac.nz> References: <20150518120054.GK5663@ando.pearwood.info> <5559E624.7030708@mrabarnett.plus.com> <20150518161827.GN5663@ando.pearwood.info> <555A705B.4080100@canterbury.ac.nz> Message-ID: <20150519011736.GP5663@ando.pearwood.info> On Tue, May 19, 2015 at 11:06:03AM +1200, Greg Ewing wrote: > Steven D'Aprano wrote: > >After all, I might want to write: > > > >if sys.version != '3.7' and read_config('config.ini')['allow_jabberwocky']: > > from __future__ import jabberwocky > > You might want to, but I would have no qualms about > firmly telling you that you can't. Putting try: > in front of a future import still doesn't introduce > any executable code before it, whereas the above does. "Set up a try...except block" is not executable? Then how does it, um, you know, set up the try...except block? :-) "try" compiles to executable code. If you don't believe me: def a(): spam def b(): try: spam except: pass from dis import dis dis(a) dis(b) and take note of the SETUP_EXCEPT byte-code. In any case, I think that neither of us wants to change the rules about what can precede a __future__ import, so hopefully the point is moot. -- Steve From steve at pearwood.info Tue May 19 03:20:38 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 19 May 2015 11:20:38 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: <555A84A1.60608@mrabarnett.plus.com> References: <20150518120054.GK5663@ando.pearwood.info> <5559E624.7030708@mrabarnett.plus.com> <20150518161827.GN5663@ando.pearwood.info> <555A705B.4080100@canterbury.ac.nz> <555A84A1.60608@mrabarnett.plus.com> Message-ID: <20150519012038.GQ5663@ando.pearwood.info> On Tue, May 19, 2015 at 01:32:33AM +0100, MRAB wrote: > I think that part of the problem is that it looks like an import > statement, but it's really a compiler directive in disguise... +1 -- Steve From steve at pearwood.info Tue May 19 05:15:46 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 19 May 2015 13:15:46 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: <20150518145225.GM5663@ando.pearwood.info> <20150518171319.GO5663@ando.pearwood.info> Message-ID: <20150519031545.GR5663@ando.pearwood.info> On Tue, May 19, 2015 at 03:36:12AM +1000, Chris Angelico wrote: > My understanding of the OP's problem is this: > > # Utility file > def some_generator(): > yield stuff > > # Tests file > import utils > assert next(some_generator()) == stuff That doesn't test that some_generator never raises StopIteration directly. All it does is test that it yields correctly. See below for a meaningful test. > Now, PEP 479 says that his code should never raise StopIteration in a > generator, or in anything called by a generator. He has no problem > with this, philosophically, but to prove that the change has indeed > happened, it would be good to run the test suite with generator_stop > active - equivalent to running Python 3.7 on the test suite. It's not clear to me whether you're talking about Ram testing that PEP 479 is working as claimed ("to prove that the change has indeed happened"), or testing *his own generators* to check that he doesn't accidentally call "raise StopIteration" inside them, regardless of version. If Ram is merely testing PEP 479, then he needs tests like this: def gen(): raise StopIteration assertRaises(RuntimeError, gen) These tests are only meaningful for 3.5 or better, since in 3.4 the PEP isn't implemented and his tests will fail. They belong in the Python 3.5 test suite, not Ram's library test suite, but if he insists on having them, he can stick them in a separate file as already discussed. More likely, Ram is testing his own generators, not the interpreter. He wants to ensure that none of his generators raise StopIteration but always use return instead. Whatever test he writes, he has to run it on a generator which is passed in, not on a test generator he writes specifically for the test. It's hard to test arbitrary generators for compliance with the rule "don't raise StopIterator directly", since you cannot distinguish a return from a raise from the outside unless PEP 479 is in effect. Ideally, the test should still fail even if PEP 479 is not implemented. Otherwise it's a useless test under 3.4 and older, and you might as well not even bother running it. Before PEP 479 is in effect, I can't think of any practical way to distinguish the cases: (1) generator exits by raising (fail); (2) generator exits by returning (pass); since both cases end up raising StopIteration. Perhaps Ram is cleverer than me and can come up with a definite test, but I'd like to see it before commenting. The only solution I can come up with is to use the inspect module to fetch the generator's source code and scan it for "raise StopIteration". Parsing the AST will also work, or even the byte-code at a pinch, but the source code is easiest: assertFalse("raise StopIteration" in source) That will fail if the generator raises StopIteration directly regardless of version. It doesn't catch *all possible* violations, e.g.: exc = eval(codecs.encode('FgbcVgrengvba', 'rot-13')) raise exc but I assume that Ram trusts himself not to be actively trying to subvert his own tests. (If not, then he has bigger problems.) So, I believe that the whole __future__ directive is a red herring, and doesn't actually help Ram do what he wants, which is to write tests which will fail if his generators call raise StopIteration regardless of what version of Python he runs the test under. -- Steve From rosuav at gmail.com Tue May 19 07:46:49 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 19 May 2015 15:46:49 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: <20150519031545.GR5663@ando.pearwood.info> References: <20150518145225.GM5663@ando.pearwood.info> <20150518171319.GO5663@ando.pearwood.info> <20150519031545.GR5663@ando.pearwood.info> Message-ID: On Tue, May 19, 2015 at 1:15 PM, Steven D'Aprano wrote: > On Tue, May 19, 2015 at 03:36:12AM +1000, Chris Angelico wrote: > >> My understanding of the OP's problem is this: >> >> # Utility file >> def some_generator(): >> yield stuff >> >> # Tests file >> import utils >> assert next(some_generator()) == stuff > > That doesn't test that some_generator never raises StopIteration > directly. All it does is test that it yields correctly. See below for a > meaningful test. Right; this is an existing codebase which (presumably) already has tests. These tests will continue to pass post-479, but they are inadequate as proof that the transformation to "never raise StopIteration" has been completed. > It's not clear to me whether you're talking about Ram testing that PEP > 479 is working as claimed ("to prove that the change has indeed > happened"), or testing *his own generators* to check that he doesn't > accidentally call "raise StopIteration" inside them, regardless of > version. > > More likely, Ram is testing his own generators, not the interpreter. He > wants to ensure that none of his generators raise StopIteration but > always use return instead. Whatever test he writes, he has to run it on > a generator which is passed in, not on a test generator he writes > specifically for the test. Correct. > It's hard to test arbitrary generators for compliance with the rule > "don't raise StopIterator directly", since you cannot distinguish a > return from a raise from the outside unless PEP 479 is in effect. > Ideally, the test should still fail even if PEP 479 is not implemented. > Otherwise it's a useless test under 3.4 and older, and you might as well > not even bother running it. Indeed. You have summed up the problem. > Before PEP 479 is in effect, I can't think of any practical way to > distinguish the cases: > > (1) generator exits by raising (fail); > (2) generator exits by returning (pass); > > since both cases end up raising StopIteration. Perhaps Ram is cleverer > than me and can come up with a definite test, but I'd like to see it > before commenting. > > The only solution I can come up with is to use the inspect module to > fetch the generator's source code and scan it for "raise StopIteration". > Parsing the AST will also work, or even the byte-code at a pinch, but > the source code is easiest: > > assertFalse("raise StopIteration" in source) > > That will fail if the generator raises StopIteration directly regardless > of version. It doesn't catch *all possible* violations, e.g.: More significant example: It doesn't catch a codebase that has some functions which are used in generators and others which are used in class-based iterators. > So, I believe that the whole __future__ directive is a red herring, and > doesn't actually help Ram do what he wants, which is to write tests > which will fail if his generators call raise StopIteration regardless of > what version of Python he runs the test under. Okay. So how do you ensure that Python 3.7 and Python 3.4 can both run your code? ChrisA From greg.ewing at canterbury.ac.nz Tue May 19 08:16:17 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 19 May 2015 18:16:17 +1200 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: <20150518145225.GM5663@ando.pearwood.info> <20150518171319.GO5663@ando.pearwood.info> <20150519031545.GR5663@ando.pearwood.info> Message-ID: <555AD531.9060102@canterbury.ac.nz> Maybe what's needed is a command-line switch that turns on a future feature for all code? Then you can run the Python 3 tests with it, and the Python 2 tests without it, and not have to modify any code. -- Greg From steve at pearwood.info Tue May 19 11:25:47 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 19 May 2015 19:25:47 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: <20150518145225.GM5663@ando.pearwood.info> <20150518171319.GO5663@ando.pearwood.info> <20150519031545.GR5663@ando.pearwood.info> Message-ID: <20150519092547.GA28058@ando.pearwood.info> On Tue, May 19, 2015 at 03:46:49PM +1000, Chris Angelico wrote: > > The only solution I can come up with is to use the inspect module to > > fetch the generator's source code and scan it for "raise StopIteration". > > Parsing the AST will also work, or even the byte-code at a pinch, but > > the source code is easiest: > > > > assertFalse("raise StopIteration" in source) > > > > That will fail if the generator raises StopIteration directly regardless > > of version. It doesn't catch *all possible* violations, e.g.: > > More significant example: It doesn't catch a codebase that has some > functions which are used in generators and others which are used in > class-based iterators. Obviously I only sketched a solution. The person writing the tests has to distinguish between functions or methods which must not call "raise StopIteration", and test them, while avoiding testing those which may use raise. They may want to test more than just the generator function themselves, e.g. any functions they call. In principle, if you're reading the code or the AST, you can do a static analysis to automatically detect what functions it calls, and scan them as well, but that's a lot of effort for mere unit tests, and the chances are that your test code will be buggier than your non-test code. Easier to just add the called functions to a list of functions to be checked. The person writing the tests must decide how much he cares about this. "Do the simplest thing that can possibly work" applies to tests as well as code (tests *are* code). (In my opinion, just by *reading* this thread, Ram has already exceeded the amount of time and energy that these tests are worth.) > > So, I believe that the whole __future__ directive is a red herring, and > > doesn't actually help Ram do what he wants, which is to write tests > > which will fail if his generators call raise StopIteration regardless of > > what version of Python he runs the test under. > > Okay. So how do you ensure that Python 3.7 and Python 3.4 can both run > your code? If I am right that the future directive is irrelevant, then you simply *don't include the future directive*. Or you split the code into parts that don't require the directive, and parts that do, and put them in different files, then conditionally import the second set, either in a try...except or if version... block. Or you write one file: test.py, and run your tests with a wrapper script which duplicates that file and inserts the future directive: # untested cp test.py test479.py sed -i '1i from __future__ import feature' test479.py python -m unittest test.py python -m unittest test479.py Combine and adjust as needed. -- Steve From ncoghlan at gmail.com Tue May 19 13:22:36 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 19 May 2015 21:22:36 +1000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: <20150519031545.GR5663@ando.pearwood.info> References: <20150518145225.GM5663@ando.pearwood.info> <20150518171319.GO5663@ando.pearwood.info> <20150519031545.GR5663@ando.pearwood.info> Message-ID: On 19 May 2015 at 13:15, Steven D'Aprano wrote: > So, I believe that the whole __future__ directive is a red herring, and > doesn't actually help Ram do what he wants, which is to write tests > which will fail if his generators call raise StopIteration regardless of > what version of Python he runs the test under. The essential impossibility of writing such tests is one of the underlying reasons *why* PEP 479 was accepted - you can't sensibly test for inadvertently escaping StopIteration values. However, I interpreted Ram's request slightly differently: if I'm understanding the request correctly, he'd like a way to write single-source modules such that *on Python 3.5+* they effectively run with "from __future__ import generator_stop", while on older Python versions, they run unmodified. That way, running the test suite under Python 3.5 will show that at least the regression tests aren't relying on "escaping StopIteration" in order to pass. The intended answer to Ram's request is "configure the warnings module to turn the otherwise silent deprecation warning into an error". From https://www.python.org/dev/peps/pep-0479/#transition-plan: * Python 3.5: Enable new semantics under __future__ import; silent deprecation warning if StopIteration bubbles out of a generator not under __future__ import. However, we missed the second half of that in the initial PEP implementation, so it doesn't currently emit the deprecation warning at all, which means there's no way to turn it into an error instead: http://bugs.python.org/issue24237 Once that issue has been fixed, then "-Wall" will cause any tests relying on the deprecated behaviour to fail, *without* needing to modify the code under test to use the future import. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mistersheik at gmail.com Tue May 19 17:42:48 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 19 May 2015 08:42:48 -0700 (PDT) Subject: [Python-ideas] an unless statement would occasionally be useful In-Reply-To: <032DB30B-2DFB-447D-B558-7B8C247113EC@yahoo.com> References: <5558D8EE.8010105@earthlink.net> <032DB30B-2DFB-447D-B558-7B8C247113EC@yahoo.com> Message-ID: This is hilarious. Although to be fair, test might be useful if for example, you test types in one thread and run code optimized for that type in another? On Sunday, May 17, 2015 at 5:40:59 PM UTC-4, Andrew Barnert via Python-ideas wrote: > > On May 17, 2015, at 11:07, Charles Hixson > wrote: > > > > I'm envisioning "unless" as a synonym for "if not(...):" currently I > use > > > > if .... : > > pass > > else: > > ... > > > > which works. > > > > N.B.: This isn't extremely important as there are already two ways to > accomplish the same purpose, but it would be useful, seems easy to > implement, and is already used by many other languages. The advantage is > that when the condition is long it simplifies understanding. > > But if you just use not instead of else, it simplifies understanding just > as much--and without making the language larger (which makes it harder to > learn/remember when switching languages, makes the parser bigger, etc.): > > if not ...: > ... > > It seems like every year someone proposes either "unless" or "until" or > the whole suite of Perl variants (inherently-negated keywords, postfix, > do...while-type syntax), but nobody ever asks for anything clever. Think of > what you could do with a "lest" statement, which will speculatively execute > the body and then test the condition before deciding whether to actually > have executed the body. Or a "without" that closes a context before the > body instead of after. Or a "butfor" that iterates over every extant object > that isn't contained in the Iterable. Or a "because" that raises instead of > skipping the body if the condition isn't truthy. Or a "before" that > remembers the body for later and executes it a synchronously when the > condition becomes true. > > > > > _______________________________________________ > > Python-ideas mailing list > > Python... at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed May 20 17:57:42 2015 From: brett at python.org (Brett Cannon) Date: Wed, 20 May 2015 15:57:42 +0000 Subject: [Python-ideas] Making it easy to prepare for PEP479 In-Reply-To: References: <20150518145225.GM5663@ando.pearwood.info> <20150518171319.GO5663@ando.pearwood.info> <20150519031545.GR5663@ando.pearwood.info> Message-ID: On Tue, May 19, 2015 at 7:30 AM Nick Coghlan wrote: > On 19 May 2015 at 13:15, Steven D'Aprano wrote: > > So, I believe that the whole __future__ directive is a red herring, and > > doesn't actually help Ram do what he wants, which is to write tests > > which will fail if his generators call raise StopIteration regardless of > > what version of Python he runs the test under. > > The essential impossibility of writing such tests is one of the > underlying reasons *why* PEP 479 was accepted - you can't sensibly > test for inadvertently escaping StopIteration values. > > However, I interpreted Ram's request slightly differently: if I'm > understanding the request correctly, he'd like a way to write > single-source modules such that *on Python 3.5+* they effectively run > with "from __future__ import generator_stop", while on older Python > versions, they run unmodified. That way, running the test suite under > Python 3.5 will show that at least the regression tests aren't relying > on "escaping StopIteration" in order to pass. > > The intended answer to Ram's request is "configure the warnings module > to turn the otherwise silent deprecation warning into an error". From > https://www.python.org/dev/peps/pep-0479/#transition-plan: > > * Python 3.5: Enable new semantics under __future__ import; silent > deprecation warning if StopIteration bubbles out of a generator not > under __future__ import. > > However, we missed the second half of that in the initial PEP > implementation, so it doesn't currently emit the deprecation warning > at all, which means there's no way to turn it into an error instead: > http://bugs.python.org/issue24237 > > Once that issue has been fixed, then "-Wall" will cause any tests > relying on the deprecated behaviour to fail, *without* needing to > modify the code under test to use the future import. > Another option is to use a custom import loader which sets the __future__ flag passed to compile() depending under what version of Python you were running your code under; overriding https://docs.python.org/3.5/library/importlib.html#importlib.abc.InspectLoader.source_to_code is all that would be needed to make that happen. -------------- next part -------------- An HTML attachment was scrubbed... URL: From demianbrecht at gmail.com Thu May 21 07:29:15 2015 From: demianbrecht at gmail.com (Demian Brecht) Date: Wed, 20 May 2015 22:29:15 -0700 Subject: [Python-ideas] Adding jsonschema to the standard library Message-ID: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> Disclaimer: I?m not the author of jsonschema (https://github.com/Julian/jsonschema), but as a user think that users of the standard library (and potentially areas of the standard library itself) could benefit from its addition into the standard library. I?ve been using jsonschema for the better part of a couple years now and have found it not only invaluable, but flexible around the variety of applications it has. Personally, I generally use it for HTTP response validation when dealing with RESTful APIs and system configuration input validation. For those not familiar with the package: RFC draft: https://tools.ietf.org/html/draft-zyp-json-schema-04 Home: http://json-schema.org/ Proposed addition implementation: https://github.com/Julian/jsonschema Coles notes stats: Has been publicly available for over a year: v0.1 released Jan 1, 2012, currently at 2.4.0 (released Sept 22, 2014) Heavily used by the community: Currently sees ~585k downloads per month according to PyPI I?ve reached out to the author to express my interest in authoring a PEP to have the module included to gauge his interest in assisting with maintenance as needed during the integration period (or following). I?d also be personally interested in supporting it as part of the stdlib as well. My question is: Is there any reason up front anyone can see that this addition wouldn?t fly, or are others interested in the addition as well? Thanks, Demian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From gmludo at gmail.com Thu May 21 07:46:27 2015 From: gmludo at gmail.com (Ludovic Gasc) Date: Thu, 21 May 2015 07:46:27 +0200 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> Message-ID: As a end-dev that uses your library for a small time, it's an useful tool. We're migrating quicker an Erlang application to Python with your library because the legacy application uses JSON schema. >From my point of view, validating I/O data is a common problem of most developers, however, it means that you have a lot of developers that have a strong opinion how to validate data ;-) At least to me, it's a good idea to include this library in Python, even if you have plenty of libraries to do that with several approachs, for now, I didn't find a simpler approach that via JSON schemas. The bonus with that is that you can reuse your JSON schemas for migrations and also in your javascript source code. It isn't a silver bullet to resolve all validation corner cases, however enough powerful to resolve the most boring use cases. Ludovic Gasc (GMLudo) http://www.gmludo.eu/ On 21 May 2015 07:29, "Demian Brecht" wrote: > Disclaimer: I?m not the author of jsonschema ( > https://github.com/Julian/jsonschema), but as a user think that users of > the standard library (and potentially areas of the standard library itself) > could benefit from its addition into the standard library. > > I?ve been using jsonschema for the better part of a couple years now and > have found it not only invaluable, but flexible around the variety of > applications it has. Personally, I generally use it for HTTP response > validation when dealing with RESTful APIs and system configuration input > validation. For those not familiar with the package: > > RFC draft: https://tools.ietf.org/html/draft-zyp-json-schema-04 > Home: http://json-schema.org/ > Proposed addition implementation: https://github.com/Julian/jsonschema > > Coles notes stats: > > Has been publicly available for over a year: v0.1 released Jan 1, 2012, > currently at 2.4.0 (released Sept 22, 2014) > Heavily used by the community: Currently sees ~585k downloads per month > according to PyPI > > I?ve reached out to the author to express my interest in authoring a PEP > to have the module included to gauge his interest in assisting with > maintenance as needed during the integration period (or following). I?d > also be personally interested in supporting it as part of the stdlib as > well. > > My question is: Is there any reason up front anyone can see that this > addition wouldn?t fly, or are others interested in the addition as well? > > Thanks, > Demian > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From demianbrecht at gmail.com Thu May 21 07:53:33 2015 From: demianbrecht at gmail.com (Demian Brecht) Date: Wed, 20 May 2015 22:53:33 -0700 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> Message-ID: > On May 20, 2015, at 10:46 PM, Ludovic Gasc wrote: > As a end-dev that uses your library for a small time, it's an useful tool. > Disclaimer: I?m not the author of jsonschema Emphasis on /not/. I?m just another user of the library like you :) But cheers for the feedback! -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From yselivanov.ml at gmail.com Thu May 21 07:59:43 2015 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 21 May 2015 01:59:43 -0400 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> Message-ID: <555D744F.4000307@gmail.com> On 2015-05-21 1:29 AM, Demian Brecht wrote: [..] > My question is: Is there any reason up front anyone can see that this addition wouldn?t fly, or are others interested in the addition as well? > I think we should wait at least until json-schema.org releases a final version of the spec. Thanks, Yury From demianbrecht at gmail.com Thu May 21 08:18:08 2015 From: demianbrecht at gmail.com (Demian Brecht) Date: Wed, 20 May 2015 23:18:08 -0700 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: <555D744F.4000307@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <555D744F.4000307@gmail.com> Message-ID: <9E094141-A7EF-44B1-B713-301F9D9524E9@gmail.com> > On May 20, 2015, at 10:59 PM, Yury Selivanov wrote: > I think we should wait at least until json-schema.org releases a final version of the spec. I?d thought about that as well, but here were the arguments that I could think of that led me to proposing this in the first place: The latest draft of the RFC expired Jan 31, 2013. I?d have to try to reach out to the author(s) to confirm, but I?d venture to say there likely isn?t much more effort being put into it. The library is in heavy use and is useful in practice in its current state. I think that in situations like this practicality of a module should come first and finalized spec second. There are numerous places in the library that deviate from specs in the name of practical use. I?m not advocating that shouldn?t be an exception as opposed to the rule, I?m just saying that there are multiple things to consider prior to simply squashing an inclusion because of RFC draft state. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From stephen at xemacs.org Thu May 21 09:39:56 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 21 May 2015 16:39:56 +0900 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> Message-ID: <87y4kixttf.fsf@uwakimon.sk.tsukuba.ac.jp> Demian Brecht writes: > RFC draft: https://tools.ietf.org/html/draft-zyp-json-schema-04 I note that this draft, apparently written in Nov. 2011, expired almost two years ago with no update. OTOH, 4 other RFCs related to JSON (6901, 6902, 7386, 7396) have been published recently. (This kind of thing is common with RFCs; people get fed up with the process and just go off and do something that's "good enough" for them. But it does show they've given up on the process of getting a global standard at least for now.) Then in Oct 2012, Andy Newton wrote[1]: Schemas. There is no one standardized schema language for JSON, although several are presently in the works (including one by this author). The need for a JSON schema language is controversial?JSON is regarded by most as simple enough on its own. Indeed, there is no shortage of JSON-based interchange specification making due without schema formalism. and his independent proposal[2] (confusingly called "content rules") is current, expiring on June 5. (Note that there is no proposal currently being discussed by the IETF APPSAWG. Newton's proposal is independent, pending formation of a new charter for a JSON schema WG.) > My question is: Is there any reason up front anyone can see that > this addition wouldn?t fly? I would say that the evident controversy over which schema language will be standardized is a barrier, unless you can say that Newton's proposals have no support from the community or something like that. It's not a terribly high barrier in one sense (Python doesn't demand that modules be perfect in all ways), but you do have to address the perception of controversy, I think (at least to deny there really is any). A more substantive issue is that Appendix A of Newton's I-D certainly makes json-schema look "over the top" in verbosity of notation -- XML would be proud. If that assessment is correct, the module could be considered un-Pythonic (see Zen #4, and although JSON content rules are not themselves JSON while JSON schema is valid JSON, see Zen #9). N.B. I'm not against this proposal, just answering your question. I did see that somebody named James Newton-King (aka newtonsoft.com) has an implementation of json-schema for .NET, and json-schema.org seems to be in active development, which are arguments in favor of your proposal. Footnotes: [1] http://www.internetsociety.org/articles/using-json-ietf-protocols [2] https://tools.ietf.org/html/draft-newton-json-content-rules-04 From stephen at xemacs.org Thu May 21 09:52:07 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 21 May 2015 16:52:07 +0900 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: <555D744F.4000307@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <555D744F.4000307@gmail.com> Message-ID: <87wq02xt94.fsf@uwakimon.sk.tsukuba.ac.jp> Yury Selivanov writes: > I think we should wait at least until json-schema.org releases a > final version of the spec. If you mean an RFC, there are all kinds of reasons, some important, some just tedious, why a perfectly good spec never gets released as an RFC. I agree that the fact that none of the IETF, W3C, or ECMA has released a formal spec yet needs discussion. From p.f.moore at gmail.com Thu May 21 09:57:27 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 21 May 2015 08:57:27 +0100 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> Message-ID: On 21 May 2015 at 06:29, Demian Brecht wrote: > Has been publicly available for over a year: v0.1 released Jan 1, 2012, currently at 2.4.0 (released Sept 22, 2014) > Heavily used by the community: Currently sees ~585k downloads per month according to PyPI One key question that should be addressed as part of any proposal for inclusion into the stdlib. Would switching to having feature releases only when a new major Python version is released (with bugfixes at minor releases) be acceptable to the project? From the figures you quote, it sounds like there has been some rapid development, although things seem to have slowed down now, so maybe things are stable enough. Paul From stephen at xemacs.org Thu May 21 10:04:56 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 21 May 2015 17:04:56 +0900 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: <9E094141-A7EF-44B1-B713-301F9D9524E9@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <555D744F.4000307@gmail.com> <9E094141-A7EF-44B1-B713-301F9D9524E9@gmail.com> Message-ID: <87vbfmxsnr.fsf@uwakimon.sk.tsukuba.ac.jp> Demian Brecht writes: > The latest draft of the RFC expired Jan 31, 2013. Actually, expiration is more than half a year fresher: August 4, 2013. But AFAICT none of the schema proposals were RFC track at all, let alone normative. They're just in support of various other JSON-related IETF work. Steve From ncoghlan at gmail.com Thu May 21 11:15:20 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 21 May 2015 19:15:20 +1000 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> Message-ID: On 21 May 2015 at 17:57, Paul Moore wrote: > On 21 May 2015 at 06:29, Demian Brecht wrote: >> Has been publicly available for over a year: v0.1 released Jan 1, 2012, currently at 2.4.0 (released Sept 22, 2014) >> Heavily used by the community: Currently sees ~585k downloads per month according to PyPI > > One key question that should be addressed as part of any proposal for > inclusion into the stdlib. Would switching to having feature releases > only when a new major Python version is released (with bugfixes at > minor releases) be acceptable to the project? From the figures you > quote, it sounds like there has been some rapid development, although > things seem to have slowed down now, so maybe things are stable > enough. The other question to be answered these days is the value bundling offers over "pip install jsonschema" (or a platform specific equivalent). While it's still possible to meet that condition, it's harder now that we offer pip as a standard feature, especially since getting added to the standard library almost universally makes life more difficult for module maintainers if they're not already core developers. I'm not necessarily opposed to including JSON schema validation in general or jsonschema in particular (I've used it myself in the past and think it's a decent option if you want a bit more rigor in your data validation), but I'm also not sure how large an overlap there will be between "could benefit from using jsonschema", "has a spectacularly onerous package review process", and "can't already get jsonschema from an approved source". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From julian at grayvines.com Thu May 21 23:10:42 2015 From: julian at grayvines.com (Julian Berman) Date: Thu, 21 May 2015 14:10:42 -0700 Subject: [Python-ideas] Adding jsonschema to the standard library Message-ID: Hey, author here, thanks a lot Demian for even suggesting such a thing :). I'm really glad that people have found jsonschema useful. I actually tend these days to think similarly to what Nick mentioned, that the standard library really has decreased in importance as pip has shaped up and now been bundled -- so overall my personal opinion is that I wouldn't personally be pushing to get jsonschema in -- but! If you felt strongly, just some brief answers -- I think jsonschema would be able to cope with more restricted release cycles. And there are a few areas that I don't like about jsonschema (some APIs) which eventually I'd like to fix (RefResolver in particular), but for the most part I think it has stabilized more or less. I can provide some more details if there's any interest. Thanks again for even proposing such a thing :) -Julian On Thu, May 21, 2015 at 2:15 AM, wrote: > > ------------------------------ > > Message: 7 > Date: Thu, 21 May 2015 19:15:20 +1000 > From: Nick Coghlan > To: Paul Moore > Cc: Demian Brecht , Python-Ideas > > Subject: Re: [Python-ideas] Adding jsonschema to the standard library > Message-ID: > khvnsQ at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On 21 May 2015 at 17:57, Paul Moore wrote: > > On 21 May 2015 at 06:29, Demian Brecht wrote: > >> Has been publicly available for over a year: v0.1 released Jan 1, 2012, > currently at 2.4.0 (released Sept 22, 2014) > >> Heavily used by the community: Currently sees ~585k downloads per month > according to PyPI > > > > One key question that should be addressed as part of any proposal for > > inclusion into the stdlib. Would switching to having feature releases > > only when a new major Python version is released (with bugfixes at > > minor releases) be acceptable to the project? From the figures you > > quote, it sounds like there has been some rapid development, although > > things seem to have slowed down now, so maybe things are stable > > enough. > > The other question to be answered these days is the value bundling > offers over "pip install jsonschema" (or a platform specific > equivalent). While it's still possible to meet that condition, it's > harder now that we offer pip as a standard feature, especially since > getting added to the standard library almost universally makes life > more difficult for module maintainers if they're not already core > developers. > > I'm not necessarily opposed to including JSON schema validation in > general or jsonschema in particular (I've used it myself in the past > and think it's a decent option if you want a bit more rigor in your > data validation), but I'm also not sure how large an overlap there > will be between "could benefit from using jsonschema", "has a > spectacularly onerous package review process", and "can't already get > jsonschema from an approved source". > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri May 22 00:37:37 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 21 May 2015 18:37:37 -0400 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: References: Message-ID: On 5/21/2015 5:10 PM, Julian Berman wrote: > Hey, author here, thanks a lot Demian for even suggesting such a thing :). Welcome to python-ideas. > I'm really glad that people have found jsonschema useful. In response to Demian, the module initially strikes me, a non-json user, as too specialized for the stdlib, even if extremely useful to people within the specialty. The high pypi download rate could be interpreted as meaning that the module does not need to be in the stdlib to be discovered and used. > I actually tend these days to think similarly to what Nick mentioned, > that the standard library really has decreased in importance as pip has > shaped up and now been bundled -- so overall my personal opinion is that > I wouldn't personally be pushing to get jsonschema in -- but! If you > felt strongly, just some brief answers -- I think jsonschema would be > able to cope with more restricted release cycles. As a core developer, I can see a downside for you, so I would advise you to decline the invitation unless you see a stronger upside than is immediately obvious. > And there are a few areas that I don't like about jsonschema (some APIs) > which eventually I'd like to fix (RefResolver in particular), but for > the most part I think it has stabilized more or less. -- Terry Jan Reedy From benhoyt at gmail.com Fri May 22 03:18:24 2015 From: benhoyt at gmail.com (Ben Hoyt) Date: Thu, 21 May 2015 21:18:24 -0400 Subject: [Python-ideas] Enabling access to the AST for Python code Message-ID: Hi Python Ideas folks, (I previously posted a similar message on Python-Dev, but it's a better fit for this list. See that thread here: https://mail.python.org/pipermail/python-dev/2015-May/140063.html) Enabling access to the AST for compiled code would make some cool things possible (C# LINQ-style ORMs, for example), and not knowing too much about this part of Python internals, I'm wondering how possible and practical this would be. Context: PonyORM (http://ponyorm.com/) allows you to write regular Python generator expressions like this: select(c for c in Customer if sum(c.orders.price) > 1000) which compile into and run SQL like this: SELECT "c"."id" FROM "Customer" "c" LEFT JOIN "Order" "order-1" ON "c"."id" = "order-1"."customer" GROUP BY "c"."id" HAVING coalesce(SUM("order-1"."total_price"), 0) > 1000 I think the Pythonic syntax here is beautiful. But the tricks PonyORM has to go to get it are ... not quite so beautiful. Because the AST is not available, PonyORM decompiles Python bytecode into an AST first, and then converts that to SQL. (More details on all that from author's EuroPython talk at http://pyvideo.org/video/2968) PonyORM needs the AST just for generator expressions and lambda functions, but obviously if this kind of AST access feature were in Python it'd probably be more general. I believe C#'s LINQ provides something similar, where if you're developing a LINQ converter library (say LINQ to SQL), you essentially get the AST of the code ("expression tree") and the library can do what it wants with that. (I know that there's the "ast" module and ast.parse(), which can give you an AST given a *source string*, but that's not very convenient here.) What would it take to enable this kind of AST access in Python? Is it possible? Is it a good idea? -Ben From njs at pobox.com Fri May 22 03:40:25 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 21 May 2015 18:40:25 -0700 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: Message-ID: On Thu, May 21, 2015 at 6:18 PM, Ben Hoyt wrote: > Hi Python Ideas folks, > > (I previously posted a similar message on Python-Dev, but it's a > better fit for this list. See that thread here: > https://mail.python.org/pipermail/python-dev/2015-May/140063.html) > > Enabling access to the AST for compiled code would make some cool > things possible (C# LINQ-style ORMs, for example), and not knowing too > much about this part of Python internals, I'm wondering how possible > and practical this would be. What concretely are you imagining? I can imagine lots of possibilities with pretty different properties... e.g., one could have an '.ast' attribute attached to every code object, which always tracks the source that the code was compiled from. Or one could add a new (quasi)quoting syntax, like 'select(! c for c in Customer if sum(c.orders.price) > 1000)' where ! is a low-priority operator that simply returns the AST of whatever is written to the right of it. Or... lots of things, probably. -n -- Nathaniel J. Smith -- http://vorpus.org From abarnert at yahoo.com Fri May 22 03:51:34 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 21 May 2015 18:51:34 -0700 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: Message-ID: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> On May 21, 2015, at 18:18, Ben Hoyt wrote: > > (I know that there's the "ast" module and ast.parse(), which can give > you an AST given a *source string*, but that's not very convenient > here.) Why not? Python modules are distributed as source. You can pretty easily write an import hook to intercept module loading at the AST level and transform it however you want. Or just use MacroPy, which wraps up all the hard stuff (especially 2.x compatibility) and provides a huge framework of useful tools. What do you want to do that can't be done that way? For many uses, you don't even have to go that far--code objects remember their source file and line number, which you can usually use to retrieve the text and regenerate the AST. From benhoyt at gmail.com Fri May 22 03:57:50 2015 From: benhoyt at gmail.com (Ben Hoyt) Date: Thu, 21 May 2015 21:57:50 -0400 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: Message-ID: Not knowing too much about interpreter internals, I guess I was fishing somewhat for the range of possibilities. :-) But I was definitely thinking more along the lines of a "co_ast" attribute on code objects. The new syntax approach might be fun, but I'd think it's a lot more challenging and problematic to add new syntax. -Ben On Thu, May 21, 2015 at 9:40 PM, Nathaniel Smith wrote: > On Thu, May 21, 2015 at 6:18 PM, Ben Hoyt wrote: >> Hi Python Ideas folks, >> >> (I previously posted a similar message on Python-Dev, but it's a >> better fit for this list. See that thread here: >> https://mail.python.org/pipermail/python-dev/2015-May/140063.html) >> >> Enabling access to the AST for compiled code would make some cool >> things possible (C# LINQ-style ORMs, for example), and not knowing too >> much about this part of Python internals, I'm wondering how possible >> and practical this would be. > > What concretely are you imagining? I can imagine lots of possibilities > with pretty different properties... e.g., one could have an '.ast' > attribute attached to every code object, which always tracks the > source that the code was compiled from. Or one could add a new > (quasi)quoting syntax, like 'select(! c for c in Customer if > sum(c.orders.price) > 1000)' where ! is a low-priority operator that > simply returns the AST of whatever is written to the right of it. > Or... lots of things, probably. > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org From greg.ewing at canterbury.ac.nz Fri May 22 04:08:45 2015 From: greg.ewing at canterbury.ac.nz (Greg) Date: Fri, 22 May 2015 14:08:45 +1200 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> Message-ID: <555E8FAD.1060100@canterbury.ac.nz> On 22/05/2015 1:51 p.m., Andrew Barnert via Python-ideas wrote: > Or just use MacroPy, which > wraps up all the hard stuff (especially 2.x compatibility) and > provides a huge framework of useful tools. What do you want to do > that can't be done that way? You might not want to drag in a huge framework just to do one thing. -- Greg From benhoyt at gmail.com Fri May 22 04:10:15 2015 From: benhoyt at gmail.com (Ben Hoyt) Date: Thu, 21 May 2015 22:10:15 -0400 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> Message-ID: Huh, interesting idea. I've never used import hooks. Looks like the relevant macropy source code is here: https://github.com/lihaoyi/macropy/blob/master/macropy/core/import_hooks.py So basically you would do the following: 1) intercept the import 2) find the source code file yourself and read it 3) call ast.parse() on the source string 4) do anything you want to the AST, for example turn the "select(c for c in Customer if sum(c.orders.price) > 1000" into whatever SQL or other function calls 5) pass the massaged AST to compile(), execute it and return the module Hmmm, yeah, I think you're basically suggesting macro-like processing of the AST. Pretty cool, but not quite what I was thinking of ... I was thinking select() would get an AST object at runtime and do stuff with it. -Ben On Thu, May 21, 2015 at 9:51 PM, Andrew Barnert wrote: > On May 21, 2015, at 18:18, Ben Hoyt wrote: >> >> (I know that there's the "ast" module and ast.parse(), which can give >> you an AST given a *source string*, but that's not very convenient >> here.) > > Why not? Python modules are distributed as source. You can pretty easily write an import hook to intercept module loading at the AST level and transform it however you want. Or just use MacroPy, which wraps up all the hard stuff (especially 2.x compatibility) and provides a huge framework of useful tools. What do you want to do that can't be done that way? > > For many uses, you don't even have to go that far--code objects remember their source file and line number, which you can usually use to retrieve the text and regenerate the AST. From greg.ewing at canterbury.ac.nz Fri May 22 04:13:12 2015 From: greg.ewing at canterbury.ac.nz (Greg) Date: Fri, 22 May 2015 14:13:12 +1200 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: Message-ID: <555E90B8.7070600@canterbury.ac.nz> On 22/05/2015 1:57 p.m., Ben Hoyt wrote: > But I was definitely thinking more along the lines of a "co_ast" > attribute on code objects. The new syntax approach might be fun, but > I'd think it's a lot more challenging and problematic to add new > syntax. Advantages of new syntax: * More flexible: Any expression can be made into an AST, not just lambdas or genexps. * More efficient: No need to carry an AST around with every code object, the vast majority of which will never be used. Disadvantages of new syntax: * All the disadvantages of new syntax. -- Greg From yselivanov.ml at gmail.com Fri May 22 04:13:37 2015 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 21 May 2015 22:13:37 -0400 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> Message-ID: <555E90D1.7060404@gmail.com> Hi Ben, On 2015-05-21 10:10 PM, Ben Hoyt wrote: > Hmmm, yeah, I think you're basically suggesting macro-like processing > of the AST. Pretty cool, but not quite what I was thinking of ... I > was thinking select() would get an AST object at runtime and do stuff > with it. Unfortunately, it's not that easy. Storing AST would require a lot of extra memory in runtime. You have to somehow mark the places where you need it syntactically. I like how it's done in Rust: select!( ... ) Yury From benhoyt at gmail.com Fri May 22 04:15:23 2015 From: benhoyt at gmail.com (Ben Hoyt) Date: Thu, 21 May 2015 22:15:23 -0400 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> Message-ID: Oh wait, macropy already has this exact thing. They call it PINQ (kinda Python LINQ), and they're macro-compiling it to SQLAlchemy calls. https://github.com/lihaoyi/macropy#pinq-to-sqlalchemy Wow. -Ben On Thu, May 21, 2015 at 10:10 PM, Ben Hoyt wrote: > Huh, interesting idea. I've never used import hooks. Looks like the > relevant macropy source code is here: > > https://github.com/lihaoyi/macropy/blob/master/macropy/core/import_hooks.py > > So basically you would do the following: > > 1) intercept the import > 2) find the source code file yourself and read it > 3) call ast.parse() on the source string > 4) do anything you want to the AST, for example turn the "select(c for > c in Customer if sum(c.orders.price) > 1000" into whatever SQL or > other function calls > 5) pass the massaged AST to compile(), execute it and return the module > > Hmmm, yeah, I think you're basically suggesting macro-like processing > of the AST. Pretty cool, but not quite what I was thinking of ... I > was thinking select() would get an AST object at runtime and do stuff > with it. > > -Ben > > On Thu, May 21, 2015 at 9:51 PM, Andrew Barnert wrote: >> On May 21, 2015, at 18:18, Ben Hoyt wrote: >>> >>> (I know that there's the "ast" module and ast.parse(), which can give >>> you an AST given a *source string*, but that's not very convenient >>> here.) >> >> Why not? Python modules are distributed as source. You can pretty easily write an import hook to intercept module loading at the AST level and transform it however you want. Or just use MacroPy, which wraps up all the hard stuff (especially 2.x compatibility) and provides a huge framework of useful tools. What do you want to do that can't be done that way? >> >> For many uses, you don't even have to go that far--code objects remember their source file and line number, which you can usually use to retrieve the text and regenerate the AST. From ethan at stoneleaf.us Fri May 22 04:22:46 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 21 May 2015 19:22:46 -0700 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: Message-ID: <555E92F6.80803@stoneleaf.us> redirecting py-dev thread here On 05/21/2015 07:06 PM, Greg wrote: > On 22/05/2015 1:33 p.m., Ethan Furman wrote: >> Going back to the OP: >> >>> select(c for c in Customer if sum(c.orders.price) > 1000) >>> >>> which compile into and run SQL like this: >>> >>> SELECT "c"."id" >>> FROM "Customer" "c" >>> LEFT JOIN "Order" "order-1" ON "c"."id" = "order-1"."customer" >>> GROUP BY "c"."id" >>> HAVING coalesce(SUM("order-1"."total_price"), 0) > 1000 >> >> That last code is /not/ Python. ;) > > More importantly, it's not Python *semantics*. You can't view > it as simply a translation of the Python expression into a > different language. Ah, I think I see -- that 'select' isn't really doing anything is it? The 'if' clause is acting as the 'select' in the gen-exp. But then `sum(c.orders.price)` isn't really Python semantics either, is it... although it could be if it was souped up -- `c.orders` would have to return a customer-based object that was smart enough to return a list of whatever attribute was asked for. That'd be cool. -- ~Ethan~ From abarnert at yahoo.com Fri May 22 04:22:25 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 21 May 2015 19:22:25 -0700 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: <555E8FAD.1060100@canterbury.ac.nz> References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> <555E8FAD.1060100@canterbury.ac.nz> Message-ID: <042AA2E2-6FC8-480A-8C2E-A42AE941C5BA@yahoo.com> > On May 21, 2015, at 19:08, Greg wrote: > >> On 22/05/2015 1:51 p.m., Andrew Barnert via Python-ideas wrote: >> Or just use MacroPy, which >> wraps up all the hard stuff (especially 2.x compatibility) and >> provides a huge framework of useful tools. What do you want to do >> that can't be done that way? > > You might not want to drag in a huge framework just to > do one thing. But "all kinds of LINQ-style things, like ORMs" isn't just one thing. If you're going to build a huge framework, why not build it on top of another framework that does the hard part of the work for you? From abarnert at yahoo.com Fri May 22 04:37:29 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 21 May 2015 19:37:29 -0700 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> Message-ID: <94EFFB86-0672-4F80-944B-0B73C5107ED3@yahoo.com> On May 21, 2015, at 19:15, Ben Hoyt wrote: > > Oh wait, macropy already has this exact thing. They call it PINQ > (kinda Python LINQ), and they're macro-compiling it to SQLAlchemy > calls. I didn't even realize he'd included this when suggesting MacroPy. :) Anyway, most of his macros are pretty easy to read as sample code, so even if what he's done isn't exactly what you wanted, it should be a good foundation. > https://github.com/lihaoyi/macropy#pinq-to-sqlalchemy > > Wow. > > -Ben > >> On Thu, May 21, 2015 at 10:10 PM, Ben Hoyt wrote: >> Huh, interesting idea. I've never used import hooks. Looks like the >> relevant macropy source code is here: >> >> https://github.com/lihaoyi/macropy/blob/master/macropy/core/import_hooks.py If you wanted to do this yourself, and only need to support 3.4+, it's a lot easier than the way MacroPy does it. But of course it's even easier to just use MacroPy. >> So basically you would do the following: >> >> 1) intercept the import >> 2) find the source code file yourself and read it >> 3) call ast.parse() on the source string >> 4) do anything you want to the AST, for example turn the "select(c for >> c in Customer if sum(c.orders.price) > 1000" into whatever SQL or >> other function calls >> 5) pass the massaged AST to compile(), execute it and return the module >> >> Hmmm, yeah, I think you're basically suggesting macro-like processing >> of the AST. Pretty cool, but not quite what I was thinking of ... I >> was thinking select() would get an AST object at runtime and do stuff >> with it. If you really want to, you can build a trivial import hook that just attaches the ASTs (to everything, or only to specific code) and then ignore the code and process the AST at runtime. If you actually need to use runtime information in the processing, that might be worth it, but otherwise it seems like you're just wasting time transforming and compiling the AST on every request. Of course you could build in a cache if the information isn't really dynamic, but in that case, using the code object and .pyc as a cache is a lot simpler and probably more efficient. >> >> -Ben >> >>> On Thu, May 21, 2015 at 9:51 PM, Andrew Barnert wrote: >>>> On May 21, 2015, at 18:18, Ben Hoyt wrote: >>>> >>>> (I know that there's the "ast" module and ast.parse(), which can give >>>> you an AST given a *source string*, but that's not very convenient >>>> here.) >>> >>> Why not? Python modules are distributed as source. You can pretty easily write an import hook to intercept module loading at the AST level and transform it however you want. Or just use MacroPy, which wraps up all the hard stuff (especially 2.x compatibility) and provides a huge framework of useful tools. What do you want to do that can't be done that way? >>> >>> For many uses, you don't even have to go that far--code objects remember their source file and line number, which you can usually use to retrieve the text and regenerate the AST. From steve at pearwood.info Fri May 22 04:44:37 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 22 May 2015 12:44:37 +1000 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> Message-ID: <20150522024437.GY5663@ando.pearwood.info> On Thu, May 21, 2015 at 06:51:34PM -0700, Andrew Barnert via Python-ideas wrote: > On May 21, 2015, at 18:18, Ben Hoyt wrote: > > > > (I know that there's the "ast" module and ast.parse(), which can give > > you an AST given a *source string*, but that's not very convenient > > here.) > > Why not? Python modules are distributed as source. *Some* Python modules are distributed as source. Don't forget that byte-code only modules are officially supported. Functions may also be constructed dynamically, at runtime. Closures may have source code available for them, but functions and methods constructed with exec (such as those in namedtuples) do not. Also, the interactive interpreter is a very powerful tool, but it doesn't record the source code of functions you type into it. So there are at least three examples where the source is not available at all. Ben also talks about *convenience*: `func.ast` will always be more convenient than: import ast import parse ast.parse(inspect.getsource(func)) not to mention the wastefulness of parsing something which has already been parsed before. On the other hand, keeping the ast around even when it's not used wastes memory, so this is a classic time/space trade off. > You can pretty > easily write an import hook to intercept module loading at the AST > level and transform it however you want. Let's have a look at yours then, that ought to only take a minute or three :-) (That's my definition of "pretty easily".) I think that the majority of Python programmers have no idea that you can even write an import hook at all, let alone how to do it. -- Steve From abarnert at yahoo.com Fri May 22 05:00:15 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 21 May 2015 20:00:15 -0700 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: <20150522024437.GY5663@ando.pearwood.info> References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> <20150522024437.GY5663@ando.pearwood.info> Message-ID: <509DA1BB-AA08-4945-91FE-17E9546D3FDB@yahoo.com> LOn May 21, 2015, at 19:44, Steven D'Aprano wrote: > >> On Thu, May 21, 2015 at 06:51:34PM -0700, Andrew Barnert via Python-ideas wrote: >>> On May 21, 2015, at 18:18, Ben Hoyt wrote: >>> >>> (I know that there's the "ast" module and ast.parse(), which can give >>> you an AST given a *source string*, but that's not very convenient >>> here.) >> >> Why not? Python modules are distributed as source. > > *Some* Python modules are distributed as source. Don't forget that > byte-code only modules are officially supported. > > Functions may also be constructed dynamically, at runtime. Closures may > have source code available for them, but functions and methods > constructed with exec (such as those in namedtuples) do not. > > Also, the interactive interpreter is a very powerful tool, but it > doesn't record the source code of functions you type into it. > > So there are at least three examples where the source is not available > at all. By comparison, code objects that don't carry around their AST including everything running in any version of Python except maybe a future version that'll be out in a year and a half, if this idea gets accepted, and probably only in CPython. Plus, I'm pretty sure people would demand the ability to not waste memory and disk space on ASTs when they don't need them, so they still wouldn't be always available. > Ben also talks about *convenience*: `func.ast` will always be > more convenient than: > > import ast > import parse > ast.parse(inspect.getsource(func)) > > not to mention the wastefulness of parsing something which has already > been parsed before. > On the other hand, keeping the ast around even when > it's not used wastes memory, so this is a classic time/space trade off. > > >> You can pretty >> easily write an import hook to intercept module loading at the AST >> level and transform it however you want. > > Let's have a look at yours then, that ought to only take a minute or > three :-) Does "import macropy" count? That only took me a second or three. :) Certainly a _lot_ easier than hacking the CPython source, even for something as trivial as adding a new member to the code object and finding all the places to attach the AST. > (That's my definition of "pretty easily".) > > I think that the majority of Python programmers have no idea that you > can even write an import hook at all, let alone how to do it. Sure, because they have no need to do so. But it's very easy to learn. Especially after the changes in 3.3 and again in 3.4. During the discussion on Unicode operators that turned into a discussion on a Unicode empty set literal, I suggested an import hook, someone (possibly you?) challenged me to write one if it was so easy, and it took me under half an hour to learn the 3.4 system and implement one. (I'm sure it would be a lot faster this time. But probably not on my phone...) All that work to improve the import system really did pay off. By comparison, hacking in new syntax to CPython to play with operator sectioning yesterday took me about four hours. And of course anyone can download and use my import hook to get the empty set literal in any standard Python 3.4 or later, but anyone who wants to use my operator sectioning hacks has to clone my fork and build and install a new interpreter. From dw+python-ideas at hmmz.org Fri May 22 05:02:10 2015 From: dw+python-ideas at hmmz.org (David Wilson) Date: Fri, 22 May 2015 03:02:10 +0000 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: Message-ID: <20150522030210.GD515@k3> This sounds like a cool feature, though I'm not sure if exposing the AST directly on the code object is the best approach.. Attaching the AST to the code object implies serializing (and deserializing into nicely sparse heap allocations) it via .pyc files, since code objects are marshalled there. What about improving the parser so that exact start/end positions are recorded for function bodies? This might be represented as 2 cheap integers in RAM, allowing for a helper function in the compiler or inspect modules (inspect.ast()?) to handle the grunt work. Implementations like Micropython could just stub out those fields with -1 or whatever else if desired. One upside to direct attachment would be that a function returned by e.g. eval() with no underlying source file would still have its AST attached, without the caller having to keep hold of the unparsed string, but the downside of RAM/disk/potentially hefty deserialization performance seems to outweigh that. I also wish there was a nicer way of introducing an expression that was to be represented as an AST, but I think that would involve adding another language keyword, and simply overloading the meaning of generators slightly seems preferable to that. :) David On Thu, May 21, 2015 at 09:18:24PM -0400, Ben Hoyt wrote: > Hi Python Ideas folks, > > (I previously posted a similar message on Python-Dev, but it's a > better fit for this list. See that thread here: > https://mail.python.org/pipermail/python-dev/2015-May/140063.html) > > Enabling access to the AST for compiled code would make some cool > things possible (C# LINQ-style ORMs, for example), and not knowing too > much about this part of Python internals, I'm wondering how possible > and practical this would be. > > Context: PonyORM (http://ponyorm.com/) allows you to write regular > Python generator expressions like this: > > select(c for c in Customer if sum(c.orders.price) > 1000) > > which compile into and run SQL like this: > > SELECT "c"."id" > FROM "Customer" "c" > LEFT JOIN "Order" "order-1" ON "c"."id" = "order-1"."customer" > GROUP BY "c"."id" > HAVING coalesce(SUM("order-1"."total_price"), 0) > 1000 > > I think the Pythonic syntax here is beautiful. But the tricks PonyORM > has to go to get it are ... not quite so beautiful. Because the AST is > not available, PonyORM decompiles Python bytecode into an AST first, > and then converts that to SQL. (More details on all that from author's > EuroPython talk at http://pyvideo.org/video/2968) > > PonyORM needs the AST just for generator expressions and > lambda functions, but obviously if this kind of AST access feature > were in Python it'd probably be more general. > > I believe C#'s LINQ provides something similar, where if you're > developing a LINQ converter library (say LINQ to SQL), you essentially > get the AST of the code ("expression tree") and the library can do > what it wants with that. > > (I know that there's the "ast" module and ast.parse(), which can give > you an AST given a *source string*, but that's not very convenient > here.) > > What would it take to enable this kind of AST access in Python? Is it > possible? Is it a good idea? > > -Ben > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From techtonik at gmail.com Fri May 22 11:59:30 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 22 May 2015 12:59:30 +0300 Subject: [Python-ideas] Timer that starts as soon as it is imported Message-ID: Is the idea to have timer that starts on import is good? From phd at phdru.name Fri May 22 12:58:47 2015 From: phd at phdru.name (Oleg Broytman) Date: Fri, 22 May 2015 12:58:47 +0200 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: References: Message-ID: <20150522105847.GA9624@phdru.name> On Fri, May 22, 2015 at 12:59:30PM +0300, anatoly techtonik wrote: > Is the idea to have timer that starts on import is good? No, because: -- it could be imported at the wrong time; -- it couldn't be "reimported"; what is the usage of one-time timer? -- if it could be reset and restarted at need -- why not start it manually in the first place? Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From brett at python.org Fri May 22 16:40:04 2015 From: brett at python.org (Brett Cannon) Date: Fri, 22 May 2015 14:40:04 +0000 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> Message-ID: On Thu, May 21, 2015 at 10:10 PM Ben Hoyt wrote: > Huh, interesting idea. I've never used import hooks. Looks like the > relevant macropy source code is here: > > https://github.com/lihaoyi/macropy/blob/master/macropy/core/import_hooks.py > > So basically you would do the following: > > 1) intercept the import > 2) find the source code file yourself and read it > 3) call ast.parse() on the source string > 4) do anything you want to the AST, for example turn the "select(c for > c in Customer if sum(c.orders.price) > 1000" into whatever SQL or > other function calls > 5) pass the massaged AST to compile(), execute it and return the module > > Hmmm, yeah, I think you're basically suggesting macro-like processing > of the AST. Pretty cool, but not quite what I was thinking of ... I > was thinking select() would get an AST object at runtime and do stuff > with it. > Depending on what version of Python you are targeting, it's actually simpler than that even to get it into the import system: 1. Subclass importlib.machinery.SourceFileLoader and override source_to_code() to do your AST transformation and return your changed code object (basically your steps 3-5 above) 2. Set a path hook that uses an instance of importlib.machinery.FileFinder which utilizes your custom loader 3. There is no step 3 I know this isn't what you're after, but I just wanted to let you know importlib has made this sort of thing fairly trivial to implement. -Brett > > -Ben > > On Thu, May 21, 2015 at 9:51 PM, Andrew Barnert > wrote: > > On May 21, 2015, at 18:18, Ben Hoyt wrote: > >> > >> (I know that there's the "ast" module and ast.parse(), which can give > >> you an AST given a *source string*, but that's not very convenient > >> here.) > > > > Why not? Python modules are distributed as source. You can pretty easily > write an import hook to intercept module loading at the AST level and > transform it however you want. Or just use MacroPy, which wraps up all the > hard stuff (especially 2.x compatibility) and provides a huge framework of > useful tools. What do you want to do that can't be done that way? > > > > For many uses, you don't even have to go that far--code objects remember > their source file and line number, which you can usually use to retrieve > the text and regenerate the AST. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benhoyt at gmail.com Fri May 22 16:52:45 2015 From: benhoyt at gmail.com (Ben Hoyt) Date: Fri, 22 May 2015 10:52:45 -0400 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> Message-ID: Good to know -- thanks! -Ben On Fri, May 22, 2015 at 10:40 AM, Brett Cannon wrote: > > > On Thu, May 21, 2015 at 10:10 PM Ben Hoyt wrote: >> >> Huh, interesting idea. I've never used import hooks. Looks like the >> relevant macropy source code is here: >> >> >> https://github.com/lihaoyi/macropy/blob/master/macropy/core/import_hooks.py >> >> So basically you would do the following: >> >> 1) intercept the import >> 2) find the source code file yourself and read it >> 3) call ast.parse() on the source string >> 4) do anything you want to the AST, for example turn the "select(c for >> c in Customer if sum(c.orders.price) > 1000" into whatever SQL or >> other function calls >> 5) pass the massaged AST to compile(), execute it and return the module >> >> Hmmm, yeah, I think you're basically suggesting macro-like processing >> of the AST. Pretty cool, but not quite what I was thinking of ... I >> was thinking select() would get an AST object at runtime and do stuff >> with it. > > > Depending on what version of Python you are targeting, it's actually simpler > than that even to get it into the import system: > > Subclass importlib.machinery.SourceFileLoader and override source_to_code() > to do your AST transformation and return your changed code object (basically > your steps 3-5 above) > Set a path hook that uses an instance of importlib.machinery.FileFinder > which utilizes your custom loader > There is no step 3 > > I know this isn't what you're after, but I just wanted to let you know > importlib has made this sort of thing fairly trivial to implement. > > -Brett > >> >> >> -Ben >> >> On Thu, May 21, 2015 at 9:51 PM, Andrew Barnert >> wrote: >> > On May 21, 2015, at 18:18, Ben Hoyt wrote: >> >> >> >> (I know that there's the "ast" module and ast.parse(), which can give >> >> you an AST given a *source string*, but that's not very convenient >> >> here.) >> > >> > Why not? Python modules are distributed as source. You can pretty easily >> > write an import hook to intercept module loading at the AST level and >> > transform it however you want. Or just use MacroPy, which wraps up all the >> > hard stuff (especially 2.x compatibility) and provides a huge framework of >> > useful tools. What do you want to do that can't be done that way? >> > >> > For many uses, you don't even have to go that far--code objects remember >> > their source file and line number, which you can usually use to retrieve the >> > text and regenerate the AST. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ From demianbrecht at gmail.com Fri May 22 18:39:34 2015 From: demianbrecht at gmail.com (Demian Brecht) Date: Fri, 22 May 2015 09:39:34 -0700 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> Message-ID: <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> First off, thanks all for the well thought out responses! Will try to touch on each point when I get a few spare cycles throughout the day. > On May 21, 2015, at 2:15 AM, Nick Coghlan wrote: > > The other question to be answered these days is the value bundling > offers over "pip install jsonschema" (or a platform specific > equivalent). While it's still possible to meet that condition, it's > harder now that we offer pip as a standard feature, especially since > getting added to the standard library almost universally makes life > more difficult for module maintainers if they're not already core > developers. This is an interesting problem and a question that I?ve had at the back of my mind as well. With the addition of pip, there is really no additional value /to those who already know about the package and what problem it solves/. In my mind, the value of bundling anything nowadays really boils down to ?this is the suggested de facto standard of solving problem [X] using Python?. I see two problems with relying on pip and PyPI as an alternative to bundling: 1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice. 2. You generally won't know about packages that don?t solve problems you?ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn?t even know were a thing. Likewise with jsonschema, I wouldn?t have known it was a thing had a co-worker not introduced me to it a couple years ago. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From graffatcolmingov at gmail.com Fri May 22 21:08:47 2015 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Fri, 22 May 2015 14:08:47 -0500 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> Message-ID: On Fri, May 22, 2015 at 11:39 AM, Demian Brecht wrote: > First off, thanks all for the well thought out responses! Will try to touch on each point when I get a few spare cycles throughout the day. > >> On May 21, 2015, at 2:15 AM, Nick Coghlan wrote: >> >> The other question to be answered these days is the value bundling >> offers over "pip install jsonschema" (or a platform specific >> equivalent). While it's still possible to meet that condition, it's >> harder now that we offer pip as a standard feature, especially since >> getting added to the standard library almost universally makes life >> more difficult for module maintainers if they're not already core >> developers. > > This is an interesting problem and a question that I?ve had at the back of my mind as well. With the addition of pip, there is really no additional value /to those who already know about the package and what problem it solves/. In my mind, the value of bundling anything nowadays really boils down to ?this is the suggested de facto standard of solving problem [X] using Python?. I see two problems with relying on pip and PyPI as an alternative to bundling: Counter-point: What library is the de facto standard of doing HTTP in Python? Requests is, of course. Discussion of its inclusion has happened several times and each time the decision is to not include it. The most recent such discussion was at the Language Summit at PyCon 2015 in Montreal. If you want to go by download count, then Requests should still be in the standard library but it just will not happen. > 1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice. That's not exactly true in every case. The only library that parses and emits YAML is PyYAML. It's both unmaintained, incomplete, and full of bugs. That said, it's the de facto standard and it's the only onw of its kind that I know of on PyPI. I would vehemently argue against its inclusion were it ever purposed. > 2. You generally won't know about packages that don?t solve problems you?ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn?t even know were a thing. Likewise with jsonschema, I wouldn?t have known it was a thing had a co-worker not introduced me to it a couple years ago. Counter-point, once you know you want to use JSON Schema looking for implementations in python yields Julian's implementation first. You said (paraphrasing) in your first email that jsonschema should only be excluded from the stdlib if people could bring up reasons against it. The standard library has grown in the past few releases but that doesn't mean it needs to grow every time. It also means it doesn't need to grow to include an implementation of every possible /thing/ that exists. Further, leaving it up to others to prove why it shouldn't be included isn't sufficient. You have to prove to the community why it MUST be included. Saying "Ah let's throw this thing in there anyway because why not" isn't valid. By that logic, I could nominate several libraries that I find useful in day-to-day work and the barrier to entry would be exactly as much energy as people who care about the standard library are willing to expend to keep the less than sultry candidates out. In this case, that /thing/ is JSON Schema. Last I checked, JSON Schema was a IETF Draft that was never accepted and a specification which expired. That means in a couple years, ostensibly after this was added to the stdlib, it could be made completely irrelevant and the time to fix it would be incredible. That would be far less of an issue if jsonschema were not included at all. Overall, I'm strongly against its inclusion. Not because the library isn't excellent. It is. I use it. I'm strongly against it for the reasons listed above. From donald at stufft.io Fri May 22 21:23:14 2015 From: donald at stufft.io (Donald Stufft) Date: Fri, 22 May 2015 15:23:14 -0400 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> Message-ID: > On May 22, 2015, at 3:08 PM, Ian Cordasco wrote: > >> >> 1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice. > > That's not exactly true in every case. The only library that parses > and emits YAML is PyYAML. It's both unmaintained, incomplete, and full > of bugs. That said, it's the de facto standard and it's the only onw > of its kind that I know of on PyPI. I would vehemently argue against > its inclusion were it ever purposed. > >> 2. You generally won't know about packages that don?t solve problems you?ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn?t even know were a thing. Likewise with jsonschema, I wouldn?t have known it was a thing had a co-worker not introduced me to it a couple years ago. > > Counter-point, once you know you want to use JSON Schema looking for > implementations in python yields Julian's implementation first. I think a future area of work is going to be on improving the ability for people who don't know what they want to find out that they want something and which thing they want on PyPI. I'm not entirely sure what this is going to look like but I think it's an important problem. It's being solved for very specific cases by starting to have the standard documentation explicitly call out these defacto standards of the Python ecosystem where it makes sense. This of course does not scale to every single problem domain or module on PyPI so we still need a more general solution. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From abarnert at yahoo.com Fri May 22 21:24:26 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 22 May 2015 12:24:26 -0700 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> Message-ID: <16BABB0C-6CB3-447D-A6B2-223A8D985674@yahoo.com> On May 22, 2015, at 09:39, Demian Brecht wrote: > In my mind, the value of bundling anything nowadays really boils down to ?this is the suggested de facto standard of solving problem [X] using Python?. The other way of saying that is to say it explicitly in the stdlib docs, usage docs, and/or tutorial and link to the package. While that used to be pretty rare, that's changed recently. Off the top of my head, there are links to setuptools, requests, nose, py.test, Pillow, PyObjC, py2app, PyWin32, WConio, Console, UniCurses, Urwid, the major alternative GUI frameworks, Twisted, and pexpect. So, if you wrote something to put in the json module docs, the input/output section of the tutorial, or a howto explaining that if you want structured and validated JSON the usual standard is JSON Schema and the jsonschema library can do it for you in Python, that would get most of the same benefits as adding jsonschema to the stdlib without most of the costs. > I see two problems with relying on pip and PyPI as an alternative to bundling: In general, there's a potentially much bigger reason: some projects can't use arbitrary third-party projects without a costly vetting process, or need to work on machines that don't have Internet access or don't have a way to install user site-packages or virtualenvs, etc. Fortunately, those kinds of problems aren't likely to come up for the kinds of projects that need JSON Schema (e.g., Internet servers, client frameworks that are themselves installed via pip, client apps that are distributed by bundling with cx_Freeze/py2app/etc.). > 1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice. Usually this is a strength, not a weakness. Until one project really is good enough to become the de facto standard, you wouldn't want to limit the competition, right? The problem traditionally has been that once something _does_ reach that point, there's no way to make that clear--but now that the stdlib docs link to outside projects, there's a solution. > 2. You generally won't know about packages that don?t solve problems you?ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn?t even know were a thing. Likewise with jsonschema, I wouldn?t have known it was a thing had a co-worker not introduced me to it a couple years ago. From p.andrefreitas at gmail.com Sat May 23 01:08:40 2015 From: p.andrefreitas at gmail.com (=?UTF-8?Q?Andr=C3=A9_Freitas?=) Date: Fri, 22 May 2015 23:08:40 +0000 Subject: [Python-ideas] Cmake as build system Message-ID: Hi, What you think about using Cmake build system? I see advantages such as: - Cross-plataform; - Supported in Clion IDE (amazing C/C++ IDE, breakpoints, etc); - Simple and easy to use (Zen of Python :) https://www.python.org/dev/peps/pep-0020/ ); I was actually seeing a discussion in python-commiters about Windows 7 buildbots failing. Found that someone already had the same idea but don't know if it was shared here: http://www.vtk.org/Wiki/BuildingPythonWithCMake Please share your thoughts. Regards, Andr? Freitas -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Sat May 23 01:48:52 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 22 May 2015 18:48:52 -0500 Subject: [Python-ideas] Cmake as build system In-Reply-To: References: Message-ID: HAHAHA!! Good luck! I've raised this issue before. Twice. Autotools sucks. And makes cross-compiling a pain in the neck. Bottom line was: - C++ is a big dependency - The autotools build system has been tested already on lots and lots and lots of platforms - Nobody has even implemented an alternative build system for Python 3 yet (python-cmake is only for Python 2) - No one can agree on a best build system (for instance, I hate CMake!) On Fri, May 22, 2015 at 6:08 PM, Andr? Freitas wrote: > Hi, > What you think about using Cmake build system? > > I see advantages such as: > - Cross-plataform; > - Supported in Clion IDE (amazing C/C++ IDE, breakpoints, etc); > - Simple and easy to use (Zen of Python :) > https://www.python.org/dev/peps/pep-0020/ ); > > I was actually seeing a discussion in python-commiters about Windows 7 > buildbots failing. Found that someone already had the same idea but don't > know if it was shared here: > http://www.vtk.org/Wiki/BuildingPythonWithCMake > > Please share your thoughts. > > Regards, > Andr? Freitas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.andrefreitas at gmail.com Sat May 23 02:08:55 2015 From: p.andrefreitas at gmail.com (=?UTF-8?Q?Andr=C3=A9_Freitas?=) Date: Sat, 23 May 2015 01:08:55 +0100 Subject: [Python-ideas] Cmake as build system In-Reply-To: References: Message-ID: Hi, Thanks for sharing Ryan Gonzalez :) It just could be another alternative and not a replacement of autotools. Not only about the cross-platform feature of Cmake but the integration with modern IDEs. I really see an improvement in productivity using the IDE debugger (e.g Clion) instead of using prints everywhere ( http://programmers.stackexchange.com/questions/78152/real-programmers-use-debuggers ). 2015-05-23 0:48 GMT+01:00 Ryan Gonzalez : > HAHAHA!! > > Good luck! I've raised this issue before. Twice. Autotools sucks. And > makes cross-compiling a pain in the neck. Bottom line was: > > - C++ is a big dependency > - The autotools build system has been tested already on lots and lots and > lots of platforms > - Nobody has even implemented an alternative build system for Python 3 yet > (python-cmake is only for Python 2) > - No one can agree on a best build system (for instance, I hate CMake!) > > > On Fri, May 22, 2015 at 6:08 PM, Andr? Freitas > wrote: > >> Hi, >> What you think about using Cmake build system? >> >> I see advantages such as: >> - Cross-plataform; >> - Supported in Clion IDE (amazing C/C++ IDE, breakpoints, etc); >> - Simple and easy to use (Zen of Python :) >> https://www.python.org/dev/peps/pep-0020/ ); >> >> I was actually seeing a discussion in python-commiters about Windows 7 >> buildbots failing. Found that someone already had the same idea but don't >> know if it was shared here: >> http://www.vtk.org/Wiki/BuildingPythonWithCMake >> >> Please share your thoughts. >> >> Regards, >> Andr? Freitas >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > Ryan > [ERROR]: Your autotools build scripts are 200 lines longer than your > program. Something?s wrong. > http://kirbyfan64.github.io/ > > -- Andr? Freitas p.andrefreitas at gmail.com "Imagination is more important than knowledge" - Albert Einstein *google+* Andr?Freitas92 *linkedin* pandrefreitas *github* andrefreitas *website* www.andrefreitas.pt Esta mensagem pode conter informa??o confidencial ou privilegiada, sendo seu sigilo protegido por lei. Se voc? n?o for o destinat?rio ou a pessoa autorizada a receber esta mensagem, n?o pode usar, copiar ou divulgar as informa??es nela contidas ou tomar qualquer a??o baseada nessas informa??es. Se voc? recebeu esta mensagem por engano, por favor, avise imediatamente ao remetente, respondendo o e-mail e em seguida apague-a. Agradecemos a sua coopera??o. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat May 23 03:45:22 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 22 May 2015 18:45:22 -0700 Subject: [Python-ideas] Cmake as build system In-Reply-To: References: Message-ID: <9A12460B-8A5E-44CC-BF8B-D1EBF23EF9B4@yahoo.com> On May 22, 2015, at 17:08, Andr? Freitas wrote: > > Hi, > Thanks for sharing Ryan Gonzalez :) > > It just could be another alternative and not a replacement of autotools. Not only about the cross-platform feature of Cmake but the integration with modern IDEs. I really see an improvement in productivity using the IDE debugger (e.g Clion) instead of using prints everywhere (http://programmers.stackexchange.com/questions/78152/real-programmers-use-debuggers). What's stopping you from using an IDE debugger? I've run CPython itself or other similarly complex projects under Xcode, Eclipse, Visual Studio, WinDebug, ggdb, and other graphical debuggers without them having to understand how the code got built. If Clion can't do the same, that sounds like a problem with Clion. (Although personally, I usually find it easier to debug interpreters or other complex CLI programs just running gdb/lldb/whatever on the terminal.) > 2015-05-23 0:48 GMT+01:00 Ryan Gonzalez : >> HAHAHA!! >> >> Good luck! I've raised this issue before. Twice. Autotools sucks. And makes cross-compiling a pain in the neck. Bottom line was: >> >> - C++ is a big dependency >> - The autotools build system has been tested already on lots and lots and lots of platforms >> - Nobody has even implemented an alternative build system for Python 3 yet (python-cmake is only for Python 2) >> - No one can agree on a best build system (for instance, I hate CMake!) >> >> >>> On Fri, May 22, 2015 at 6:08 PM, Andr? Freitas wrote: >>> Hi, >>> What you think about using Cmake build system? >>> >>> I see advantages such as: >>> - Cross-plataform; >>> - Supported in Clion IDE (amazing C/C++ IDE, breakpoints, etc); >>> - Simple and easy to use (Zen of Python :) https://www.python.org/dev/peps/pep-0020/ ); >>> >>> I was actually seeing a discussion in python-commiters about Windows 7 buildbots failing. Found that someone already had the same idea but don't know if it was shared here: http://www.vtk.org/Wiki/BuildingPythonWithCMake >>> >>> Please share your thoughts. >>> >>> Regards, >>> Andr? Freitas >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> -- >> Ryan >> [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. >> http://kirbyfan64.github.io/ > > > > -- > Andr? Freitas > p.andrefreitas at gmail.com > "Imagination is more important than knowledge" - Albert Einstein > google+ Andr?Freitas92 > linkedin pandrefreitas > github andrefreitas > website www.andrefreitas.pt > Esta mensagem pode conter informa??o confidencial ou privilegiada, sendo seu sigilo protegido por lei. Se voc? n?o for o destinat?rio ou a pessoa autorizada a receber esta mensagem, n?o pode usar, copiar ou divulgar as informa??es nela contidas ou tomar qualquer a??o baseada nessas informa??es. Se voc? recebeu esta mensagem por engano, por favor, avise imediatamente ao remetente, respondendo o e-mail e em seguida apague-a. Agradecemos a sua coopera??o. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat May 23 04:08:31 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 22 May 2015 19:08:31 -0700 Subject: [Python-ideas] Cmake as build system In-Reply-To: References: Message-ID: <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com> Sorry, meant to include this in my previous reply, but I accidentally cut and didn't paste... Sent from my iPhone > On May 22, 2015, at 17:08, Andr? Freitas wrote: > > Hi, > Thanks for sharing Ryan Gonzalez :) > > It just could be another alternative and not a replacement of autotools. If the problem is that the autotools build system is a nightmare to maintain, how is having two completely different complex build systems that have to be kept perfectly in sync not going to be an even bigger nightmare? > Not only about the cross-platform feature of Cmake but the integration with modern IDEs. I really see an improvement in productivity using the IDE debugger (e.g Clion) instead of using prints everywhere (http://programmers.stackexchange.com/questions/78152/real-programmers-use-debuggers). Why did you link to a question that was migrated and then closed as not constructive, and that was written to argue that debuggers are useless, and whose contrary answers only talk about command-line debugging rather than whether a GUI wrapper can help debugging? That seems to argue against your case, not for it... > 2015-05-23 0:48 GMT+01:00 Ryan Gonzalez : >> HAHAHA!! >> >> Good luck! I've raised this issue before. Twice. Autotools sucks. And makes cross-compiling a pain in the neck. Bottom line was: >> >> - C++ is a big dependency >> - The autotools build system has been tested already on lots and lots and lots of platforms >> - Nobody has even implemented an alternative build system for Python 3 yet (python-cmake is only for Python 2) >> - No one can agree on a best build system (for instance, I hate CMake!) >> >> >>> On Fri, May 22, 2015 at 6:08 PM, Andr? Freitas wrote: >>> Hi, >>> What you think about using Cmake build system? >>> >>> I see advantages such as: >>> - Cross-plataform; >>> - Supported in Clion IDE (amazing C/C++ IDE, breakpoints, etc); >>> - Simple and easy to use (Zen of Python :) https://www.python.org/dev/peps/pep-0020/ ); >>> >>> I was actually seeing a discussion in python-commiters about Windows 7 buildbots failing. Found that someone already had the same idea but don't know if it was shared here: http://www.vtk.org/Wiki/BuildingPythonWithCMake >>> >>> Please share your thoughts. >>> >>> Regards, >>> Andr? Freitas >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> -- >> Ryan >> [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. >> http://kirbyfan64.github.io/ > > > > -- > Andr? Freitas > p.andrefreitas at gmail.com > "Imagination is more important than knowledge" - Albert Einstein > google+ Andr?Freitas92 > linkedin pandrefreitas > github andrefreitas > website www.andrefreitas.pt > Esta mensagem pode conter informa??o confidencial ou privilegiada, sendo seu sigilo protegido por lei. Se voc? n?o for o destinat?rio ou a pessoa autorizada a receber esta mensagem, n?o pode usar, copiar ou divulgar as informa??es nela contidas ou tomar qualquer a??o baseada nessas informa??es. Se voc? recebeu esta mensagem por engano, por favor, avise imediatamente ao remetente, respondendo o e-mail e em seguida apague-a. Agradecemos a sua coopera??o. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat May 23 04:59:21 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 23 May 2015 11:59:21 +0900 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> Message-ID: <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> Donald Stufft writes: > I think a future area of work is going to be on improving the > ability for people who don't know what they want to find out that > they want something and which thing they want on PyPI. I'm not > entirely sure what this is going to look like +1 > but I think it's an important problem. +1 > It's being solved for very specific cases by starting to have the > standard documentation explicitly call out these defacto standards > of the Python ecosystem where it makes sense. Because that's necessarily centralized, it's a solution to a different problem. We need a decentralized approach to deal with the "people who use package X often would benefit from Y too, but don't know where to find Y or which implementation to use." IOW, there needs to be a way for X to recommend implementation Z (or implementations Z1 or Z2) of Y. > This of course does not scale to every single problem domain or > module on PyPI so we still need a more general solution. The only way we know to scale a web is to embed the solution in the nodes. Currently many packages know what they use internally (the install_requires field), but as far as I can see there's no way for a package X to recommend "related" packages Z to implement function Y in applications using X. Eg, the plethora of ORMs available, some of which work better with particular packages than others do. We could also recommend that package maintainers document such recommendations, preferably in a fairly standard place, in their package documentation. Even something like "I've successfully used Z to do Y in combination with this package" would often help a lot. If a maintainer (obvious extension: 3rd party recommendations and voting) wants to recommend other packages that work and play well with her package but aren't essential to its function, how about a dictionary mapping Trove classifiers to lists of recommended packages for that implmenentation? From abarnert at yahoo.com Sat May 23 07:07:34 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 22 May 2015 22:07:34 -0700 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <8DF1D599-2DF9-4F36-8235-BAF23B3E0076@yahoo.com> On May 22, 2015, at 19:59, Stephen J. Turnbull wrote: > > Donald Stufft writes: > >> I think a future area of work is going to be on improving the >> ability for people who don't know what they want to find out that >> they want something and which thing they want on PyPI. I'm not >> entirely sure what this is going to look like > > +1 > >> but I think it's an important problem. > > +1 > >> It's being solved for very specific cases by starting to have the >> standard documentation explicitly call out these defacto standards >> of the Python ecosystem where it makes sense. > > Because that's necessarily centralized, it's a solution to a different > problem. We need a decentralized approach to deal with the "people > who use package X often would benefit from Y too, but don't know where > to find Y or which implementation to use." IOW, there needs to be a > way for X to recommend implementation Z (or implementations Z1 or Z2) > of Y. > >> This of course does not scale to every single problem domain or >> module on PyPI so we still need a more general solution. > > The only way we know to scale a web is to embed the solution in the > nodes. Currently many packages know what they use internally (the > install_requires field), but as far as I can see there's no way for a > package X to recommend "related" packages Z to implement function Y in > applications using X. Eg, the plethora of ORMs available, some of > which work better with particular packages than others do. > > We could also recommend that package maintainers document such > recommendations, preferably in a fairly standard place, in their > package documentation. Even something like "I've successfully used Z > to do Y in combination with this package" would often help a lot. > > If a maintainer (obvious extension: 3rd party recommendations and > voting) wants to recommend other packages that work and play well with > her package but aren't essential to its function, how about a > dictionary mapping Trove classifiers to lists of recommended packages > for that implmenentation? This is a really cool idea, but it would help to have some specific examples. For example, BeautifulSoup can only use html5lib or lxml as optional HTML parsers, and lxml as an optional XML parser; nothing else will do any good. But it works well with any HTTP request engine, so any "global" recommendation is a good idea, so it should get the same list (say, requests, urllib3, grequests, pycurl) as any other project that wants to suggest an HTTP request engine. And as for scraper frameworks, that should look at the global recommendations, but restricted to the ones that use, or can use, BeautifulSoup. I'm not sure how to reasonably represent all three of those things in a node. Of course it's quite possible that I jumped right to a particularly hard example with unique problems that don't need to be solved in general, and really only the first one is necessary, in which case this is a much simpler problem... From stephen at xemacs.org Sat May 23 08:55:16 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 23 May 2015 15:55:16 +0900 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: <8DF1D599-2DF9-4F36-8235-BAF23B3E0076@yahoo.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <8DF1D599-2DF9-4F36-8235-BAF23B3E0076@yahoo.com> Message-ID: <87fv6nhjfv.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > > If a maintainer (obvious extension: 3rd party recommendations and > > voting) wants to recommend other packages that work and play well with > > her package but aren't essential to its function, how about a > > dictionary mapping Trove classifiers to lists of recommended packages > > for that implmenentation? > > This is a really cool idea, but it would help to have some specific examples. > > For example, BeautifulSoup can only use html5lib or lxml as > optional HTML parsers, and lxml as an optional XML parser; nothing > else will do any good. But it works well with any HTTP request > engine, so any "global" recommendation is a good idea, so it should > get the same list (say, requests, urllib3, grequests, pycurl) as > any other project that wants to suggest an HTTP request engine. And > as for scraper frameworks, that should look at the global > recommendations, but restricted to the ones that use, or can use, > BeautifulSoup. I'm not sure how to reasonably represent all three > of those things in a node. Well, #2 is easy. You just have a special "global" node that has the same kind of classifier->package map, and link to that. I don't think #3 can be handled so easily, and probably it's not really worth it complexifying things that far at first -- I think you probably need most of SQL to express such constraints. I suspect that I would handle #3 with a special sort of "group" package, that just requires certain classifiers and then recommends implementations of them that work well together. It would be easy for the database to automatically update a group's recommended implementations to point to the group (which would be yet another new attribute for the package). I'll take a look at the whole shebang and see if I can come up with something a bit more elegant than the crockery of adhoc-ery above, but it will be at least next week before I have anything to say. Steve From p.andrefreitas at gmail.com Sat May 23 13:00:22 2015 From: p.andrefreitas at gmail.com (=?UTF-8?Q?Andr=C3=A9_Freitas?=) Date: Sat, 23 May 2015 11:00:22 +0000 Subject: [Python-ideas] Cmake as build system In-Reply-To: <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com> References: <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com> Message-ID: Andrew, Thanks for sharing your thoughts. I am trying to write a Cmake file for cpython for those that want to contribute using Clion IDE and I will put it on a public repository. If it can be useful for some developers is worth sharing. Best regards, Andr? Freitas Em s?b, 23 de mai de 2015 ?s 03:08, Andrew Barnert escreveu: > Sorry, meant to include this in my previous reply, but I accidentally cut > and didn't paste... > > Sent from my iPhone > > On May 22, 2015, at 17:08, Andr? Freitas wrote: > > Hi, > > Thanks for sharing Ryan Gonzalez :) > > It just could be another alternative and not a replacement of autotools. > > > If the problem is that the autotools build system is a nightmare to > maintain, how is having two completely different complex build systems that > have to be kept perfectly in sync not going to be an even bigger nightmare? > > Not only about the cross-platform feature of Cmake but the integration > with modern IDEs. I really see an improvement in productivity using the IDE > debugger (e.g Clion) instead of using prints everywhere ( > http://programmers.stackexchange.com/questions/78152/real-programmers-use-debuggers > ). > > > Why did you link to a question that was migrated and then closed as not > constructive, and that was written to argue that debuggers are useless, and > whose contrary answers only talk about command-line debugging rather than > whether a GUI wrapper can help debugging? That seems to argue against your > case, not for it... > > 2015-05-23 0:48 GMT+01:00 Ryan Gonzalez : > >> HAHAHA!! >> >> Good luck! I've raised this issue before. Twice. Autotools sucks. And >> makes cross-compiling a pain in the neck. Bottom line was: >> >> - C++ is a big dependency >> - The autotools build system has been tested already on lots and lots and >> lots of platforms >> - Nobody has even implemented an alternative build system for Python 3 >> yet (python-cmake is only for Python 2) >> - No one can agree on a best build system (for instance, I hate CMake!) >> >> >> On Fri, May 22, 2015 at 6:08 PM, Andr? Freitas >> wrote: >> >>> Hi, >>> What you think about using Cmake build system? >>> >>> I see advantages such as: >>> - Cross-plataform; >>> - Supported in Clion IDE (amazing C/C++ IDE, breakpoints, etc); >>> - Simple and easy to use (Zen of Python :) >>> https://www.python.org/dev/peps/pep-0020/ ); >>> >>> I was actually seeing a discussion in python-commiters about Windows 7 >>> buildbots failing. Found that someone already had the same idea but don't >>> know if it was shared here: >>> http://www.vtk.org/Wiki/BuildingPythonWithCMake >>> >>> Please share your thoughts. >>> >>> Regards, >>> Andr? Freitas >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> >> >> -- >> Ryan >> [ERROR]: Your autotools build scripts are 200 lines longer than your >> program. Something?s wrong. >> http://kirbyfan64.github.io/ >> >> > > > > -- > Andr? Freitas > p.andrefreitas at gmail.com > "Imagination is more important than knowledge" - Albert Einstein > *google+* Andr?Freitas92 > *linkedin* pandrefreitas > *github* andrefreitas > *website* www.andrefreitas.pt > Esta mensagem pode conter informa??o confidencial ou privilegiada, sendo > seu sigilo protegido por lei. Se voc? n?o for o destinat?rio ou a pessoa > autorizada a receber esta mensagem, n?o pode usar, copiar ou divulgar as > informa??es nela contidas ou tomar qualquer a??o baseada nessas > informa??es. Se voc? recebeu esta mensagem por engano, por favor, avise > imediatamente ao remetente, respondendo o e-mail e em seguida apague-a. > Agradecemos a sua coopera??o. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat May 23 16:21:48 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 24 May 2015 00:21:48 +1000 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 23 May 2015 at 12:59, Stephen J. Turnbull wrote: > Donald Stufft writes: > > It's being solved for very specific cases by starting to have the > > standard documentation explicitly call out these defacto standards > > of the Python ecosystem where it makes sense. > > Because that's necessarily centralized, it's a solution to a different > problem. We need a decentralized approach to deal with the "people > who use package X often would benefit from Y too, but don't know where > to find Y or which implementation to use." IOW, there needs to be a > way for X to recommend implementation Z (or implementations Z1 or Z2) > of Y. https://www.djangopackages.com/ covers this well for the Django ecosystem (I actually consider it to be one of Django's killer features, and I'm pretty sure I'm not alone in that - like ReadTheDocs, it was a product of DjangoDash 2010). There was an effort a few years back to set up an instance of that for PyPI in general, as well as similar comparison sites for Pyramid and Plone, but none of them ever hit the same kind of critical mass of useful input as the Django one. The situation has changed substantially since then, though, as we've been more actively promoting pip, PyPI and third party libraries as part of the recommended Python developer experience, and the main standard library documentation now delegates to packaging.python.org for the details after very brief introductions to installing and publishing packages. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat May 23 16:41:08 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 24 May 2015 00:41:08 +1000 Subject: [Python-ideas] Cmake as build system In-Reply-To: <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com> References: <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com> Message-ID: On 23 May 2015 at 12:08, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > Sorry, meant to include this in my previous reply, but I accidentally cut > and didn't paste... > > Sent from my iPhone > > On May 22, 2015, at 17:08, Andr? Freitas wrote: > > Hi, > Thanks for sharing Ryan Gonzalez :) > > It just could be another alternative and not a replacement of autotools. > > > If the problem is that the autotools build system is a nightmare to > maintain, how is having two completely different complex build systems that > have to be kept perfectly in sync not going to be an even bigger nightmare? > Three - we already have to keep autotools and the MSVS solution in sync (except where they're deliberately different, such as always bundling OpenSSL on Windows). I don't think there's actually any active *opposition* to replacing autotools, there just aren't currently any sufficiently compelling alternatives out there to motivate someone to do all the work involved in proposing a change, working through all the build requirements across all the different redistributor channels (including the nascent iOS and Android support being pursued on mobile-sig), and figuring out how to get from point A to point B without breaking the world at any point in the process. That said, I'll admit that to someone interested in the alternatives, listing some of the problems that autotools is currently solving for us may *sound* like opposition, rather than accurately scoping out the problem requirements and the transition to be managed :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.andrefreitas at gmail.com Sun May 24 01:28:58 2015 From: p.andrefreitas at gmail.com (=?UTF-8?Q?Andr=C3=A9_Freitas?=) Date: Sat, 23 May 2015 23:28:58 +0000 Subject: [Python-ideas] Cmake as build system In-Reply-To: References: <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com> Message-ID: Hi Nick, I agree with you. You are completely right :) I am new to python mailing list and contributors. Hope to suggest more effective ideas in the future. Best regards, Andr? Freitas A s?b, 23/05/2015, 3:41 da tarde, Nick Coghlan escreveu: > On 23 May 2015 at 12:08, Andrew Barnert via Python-ideas < > python-ideas at python.org> wrote: > >> Sorry, meant to include this in my previous reply, but I accidentally cut >> and didn't paste... >> >> Sent from my iPhone >> >> On May 22, 2015, at 17:08, Andr? Freitas >> wrote: >> >> Hi, >> Thanks for sharing Ryan Gonzalez :) >> >> It just could be another alternative and not a replacement of autotools. >> >> >> If the problem is that the autotools build system is a nightmare to >> maintain, how is having two completely different complex build systems that >> have to be kept perfectly in sync not going to be an even bigger nightmare? >> > > Three - we already have to keep autotools and the MSVS solution in sync > (except where they're deliberately different, such as always bundling > OpenSSL on Windows). > > I don't think there's actually any active *opposition* to replacing > autotools, there just aren't currently any sufficiently compelling > alternatives out there to motivate someone to do all the work involved in > proposing a change, working through all the build requirements across all > the different redistributor channels (including the nascent iOS and Android > support being pursued on mobile-sig), and figuring out how to get from > point A to point B without breaking the world at any point in the process. > > That said, I'll admit that to someone interested in the alternatives, > listing some of the problems that autotools is currently solving for us may > *sound* like opposition, rather than accurately scoping out the problem > requirements and the transition to be managed :) > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Sun May 24 01:44:31 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sat, 23 May 2015 18:44:31 -0500 Subject: [Python-ideas] Cmake as build system In-Reply-To: References: <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com> Message-ID: <56E6005F-B559-4181-8512-42AA303F9C8F@gmail.com> No worries. Most of my ideas still get vetoed in 10 minutes. :) On May 23, 2015 6:28:58 PM CDT, "Andr? Freitas" wrote: >Hi Nick, >I agree with you. You are completely right :) > >I am new to python mailing list and contributors. Hope to suggest more >effective ideas in the future. > >Best regards, >Andr? Freitas > >A s?b, 23/05/2015, 3:41 da tarde, Nick Coghlan >escreveu: > >> On 23 May 2015 at 12:08, Andrew Barnert via Python-ideas < >> python-ideas at python.org> wrote: >> >>> Sorry, meant to include this in my previous reply, but I >accidentally cut >>> and didn't paste... >>> >>> Sent from my iPhone >>> >>> On May 22, 2015, at 17:08, Andr? Freitas >>> wrote: >>> >>> Hi, >>> Thanks for sharing Ryan Gonzalez :) >>> >>> It just could be another alternative and not a replacement of >autotools. >>> >>> >>> If the problem is that the autotools build system is a nightmare to >>> maintain, how is having two completely different complex build >systems that >>> have to be kept perfectly in sync not going to be an even bigger >nightmare? >>> >> >> Three - we already have to keep autotools and the MSVS solution in >sync >> (except where they're deliberately different, such as always bundling >> OpenSSL on Windows). >> >> I don't think there's actually any active *opposition* to replacing >> autotools, there just aren't currently any sufficiently compelling >> alternatives out there to motivate someone to do all the work >involved in >> proposing a change, working through all the build requirements across >all >> the different redistributor channels (including the nascent iOS and >Android >> support being pursued on mobile-sig), and figuring out how to get >from >> point A to point B without breaking the world at any point in the >process. >> >> That said, I'll admit that to someone interested in the alternatives, >> listing some of the problems that autotools is currently solving for >us may >> *sound* like opposition, rather than accurately scoping out the >problem >> requirements and the transition to be managed :) >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun May 24 02:32:18 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 24 May 2015 10:32:18 +1000 Subject: [Python-ideas] Cmake as build system In-Reply-To: <56E6005F-B559-4181-8512-42AA303F9C8F@gmail.com> References: <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com> <56E6005F-B559-4181-8512-42AA303F9C8F@gmail.com> Message-ID: On 24 May 2015 09:44, "Ryan Gonzalez" wrote: > > No worries. Most of my ideas still get vetoed in 10 minutes. :) Having a place to publish those is a big part of the reason this list exists, though. Even when we ultimately decide an idea *isn't* worth pursuing, the pay-off is having both a pool of contributors that appreciate the problems with the idea, as well as a permanent public record of the related discussion. And that's before we even get to the fact that the first step in having good ideas is simply having lots of ideas to consider for refinement. Folks that would prefer to focus their limited time on the at-least-potentially-plausible suggestions have the option of just following python-dev and skipping python-ideas entirely. (Respecting that is why we try to be fairly strict in redirecting more speculative discussions back here rather than letting them continue indefinitely on python-dev) Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmludo at gmail.com Sun May 24 13:56:43 2015 From: gmludo at gmail.com (Ludovic Gasc) Date: Sun, 24 May 2015 13:56:43 +0200 Subject: [Python-ideas] Adding jsonschema to the standard library In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Hi all, After to read all responses, I've changed my mind: At the first look, the advantage to push jsonschema into Python lib is to standardize and promote an actual good practice. But yes, you're right, it's too early to include that because the standard should be changed and/or abandonned by a new good practice, like SOAP and REST. It's more future proof to promote PyPI and pip to Python developers. Regards. -- Ludovic Gasc (GMLudo) http://www.gmludo.eu/ 2015-05-23 16:21 GMT+02:00 Nick Coghlan : > On 23 May 2015 at 12:59, Stephen J. Turnbull wrote: > > Donald Stufft writes: > > > It's being solved for very specific cases by starting to have the > > > standard documentation explicitly call out these defacto standards > > > of the Python ecosystem where it makes sense. > > > > Because that's necessarily centralized, it's a solution to a different > > problem. We need a decentralized approach to deal with the "people > > who use package X often would benefit from Y too, but don't know where > > to find Y or which implementation to use." IOW, there needs to be a > > way for X to recommend implementation Z (or implementations Z1 or Z2) > > of Y. > > https://www.djangopackages.com/ covers this well for the Django > ecosystem (I actually consider it to be one of Django's killer > features, and I'm pretty sure I'm not alone in that - like > ReadTheDocs, it was a product of DjangoDash 2010). > > There was an effort a few years back to set up an instance of that for > PyPI in general, as well as similar comparison sites for Pyramid and > Plone, but none of them ever hit the same kind of critical mass of > useful input as the Django one. > > The situation has changed substantially since then, though, as we've > been more actively promoting pip, PyPI and third party libraries as > part of the recommended Python developer experience, and the main > standard library documentation now delegates to packaging.python.org > for the details after very brief introductions to installing and > publishing packages. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmludo at gmail.com Mon May 25 00:26:33 2015 From: gmludo at gmail.com (Ludovic Gasc) Date: Mon, 25 May 2015 00:26:33 +0200 Subject: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in logging module to simplify structured logs support In-Reply-To: References: Message-ID: Hi Python-Ideas ML, To resume quickly the idea: I wish to add "extra" attribute to LogMessage, to facilitate structured logs generation. For more details with use case and example, you can read message below. Before to push the patch on bugs.python.org, I'm interested in by your opinions: the patch seems to be too simple to be honest. Regards. -- Ludovic Gasc (GMLudo) http://www.gmludo.eu/ ---------- Forwarded message ---------- From: Guido van Rossum Date: 2015-05-24 23:44 GMT+02:00 Subject: Re: [Python-Dev] An yocto change proposal in logging module to simplify structured logs support To: Ludovic Gasc Ehh, python-ideas? On Sun, May 24, 2015 at 10:22 AM, Ludovic Gasc wrote: > Hi, > > 1. The problem > > For now, when you want to write a log message, you concatenate the data > from your context to generate a string: In fact, you convert your > structured data to a string. > When a sysadmin needs to debug your logs when something is wrong, he must > write regular expressions to extract interesting data. > > Often, he must find the beginning of the interesting log and follow the > path. Sometimes, you can have several requests in the same time in the log, > it's harder to find interesting log. > In fact, with regular expressions, the sysadmin tries to convert the log > lines strings to structured data. > > 2. A possible solution > > You should provide a set of regular expressions to your sysadmins to help > them to find the right logs, however, another approach is possible: > structured logs. > Instead of to break your data structure to push in the log message, the > idea is to keep the data structure, to attach that as metadata of the log > message. > For now, I know at least Logstash and Journald that can handle structured > logs and provide a query tool to extract easily logs. > > 3. A concrete example with structured logs > > As most Web developers, we build HTTP daemons used by several different > human clients in the same time. > In the Python source code, to support structured logs, you don't have a > big change, you can use "extra" parameter for that, example: > > [handle HTTP request] > LOG.debug('Receive a create_or_update request', extra={'request_id': > request.request_id, > > 'account_id': account_id, > > 'aiohttp_request': request, > > 'payload': str(payload)}) > [create data in database] > LOG.debug('Callflow created', extra={'account_id': account_id, > 'request_id': > request.request_id, > 'aiopg_cursor': cur, > 'results': row}) > > Now, if you want, you can enhance the structured log with a custom logging > Handler, because the standard journald handler doesn't know how to handle > aiohttp_request or aiopg_cursor. > My example is based on journald, but you can write an equivalent version > with python-logstash: > #### > from systemdream.journal.handler import JournalHandler > > class Handler(JournalHandler): > # Tip: on a system without journald, use socat to test: > # socat UNIX-RECV:/run/systemd/journal/socket STDIN > def emit(self, record): > if record.extra: > # import ipdb; ipdb.set_trace() > if 'aiohttp_request' in record.extra: > record.extra['http_method'] = > record.extra['aiohttp_request'].method > record.extra['http_path'] = > record.extra['aiohttp_request'].path > record.extra['http_headers'] = > str(record.extra['aiohttp_request'].headers) > del(record.extra['aiohttp_request']) > if 'aiopg_cursor' in record.extra: > record.extra['pg_query'] = > record.extra['aiopg_cursor'].query.decode('utf-8') > record.extra['pg_status_message'] = > record.extra['aiopg_cursor'].statusmessage > record.extra['pg_rows_count'] = > record.extra['aiopg_cursor'].rowcount > del(record.extra['aiopg_cursor']) > super().emit(record) > #### > > And you can enable this custom handler in your logging config file like > this: > [handler_journald] > class=XXXXXXXXXX.utils.logs.Handler > args=() > formatter=detailed > > And now, with journalctl, you can easily extract logs, some examples: > Logs messages from 'lg' account: > journalctl ACCOUNT_ID=lg > All HTTP requests that modify the 'lg' account (PUT, POST and DELETE): > journalctl ACCOUNT_ID=lg HTTP_METHOD=PUT > HTTP_METHOD=POST HTTP_METHOD=DELETE > Retrieve all logs from one specific HTTP request: > journalctl REQUEST_ID=130b8fa0-6576-43b6-a624-4a4265a2fbdd > All HTTP requests with a specific path: > journalctl HTTP_PATH=/v1/accounts/lg/callflows > All logs of "create" function in the file "example.py" > journalctl CODE_FUNC=create CODE_FILE=/path/example.py > > If you already do a troubleshooting on a production system, you should > understand the interest of this: > In fact, it's like to have SQL queries capabilities, but it's logging > oriented. > We use that since a small time on one of our critical daemon that handles > a lot of requests across several servers, it's already adopted from our > support team. > > 4. The yocto issue with the Python logging module > > I don't explain here a small part of my professional life for my pleasure, > but to help you to understand the context and the usages, because my patch > for logging is very small. > If you're an expert of Python logging, you already know that my Handler > class example I provided above can't run on a classical Python logging, > because LogRecord doesn't have an extra attribute. > > extra parameter exists in the Logger, but, in the LogRecord, it's merged > as attributes of LogRecord: > https://github.com/python/cpython/blob/master/Lib/logging/__init__.py#L1386 > > It means, that when the LogRecord is sent to the Handler, you can't > retrieve the dict from the extra parameter of logger. > The only way to do that without to patch Python logging, is to rebuild by > yourself the dict with a list of official attributes of LogRecord, as is > done in python-logstash: > > https://github.com/vklochan/python-logstash/blob/master/logstash/formatter.py#L23 > At least to me, it's a little bit dirty. > > My quick'n'dirty patch I use for now on our CPython on production: > > diff --git a/Lib/logging/__init__.py b/Lib/logging/__init__.py > index 104b0be..30fa6ef 100644 > --- a/Lib/logging/__init__.py > +++ b/Lib/logging/__init__.py > @@ -1382,6 +1382,7 @@ class Logger(Filterer): > """ > rv = _logRecordFactory(name, level, fn, lno, msg, args, exc_info, > func, > sinfo) > + rv.extra = extra > if extra is not None: > for key in extra: > if (key in ["message", "asctime"]) or (key in > rv.__dict__): > > At least to me, it should be cleaner to add "extra" as parameter > of _logRecordFactory, but I've no idea of side effects, I understand that > logging module is critical, because it's used everywhere. > However, except with python-logstash, to my knowledge, extra parameter > isn't massively used. > The only backward incompatibility I see with a new extra attribute of > LogRecord, is that if you have a log like this: > LOG.debug('message', extra={'extra': 'example'}) > It will raise a KeyError("Attempt to overwrite 'extra' in LogRecord") > exception, but, at least to me, the probability of this use case is near to > 0. > > Instead of to "maintain" this yocto patch, even it's very small, I should > prefer to have a clean solution in Python directly. > > Thanks for your remarks. > > Regards. > -- > Ludovic Gasc (GMLudo) > http://www.gmludo.eu/ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon May 25 04:19:07 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 25 May 2015 12:19:07 +1000 Subject: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in logging module to simplify structured logs support In-Reply-To: References: Message-ID: <20150525021907.GD5663@ando.pearwood.info> On Mon, May 25, 2015 at 12:26:33AM +0200, Ludovic Gasc wrote: > Hi Python-Ideas ML, > > To resume quickly the idea: I wish to add "extra" attribute to LogMessage, > to facilitate structured logs generation. The documentation for the logging module already includes a recipe for simple structured logging: https://docs.python.org/2/howto/logging-cookbook.html#implementing-structured-logging At the other extreme, there is the structlog module: https://structlog.readthedocs.org/en/stable/ How does your change compare to those? -- Steve From rustompmody at gmail.com Mon May 25 07:06:00 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Mon, 25 May 2015 10:36:00 +0530 Subject: [Python-ideas] Framework for Python for CS101 Message-ID: Context: A bunch of my students will be working with me (if all goes according to plan!!)to hack on/in CPython sources. One of the things we would like to try is a framework for CS101 [Intro to programming] So for example beginners get knocked out by None 'disappearing' from the prompt Correctable by >>> import sys >>> sys.displayhook = print Now of course one can say: "If you want that behavior, set it as you choose" However at the stage that beginners are knocked down by such, setting up a pythonstartup file is a little premature. So the idea (inspired by Scheme's racket) is to have a sequence of 'teachpacks'. They are like concentric rings, the innermost one being the noob ring, the outermost one being standard python. Now note that while the larger changes would in general be restrictions, ie subsetting standard python, they may not be easily settable in PYTHONSTARTUP. eg sorted function and sort method confusion extend/append/etc mutable methods vs immutable '+' Now different teachers may like to navigate the world of python differently. So for example I prefer to start with the immutable (functional) subset and go on to the stateful/imperative. The point (here) is not so much which is preferable so much as this that a given teacher should have the freedom to chart out a course through python in which (s)he can cross out certain features at certain points for students. So a teacher preferring to emphasise OO/imperative over functional may prefer the opposite choice. [Aside: ACM curriculum 2013 juxtaposes OO and FP as absolute basic in core CS https://www.acm.org/education/CS2013-final-report.pdf pgs 157,158 ] So the idea is to make a framework for teachers to easily configure and select teachpacks to their taste. How does that sound? Rusi -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon May 25 10:01:10 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 25 May 2015 01:01:10 -0700 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: On May 24, 2015, at 22:06, Rustom Mody wrote: > > Context: A bunch of my students will be working with me (if all goes according to plan!!)to hack on/in CPython sources. > > One of the things we would like to try is a framework for CS101 [Intro to programming] > > So for example beginners get knocked out by None 'disappearing' from the prompt > Correctable by > > >>> import sys > >>> sys.displayhook = print > > Now of course one can say: "If you want that behavior, set it as you choose" > However at the stage that beginners are knocked down by such, setting up a pythonstartup file is a little premature. > > So the idea (inspired by Scheme's racket) is to have a sequence of 'teachpacks'. > They are like concentric rings, the innermost one being the noob ring, the outermost one being standard python. How exactly does this work? Is it basically just a custom pythonstartup file that teachers can give to their students? Maybe with some menu- or wizard-based configuration to help create the file? Or is this some different mechanism? If so, what does setting it up, and distributing it to students, look like? I realize that below you talk about doing things that are currently not easy to do in a pythonstartup, like hiding all mutating sequence methods, but presumably the patches to the interpreter core would be something like adding hide_mutating_sequence_methods() and similar functions that teachers could then choose to include in the pythonstartup file or whatever they give out. > Now note that while the larger changes would in general be restrictions, ie subsetting standard python, they may not be easily settable in PYTHONSTARTUP. > eg sorted function and sort method confusion > extend/append/etc mutable methods vs immutable '+' > > Now different teachers may like to navigate the world of python differently. > So for example I prefer to start with the immutable (functional) subset and go on to the stateful/imperative. The point (here) is not so much which is preferable so much as this that a given teacher should have the freedom to chart out a course through python in which (s)he can cross out certain features at certain points for students. So a teacher preferring to emphasise OO/imperative over functional may prefer the opposite choice. > > [Aside: ACM curriculum 2013 juxtaposes OO and FP as absolute basic in core CS > https://www.acm.org/education/CS2013-final-report.pdf > pgs 157,158 > ] > > So the idea is to make a framework for teachers to easily configure and select teachpacks to their taste. > > How does that sound? > > Rusi > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmludo at gmail.com Mon May 25 15:56:40 2015 From: gmludo at gmail.com (Ludovic Gasc) Date: Mon, 25 May 2015 15:56:40 +0200 Subject: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in logging module to simplify structured logs support In-Reply-To: <20150525021907.GD5663@ando.pearwood.info> References: <20150525021907.GD5663@ando.pearwood.info> Message-ID: 2015-05-25 4:19 GMT+02:00 Steven D'Aprano : > On Mon, May 25, 2015 at 12:26:33AM +0200, Ludovic Gasc wrote: > > Hi Python-Ideas ML, > > > > To resume quickly the idea: I wish to add "extra" attribute to > LogMessage, > > to facilitate structured logs generation. > > The documentation for the logging module already includes a recipe for > simple structured logging: > > > https://docs.python.org/2/howto/logging-cookbook.html#implementing-structured-logging If I understand correctly this recipe, it's "only" to standardize log message content => not really sysadmin friendly to be read, but the most important, you must continue to parse and construct a database of structured logs to query inside. When you have more than 400+ log messages each second on only one server, rebuild the data structure isn't a negligible cost, contrary to push directly a structured data directly on the wire, directly understandable by your structured log daemon. > > > At the other extreme, there is the structlog module: > > https://structlog.readthedocs.org/en/stable/ Thank you for the link, it's an interesting project, it's like "logging" module but on steroids, some good logging ideas inside. However, in fact, if I understand correctly, it's the same approach that the previous recipe: Generate a log file with JSON content, use logstash-forwarder to reparse the JSON content, to finally send the structure to logstash, for the query part: https://structlog.readthedocs.org/en/stable/standard-library.html#suggested-configuration > How does your change compare to those? > In the use case of structlog, drop the logstash-forwarder step to interconnect directly Python daemon with structured log daemon. Even if logstash-forwarder should be efficient, why to have an additional step to rebuild a structure you have at the beginning ? It's certainly possible to monkey patch or override the logging module to have this behaviour, nevertheless, it should be cleaner to be directly integrated in Python. Moreover, in fact, with the "extra" parameter addition, 99% of the work is already done in Python, my addition is only to keep explicit the list of metadata in the LogMessage. The nice to have, at least to me, is that extra dict should be also usable to format string message, to avoid to pass two times the same information. If I don't raise blocking remarks in this discussion, I'll send a patch on bugs.python.org. Regards. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon May 25 18:43:36 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 25 May 2015 09:43:36 -0700 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: Just a note here, that (as an intro to python teacher), I think this is a pedagogically bad idea. At least if the goal is to teach Python -- while you don't need to introduce all the complexity up front, hiding it just sends students down the wrong track. On the other hand, if you want a kind-of-like-python-but-simpler language to teach particular computer science concepts, this kind of hacking may be of value. But I don't think it would be a good idea to build that capability inot Python itself. And I think you can hack in in with monkey patching anyway -- so that's probably the way to go. for example: """So for example I prefer to start with the immutable (functional) subset""" you can certainly do that by simply using tuples and the functional tools. (OK, maybe not -- after all most (all?) of the functional stuff returns lists, not tuples, and that may be beyond monkey-patchable) But that's going to be a lot of hacking to change. Is it so bad to have them work with lists in a purely functional way? -Chris On Mon, May 25, 2015 at 1:01 AM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > On May 24, 2015, at 22:06, Rustom Mody wrote: > > Context: A bunch of my students will be working with me (if all goes > according to plan!!)to hack on/in CPython sources. > > One of the things we would like to try is a framework for CS101 [Intro to > programming] > > So for example beginners get knocked out by None 'disappearing' from the > prompt > Correctable by > > >>> import sys > >>> sys.displayhook = print > > Now of course one can say: "If you want that behavior, set it as you > choose" > However at the stage that beginners are knocked down by such, setting up a > pythonstartup file is a little premature. > > So the idea (inspired by Scheme's racket) is to have a sequence of > 'teachpacks'. > They are like concentric rings, the innermost one being the noob ring, the > outermost one being standard python. > > > How exactly does this work? Is it basically just a custom pythonstartup > file that teachers can give to their students? Maybe with some menu- or > wizard-based configuration to help create the file? Or is this some > different mechanism? If so, what does setting it up, and distributing it to > students, look like? > > I realize that below you talk about doing things that are currently not > easy to do in a pythonstartup, like hiding all mutating sequence methods, > but presumably the patches to the interpreter core would be something like > adding hide_mutating_sequence_methods() and similar functions that teachers > could then choose to include in the pythonstartup file or whatever they > give out. > > Now note that while the larger changes would in general be restrictions, > ie subsetting standard python, they may not be easily settable in > PYTHONSTARTUP. > eg sorted function and sort method confusion > extend/append/etc mutable methods vs immutable '+' > > Now different teachers may like to navigate the world of python > differently. > So for example I prefer to start with the immutable (functional) subset > and go on to the stateful/imperative. The point (here) is not so much > which is preferable so much as this that a given teacher should have the > freedom to chart out a course through python in which (s)he can cross out > certain features at certain points for students. So a teacher preferring > to emphasise OO/imperative over functional may prefer the opposite choice. > > [Aside: ACM curriculum 2013 juxtaposes OO and FP as absolute basic in > core CS > https://www.acm.org/education/CS2013-final-report.pdf > pgs 157,158 > ] > > So the idea is to make a framework for teachers to easily configure and > select teachpacks to their taste. > > How does that sound? > > Rusi > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From rustompmody at gmail.com Mon May 25 14:11:19 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Mon, 25 May 2015 05:11:19 -0700 (PDT) Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com> On Monday, May 25, 2015 at 1:31:58 PM UTC+5:30, Andrew Barnert via Python-ideas wrote: > > On May 24, 2015, at 22:06, Rustom Mody > > wrote: > > Context: A bunch of my students will be working with me (if all goes > according to plan!!)to hack on/in CPython sources. > > One of the things we would like to try is a framework for CS101 [Intro to > programming] > > So for example beginners get knocked out by None 'disappearing' from the > prompt > Correctable by > > >>> import sys > >>> sys.displayhook = print > > Now of course one can say: "If you want that behavior, set it as you > choose" > However at the stage that beginners are knocked down by such, setting up a > pythonstartup file is a little premature. > > So the idea (inspired by Scheme's racket) is to have a sequence of > 'teachpacks'. > They are like concentric rings, the innermost one being the noob ring, the > outermost one being standard python. > > > How exactly does this work? Is it basically just a custom pythonstartup > file that teachers can give to their students? Maybe with some menu- or > wizard-based configuration to help create the file? Or is this some > different mechanism? If so, what does setting it up, and distributing it to > students, look like? > Frankly Ive not thought through these details in detail(!) > I realize that below you talk about doing things that are currently not > easy to do in a pythonstartup, like hiding all mutating sequence methods, > but presumably the patches to the interpreter core would be something like > adding hide_mutating_sequence_methods() and similar functions that teachers > could then choose to include in the pythonstartup file or whatever they > give out. > > I personally would wish for other minor surgeries eg a different keyword from 'def' for generators. >From the pov of an experienced programmer the mental load of one keyword for two disparate purposes is easy enough to handle and the language clutter from an extra keyword is probably just not worth it. However from having taught python for 10+ years I can say this 'overloading' causes endless grief and slowdown of beginners. Then there is even more wishful thinking changes -- distinguishing procedure from function. After 30 years of Lisp and ML and ... and Haskell and square-peg-into-round-holing these into python, Ive come to the conclusion that Pascal got this distinction more right than all these. However I expect this surgery to be more invasive and pervasive than I can handle with my (current) resources. etc etc In short I am talking of a language that is morally equivalent to python but cosmetically different and is designed to be conducive to learning programming -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Mon May 25 20:17:32 2015 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 25 May 2015 13:17:32 -0500 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: On Mon, May 25, 2015 at 12:06 AM, Rustom Mody wrote: > Context: A bunch of my students will be working with me (if all goes > according to plan!!)to hack on/in CPython sources. > > One of the things we would like to try is a framework for CS101 [Intro to > programming] > You said framework, and I thought 'web framework' and 'testing': * Bottle is great; simple; single file; and WSGI-compatible * TDD! from the start! https://westurner.org/wiki/awesome-python-testing#web-frameworks > > So for example beginners get knocked out by None 'disappearing' from the > prompt > Correctable by > > >>> import sys > >>> sys.displayhook = print > > Now of course one can say: "If you want that behavior, set it as you > choose" > However at the stage that beginners are knocked down by such, setting up a > pythonstartup file is a little premature. > > So the idea (inspired by Scheme's racket) is to have a sequence of > 'teachpacks'. > They are like concentric rings, the innermost one being the noob ring, the > outermost one being standard python. > In terms of a curricula graph, are they flat or nested dependencies? > > Now note that while the larger changes would in general be restrictions, > ie subsetting standard python, they may not be easily settable in > PYTHONSTARTUP. > eg sorted function and sort method confusion > extend/append/etc mutable methods vs immutable '+' > I add tab-completion in ~/.pythonrc (and little more; so that my scripts work without trying to remember imports etc) (see: gh:westurner/dotfiles/etc/.pythonrc; symlinked in by gh:westurner/dotfiles/scripts/bootstrap_dotfiles.sh). dotfiles.venv.ipython_config.py (and conda) make navigating VIRTUAL_ENVs (that are/can be isolated from concurrent changes in system packages) easier for me. Depending on your setup, managing that many envs is probably easier with conda and/or Docker. * https://github.com/ipython/ipython/wiki/Install:-Docker > Now different teachers may like to navigate the world of python > differently. > So for example I prefer to start with the immutable (functional) subset > and go on to the stateful/imperative. The point (here) is not so much > which is preferable so much as this that a given teacher should have the > freedom to chart out a course through python in which (s)he can cross out > certain features at certain points for students. So a teacher preferring > to emphasise OO/imperative over functional may prefer the opposite choice. > * https://www.reddit.com/r/learnpython/wiki * https://www.reddit.com/r/learnpython/wiki/books * https://github.com/scipy-lectures/scipy-lecture-notes (Sphinx) * https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks * https://github.com/jrjohansson/scientific-python-lectures/ (IPython) There was also talk of generating EdX courses from IPython notebooks over on ipython-dev: http://mail.scipy.org/pipermail/ipython-dev/2015-February/015911.html > [Aside: ACM curriculum 2013 juxtaposes OO and FP as absolute basic in > core CS > https://www.acm.org/education/CS2013-final-report.pdf > pgs 157,158 > ] > > So the idea is to make a framework for teachers to easily configure and > select teachpacks to their taste. > > How does that sound? > > Rusi > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Mon May 25 20:33:41 2015 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 25 May 2015 13:33:41 -0500 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com> References: <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com> Message-ID: On Mon, May 25, 2015 at 7:11 AM, Rustom Mody wrote: > > On Monday, May 25, 2015 at 1:31:58 PM UTC+5:30, Andrew Barnert via > Python-ideas wrote: >> >> On May 24, 2015, at 22:06, Rustom Mody wrote: >> >> Context: A bunch of my students will be working with me (if all goes >> according to plan!!)to hack on/in CPython sources. >> >> One of the things we would like to try is a framework for CS101 [Intro to >> programming] >> >> So for example beginners get knocked out by None 'disappearing' from the >> prompt >> Correctable by >> >> >>> import sys >> >>> sys.displayhook = print >> >> Now of course one can say: "If you want that behavior, set it as you >> choose" >> However at the stage that beginners are knocked down by such, setting up >> a pythonstartup file is a little premature. >> >> So the idea (inspired by Scheme's racket) is to have a sequence of >> 'teachpacks'. >> They are like concentric rings, the innermost one being the noob ring, >> the outermost one being standard python. >> >> >> How exactly does this work? Is it basically just a custom pythonstartup >> file that teachers can give to their students? Maybe with some menu- or >> wizard-based configuration to help create the file? Or is this some >> different mechanism? If so, what does setting it up, and distributing it to >> students, look like? >> > > Frankly Ive not thought through these details in detail(!) > > >> I realize that below you talk about doing things that are currently not >> easy to do in a pythonstartup, like hiding all mutating sequence methods, >> but presumably the patches to the interpreter core would be something like >> adding hide_mutating_sequence_methods() and similar functions that teachers >> could then choose to include in the pythonstartup file or whatever they >> give out. >> >> > I personally would wish for other minor surgeries eg a different keyword > from 'def' for generators. > From the pov of an experienced programmer the mental load of one keyword > for two disparate purposes is easy enough to handle and the language > clutter from an extra keyword is probably just not worth it. > However from having taught python for 10+ years I can say this > 'overloading' causes endless grief and slowdown of beginners. > * https://docs.python.org/2/library/tokenize.html * https://hg.python.org/cpython/file/2.7/Lib/tokenize.py * https://hg.python.org/cpython/file/tip/Grammar/Grammar * https://www.youtube.com/watch?v=R31NRWgoIWM&index=9&list=PLt_DvKGJ_QLZd6Gpug-6x4eYoHPy4q_kb * https://docs.python.org/devguide/compiler.html * https://docs.python.org/2/library/compiler.html I identify functions that are generators by the 'yield' (and 'yield from') tokens. I document functions that yield: def generating_function(n): """Generate a sequence # numpy style :returns: (1,2,..,n) :rtype: generator (int) # google-style Yields: int: (1,2,n) """ returns (x for x in range(n)) > Then there is even more wishful thinking changes -- distinguishing > procedure from function. > After 30 years of Lisp and ML and ... and Haskell and > square-peg-into-round-holing these into python, Ive come to the conclusion > that Pascal got this distinction more right than all these. However I > expect this surgery to be more invasive and pervasive than I can handle > with my (current) resources. > @staticmethod, @classmethod, @property decorators > etc > etc > In short I am talking of a language that is morally equivalent to python > but cosmetically different and is designed to be conducive to learning > programming > I suppose you could fork to teach; but [...]. You might check out http://pythontutor.com/ and/or http://www.brython.info/ (Python, JS, and compilation). > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokoproject at gmail.com Mon May 25 21:48:18 2015 From: gokoproject at gmail.com (John Wong) Date: Mon, 25 May 2015 15:48:18 -0400 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com> Message-ID: The title is very catchy :-) I am genuinely interested in improving CS education, both from the perspective of a recent grad and a learner in general (who isn't?). But I think we all do appreciate every effort this community put together as a whole. > One of the things we would like to try is a framework for CS101 [Intro to programming] I am still unsure what kind of framework we are proposing. What is the exact goal? Is it just teaching more FP? Benefit? What are the user stories? Maybe this is not something people would integrate into CPython, and probably something you'd fork from CPython. >From my experience, the hard thing about learning a new programming language is not always about undefined behaviors, under-documented APIs or confusing syntax, but sometimes the lack of documentation demonstrating how to effectively use the language and leverage community tools frustrate people. Python community is extremely supportive and there is so many info out there that a search would yield some useful answer right away. For some other languages that is not that case. But Python can do better, and I feel this is the issue with learning a language more effective. Teaching syntax, or understand how to model your problem before you crack a solution are secondary IMO. I as a learner is far more interested in "how do I accomplish XYZ" and then I will get curious about how one come to a particular solution. Similarly, when I see beautiful Vim setup I would go look for "how to setup my Vim like this." I say all the above because, IMO, OO vs FP, Scheme vs Python vs Java vs C is really not an interesting debate. I am biased because the first language I was taught officially in my undergraduate career was Python (but I knew basic programming well before that). I appreciate the expressive of FP, and how FP "supposedly" help reason your solution, much closer how you would write a proof. But with all due respect, I don't think FP vs OO is really the problem in CS 101, or just about any one learning a new language. Teachers (including TAs) would have to spend time to get a Python setup working on each student's workstation, or troubleshoot environmental issue (to be fair, this is part of learning dealing with different platform, different toolsets). Hence I love projects which aim to reduce administrative tasks in classroom (ipython notebook, online compiler/interpreter etc). Or maybe more verbose warning for beginners? For example, there was a PEP (???) to add a warning when someone type (print "hello world") in Python 3 instead of showing invalid syntax. That can be extremely helpful for beginners and this is something worth thinking. For example, certain errors / deprecations could show "refer to this awesome PEP, or refer to this interesting discussion, or refer to this really well-written blogpost - something python core-dev agrees with." Thanks. John On Mon, May 25, 2015 at 2:33 PM, Wes Turner wrote: > > > On Mon, May 25, 2015 at 7:11 AM, Rustom Mody > wrote: > >> >> On Monday, May 25, 2015 at 1:31:58 PM UTC+5:30, Andrew Barnert via >> Python-ideas wrote: >>> >>> On May 24, 2015, at 22:06, Rustom Mody wrote: >>> >>> Context: A bunch of my students will be working with me (if all goes >>> according to plan!!)to hack on/in CPython sources. >>> >>> One of the things we would like to try is a framework for CS101 [Intro >>> to programming] >>> >>> So for example beginners get knocked out by None 'disappearing' from the >>> prompt >>> Correctable by >>> >>> >>> import sys >>> >>> sys.displayhook = print >>> >>> Now of course one can say: "If you want that behavior, set it as you >>> choose" >>> However at the stage that beginners are knocked down by such, setting up >>> a pythonstartup file is a little premature. >>> >>> So the idea (inspired by Scheme's racket) is to have a sequence of >>> 'teachpacks'. >>> They are like concentric rings, the innermost one being the noob ring, >>> the outermost one being standard python. >>> >>> >>> How exactly does this work? Is it basically just a custom pythonstartup >>> file that teachers can give to their students? Maybe with some menu- or >>> wizard-based configuration to help create the file? Or is this some >>> different mechanism? If so, what does setting it up, and distributing it to >>> students, look like? >>> >> >> Frankly Ive not thought through these details in detail(!) >> >> >>> I realize that below you talk about doing things that are currently not >>> easy to do in a pythonstartup, like hiding all mutating sequence methods, >>> but presumably the patches to the interpreter core would be something like >>> adding hide_mutating_sequence_methods() and similar functions that teachers >>> could then choose to include in the pythonstartup file or whatever they >>> give out. >>> >>> >> I personally would wish for other minor surgeries eg a different keyword >> from 'def' for generators. >> From the pov of an experienced programmer the mental load of one keyword >> for two disparate purposes is easy enough to handle and the language >> clutter from an extra keyword is probably just not worth it. >> However from having taught python for 10+ years I can say this >> 'overloading' causes endless grief and slowdown of beginners. >> > > * https://docs.python.org/2/library/tokenize.html > * https://hg.python.org/cpython/file/2.7/Lib/tokenize.py > * https://hg.python.org/cpython/file/tip/Grammar/Grammar > * > https://www.youtube.com/watch?v=R31NRWgoIWM&index=9&list=PLt_DvKGJ_QLZd6Gpug-6x4eYoHPy4q_kb > > * https://docs.python.org/devguide/compiler.html > * https://docs.python.org/2/library/compiler.html > > I identify functions that are generators by the 'yield' (and 'yield from') > tokens. > > I document functions that yield: > > def generating_function(n): > """Generate a sequence > # numpy style > :returns: (1,2,..,n) > :rtype: generator (int) > > # google-style > Yields: > int: (1,2,n) > > """ > returns (x for x in range(n)) > > > > >> Then there is even more wishful thinking changes -- distinguishing >> procedure from function. >> After 30 years of Lisp and ML and ... and Haskell and >> square-peg-into-round-holing these into python, Ive come to the conclusion >> that Pascal got this distinction more right than all these. However I >> expect this surgery to be more invasive and pervasive than I can handle >> with my (current) resources. >> > > @staticmethod, @classmethod, @property decorators > > >> etc >> etc >> In short I am talking of a language that is morally equivalent to python >> but cosmetically different and is designed to be conducive to learning >> programming >> > > I suppose you could fork to teach; but [...]. > > You might check out http://pythontutor.com/ and/or > http://www.brython.info/ (Python, JS, and compilation). > > >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon May 25 22:08:46 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 25 May 2015 20:08:46 +0000 (UTC) Subject: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in logging module to simplify structured logs support In-Reply-To: References: Message-ID: <1110534133.1837705.1432584526401.JavaMail.yahoo@mail.yahoo.com> On Monday, May 25, 2015 6:57 AM, Ludovic Gasc wrote: >2015-05-25 4:19 GMT+02:00 Steven D'Aprano : >>At the other extreme, there is the structlog module: >> >>https://structlog.readthedocs.org/en/stable/ > >Thank you for the link, it's an interesting project, it's like "logging" module but on steroids, some good logging ideas inside. >However, in fact, if I understand correctly, it's the same approach that the previous recipe: Generate a log file with JSON content, use logstash-forwarder to reparse the JSON content, to finally send the structure to logstash, for the query part: https://structlog.readthedocs.org/en/stable/standard-library.html#suggested-configuration >>How does your change compare to those? >> > > >In the use case of structlog, drop the logstash-forwarder step to interconnect directly Python daemon with structured log daemon. >Even if logstash-forwarder should be efficient, why to have an additional step to rebuild a structure you have at the beginning ? You can't send a Python dictionary over the wire, or store a Python dictionary in a database. You need to encode it to some transmission and/or storage format; there's no way around that. And what's wrong with using JSON as that format? More importantly, when you drop logstash-forwarder, how are you intending to get the messages to the upstream server? You don't want to make your log calls synchronously wait for acknowledgement before returning. So you need some kind of buffering. And just buffering in memory doesn't work: if your service shuts down unexpectedly, you've lost the last batch of log messages which would tell you why it went down (plus, if the network goes down temporarily, your memory use becomes unbounded). You can of course buffer to disk, but then you've just reintroduced the same need for some kind of intermediate storage format you were trying to eliminate?and it doesn't really solve the problem, because if your service shuts down, the last messages won't get sent until it starts up again. So you could write a separate simple store-and-forward daemon that either reads those file buffers or listens on localhost UDP? but then you've just recreated logstash-forwarder. And even if you wanted to do all that, I don't see why you couldn't do it all with structlog. They recommend using an already-working workflow instead of designing a different one from scratch, but it's just a recommendation. From abarnert at yahoo.com Mon May 25 23:01:23 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 25 May 2015 21:01:23 +0000 (UTC) Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com> References: <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com> Message-ID: <556623157.1837196.1432587683128.JavaMail.yahoo@mail.yahoo.com> On Monday, May 25, 2015 5:11 AM, Rustom Mody wrote: >On Monday, May 25, 2015 at 1:31:58 PM UTC+5:30, Andrew Barnert via Python-ideas wrote: >On May 24, 2015, at 22:06, Rustom Mody wrote: >> >> >>Context: A bunch of my students will be working with me (if all goes according to plan!!)to hack on/in CPython sources. >>> >>>One of the things we would like to try is a framework for CS101 [Intro to programming] >>> >>>So for example beginners get knocked out by None 'disappearing' from the prompt >>>Correctable by >>> >>>>>> import sys >>>>>> sys.displayhook = print >>> >>>Now of course one can say: "If you want that behavior, set it as you choose" >>>However at the stage that beginners are knocked down by such, setting up a pythonstartup file is a little premature. >>> >>>So the idea (inspired by Scheme's racket) is to have a sequence of 'teachpacks'. >>>They are like concentric rings, the innermost one being the noob ring, the outermost one being standard python. >>> >> >>How exactly does this work? Is it basically just a custom pythonstartup file that teachers can give to their students? Maybe with some menu- or wizard-based configuration to help create the file? Or is this some different mechanism? If so, what does setting it up, and distributing it to students, look like? > >Frankly Ive not thought through these details in detail(!) OK, but have you thought through them at all? Or, if not, are you willing to? Without some idea of what the intended interface for teachers and students is, it's going to be very hard to think through how anything else works. For example, if you have a set of special functions (maybe in a "teachpack" stdlib module) that disable and enable different things in the current Python session, then building a teachpack is just a matter of writing (or GUI-generating) a pythonstartup file with a few function calls, and distributing it to students is just a matter of telling them how to download it and set up the PYTHONSTARTUP environment variable, which seems reasonable. But of course that limits what you can do in these teachpacks to the kinds of things you could change dynamically at runtime. If, on the other hand, each "surgery" is a patch to CPython, there's no limit to what you can change, but assembling a teachpack is a matter of assembling and applying patches (and hoping they don't conflict), and building CPython for every platform any of the students might use, and then distributing it requires telling them to download an installer and explaining how to make sure they never accidentally run the system Python instead of your build, which doesn't seem reasonable. >>I realize that below you talk about doing things that are currently not easy to do in a pythonstartup, like hiding all mutating sequence methods, but presumably the patches to the interpreter core would be something like adding hide_mutating_sequence_ methods() and similar functions that teachers could then choose to include in the pythonstartup file or whatever they give out. > >I personally would wish for other minor surgeries eg a different keyword from 'def' for generators. >From the pov of an experienced programmer the mental load of one keyword for two disparate purposes is easy enough to handle and the language clutter from an extra keyword is probably just not worth it. >However from having taught python for 10+ years I can say this 'overloading' causes endless grief and slowdown of beginners. >Then there is even more wishful thinking changes -- distinguishing procedure from function. >After 30 years of Lisp and ML and ... and Haskell and square-peg-into-round-holing these into python, Ive come to the conclusion that Pascal got this distinction more right than all these. However I expect this surgery to be more invasive and pervasive than I can handle with my (current) resources. That depends. If you want them to actually be different under the covers, maybe. But if you only care what they look like at the language level, that should be almost as easy as the previous idea. A "defproc" introduces a function definition with an extra flag set; it's an error to use a return statement with a value in any definition being compiled with that flag; there's your definition side. If you want a separate procedure call statement, it's compiled to, in essence, a check that the procedure flag is set on the function's code object, a normal function call, and popping the useless None off the stack, while the function call expression just needs to add a check that the procedure flag is clear. >etc >etc >In short I am talking of a language that is morally equivalent to python but cosmetically different and is designed to be conducive to learning programming I'm not sure a language that doesn't have any mutating methods, distinguishes procedures from functions, etc. is actually morally equivalent to Python. And this is also straying very far from the original idea of a restricted subset of Python. Adding new syntax and semantics to define explicit generators, or to define and call procedures, is not a subset. And it's also very different from your original analogy to Racket teachpacks. The idea behind Racket is that Scheme is a language whose implementation is dead-simple, and designed from the start to be extensible at every level, from the syntax up, so almost anything can be done in a library. So, you can start with almost no core stdlib, and then a teachpack is just a handful of stdlib-style functions to do the things the students haven't yet learned how to do (or, sometimes, to standardize what the students did in a previous exercise). If you really wanted to do this, I think the first step would have to be transforming Python into an extensible language (or transforming CPython or another implementation into a more general implementation) in the same sense as Scheme. Maybe Python plus macros and read-macros would be sufficient for that, but I'm not sure it would be, and, even if it were, it sounds like a much bigger project than you're envisioning. And honestly, I think it would be less work to design a new language that's effectively Python-esque m-expressions on top of a Scheme core. Since it's "only a toy" language for teaching", you don't need to worry about all kinds of issues that a real language like Python needs to deal with, like making iteration over a zillion items efficient, or having a nice C API, or exposing even the most advanced functionality in an easy-to-use (and easy-to-hook) way. From abarnert at yahoo.com Mon May 25 23:13:53 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 25 May 2015 21:13:53 +0000 (UTC) Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: <974479162.1867210.1432588433071.JavaMail.yahoo@mail.yahoo.com> On Monday, May 25, 2015 10:50 AM, Rustom Mody wrote: >About programming pedagogy: > >| Rob Hagan at Monash had shown that you could teach students more COBOL with one semester of Scheme and one semester of COBOL than you >| could with three semesters of COBOL OK, fine. But what can you take away from that? It may just be that COBOL is hard to teach. Is the same thing true of Python? If not, this is irrelevant. Or it may be that teaching two very different languages is a useful thing to do. In that case, this is relevant, but it doesn't seem likely that two similar dialects of the same language would be sufficient to get the same benefit. Maybe with a language that can be radically reconfigured like Oz (which you can switch from having Python-style variables to C-style variables to Prolog-style variables) it would work, but even that's little more than a guess. >from https://groups.google.com/d/msg/erlang-programming/5X1irAmLMD8/qCQJ11Y5jEAJ > >No this is not about 'pro-scheme' but about 'pro-learning-curve' >I dont believe we should be teaching python (or C++ or Java or Haskell or...) but programming. >[I started my last programming paradigms with python course with the koan: >You cannot do programming without syntax >Syntax is irrelevant to programming >So what is relevant? >] I don't think syntax _is_ irrelevant to programming. I think that's a large part of the reason for using Python: it makes the flow of the program visually graspable, it has constructs that read like English, it avoids many ambiguities or near-ambiguities that you'd otherwise have to stop and think through, it has strongly-reinforced idioms for complex patterns that people can recognize at a glance, etc. And, in a very different way, syntax (at a slightly higher level than the actual s-expression syntax) is also a large part of the reason for using Lisp, in a very different way: half of writing an application in Lisp is in essence programming the language constructs to make your application easier to write. Besides, if syntax were irrelevant, why would you care about the same keyword for defining regular functions and generator functions, the same expressions for calling functions and procedures, etc.? That's just syntax. From tjreedy at udel.edu Mon May 25 23:52:45 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 25 May 2015 17:52:45 -0400 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com> References: <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com> Message-ID: On 5/25/2015 8:11 AM, Rustom Mody wrote: > I personally would wish for other minor surgeries eg a different keyword > from 'def' for generators. 'Def' is for generator *functions*. Guido notwithstanding, overloading 'generator' to mean both a subcategory of function and the non-function iterators they produce leads to confusion. The only structural difference between a normal function and generator function is a flag bit in the associated code object. In 3.5, non-buggy generator functions must exit with explicit or implicit 'return', just as with other normal functions other than .__next__ methods. Allowing generator functions to exit with StopIteration slightly confused them with iterator .__next__ methods. > From the pov of an experienced programmer the mental load of one > keyword for two disparate purposes The single purpose is to define a function object, with a defined set of attributes, that one may call. is easy enough to handle and the > language clutter from an extra keyword is probably just not worth it. An extra keyword 'async' is being added for coroutine functions, resulting I believe in 'async def'. But I also believe this is not the only usage of 'async', while it would be the only usage of a 'gen' prefix. > However from having taught python for 10+ years I can say this > 'overloading' causes endless grief and slowdown of beginners. I think part of the grief is overloading 'generator' to mean both a non-iterable function and an iterable non-function. To really understand generators and generator functions, I think one needs to understand what an iterator class looks like, with a combination of boilerplate and custom code in .__init__, .__iter__, and .__next__. The generator as iterator has the boilerplate code, while the generator function has the needed custom code in the .__init__ and .__next__ methods combined in one function body. For this purpose, assignments between local and self attribute namespaces, which one might call 'custom boilerplate' are not needed and disappear. One may think of a generator function as defining a subclass of the generator class. -- Terry Jan Reedy From shoyer at gmail.com Tue May 26 01:38:20 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 25 May 2015 16:38:20 -0700 Subject: [Python-ideas] The pipe protocol, a convention for extensible method chaining Message-ID: In the PyData community, we really like method chaining for data analysis pipelines: (iris.query('SepalLength > 5') .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength, PetalRatio = lambda x: x.PetalWidth / x.PetalLength) .plot(kind='scatter', x='SepalRatio', y='PetalRatio')) Unfortunately, method chaining isn't very extensible -- short of monkey patching, every method we want to use has exist on the original object. If a user wants to supply their own plotting function, they can't use method chaining anymore. You may recall that we brought this up a few months ago on python-ideas as an example of why we would like macros. To get around this issue, we are contemplating adding a pipe method to pandas DataFrames. It looks like this: def pipe(self, func, *args, **kwargs): pipe_func = getattr(func, '__pipe_func__', func) return pipe_func(self, *args, **kwargs) We would encourage third party libraries with objects on which method chaining is useful to define a pipe method in the same way. The main idea here is to create an easy way for users to do method chaining with their own functions and with functions from third party libraries. The business with __pipe_func__ is more magical, and frankly we aren't sure it's worth the complexity. The idea is to create a "pipe protocol" that allows functions to decide how they are called when piped. This is useful in some cases, because it doesn't always make sense for functions that act on piped data to accept that data as their first argument. For more motivation and examples, please read the opening post in this GitHub issue: https://github.com/pydata/pandas/issues/10129 Obviously, this sort of protocol would not be an official part of the Python language. But because we are considering creating a de-facto standard, we would love to get feedback from other Python communities that use method chaining: 1. Have you encountered or addressed the problem of extensible method chaining? 2. Would this pipe protocol be useful to you? 3. Is it worth allowing piped functions to override how they are called by defining something like __pipe_func__? Note that I'm not particularly interested in feedback about how we shouldn't be defining double underscore methods. There are other ways we could spell __pipe_func__, but double underscores seems to be pretty standard for ad-hoc protocols. Thanks for your attention. Best, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue May 26 04:21:43 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 25 May 2015 19:21:43 -0700 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: On Mon, May 25, 2015 at 10:50 AM, Rustom Mody wrote: > About programming pedagogy: > > | Rob Hagan at Monash had shown that you could teach students more COBOL > with one semester of Scheme and one semester of COBOL than you > | could with three semesters of COBOL > I've seen similar claims with Java and Python in place of COBOL and Scheme. My thoughts on that are that Python already has little of the cruft that isn't really about programming. But it sounds to me like you aren't so much simplifying the language as hiding parts of it, which I'm not sure buys you much. > You cannot do programming without syntax > Syntax is irrelevant to programming > So what is relevant? > :-) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue May 26 04:54:55 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 26 May 2015 12:54:55 +1000 Subject: [Python-ideas] The pipe protocol, a convention for extensible method chaining In-Reply-To: References: Message-ID: <20150526025455.GG5663@ando.pearwood.info> On Mon, May 25, 2015 at 04:38:20PM -0700, Stephan Hoyer wrote: > In the PyData community, we really like method chaining for data analysis > pipelines: > > (iris.query('SepalLength > 5') > .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength, > PetalRatio = lambda x: x.PetalWidth / x.PetalLength) > .plot(kind='scatter', x='SepalRatio', y='PetalRatio')) > > > Unfortunately, method chaining isn't very extensible -- short of monkey > patching, every method we want to use has exist on the original object. If > a user wants to supply their own plotting function, they can't use method > chaining anymore. It's not really *method* chaining any more if they do that :-) > You may recall that we brought this up a few months ago on python-ideas as > an example of why we would like macros. > > To get around this issue, we are contemplating adding a pipe method to > pandas DataFrames. It looks like this: > > def pipe(self, func, *args, **kwargs): > pipe_func = getattr(func, '__pipe_func__', func) > return pipe_func(self, *args, **kwargs) Are you sure this actually works in practice? Since pipe() returns the result of calling the passed in function, not the dataframe, it seems to me that you can't actually chain this unless it's the last call in the chain. This should work: (iris.query('SepalLength > 5') .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength, PetalRatio = lambda x: x.PetalWidth / x.PetalLength) .pipe(myplot, kind='scatter', x='SepalRatio', y='PetalRatio') ) but I don't think this will work: (iris.query('SepalLength > 5') .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength, PetalRatio = lambda x: x.PetalWidth / x.PetalLength) .pipe(myexport, spam=True, eggs=False) .plot(kind='scatter', x='SepalRatio', y='PetalRatio') ) That makes it somewhat less of a general purpose pipelining method and more of a special case "replace the plotter with a different plotter" helper method. And for that special case, I'd prefer to give the plot method an extra argument, which if not None, is a function to delegate to: .plot(kind='scatter', x='SepalRatio', y='PetalRatio', plotter=myplot) What's the point of the redirection to __pipe_func__? Under what circumstances would somebody use __pipe_func__ instead of just passing a callable (a function or other object with __call__ method)? If you don't have a good use case for it, then "You Ain't Gonna Need It" applies. I think that is completely unnecessary. (It also abuses a reserved namespace, but you've already said you don't care about that.) Instead of passing: .pipe(myobject, args) # myobject has a __pipe_func__ method just make it explicit and write: .pipe(myobject.some_method, args) And for what it's worth, apart from the dunder issue, I think it's silly to have a *method* called "*_func__". > The business with __pipe_func__ is more magical, and frankly we aren't sure > it's worth the complexity. The idea is to create a "pipe protocol" that > allows functions to decide how they are called when piped. This is useful > in some cases, because it doesn't always make sense for functions that act > on piped data to accept that data as their first argument. Just use a wrapper function that reorders the arguments. If the reordering is simple enough, you can do it in place with a lambda: .pipe(lambda *args, **kwargs: myplot(args[1], args[0], *args[2:])) > Obviously, this sort of protocol would not be an official part of the > Python language. But because we are considering creating a de-facto > standard, we would love to get feedback from other Python communities that > use method chaining: Because you are considering creating a de-facto standard, I think it is especially rude to trespass on the reserved dunder namespace. (Unless, of course, the core developers decide that they don't mind.) > 1. Have you encountered or addressed the problem of extensible method > chaining? Yes. I love chaining in, say, bash, and it works well in Ruby, but it's less useful in Python. My attempt to help bring chaining to Python is here http://code.activestate.com/recipes/578770-method-chaining/ but it relies on methods operating by side-effect, not returning a new result. But generally speaking, I don't like methods that operate by side-effect, so I don't use chaining much in practice. I'm always on the look-out for opportunities where it makes sense though. > 2. Would this pipe protocol be useful to you? I don't think so. > 3. Is it worth allowing piped functions to override how they are called by > defining something like __pipe_func__? No, I think it is completely unnecessary. -- Steve From rustompmody at gmail.com Tue May 26 04:56:36 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Mon, 25 May 2015 19:56:36 -0700 (PDT) Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: On Tuesday, May 26, 2015 at 7:58:31 AM UTC+5:30, Chris Barker wrote: > > > But it sounds to me like you aren't so much simplifying the language as > hiding parts of it, which I'm not sure buys you much. > Learners learn sequentially -- probably a more fundamental law than 'computer compute sequentially'. If there are significant tracts of the language that are out of one's (current) understanding and not arising to confuse the noob, thats ok. But when they arise and the learner does not have the intellectual equipment to deal with that it just slows down learning. Take the print statment/function. It would be rather ridiculous to remove the print from a realistic language. However if you've taught enough beginners you'd know how hard it is to get beginners to write ... return as against ... print () And so in an early teachpack, I'd disable the print statement. This of course means that at that level the student is bound to trying out python at the interactive interpreter. Some people think that that renders the language ridiculously impotent. My experience suggests that if this guide-rail were available, beginners would get key beginner-stuff eg - writing structured code - defining, passing, using, visualizing suitable data structures much faster. So... FOR THE BEGINNER: "cutting out" == "simplifying" -------------- next part -------------- An HTML attachment was scrubbed... URL: From liik.joonas at gmail.com Tue May 26 05:22:49 2015 From: liik.joonas at gmail.com (Joonas Liik) Date: Tue, 26 May 2015 06:22:49 +0300 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: I'm not sure how good the analogy is.. but i've just taken a certain course with such a "simplified" language. (speaking about the SQL here, of course you can get around that if you do things in VBA.. IMO thats not really an improvement tho) MS Access felt really impotent, and since you often stumble on SQLServer docs MS Access often feels broken when you try to use some of those and it doesn't work. ..and that happened like lots of times (double digits..) If you omit basic features that people will come to expect based on readily available documentation you will only breed resentment. I'm afraid that all you will achieve with your good intentions is scare newcomers away from python :( -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue May 26 05:50:48 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 26 May 2015 13:50:48 +1000 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: <20150526035048.GJ5663@ando.pearwood.info> On Mon, May 25, 2015 at 10:36:00AM +0530, Rustom Mody wrote: > Context: A bunch of my students will be working with me (if all goes > according to plan!!)to hack on/in CPython sources. I'm sorry, I see a serious disconnect between what you are trying to do (hack on CPython sources) and your students (beginners so early in the learning process that they are confused by the fact that None doesn't print in the interactive interpreter). How on earth do you expect that students who cannot even cope with "disappearing None" will deal with hacking the CPython source code? I assume you don't mean the C code of the interpreter itself, but the standard library. Even so, the standard library has code written in a multitude of styles, some of it is 20 years old, some of it is more or less a direct port of Java code, much of it involves the use of advanced concepts. I don't see this as being even remotely viable. [...] > Now different teachers may like to navigate the world of python differently. > So for example I prefer to start with the immutable (functional) subset and > go on to the stateful/imperative. The point (here) is not so much which is > preferable so much as this that a given teacher should have the freedom to > chart out a course through python in which (s)he can cross out certain > features at certain points for students. So a teacher preferring to > emphasise OO/imperative over functional may prefer the opposite choice. And of course you can do so. But you cannot expect to chart out a *pure* OO or *pure* functional course, since Python is not purely either. As a deliberate design choice, Python uses both functional and OO concepts all the way through the builtins and standard library. If you insist on a pure approach, Python is the wrong language for you. Python uses a hybrid paradigm of functional and procedural and OO and imperative approaches. Why not make that a teaching feature rather than a problem? You can compare the different approaches: functional sorted() versus OO .sort(), for example. Or have the students write them own OO version of map(). -- Steve From rustompmody at gmail.com Tue May 26 06:01:58 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Mon, 25 May 2015 21:01:58 -0700 (PDT) Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: <20150526035048.GJ5663@ando.pearwood.info> References: <20150526035048.GJ5663@ando.pearwood.info> Message-ID: <787080c9-7320-443f-a63e-09b975c8cd88@googlegroups.com> On Tuesday, May 26, 2015 at 9:21:43 AM UTC+5:30, Steven D'Aprano wrote: > > On Mon, May 25, 2015 at 10:36:00AM +0530, Rustom Mody wrote: > > Context: A bunch of my students will be working with me (if all goes > > according to plan!!)to hack on/in CPython sources. > > I'm sorry, I see a serious disconnect between what you are trying to do > (hack on CPython sources) and your students (beginners so early in the > learning process that they are confused by the fact that None doesn't > print in the interactive interpreter). > > How on earth do you expect that students who cannot even cope with > "disappearing None" will deal with hacking the CPython source code? > Heh! They are not the same students!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue May 26 06:13:09 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 26 May 2015 14:13:09 +1000 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: <20150526035048.GJ5663@ando.pearwood.info> References: <20150526035048.GJ5663@ando.pearwood.info> Message-ID: On 26 May 2015 at 13:50, Steven D'Aprano wrote: > If you insist on a pure approach, Python is the wrong language for you. > Python uses a hybrid paradigm of functional and procedural and OO and > imperative approaches. Not only that, but Python *deliberately* makes stateful procedural code the default, as that's the only style that comes to humans intuitively enough for it to be the standard way of *giving instructions to other humans*. It's the way checklists are written, it's the way cookbooks are written, it's the way work instructions and procedure manuals are written. If you allow for the use of illustrations in place of words, it's even the way IKEA and LEGO assembly instructions are written. More advanced conceptual modelling techniques like functional programming and object-oriented programming are then *optional* aspects of the language to help people cope with the fact that imperative programming doesn't scale very well when it comes to handling more complex problems. Regards, Nick. P.S. Gary Bernhardt coined a nice phrase for the functional programming focused variant of this: Imperative Shell, Functional Core. The notion works similarly well for an object-oriented core. The key though is that you can't skip over teaching the side effect laden procedural layer, or you're going to inadvertently persuade vast swathes of people that they can't program at all, when there's actually a lot of software development tasks that are well within their reach. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rustompmody at gmail.com Tue May 26 07:36:47 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Mon, 25 May 2015 22:36:47 -0700 (PDT) Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: <20150526035048.GJ5663@ando.pearwood.info> Message-ID: <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> On Tuesday, May 26, 2015 at 9:50:09 AM UTC+5:30, Nick Coghlan wrote: > Gary Bernhardt coined a nice phrase for the functional > programming focused variant of this: Imperative Shell, Functional > Core. The notion works similarly well for an object-oriented core. The > key though is that you can't skip over teaching the side effect laden > procedural layer, or you're going to inadvertently persuade vast > swathes of people that they can't program at all, when there's > actually a lot of software development tasks that are well within > their reach. > Why does the question arise of not teaching side-effects/procedural programming? Its only (if at all) a question of sequencing not of 'not teaching'. In fact that is why Python is preferable to say haskell. Take the example of arithmetic and algebra. Lets say we agree that both need to be taught/learnt. Can that be done simultaneously? >From my pov: arithmetic ? functional algebra ? imperative You want to see it the other way? That's ok and one can make a case for that viewpoint also. You want to say that algebra is more basic than arithmetic? Thats also ok [I guess many professional mathematicians would tend to that view: a group is more basic than a ring is more basic than a field. School arithmetic is one very specific and not too interesting field] The viewpoint that will not stand up to scrutiny is to say that arithmetic/algebra are the same and can be approached simultaneously. Easiest seen in the most simple and basic building block of imperative programming: the assignment statement. When you have: x = y+1 One understands the "y+1" functionally Whereas we understand the x = imperatively If you think the separation of these two worlds is unnecessary then you have the mess of C's 'expressions' like ++ And you will have students puzzling over the wonders of nature like i = i++ whereas the most useful answer would be "Syntax Error" > More advanced conceptual modelling techniques like functional > programming and object-oriented programming are then *optional* > aspects of the language to help people cope with the fact that > imperative programming doesn't scale very well when it comes to > handling more complex problems. Thats certainly true historically. However as I tried to say above I dont believe its true logically. And pedagogically the case remains very much open. ACM's most recent curriculum? juxtaposes FP and OOP (pg 157, 158) and says that 3 hours FP + 4 hours OOP is an absolute basic requirement for a CS major. I regard this as an epochal shift in our pcerception of what programming is about. The fact that this has happened 50 years after Lisp should indicate the actual speed with which our field adapts. Dijkstra said that it takes 100 years for an idea to go from inception to general acceptance. Think of when Cantor invented set theory and when modern math entered primary schools. Other inversions of historical | logical | pedagogical order here.? And finally all this is rather OT. I am talking of a framework for a teacher to chart a course through python, not any changes per se to python itself. A teacher wanting to chart a different course through python should be free (and encouraged) to do that as well. ? https://www.acm.org/education/CS2013-final-report.pdf ? http://blog.languager.org/2011/02/cs-education-is-fat-and-weak-1.html and sequel http://blog.languager.org/2011/02/cs-education-is-fat-and-weak-2.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue May 26 07:54:36 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 26 May 2015 01:54:36 -0400 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: On 5/25/2015 10:56 PM, Rustom Mody wrote: > It would be rather ridiculous to remove the print from a realistic language. > However if you've taught enough beginners you'd know how hard it is to > get beginners to write > ... return > as against > ... print () Programming is composition, the connections of outputs to inputs (as with circuit design). 'Print' is the enemy of composition. We agree so far. If submitting code with 'print' instead of 'return' gets a grade of 0 or 'fail', don't people learn fairly quickly? If assignments are partially test-defined and automatically test-graded, 'print' rather than 'return' will fail. Example: 'write a function that returns a tuple of the number of positive and negative values in an finite iterable of signed numbers', followed by examples with the caveat that the grading test will have other inputs and expected outputs. > And so in an early teachpack, I'd disable the print statement. Print is essential for debugging. You should only want to disallow print in the final submission of function code, as suggested above. -- Terry Jan Reedy From rosuav at gmail.com Tue May 26 08:07:40 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 26 May 2015 16:07:40 +1000 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: On Tue, May 26, 2015 at 3:54 PM, Terry Reedy wrote: > On 5/25/2015 10:56 PM, Rustom Mody wrote: > >> It would be rather ridiculous to remove the print from a realistic >> language. >> However if you've taught enough beginners you'd know how hard it is to >> get beginners to write >> ... return >> as against >> ... print () > > > Programming is composition, the connections of outputs to inputs (as with > circuit design). 'Print' is the enemy of composition. We agree so far. That's fine as long as it's okay to produce no results whatsoever until all processing is complete. In a pure sense, yes, a program's goal is to produce output, and it doesn't make a lot of difference how that output is produced. You can build a web framework in which the only way to send a result is to return it from a function. But there are innumerable times when it's more useful to produce intermediate output; whether that output goes to a file, a socket, the console, or something else, it's as much a part of real-world programming as returned values are. Doesn't the Zen of Python say something about practicality and purity? Hmmm. ChrisA From stephen at xemacs.org Tue May 26 08:31:20 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 26 May 2015 15:31:20 +0900 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> References: <20150526035048.GJ5663@ando.pearwood.info> <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> Message-ID: <87y4kbg893.fsf@uwakimon.sk.tsukuba.ac.jp> Rustom Mody writes: > And finally all this is rather OT. I am talking of a framework for > a teacher to chart a course through python, not any changes per se > to python itself. Then why is this conversation, interesting as it is, on python-ideas instead of python-list? From abarnert at yahoo.com Tue May 26 08:56:38 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 25 May 2015 23:56:38 -0700 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> References: <20150526035048.GJ5663@ando.pearwood.info> <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> Message-ID: <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com> On May 25, 2015, at 22:36, Rustom Mody wrote: > > I am talking of a framework for a teacher to chart a course through python, not any changes per se to python itself. How exactly can you allow a teacher to "chart a course through python" that includes separate function and generator function definition statements, procedures as distinct from functions, etc. without changing Python? Python doesn't have the configurability to switch those features on and off, and also doesn't have the features to switch on in the first place. > A teacher wanting to chart a different course through python should be free (and encouraged) to do that as well. I would like a framework for a teacher to chart a course through driving the Nissan 370Z that would allow me to start off teaching hoverpads instead of wheels, but a teacher wanting to chart a different course should be free to start with sails instead. And I want to do this without changing anything about the 370Z. From ncoghlan at gmail.com Tue May 26 09:52:14 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 26 May 2015 17:52:14 +1000 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: <87y4kbg893.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20150526035048.GJ5663@ando.pearwood.info> <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> <87y4kbg893.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 26 May 2015 16:31, "Stephen J. Turnbull" wrote: > > Rustom Mody writes: > > > And finally all this is rather OT. I am talking of a framework for > > a teacher to chart a course through python, not any changes per se > > to python itself. > > Then why is this conversation, interesting as it is, on python-ideas > instead of python-list? Or edu-sig: https://mail.python.org/mailman/listinfo/edu-sig Cheers, Nick. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Tue May 26 11:46:36 2015 From: wes.turner at gmail.com (Wes Turner) Date: Tue, 26 May 2015 04:46:36 -0500 Subject: [Python-ideas] The pipe protocol, a convention for extensible method chaining In-Reply-To: References: Message-ID: On May 25, 2015 6:45 PM, "Stephan Hoyer" wrote: > > In the PyData community, we really like method chaining for data analysis pipelines: > > (iris.query('SepalLength > 5') > .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength, > PetalRatio = lambda x: x.PetalWidth / x.PetalLength) > .plot(kind='scatter', x='SepalRatio', y='PetalRatio')) > > > Unfortunately, method chaining isn't very extensible -- short of monkey patching, every method we want to use has exist on the original object. If a user wants to supply their own plotting function, they can't use method chaining anymore. > > You may recall that we brought this up a few months ago on python-ideas as an example of why we would like macros. > > To get around this issue, we are contemplating adding a pipe method to pandas DataFrames. It looks like this: > > def pipe(self, func, *args, **kwargs): > pipe_func = getattr(func, '__pipe_func__', func) > return pipe_func(self, *args, **kwargs) > > > We would encourage third party libraries with objects on which method chaining is useful to define a pipe method in the same way. > > The main idea here is to create an easy way for users to do method chaining with their own functions and with functions from third party libraries. > > The business with __pipe_func__ is more magical, and frankly we aren't sure it's worth the complexity. The idea is to create a "pipe protocol" that allows functions to decide how they are called when piped. This is useful in some cases, because it doesn't always make sense for functions that act on piped data to accept that data as their first argument. > > For more motivation and examples, please read the opening post in this GitHub issue: https://github.com/pydata/pandas/issues/10129 > > Obviously, this sort of protocol would not be an official part of the Python language. But because we are considering creating a de-facto standard, we would love to get feedback from other Python communities that use method chaining: > 1. Have you encountered or addressed the problem of extensible method chaining? * https://pythonhosted.org/pyquery/api.html * SQLAlchemy > 2. Would this pipe protocol be useful to you? What are the advantages over just returning 'self'? (Which use cases are not possible with current syntax?) In terms of documenting functional composition, I find it easier to test and add comment strings to multiple statements. Months ago, when I looked at creating pandasrdf (pandas #3402), there is need for a (...).meta.columns w/ columnar URIs, units, (metadata: who, what, when, how). Said metadata is not storable with e.g. CSV; but is with JSON-LD, RDF, RDFa, CSVW. It would be neat to be able to track provenance metadata through [chained] transformations. > 3. Is it worth allowing piped functions to override how they are called by defining something like __pipe_func__? "There should be one-- and preferably only one --obvious way to do it." > Note that I'm not particularly interested in feedback about how we shouldn't be defining double underscore methods. There are other ways we could spell __pipe_func__, but double underscores seems to be pretty standard for ad-hoc protocols. > Thanks for your attention. > Best, > Stephan > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rustompmody at gmail.com Mon May 25 19:50:43 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Mon, 25 May 2015 10:50:43 -0700 (PDT) Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: On Monday, May 25, 2015 at 10:14:50 PM UTC+5:30, Chris Barker wrote: > > Just a note here, that (as an intro to python teacher), I think this is a > pedagogically bad idea. > > At least if the goal is to teach Python -- while you don't need to > introduce all the complexity up front, hiding it just sends students down > the wrong track. > > On the other hand, if you want a kind-of-like-python-but-simpler language > to teach particular computer science concepts, this kind of hacking may be > of value. > > But I don't think it would be a good idea to build that capability inot > Python itself. And I think you can hack in in with monkey patching anyway > -- so that's probably the way to go. > > for example: > > """So for example I prefer to start with the immutable (functional) > subset""" > > you can certainly do that by simply using tuples and the functional tools. > > (OK, maybe not -- after all most (all?) of the functional stuff returns > lists, not tuples, and that may be beyond monkey-patchable) > > But that's going to be a lot of hacking to change. > > Is it so bad to have them work with lists in a purely functional way? > > -Chris > > I guess there are 2 questions here one about teaching, one about python-ideas, both having somewhat OT answers... Anyways here goes. About ideas for python: This is really about some kids and I mucking around inside python sources. That will become something used by other teachers -- far away That will be suitable for patches to python -- even further About programming pedagogy: | Rob Hagan at Monash had shown that you could teach students more COBOL with one semester of Scheme and one semester of COBOL than you | could with three semesters of COBOL from https://groups.google.com/d/msg/erlang-programming/5X1irAmLMD8/qCQJ11Y5jEAJ No this is not about 'pro-scheme' but about 'pro-learning-curve' I dont believe we should be teaching python (or C++ or Java or Haskell or...) but programming. [I started my last programming paradigms with python course with the koan: You cannot do programming without syntax Syntax is irrelevant to programming So what is relevant? ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Tue May 26 15:43:58 2015 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 26 May 2015 10:43:58 -0300 Subject: [Python-ideas] The pipe protocol, a convention for extensible method chaining In-Reply-To: References: Message-ID: > Unfortunately, method chaining isn't very extensible -- short of monkey patching > every method we want to use has exist on the original object. (Link for repo on which the examples here are implemented: https://github.com/jsbueno/chillicurry ) Actually, the last time this subjetc showed up (and it is was not that long ago) - I could think of something "short of monkey patching everything" -- It is possible to fashion an special object with a custom `__getattr__` - sai that you call it "curry" - that them proceeds to retrieve references to functions (and methods) with the same names of the attributes you try to get from it, and wrap those function calls in order to create your pipeline. Say: >>> curry.len.list.range(5,10) 5 The trick is to pick the names "len", "list" and "range" from the calling stack frame. You can them evolve on this idea, and pass a sepecial sentinel parameter to calls on the chain, so that the function call gets delayed and the sentinel is replaced by the piped object when it is actually executed - say: >>> curry.mul(DELAY, 2).mul(DELAY, 3).complex.int(5) (30+0j) So I did put this together - but lacking a concrete use case myself, it is somewhat "amorph" - lacking specifications on what it should do - it can for example, retrieve names from the piped object attributes instead of the calling namespace: >>> curry.split.upper.str("good morning Vietnam") ['GOOD', 'MORNING', 'VIETNAM'] And the "|" operator is overriden as well so that with some parentheses lambdas and other things can be added to the chain - Just throwing in what could give you more ideas to the approach you have in mind. This one works applying the calls on the rightside first and traversing the object to the left - but it should be easy to do the opposite - starting with a call with the "seed" object on the left, and chaining calls on the right. If you find the idea interesting enough to be of use, I'd be happy to evolve what is already in place there so it could be useful. regards, js -><- On 26 May 2015 at 06:46, Wes Turner wrote: > > On May 25, 2015 6:45 PM, "Stephan Hoyer" wrote: >> >> In the PyData community, we really like method chaining for data analysis >> pipelines: >> >> (iris.query('SepalLength > 5') >> .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength, >> PetalRatio = lambda x: x.PetalWidth / x.PetalLength) >> .plot(kind='scatter', x='SepalRatio', y='PetalRatio')) >> >> >> Unfortunately, method chaining isn't very extensible -- short of monkey >> patching, every method we want to use has exist on the original object. If a >> user wants to supply their own plotting function, they can't use method >> chaining anymore. > >> >> You may recall that we brought this up a few months ago on python-ideas as >> an example of why we would like macros. >> >> To get around this issue, we are contemplating adding a pipe method to >> pandas DataFrames. It looks like this: >> >> def pipe(self, func, *args, **kwargs): >> pipe_func = getattr(func, '__pipe_func__', func) >> return pipe_func(self, *args, **kwargs) >> >> >> We would encourage third party libraries with objects on which method >> chaining is useful to define a pipe method in the same way. >> >> The main idea here is to create an easy way for users to do method >> chaining with their own functions and with functions from third party >> libraries. >> >> The business with __pipe_func__ is more magical, and frankly we aren't >> sure it's worth the complexity. The idea is to create a "pipe protocol" that >> allows functions to decide how they are called when piped. This is useful in >> some cases, because it doesn't always make sense for functions that act on >> piped data to accept that data as their first argument. >> >> For more motivation and examples, please read the opening post in this >> GitHub issue: https://github.com/pydata/pandas/issues/10129 >> >> Obviously, this sort of protocol would not be an official part of the >> Python language. But because we are considering creating a de-facto >> standard, we would love to get feedback from other Python communities that >> use method chaining: >> 1. Have you encountered or addressed the problem of extensible method >> chaining? > > * https://pythonhosted.org/pyquery/api.html > * SQLAlchemy > >> 2. Would this pipe protocol be useful to you? > > What are the advantages over just returning 'self'? (Which use cases are not > possible with current syntax?) > > In terms of documenting functional composition, I find it easier to test and > add comment strings to multiple statements. > > Months ago, when I looked at creating pandasrdf (pandas #3402), there is > need for a (...).meta.columns w/ columnar URIs, units, (metadata: who, what, > when, how). Said metadata is not storable with e.g. CSV; but is with > JSON-LD, RDF, RDFa, CSVW. > > It would be neat to be able to track provenance metadata through [chained] > transformations. > >> 3. Is it worth allowing piped functions to override how they are called by >> defining something like __pipe_func__? > > "There should be one-- and preferably only one --obvious way to do it." > >> Note that I'm not particularly interested in feedback about how we >> shouldn't be defining double underscore methods. There are other ways we >> could spell __pipe_func__, but double underscores seems to be pretty >> standard for ad-hoc protocols. >> Thanks for your attention. >> Best, >> Stephan >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From tjreedy at udel.edu Tue May 26 15:45:32 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 26 May 2015 09:45:32 -0400 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: On 5/26/2015 2:07 AM, Chris Angelico wrote: > On Tue, May 26, 2015 at 3:54 PM, Terry Reedy wrote: >> Programming is composition, the connections of outputs to inputs (as with >> circuit design). 'Print' is the enemy of composition. We agree so far. > > That's fine as long as it's okay to produce no results whatsoever > until all processing is complete. > In a pure sense, yes, a program's > goal is to produce output, and it doesn't make a lot of difference how > that output is produced. You can build a web framework in which the > only way to send a result is to return it from a function. But there > are innumerable times when it's more useful to produce intermediate > output; whether that output goes to a file, a socket, the console, or > something else, it's as much a part of real-world programming as > returned values are. The context is a beginning programming course where the goal is to teach people to write def f(a, b, c): return a*b + c print(f(2, 3, 4)) instead def f(a, b, d): print(a*b + c) f(2, 3, 4) In other words, to teach beginners to relegate output to top level code, separate from the calculation code. (Or perhaps output functions, but that is a more advanced topic.) The first function is easy testable, the second is not. For printing intermediate results, yield lines to top-level code that can do whatever with them, including printing. def text_generator(args): ... yield line for line in text_generator: print(line) is top-level code that prints intermediate results produced by a testable generator. People want to see results, which is half of why I said not to delete print. But proper assignments and grading can enforce separation of calculations from use of results. The idea of separation of concerns did not start with OOP. -- Terry Jan Reedy From julien at palard.fr Tue May 26 16:33:19 2015 From: julien at palard.fr (Julien Palard) Date: Tue, 26 May 2015 16:33:19 +0200 Subject: [Python-ideas] The pipe protocol, a convention for extensible method chaining In-Reply-To: References: Message-ID: <5564842F.5000501@palard.fr> o/ On 05/26/2015 01:38 AM, Stephan Hoyer wrote: > In the PyData community, we really like method chaining for data > analysis pipelines: A few month ago, I created an almost similar thread here: https://mail.python.org/pipermail//python-ideas/2014-October/029839.html About a package of mine https://pypi.python.org/pypi/pipe that I'm not that proud of. As the answers of my thread states, using piplines is not the Pythonic way to do things, it looks like it makes the code more readable, but it's not true, there's [always] a Pythonic way to write the same code but readable. By affecting intermediate computation to variables for example, side gain: variables are named, so self-documenting. My point of view about pipes is that it's hard to modify as, for the reader, what is passed between each action is opaque. My point of view about my library is that i should not expose an operator overloading, as it may confuse people actually needing `|` to apply a `binary or` to a result of a chained thing. -- Julien Palard From rustompmody at gmail.com Tue May 26 16:55:36 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Tue, 26 May 2015 07:55:36 -0700 (PDT) Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: On Tuesday, May 26, 2015 at 7:16:18 PM UTC+5:30, Terry Reedy wrote: > > On 5/26/2015 2:07 AM, Chris Angelico wrote: > > On Tue, May 26, 2015 at 3:54 PM, Terry Reedy > wrote: > > >> Programming is composition, the connections of outputs to inputs (as > with > >> circuit design). 'Print' is the enemy of composition. We agree so far. > > > > That's fine as long as it's okay to produce no results whatsoever > > until all processing is complete. > > In a pure sense, yes, a program's > > goal is to produce output, and it doesn't make a lot of difference how > > that output is produced. You can build a web framework in which the > > only way to send a result is to return it from a function. But there > > are innumerable times when it's more useful to produce intermediate > > output; whether that output goes to a file, a socket, the console, or > > something else, it's as much a part of real-world programming as > > returned values are. > > The context is a beginning programming course where the goal is to teach > people to write > > def f(a, b, c): return a*b + c > print(f(2, 3, 4)) > > instead > > def f(a, b, d): print(a*b + c) > f(2, 3, 4) > > In other words, to teach beginners to relegate output to top level code, > separate from the calculation code. (Or perhaps output functions, but > that is a more advanced topic.) The first function is easy testable, > the second is not. > Thanks Terry for the elucidation > > For printing intermediate results, yield lines to top-level code that > can do whatever with them, including printing. > > def text_generator(args): > ... > yield line > > for line in text_generator: print(line) > > is top-level code that prints intermediate results produced by a > testable generator. > > And thanks-squared for that. Generators are a really wonderful feature of python and not enough showcased. Think of lazy lists in haskell and how much fanfaring and trumpeting goes on around these. And by contrast how little of that for generators in the python world. Are the two all that different? You just have to think of all the data-structure/AI/etc books explaining depth-first-search and more arcane algorithms with a 'print' in the innards of it. And how far generators as a fundamental tool would go towards clarifying/modularizing these explanations So yes generators are an important component towards the goal of 'print-less' programming > People want to see results, which is half of why I said not to delete > print. But proper assignments and grading can enforce separation of > calculations from use of results. The idea of separation of concerns > did not start with OOP. > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue May 26 19:13:08 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 26 May 2015 10:13:08 -0700 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> References: <20150526035048.GJ5663@ando.pearwood.info> <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> Message-ID: On Mon, May 25, 2015 at 10:36 PM, Rustom Mody wrote: > If you think the separation of these two worlds is unnecessary then you > have the mess of C's 'expressions' like ++ > And you will have students puzzling over the wonders of nature like i = > i++ whereas the most useful answer would be "Syntax Error" > A good reason NOT to teach C as a first language ;-) > And finally all this is rather OT. I am talking of a framework for a > teacher to chart a course through python, not any changes per se to python > itself. > I would argue that you are actually not talking about teaching Python, per se -- but using a (subset) of python to teach programming in the more general sense. If you want to teach Python, then I think it is a mistake to teach a truncated version first -- it will just lead to confusion later. But having a "functional" version of Python for teaching functional programming concepts makes some sense. Though I think one woulc create that as a monkey=patched version of python wihtout hacing into the core pyton implimentaiton: i.e. replace map(), etc, with versions that return tuples, rather than lists, that kind of thing. though maybe replacing list comprehensions with tuple comprehensions would be a bit tricky... Though I'm still not sure you'd need to -- sure you CAN mutate a list, but if you use functional approaches, lists won't get mutated -- so where is the source of the confusion? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Tue May 26 20:05:06 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 26 May 2015 21:05:06 +0300 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: <20150522105847.GA9624@phdru.name> References: <20150522105847.GA9624@phdru.name> Message-ID: On Fri, May 22, 2015 at 1:58 PM, Oleg Broytman wrote: > On Fri, May 22, 2015 at 12:59:30PM +0300, anatoly techtonik wrote: >> Is the idea to have timer that starts on import is good? > > No, because: > > -- it could be imported at the wrong time; Any time is right. > -- it couldn't be "reimported"; what is the usage of one-time timer? The idea is to have convenient default timer to measure script run-time. > -- if it could be reset and restarted at need -- why not start it > manually in the first place? Current ways of measuring script run-time are not cross-platform or not memorizable. I have to reinvent timer code a couple of times, and that's not convenient for the code that is only relevant while debugging. From wes.turner at gmail.com Tue May 26 20:19:59 2015 From: wes.turner at gmail.com (Wes Turner) Date: Tue, 26 May 2015 13:19:59 -0500 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com> References: <20150526035048.GJ5663@ando.pearwood.info> <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com> Message-ID: Ways to teach Python from first principles: * Restrict the syntactical token list ("switch features on and off") * Fork Python * RPython -- https://rpython.readthedocs.org/en/latest/ * https://pypi.python.org/pypi/RestrictedPython * http://pyvideo.org/video/2585/building-and-breaking-a-python-sandbox * OR: execute code in container (e.g. LXC, LXD, Docker (JupyterHub); virtualization) * Add a preprocessor with a cost function to limit valid tokens for a given code submission (see the links to the Python grammar, tokenizer, compiler linked above) * Modify nbgrader to evaluate submissions with such a cost function: https://github.com/jupyter/nbgrader * Receive feedback about code syntax and tests from a CI system with repository commit (web)hooks * BuildBot, Jenkins, Travis CI, xUnit XML https://westurner.org/wiki/awesome-python-testing#continuous-integration-ci-and-continuous-delivery-cd On Tue, May 26, 2015 at 1:56 AM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > On May 25, 2015, at 22:36, Rustom Mody wrote: > > > > I am talking of a framework for a teacher to chart a course through > python, not any changes per se to python itself. > > How exactly can you allow a teacher to "chart a course through python" > that includes separate function and generator function definition > statements, procedures as distinct from functions, etc. without changing > Python? Python doesn't have the configurability to switch those features on > and off, and also doesn't have the features to switch on in the first place. > > > A teacher wanting to chart a different course through python should be > free (and encouraged) to do that as well. > > > I would like a framework for a teacher to chart a course through driving > the Nissan 370Z that would allow me to start off teaching hoverpads instead > of wheels, but a teacher wanting to chart a different course should be free > to start with sails instead. And I want to do this without changing > anything about the 370Z. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue May 26 20:21:01 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 27 May 2015 04:21:01 +1000 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: References: <20150522105847.GA9624@phdru.name> Message-ID: On Wed, May 27, 2015 at 4:05 AM, anatoly techtonik wrote: >> -- if it could be reset and restarted at need -- why not start it >> manually in the first place? > > Current ways of measuring script run-time are not cross-platform or > not memorizable. I have to reinvent timer code a couple of times, and > that's not convenient for the code that is only relevant while debugging. Sounds to me like something that doesn't belong in the stdlib, but makes a great utility module for private use. ChrisA From wes.turner at gmail.com Tue May 26 20:25:27 2015 From: wes.turner at gmail.com (Wes Turner) Date: Tue, 26 May 2015 13:25:27 -0500 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: <20150526035048.GJ5663@ando.pearwood.info> <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com> Message-ID: On Tue, May 26, 2015 at 1:19 PM, Wes Turner wrote: > Ways to teach Python from first principles: > > * Restrict the syntactical token list ("switch features on and off") > * Fork Python > * RPython -- https://rpython.readthedocs.org/en/latest/ > RPython -> PyPy: https://bitbucket.org/pypy/pypy PyPy is both an implementation of the Python programming language, and an > extensive compiler framework for dynamic language implementations. You can > build self-contained Python implementations which execute independently > from CPython. > * https://pypi.python.org/pypi/RestrictedPython > * http://pyvideo.org/video/2585/building-and-breaking-a-python-sandbox > * OR: execute code in container (e.g. LXC, LXD, Docker (JupyterHub); > virtualization) > > * Add a preprocessor with a cost function to limit valid tokens for a > given code submission > (see the links to the Python grammar, tokenizer, compiler linked above) > > * Modify nbgrader to evaluate submissions with such a cost function: > https://github.com/jupyter/nbgrader > > * Receive feedback about code syntax and tests from a CI system with > repository commit (web)hooks > * BuildBot, Jenkins, Travis CI, xUnit XML > > https://westurner.org/wiki/awesome-python-testing#continuous-integration-ci-and-continuous-delivery-cd > > > > On Tue, May 26, 2015 at 1:56 AM, Andrew Barnert via Python-ideas < > python-ideas at python.org> wrote: > >> On May 25, 2015, at 22:36, Rustom Mody wrote: >> > >> > I am talking of a framework for a teacher to chart a course through >> python, not any changes per se to python itself. >> >> How exactly can you allow a teacher to "chart a course through python" >> that includes separate function and generator function definition >> statements, procedures as distinct from functions, etc. without changing >> Python? Python doesn't have the configurability to switch those features on >> and off, and also doesn't have the features to switch on in the first place. >> >> > A teacher wanting to chart a different course through python should be >> free (and encouraged) to do that as well. >> >> >> I would like a framework for a teacher to chart a course through driving >> the Nissan 370Z that would allow me to start off teaching hoverpads instead >> of wheels, but a teacher wanting to chart a different course should be free >> to start with sails instead. And I want to do this without changing >> anything about the 370Z. >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue May 26 20:28:39 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 27 May 2015 04:28:39 +1000 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: References: <20150522105847.GA9624@phdru.name> Message-ID: On Wed, May 27, 2015 at 4:24 AM, anatoly techtonik wrote: > There are a lot of helpers like this that might be useful. Installing them > separately is a lot of hassle - it is easy to forget some. Package 'em all up into a single repository and clone that repo on every system you use. For me, that's called "shed", and I keep it on github: https://github.com/Rosuav/shed But whether it's public or private, git or hg, pure Python or a mix of languages, it's an easy way to pick up all those convenient little scripts. You'll never "forget some", because they're all in one place. ChrisA From techtonik at gmail.com Tue May 26 20:24:37 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 26 May 2015 21:24:37 +0300 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: References: <20150522105847.GA9624@phdru.name> Message-ID: On Tue, May 26, 2015 at 9:21 PM, Chris Angelico wrote: > On Wed, May 27, 2015 at 4:05 AM, anatoly techtonik wrote: >>> -- if it could be reset and restarted at need -- why not start it >>> manually in the first place? >> >> Current ways of measuring script run-time are not cross-platform or >> not memorizable. I have to reinvent timer code a couple of times, and >> that's not convenient for the code that is only relevant while debugging. > > Sounds to me like something that doesn't belong in the stdlib, but > makes a great utility module for private use. There are a lot of helpers like this that might be useful. Installing them separately is a lot of hassle - it is easy to forget some. From techtonik at gmail.com Tue May 26 20:30:54 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 26 May 2015 21:30:54 +0300 Subject: [Python-ideas] Lossless bulletproof conversion to unicode (backslashing) Message-ID: https://docs.python.org/2.7/library/functions.html?highlight=unicode#unicode There is no lossless way to encode the information to unicode. The argument that you know the encoding the data is coming from is a fallacy. The argument that data is always correct is a fallacy as well. So: 1. external data encoding is unknown or varies 2. external data has binary chunks that are invalid for conversion to unicode In real world you have to deal with broken and invalid output and UnicodeDecode crashes is not an option. The unicode() constructor proposes two options to deal with invalid output: 1. ignore - meaning skip and corrupt the data 2. replace - just corrupt the data The solution is to have filter preprocess the binary string to escape all non-unicode symbols so that the following lossless transformation becomes possible: binary -> escaped utf-8 string -> unicode -> binary How to accomplish that with Python 2.x? This stuff is critical to port SCons to Python 3.x and I expect for other such tools too. -- anatoly t. From ethan at stoneleaf.us Tue May 26 20:47:47 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 26 May 2015 11:47:47 -0700 Subject: [Python-ideas] Lossless bulletproof conversion to unicode (backslashing) In-Reply-To: References: Message-ID: <5564BFD3.7000101@stoneleaf.us> On 05/26/2015 11:30 AM, anatoly techtonik wrote: [...] > How to accomplish that with Python 2.x? This should be on Python List, not on Ideas. -- ~Ethan~ From phd at phdru.name Tue May 26 21:06:46 2015 From: phd at phdru.name (Oleg Broytman) Date: Tue, 26 May 2015 21:06:46 +0200 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: References: <20150522105847.GA9624@phdru.name> Message-ID: <20150526190646.GA12204@phdru.name> Hi! On Tue, May 26, 2015 at 09:05:06PM +0300, anatoly techtonik wrote: > On Fri, May 22, 2015 at 1:58 PM, Oleg Broytman wrote: > > On Fri, May 22, 2015 at 12:59:30PM +0300, anatoly techtonik wrote: > >> Is the idea to have timer that starts on import is good? > > > > No, because: > > > > -- it could be imported at the wrong time; > > Any time is right. Very much application-dependent. What if you wanna measure import time? > > -- it couldn't be "reimported"; what is the usage of one-time timer? > > The idea is to have convenient default timer to measure > script run-time. Good idea for a small separate project. Bad for the stdlib. Not every small simple useful module must be in the stdlib. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From phd at phdru.name Tue May 26 21:08:01 2015 From: phd at phdru.name (Oleg Broytman) Date: Tue, 26 May 2015 21:08:01 +0200 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: References: <20150522105847.GA9624@phdru.name> Message-ID: <20150526190801.GB12204@phdru.name> On Tue, May 26, 2015 at 09:24:37PM +0300, anatoly techtonik wrote: > On Tue, May 26, 2015 at 9:21 PM, Chris Angelico wrote: > > On Wed, May 27, 2015 at 4:05 AM, anatoly techtonik wrote: > >>> -- if it could be reset and restarted at need -- why not start it > >>> manually in the first place? > >> > >> Current ways of measuring script run-time are not cross-platform or > >> not memorizable. I have to reinvent timer code a couple of times, and > >> that's not convenient for the code that is only relevant while debugging. > > > > Sounds to me like something that doesn't belong in the stdlib, but > > makes a great utility module for private use. > > There are a lot of helpers like this that might be useful. Installing them > separately is a lot of hassle - it is easy to forget some. Incorporate them into your main repository as submodules. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From tjreedy at udel.edu Tue May 26 21:20:47 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 26 May 2015 15:20:47 -0400 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: On 5/26/2015 10:55 AM, Rustom Mody wrote: > On Tuesday, May 26, 2015 at 7:16:18 PM UTC+5:30, Terry Reedy wrote: > The context is a beginning programming course where the goal is to > teach > people to write > > def f(a, b, c): return a*b + c > print(f(2, 3, 4)) > > instead > > def f(a, b, d): print(a*b + c) > f(2, 3, 4) > > In other words, to teach beginners to relegate output to top level > code, > separate from the calculation code. (Or perhaps output functions, but > that is a more advanced topic.) The first function is easy testable, > the second is not. > Thanks Terry for the elucidation > For printing intermediate results, yield lines to top-level code that > can do whatever with them, including printing. > > def text_generator(args): > ... > yield line > > for line in text_generator: print(line) > > is top-level code that prints intermediate results produced by a > testable generator. > And thanks-squared for that. > Generators are a really wonderful feature of python and not enough > showcased. > Think of lazy lists in haskell and how much fanfaring and trumpeting > goes on around these. > And by contrast how little of that for generators in the python world. > Are the two all that different? > > You just have to think of all the data-structure/AI/etc books explaining > depth-first-search and more arcane algorithms with a 'print' in the > innards of it. > And how far generators as a fundamental tool would go towards > clarifying/modularizing these explanations > > So yes generators are an important component towards the goal of > 'print-less' programming I will just note that the stdlib is not immune from overly embedded prints. Until 3.4, one could print a disassembly to stdout with dis.dis. Period. Hard for us to test; hard for others to use. In 3.4, a file arg was added to dis, and the Bytecode and Instruction classes added, so one could a) iterate over unformatted named tuples, b) get the output as a string, or c) redirect the output to any 'file' (with a write method). The result: easy for us to test; easy for others to use the data. -- Terry Jan Reedy From abarnert at yahoo.com Tue May 26 22:32:08 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 26 May 2015 13:32:08 -0700 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: <20150526035048.GJ5663@ando.pearwood.info> <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com> Message-ID: On May 26, 2015, at 11:19, Wes Turner wrote: > > Ways to teach Python from first principles: What you're suggesting may be a reasonable way to restrict Python for teaching (although, as others have argued, I don't think it's necessary)--but it isn't a reasonable way to get what Rustom Mody says he wants. While his first paragraph started out talking about restricting Python to a subset, only one of the four examples I've seen actually is a restriction. He wants procedures and functions to be fundamentally distinct things, defined differently and called differently. You can't do that by restricting the token list, or by using RPython instead of Python, or by executing code inside a container. And the one that actually _is_ a restriction isn't at the grammar level, it's just hiding a bunch of methods (list.append, presumably list.__setitem__, etc.). Of course you _could_ do everything he wants by forking one of the Python installations and heavily modifying it (I even suggested how that particular change could be implemented in a CPython fork), or by writing a new Python-like language with a compiler that compiles to Python (which, at its simplest, might be reducible to a set of MacroPy macros or a source preprocessor), because you can do _anything_ that way. But then you're not really talking about Python in the first place, you're talking about designing and implementing a new teaching language that just borrows a lot of ideas from Python and is implemented with Python's help. And none of the rest of your suggestions are relevant once that's what you're doing. > * Restrict the syntactical token list ("switch features on and off") > * Fork Python > * RPython -- https://rpython.readthedocs.org/en/latest/ > * https://pypi.python.org/pypi/RestrictedPython > * http://pyvideo.org/video/2585/building-and-breaking-a-python-sandbox > * OR: execute code in container (e.g. LXC, LXD, Docker (JupyterHub); virtualization) > > * Add a preprocessor with a cost function to limit valid tokens for a given code submission > (see the links to the Python grammar, tokenizer, compiler linked above) > > * Modify nbgrader to evaluate submissions with such a cost function: > https://github.com/jupyter/nbgrader > > * Receive feedback about code syntax and tests from a CI system with repository commit (web)hooks > * BuildBot, Jenkins, Travis CI, xUnit XML > https://westurner.org/wiki/awesome-python-testing#continuous-integration-ci-and-continuous-delivery-cd > > > >> On Tue, May 26, 2015 at 1:56 AM, Andrew Barnert via Python-ideas wrote: >> On May 25, 2015, at 22:36, Rustom Mody wrote: >> > >> > I am talking of a framework for a teacher to chart a course through python, not any changes per se to python itself. >> >> How exactly can you allow a teacher to "chart a course through python" that includes separate function and generator function definition statements, procedures as distinct from functions, etc. without changing Python? Python doesn't have the configurability to switch those features on and off, and also doesn't have the features to switch on in the first place. >> >> > A teacher wanting to chart a different course through python should be free (and encouraged) to do that as well. >> >> >> I would like a framework for a teacher to chart a course through driving the Nissan 370Z that would allow me to start off teaching hoverpads instead of wheels, but a teacher wanting to chart a different course should be free to start with sails instead. And I want to do this without changing anything about the 370Z. >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue May 26 23:00:18 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 26 May 2015 14:00:18 -0700 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: <20150526035048.GJ5663@ando.pearwood.info> <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com> Message-ID: On Tue, May 26, 2015 at 1:32 PM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: And the one that actually _is_ a restriction isn't at the grammar level, > it's just hiding a bunch of methods (list.append, presumably > list.__setitem__, etc.). > which is odd, because Python already has an immutable sequence -- it's call a tuple. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Tue May 26 23:33:38 2015 From: wes.turner at gmail.com (Wes Turner) Date: Tue, 26 May 2015 16:33:38 -0500 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: <20150526035048.GJ5663@ando.pearwood.info> <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com> Message-ID: On Tue, May 26, 2015 at 3:32 PM, Andrew Barnert wrote: > On May 26, 2015, at 11:19, Wes Turner wrote: > > Ways to teach Python from first principles: > > > What you're suggesting may be a reasonable way to restrict Python for > teaching (although, as others have argued, I don't think it's > necessary)--but it isn't a reasonable way to get what Rustom Mody says he > wants. > > While his first paragraph started out talking about restricting Python to > a subset, only one of the four examples I've seen actually is a > restriction. He wants procedures and functions to be fundamentally distinct > things, defined differently and called differently. You can't do that by > restricting the token list, or by using RPython instead of Python, or by > executing code inside a container. And the one that actually _is_ a > restriction isn't at the grammar level, it's just hiding a bunch of methods > (list.append, presumably list.__setitem__, etc.). > > Of course you _could_ do everything he wants by forking one of the Python > installations and heavily modifying it (I even suggested how that > particular change could be implemented in a CPython fork), or by writing a > new Python-like language with a compiler that compiles to Python (which, at > its simplest, might be reducible to a set of MacroPy macros or a source > preprocessor), because you can do _anything_ that way. But then you're not > really talking about Python in the first place, you're talking about > designing and implementing a new teaching language that just borrows a lot > of ideas from Python and is implemented with Python's help. And none of the > rest of your suggestions are relevant once that's what you're doing. > I must have misunderstood the objectives. All of these suggestions are relevant to teaching [core] python in an academic environment. > > * Restrict the syntactical token list ("switch features on and off") > * Fork Python > * RPython -- https://rpython.readthedocs.org/en/latest/ > * https://pypi.python.org/pypi/RestrictedPython > * http://pyvideo.org/video/2585/building-and-breaking-a-python-sandbox > * OR: execute code in container (e.g. LXC, LXD, Docker (JupyterHub); > virtualization) > > * Add a preprocessor with a cost function to limit valid tokens for a > given code submission > (see the links to the Python grammar, tokenizer, compiler linked above) > > * Modify nbgrader to evaluate submissions with such a cost function: > https://github.com/jupyter/nbgrader > > * Receive feedback about code syntax and tests from a CI system with > repository commit (web)hooks > * BuildBot, Jenkins, Travis CI, xUnit XML > > https://westurner.org/wiki/awesome-python-testing#continuous-integration-ci-and-continuous-delivery-cd > > > > On Tue, May 26, 2015 at 1:56 AM, Andrew Barnert via Python-ideas < > python-ideas at python.org> wrote: > >> On May 25, 2015, at 22:36, Rustom Mody wrote: >> > >> > I am talking of a framework for a teacher to chart a course through >> python, not any changes per se to python itself. >> >> How exactly can you allow a teacher to "chart a course through python" >> that includes separate function and generator function definition >> statements, procedures as distinct from functions, etc. without changing >> Python? Python doesn't have the configurability to switch those features on >> and off, and also doesn't have the features to switch on in the first place. >> >> > A teacher wanting to chart a different course through python should be >> free (and encouraged) to do that as well. >> >> >> I would like a framework for a teacher to chart a course through driving >> the Nissan 370Z that would allow me to start off teaching hoverpads instead >> of wheels, but a teacher wanting to chart a different course should be free >> to start with sails instead. And I want to do this without changing >> anything about the 370Z. >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed May 27 00:41:46 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 27 May 2015 08:41:46 +1000 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: <20150526035048.GJ5663@ando.pearwood.info> <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com> Message-ID: <20150526224146.GB932@ando.pearwood.info> On Tue, May 26, 2015 at 01:19:59PM -0500, Wes Turner wrote: > Ways to teach Python from first principles: Most of these methods fail to teach *Python*. They teach something similar, but different to, Python: almost-Python. If Rustom wishes to fork Python to create his own version of almost- Python, he doesn't need to discuss it here. I'd rather he didn't discuss it here -- this is PYTHON-ideas, not Cobra-ideas, or Lua-ideas, or Rustom's-purely-functional-almost-python-ideas. There is, or at least was, a strong tradition of creating specialist teaching languages, starting with Pascal which developed as a more restricted and more pure form of Algol. But this is not the place to discuss it. > * Restrict the syntactical token list ("switch features on and off") > * Fork Python > * RPython -- https://rpython.readthedocs.org/en/latest/ I'm pretty sure that RPython is not designed as a teaching language. The PyPy guys are fairly insistent that RPython is not a general purpose language, but exists for one reason and one reason only: building compilers. > * https://pypi.python.org/pypi/RestrictedPython > * http://pyvideo.org/video/2585/building-and-breaking-a-python-sandbox > * OR: execute code in container (e.g. LXC, LXD, Docker (JupyterHub); > virtualization) Sandboxing Python and restricting the functionality of almost-Python are unrelated issues. Purely functional almost-Python would want to replace things like dict.update which modifies the dict in place with a built-in function which returns a new, updated, dict. Running regular Python in a container doesn't make it almost-Python, it is still regular Python. -- Steve From wes.turner at gmail.com Wed May 27 00:58:57 2015 From: wes.turner at gmail.com (Wes Turner) Date: Tue, 26 May 2015 17:58:57 -0500 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: <20150526224146.GB932@ando.pearwood.info> References: <20150526035048.GJ5663@ando.pearwood.info> <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com> <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com> <20150526224146.GB932@ando.pearwood.info> Message-ID: On Tue, May 26, 2015 at 5:41 PM, Steven D'Aprano wrote: > On Tue, May 26, 2015 at 01:19:59PM -0500, Wes Turner wrote: > > > Ways to teach Python from first principles: > > Most of these methods fail to teach *Python*. They teach something > similar, but different to, Python: almost-Python. > > If Rustom wishes to fork Python to create his own version of almost- > Python, he doesn't need to discuss it here. I'd rather he didn't discuss > it here -- this is PYTHON-ideas, not Cobra-ideas, or Lua-ideas, or > Rustom's-purely-functional-almost-python-ideas. > I agree. * Language syntax propositions -> python-ideas at python.org * Or, if not feasible for the general community, RPython and Sandboxing research do identify methods for (more than) syntactical restriction * Teaching -> edu-sig at python.org * IPython Notebook, JupyterHub * A custom interpreter with RPython and a custom Jupyter kernel may be of use. > > There is, or at least was, a strong tradition of creating specialist > teaching languages, starting with Pascal which developed as a more > restricted and more pure form of Algol. But this is not the place to > discuss it. > https://en.wikipedia.org/wiki/History_of_Python > > > > * Restrict the syntactical token list ("switch features on and off") > > * Fork Python > > * RPython -- https://rpython.readthedocs.org/en/latest/ > > I'm pretty sure that RPython is not designed as a teaching language. The > PyPy guys are fairly insistent that RPython is not a general purpose > language, but exists for one reason and one reason only: building > compilers. > Rather than forking, writing an interpeter may be more maintainable (and relatively consistent with a widely-deployed language with versioned semantics): https://rpython.readthedocs.org/en/latest/#writing-your-own-interpreter-in-rpython > > > > * https://pypi.python.org/pypi/RestrictedPython > > * http://pyvideo.org/video/2585/building-and-breaking-a-python-sandbox > > * OR: execute code in container (e.g. LXC, LXD, Docker (JupyterHub); > > virtualization) > > Sandboxing Python and restricting the functionality of almost-Python are > unrelated issues. Purely functional almost-Python would want to replace > things like dict.update which modifies the dict in place with a built-in > function which returns a new, updated, dict. Running regular Python in a > container doesn't make it almost-Python, it is still regular Python. > If hosting (or trying to maintain n shells), sandboxing and containers are directly relevant. * IPython notebooks can be converted to edX courses (link above) * There are reproducible Dockerfiles for development and education * A custom interpreter with RPython and a custom Jupyter kernel may be of use. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Wed May 27 07:56:14 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 26 May 2015 22:56:14 -0700 Subject: [Python-ideas] The pipe protocol, a convention for extensible method chaining In-Reply-To: <20150526025455.GG5663@ando.pearwood.info> References: <20150526025455.GG5663@ando.pearwood.info> Message-ID: Hi Steve, On Mon, May 25, 2015 at 7:54 PM, Steven D'Aprano wrote: > Are you sure this actually works in practice? > > Since pipe() returns the result of calling the passed in function, not > the dataframe, it seems to me that you can't actually chain this unless > it's the last call in the chain. This is a good point. We're pretty sure it will work in practice, because many functions that take dataframes return other dataframes -- or other objects that will implement a .pipe() method. The prototypical use case is actually closer to: df.pipe(reformat_my_data) Plotting and saving data with method chaining is convenient, but usually as the terminal step in a data analysis flow. None of the existing pandas methods for plotting or exporting return a dataframe, and it doesn't seem to be much of an impediment to method chaining. That said, we've also thought about adding a .tee() method for exactly this use case -- it's like pipe, but returns the original object instead of modifying it. What's the point of the redirection to __pipe_func__? Under what > circumstances would somebody use __pipe_func__ instead of just passing a > callable (a function or other object with __call__ method)? If you don't > have a good use case for it, then "You Ain't Gonna Need It" applies. > Our main use case was for APIs that can't accept a DataFrame as their first argument, but that naturally can be understood as modifying dataframes. Here's an example based on the Seaborn plotting library: def scatterplot(x, y, data=None): # make a 2D plot of x vs y If `x` or `y` are strings, Seaborn looks them up as columns in the provided dataframe `data`. But `x` and `y` can also be directly provided as columns. This API is in unfortunate conflict with passing in `data` as the first, required argument. > I think that is completely unnecessary. (It also abuses a reserved > namespace, but you've already said you don't care about that.) Instead > of passing: > > .pipe(myobject, args) # myobject has a __pipe_func__ method > > just make it explicit and write: > > .pipe(myobject.some_method, args) > This is a fair point. Writing something like: .pipe(seaborn.scatterplot.df, 'x', 'y') is not so much worst than omitting the .df. > Yes. I love chaining in, say, bash, and it works well in Ruby, but it's > less useful in Python. My attempt to help bring chaining to Python is > here > > http://code.activestate.com/recipes/578770-method-chaining/ > > but it relies on methods operating by side-effect, not returning a new > result. But generally speaking, I don't like methods that operate by > side-effect, so I don't use chaining much in practice. I'm always on the > look-out for opportunities where it makes sense though. > I think this is where we have an advantage in the PyData world. We tend to work less with built-in data structures and prefer to make our methods pure functions, which together make chaining much more feasible. Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Wed May 27 10:45:39 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 27 May 2015 10:45:39 +0200 Subject: [Python-ideas] The pipe protocol, a convention for extensible method chaining In-Reply-To: References: Message-ID: <55658433.7090009@egenix.com> On 26.05.2015 01:38, Stephan Hoyer wrote: > In the PyData community, we really like method chaining for data analysis > pipelines: > > (iris.query('SepalLength > 5') > .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength, > PetalRatio = lambda x: x.PetalWidth / x.PetalLength) > .plot(kind='scatter', x='SepalRatio', y='PetalRatio')) FWIW: I don't think this is a programming style we should encourage in Python in general, so I'm -1 on this. It doesn't read well, you cannot easily tell what the intermediate objects are on which you run the methods, debugging the above becomes hard, it only gives you a minor typing advantage over using variables and calling methods on those and it gives you no performance advantage. If you need a pipe pattern, it would be better to make that explicit through some special helper function or perhaps a piping object on which you register the various steps to run. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 27 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From rosuav at gmail.com Wed May 27 11:00:11 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 27 May 2015 19:00:11 +1000 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: References: <20150522105847.GA9624@phdru.name> Message-ID: On Wed, May 27, 2015 at 6:50 PM, anatoly techtonik wrote: > How do you make these importable? Do you git clone it from site-packages? > Like: > > cd site-packages/ > git clone .../shed . > > ??? What if you have two shed repositories with different tools? I prefer to operate out of ~ so I'd symlink, but otherwise, yes. And there won't be two sheds, because there is only one me. ChrisA From abarnert at yahoo.com Wed May 27 11:43:55 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 27 May 2015 02:43:55 -0700 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: References: <20150522105847.GA9624@phdru.name> Message-ID: On May 27, 2015, at 02:00, Chris Angelico wrote: > >> On Wed, May 27, 2015 at 6:50 PM, anatoly techtonik wrote: >> How do you make these importable? Do you git clone it from site-packages? >> Like: >> >> cd site-packages/ >> git clone .../shed . Or just build a trivial distribution out of it and then you can just "pip install git+https://github.com/you/repo". >> ??? What if you have two shed repositories with different tools? ... which solves that problem. > I prefer to operate out of ~ so I'd symlink, but otherwise, yes. And > there won't be two sheds, because there is only one me. ... even if someone figures out how to fork and clone Chris or Anatoly. Anyway, it works for me a lot better than the floppy I used to carry around with abutils.py, .emacs, and half a dozen other files I couldn't live without on a new/borrowed computer. From abarnert at yahoo.com Wed May 27 14:11:02 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 27 May 2015 05:11:02 -0700 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: References: <20150522105847.GA9624@phdru.name> Message-ID: <31856E7F-437A-4241-8961-3B196EFA380A@yahoo.com> On May 27, 2015, at 04:30, anatoly techtonik wrote: > > On Wed, May 27, 2015 at 12:43 PM, Andrew Barnert via Python-ideas > wrote: >> On May 27, 2015, at 02:00, Chris Angelico wrote: >>> >>>> On Wed, May 27, 2015 at 6:50 PM, anatoly techtonik wrote: >>>> How do you make these importable? Do you git clone it from site-packages? >>>> Like: >>>> >>>> cd site-packages/ >>>> git clone .../shed . >> >> Or just build a trivial distribution out of it and then you can just "pip install git+https://github.com/you/repo". > > But that would make it nested under the "repo" package namespace, no? That depends on how you write your setup.py. It can install a module, a package, three separate packages, whatever you want. I just install one flat module full of helpers (plus a whole bunch of dependencies). > If not, then how pip detects conflicts when the same file is provided > by different sheds? You can't do that. But I don't have a bunch of different sheds in the same environment, and I don't see why you'd want to either. I can imagine having different sheds for different environments (different stuff for base Mac, Linux, and Windows systems, or for venvs targeted to Gtk+ vs. PyObjC vs. Flask web services, or whatever), but I can't imagine wanting to install 7 other people's sheds all at once or something. If someone else's shed were useful enough to me, I'd either merge it into mine, or suggest that they clean it up and put it on PyPI as a real distribution instead of a personal shed (or fork it and do it myself, if they didn't want to maintain it). It's really not much different from using .emacs files (except for the added bonus of being able to pull in dependencies from PyPI and GitHub automatically). I used to look around dotfiles for ideas to borrow from other people's configs, but I never wanted to install 3 .emacs files at the same time. > I don't want it to just overwrite my scripts when > somebody updates their repository. > >>>> ??? What if you have two shed repositories with different tools? >> >> ... which solves that problem. >> >>> I prefer to operate out of ~ so I'd symlink, but otherwise, yes. And >>> there won't be two sheds, because there is only one me. >> >> ... even if someone figures out how to fork and clone Chris or Anatoly. >> >> Anyway, it works for me a lot better than the floppy I used to carry around with abutils.py, .emacs, and half a dozen other files I couldn't live without on a new/borrowed computer. > > There was no Python in my floppy universe. It probably appeared 10 years later > when internet became more accessible and Google said they are hiring. =) > > I am now more inclined that there needs to be a shed convention to gather > statistical data on custom root level importable that may be handy for some > kind of "altlib" distribution. I doubt you'd find much of use. There are specialized communities that have a broad set of things usable to most of the community, but those communities already have distributions like Python(x,y) that take care of that. I can't imagine too many things that would be useful to almost everyone. Even obvious things like lxml (and even if you could solve release schedule and similar problems), there are plenty of people with absolutely no need for it, and it has external dependencies like libxml2 that you wouldn't want to force on everyone. Also, I think this thread has shown that, even though the basic shed idea is pretty common among experienced Python devs, different people prefer different variations--whether to pip install or just clone into your venv site-packages, how extensively to make use of git features like submodules or branches, etc. But maybe promoting the idea as a suggestion somewhere in the Python or PyPA docs would get everyone closer to a convention that would make it easier to track. I'm not sure where you'd put it or what you'd say; any ideas? From steve at pearwood.info Wed May 27 14:18:32 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 27 May 2015 22:18:32 +1000 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: References: <20150522105847.GA9624@phdru.name> Message-ID: <20150527121832.GD932@ando.pearwood.info> On Wed, May 27, 2015 at 04:21:01AM +1000, Chris Angelico wrote: > On Wed, May 27, 2015 at 4:05 AM, anatoly techtonik wrote: > >> -- if it could be reset and restarted at need -- why not start it > >> manually in the first place? > > > > Current ways of measuring script run-time are not cross-platform or > > not memorizable. I have to reinvent timer code a couple of times, and > > that's not convenient for the code that is only relevant while debugging. > > Sounds to me like something that doesn't belong in the stdlib, but > makes a great utility module for private use. I disagree. I don't think it makes a good utility. I think it is a terrible design, for a number of reasons. (1) Module top level code runs only the first time you import it, after that the module is loaded from cache and the code doesn't run again. So import timer # starts a timer will only start the time the first time you import it. To make it work the second time, you have to do: del sys.modules['timer'] del timer import timer (2) Suppose you find some hack that fixes that problem. Now you have another problem: it's too hard to control when the timer starts. You only have one choice: immediately after the import. So we *have* to write our code like this: import a, b, c # do our regular imports setup = x + y + z # setup everything in advance import timer main() If you move the import timer where the other imports are, as PEP 8 suggests, you'll time too much: all the setup code as well. (3) You can only have one timer at a time. You can't run the timer in two different threads. (At least not with the simplistic UI of "import starts the timer, timer.stop() stops the timer". Contrast that to how timeit works: timeit is an ordinary module that requires no magic to work. Importing it is not the same as running it. You can import it at the top of your code, follow it by setup code, and run the timeit.Timer whenever you like. You can have as many, or as few, timers as you want. The only downside to timeit is that you normally have to provide the timed code as a string. I have a timer context manager which is designed for timing long-running code. You write the code in a "with" block: with Stopwatch(): do_this() do_that() The context manager starts the timer when you enter, and stops it when you leave. By default it prints the time used, but you can easily suppress printing and capture the result instead. I've been using this for a few years now, and it works well. The only downside is that it works too well, so I'm tempted to use it for micro code snippets, so I have it print a warning if the time taken is too small: py> with Stopwatch(): ... n = len("spam") ... elapsed time is very small; consider using timeit.Timer for micro-timings of small code snippets time taken: 0.000010 seconds -- Steve From abarnert at yahoo.com Wed May 27 14:29:37 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 27 May 2015 05:29:37 -0700 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: <20150527121832.GD932@ando.pearwood.info> References: <20150522105847.GA9624@phdru.name> <20150527121832.GD932@ando.pearwood.info> Message-ID: <6ED56DD5-2260-4207-B242-E29E71D0B3A4@yahoo.com> On May 27, 2015, at 05:18, Steven D'Aprano wrote: > >> On Wed, May 27, 2015 at 04:21:01AM +1000, Chris Angelico wrote: >> On Wed, May 27, 2015 at 4:05 AM, anatoly techtonik wrote: >>>> -- if it could be reset and restarted at need -- why not start it >>>> manually in the first place? >>> >>> Current ways of measuring script run-time are not cross-platform or >>> not memorizable. I have to reinvent timer code a couple of times, and >>> that's not convenient for the code that is only relevant while debugging. >> >> Sounds to me like something that doesn't belong in the stdlib, but >> makes a great utility module for private use. > > I disagree. I don't think it makes a good utility. I think it is a > terrible design, for a number of reasons. > > (1) Module top level code runs only the first time you import it, after > that the module is loaded from cache and the code doesn't run again. So > > import timer # starts a timer > > will only start the time the first time you import it. To make it work > the second time, you have to do: > > del sys.modules['timer'] > del timer > import timer > > (2) Suppose you find some hack that fixes that problem. Now you have > another problem: it's too hard to control when the timer starts. You > only have one choice: immediately after the import. So we *have* to > write our code like this: > > import a, b, c # do our regular imports > setup = x + y + z # setup everything in advance > import timer > main() > > If you move the import timer where the other imports are, as PEP 8 > suggests, you'll time too much: all the setup code as well. > > (3) You can only have one timer at a time. You can't run the timer in > two different threads. (At least not with the simplistic UI of "import > starts the timer, timer.stop() stops the timer". Presumably you could add a "timer.restart()" to the UI. (But in that case, how much does it really cost to use that at the start of the module instead of the magic import anyway? It's like your system uptime; it's hard to find any use for that besides actually reporting system uptime...) > Contrast that to how timeit works: timeit is an ordinary module that > requires no magic to work. Importing it is not the same as running it. > You can import it at the top of your code, follow it by setup code, and > run the timeit.Timer whenever you like. You can have as many, or as few, > timers as you want. The only downside to timeit is that you normally > have to provide the timed code as a string. For many uses, providing it as a function call works just fine, in which case there are no downsides at all. > I have a timer context manager which is designed for timing long-running > code. You write the code in a "with" block: > > with Stopwatch(): > do_this() > do_that() > > > The context manager starts the timer when you enter, and stops it when > you leave. By default it prints the time used, but you can easily > suppress printing and capture the result instead. I've been using this > for a few years now, and it works well. The only downside is that it > works too well, so I'm tempted to use it for micro code snippets, so I > have it print a warning if the time taken is too small: > > py> with Stopwatch(): > ... n = len("spam") > ... > elapsed time is very small; consider using timeit.Timer for > micro-timings of small code snippets > time taken: 0.000010 seconds That's a clever idea. I have something very similar, and I sometimes find myself abusing it that way... From jeanpierreda at gmail.com Wed May 27 14:52:17 2015 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 27 May 2015 05:52:17 -0700 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: <6ED56DD5-2260-4207-B242-E29E71D0B3A4@yahoo.com> References: <20150522105847.GA9624@phdru.name> <20150527121832.GD932@ando.pearwood.info> <6ED56DD5-2260-4207-B242-E29E71D0B3A4@yahoo.com> Message-ID: On Wed, May 27, 2015 at 5:29 AM, Andrew Barnert via Python-ideas wrote: > On May 27, 2015, at 05:18, Steven D'Aprano wrote: >> >> I have a timer context manager which is designed for timing long-running >> code. You write the code in a "with" block: >> >> with Stopwatch(): >> do_this() >> do_that() >> >> >> The context manager starts the timer when you enter, and stops it when >> you leave. By default it prints the time used, but you can easily >> suppress printing and capture the result instead. I've been using this >> for a few years now, and it works well. The only downside is that it >> works too well, so I'm tempted to use it for micro code snippets, so I >> have it print a warning if the time taken is too small: >> >> py> with Stopwatch(): >> ... n = len("spam") >> ... >> elapsed time is very small; consider using timeit.Timer for >> micro-timings of small code snippets >> time taken: 0.000010 seconds > > That's a clever idea. I have something very similar, and I sometimes find myself abusing it that way... Why not use an iterable stopwatch that measures time between calls to __next__/next? for _ in Stopwatch(): .... -- Devin From techtonik at gmail.com Wed May 27 10:21:07 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 27 May 2015 11:21:07 +0300 Subject: [Python-ideas] Lossless bulletproof conversion to unicode (backslashing) In-Reply-To: <5564BFD3.7000101@stoneleaf.us> References: <5564BFD3.7000101@stoneleaf.us> Message-ID: On Tue, May 26, 2015 at 9:47 PM, Ethan Furman wrote: > On 05/26/2015 11:30 AM, anatoly techtonik wrote: > > [...] > >> How to accomplish that with Python 2.x? > > > This should be on Python List, not on Ideas. The way to do this, probably, the idea to make it into unicode() function belongs here. So, if you're replying to this thread and read the letter, are against or for the idea? -- anatoly t. From techtonik at gmail.com Wed May 27 10:47:29 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 27 May 2015 11:47:29 +0300 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: <20150526190646.GA12204@phdru.name> References: <20150522105847.GA9624@phdru.name> <20150526190646.GA12204@phdru.name> Message-ID: On Tue, May 26, 2015 at 10:06 PM, Oleg Broytman wrote: > On Tue, May 26, 2015 at 09:05:06PM +0300, anatoly techtonik wrote: >> On Fri, May 22, 2015 at 1:58 PM, Oleg Broytman wrote: >> > On Fri, May 22, 2015 at 12:59:30PM +0300, anatoly techtonik wrote: >> >> Is the idea to have timer that starts on import is good? >> > >> > No, because: >> > >> > -- it could be imported at the wrong time; >> >> Any time is right. > > Very much application-dependent. What if you wanna measure import > time? The design principle is that default behaviour is designed: 1. most simple/intuitive thought 2. most often needed operation Every "what if" means you need to do non-default customization, such as care to place starttimer into your bootstrap script. If you want to trace, when exactly the module is imported, it can record the caller full name sys.path:module.class.method (provided that Python supports this), and lines executed from the Python start. >> > -- it couldn't be "reimported"; what is the usage of one-time timer? >> >> The idea is to have convenient default timer to measure >> script run-time. > > Good idea for a small separate project. Bad for the stdlib. Not every > small simple useful module must be in the stdlib. Yes. That's not a criteria. The criteria that modules that save time during development should come with bundled. Or another idea - the stdlib should provide a standard layout that people can replicate in their "shed" repositories on Github. Then by crawling these repositories, the names and contents could be aggregated into stats to see what are the most popular imports. That way it will be quickly to identify useful stuff that people coming from other languages find missing in Python. Also, it will allow people to document the behavior differences for modules named the same. -- anatoly t. From techtonik at gmail.com Wed May 27 10:50:09 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 27 May 2015 11:50:09 +0300 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: References: <20150522105847.GA9624@phdru.name> Message-ID: On Tue, May 26, 2015 at 9:28 PM, Chris Angelico wrote: > On Wed, May 27, 2015 at 4:24 AM, anatoly techtonik wrote: >> There are a lot of helpers like this that might be useful. Installing them >> separately is a lot of hassle - it is easy to forget some. > > Package 'em all up into a single repository and clone that repo on > every system you use. For me, that's called "shed", and I keep it on > github: > > https://github.com/Rosuav/shed > > But whether it's public or private, git or hg, pure Python or a mix of > languages, it's an easy way to pick up all those convenient little > scripts. You'll never "forget some", because they're all in one place. How do you make these importable? Do you git clone it from site-packages? Like: cd site-packages/ git clone .../shed . ??? What if you have two shed repositories with different tools? From techtonik at gmail.com Wed May 27 11:20:10 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 27 May 2015 12:20:10 +0300 Subject: [Python-ideas] Cmake as build system In-Reply-To: References: Message-ID: On Sat, May 23, 2015 at 2:48 AM, Ryan Gonzalez wrote: > HAHAHA!! > > Good luck! I've raised this issue before. Twice. Autotools sucks. Yes. If even people from Linux world say that autotools suxx ( http://esr.ibiblio.org/?p=1877 ). Imagine frustration of someone with Windows background. Perl, macros, what the hell is all this about? > And makes > cross-compiling a pain in the neck. Bottom line was: > > - C++ is a big dependency > - The autotools build system has been tested already on lots and lots and > lots of platforms Orly? The ticket to build Python with MinGW on Windows was filled many years ago, and I am not sure if it works. Maybe it was tested, but platforms evolve and break things, so those Autotools probably contain more kludges that it is realistic for maintenance. > - Nobody has even implemented an alternative build system for Python 3 yet > (python-cmake is only for Python 2) Because Python development is concentrated around patching Python itself, there is no practice in eating your own dogfood when making decisions. Take a SCons, for example, and try to port that to Python 3. You will see the key points that need to be solved (see the bulletproof unicode thread in this list). If Python developers had those toys at hand, the Python 3 would be more practical language, but looks like it is a task for a university or a full time paid job, because it is not fun for anybody here to do actual development *in Python*, and discussing Python usage issues in development lists is discouraged even though the issues raised there are important for language usability. > - No one can agree on a best build system (for instance, I hate CMake!) There is no best build system, because there is no build book with a reference of "best" or even "good enough" criteria. Even Google failed to give good rationale while releasing their Bazel. It sounded like "that worked for us better". Also, most build packages are about a fairly complex subject of tracking dependencies, caching and traversing graphs, and their documentation often doesn't have any graphics at all! Knowing how long it takes for a free time coder to draw a picture, only the mighty company can allow that, but even they don't allow their "valuable resources" to spend time on that. So, the problem is not to use fancy build system, but to use one that most people with Python background can use and enhance. CMake needs C++ skills and a separate install, SCons can be stuffed into repository. I am sure there are plenty of other Python build systems to choose from that work like this. Also, if you look at why SCons codebase, you'll notice a huge wrapping layers over subprocess management and other things that should came shipped with the Python itself just to make it a good platform for system tools. I believe that the sole reason why Python loses to Go in systems programming is that the complexity of those wrappings that you need to do over Python to make core cross-platforms concepts right. -- anatoly t. From techtonik at gmail.com Wed May 27 11:32:00 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 27 May 2015 12:32:00 +0300 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: References: <20150522105847.GA9624@phdru.name> Message-ID: On Wed, May 27, 2015 at 12:00 PM, Chris Angelico wrote: > On Wed, May 27, 2015 at 6:50 PM, anatoly techtonik wrote: >> How do you make these importable? Do you git clone it from site-packages? >> Like: >> >> cd site-packages/ >> git clone .../shed . >> >> ??? What if you have two shed repositories with different tools? > > I prefer to operate out of ~ so I'd symlink, but otherwise, yes. And > there won't be two sheds, because there is only one me. symlink every module? From techtonik at gmail.com Wed May 27 13:30:53 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 27 May 2015 14:30:53 +0300 Subject: [Python-ideas] Timer that starts as soon as it is imported In-Reply-To: References: <20150522105847.GA9624@phdru.name> Message-ID: On Wed, May 27, 2015 at 12:43 PM, Andrew Barnert via Python-ideas wrote: > On May 27, 2015, at 02:00, Chris Angelico wrote: >> >>> On Wed, May 27, 2015 at 6:50 PM, anatoly techtonik wrote: >>> How do you make these importable? Do you git clone it from site-packages? >>> Like: >>> >>> cd site-packages/ >>> git clone .../shed . > > Or just build a trivial distribution out of it and then you can just "pip install git+https://github.com/you/repo". But that would make it nested under the "repo" package namespace, no? If not, then how pip detects conflicts when the same file is provided by different sheds? I don't want it to just overwrite my scripts when somebody updates their repository. >>> ??? What if you have two shed repositories with different tools? > > ... which solves that problem. > >> I prefer to operate out of ~ so I'd symlink, but otherwise, yes. And >> there won't be two sheds, because there is only one me. > > ... even if someone figures out how to fork and clone Chris or Anatoly. > > Anyway, it works for me a lot better than the floppy I used to carry around with abutils.py, .emacs, and half a dozen other files I couldn't live without on a new/borrowed computer. There was no Python in my floppy universe. It probably appeared 10 years later when internet became more accessible and Google said they are hiring. =) I am now more inclined that there needs to be a shed convention to gather statistical data on custom root level importable that may be handy for some kind of "altlib" distribution. -- anatoly t. From p.f.moore at gmail.com Wed May 27 17:28:30 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 27 May 2015 16:28:30 +0100 Subject: [Python-ideas] Lossless bulletproof conversion to unicode (backslashing) In-Reply-To: References: Message-ID: On 26 May 2015 at 19:30, anatoly techtonik wrote: > In real world you have to deal with broken and invalid > output and UnicodeDecode crashes is not an option. > The unicode() constructor proposes two options to > deal with invalid output: > > 1. ignore - meaning skip and corrupt the data > 2. replace - just corrupt the data There are other error handlers, specifically surrogateescape is designed for this use. Only in Python 3.x admittedly, but this list is about future versions of Python, so that's what matters here. > The solution is to have filter preprocess the binary > string to escape all non-unicode symbols so that the > following lossless transformation becomes possible: > > binary -> escaped utf-8 string -> unicode -> binary > > How to accomplish that with Python 2.x? That question is for python-list. Language changes will only be made to 3.x - python-ideas isn't appropriate for questions about how to achieve something in 2.x. Paul From rymg19 at gmail.com Wed May 27 18:47:09 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 27 May 2015 11:47:09 -0500 Subject: [Python-ideas] Cmake as build system In-Reply-To: References: Message-ID: <409899E1-14F7-46E9-B8A3-0D152C2C44C6@gmail.com> On May 27, 2015 4:20:10 AM CDT, anatoly techtonik wrote: >On Sat, May 23, 2015 at 2:48 AM, Ryan Gonzalez >wrote: >> HAHAHA!! >> >> Good luck! I've raised this issue before. Twice. Autotools sucks. > >Yes. If even people from Linux world say that autotools suxx >( http://esr.ibiblio.org/?p=1877 ). Imagine frustration of someone with >Windows >background. Perl, macros, what the hell is all this about? > >> And makes >> cross-compiling a pain in the neck. Bottom line was: >> >> - C++ is a big dependency >> - The autotools build system has been tested already on lots and lots >and >> lots of platforms > >Orly? The ticket to build Python with MinGW on Windows was filled many >years ago, and I am not sure if it works. Maybe it was tested, but >platforms >evolve and break things, so those Autotools probably contain more >kludges >that it is realistic for maintenance. > >> - Nobody has even implemented an alternative build system for Python >3 yet >> (python-cmake is only for Python 2) > >Because Python development is concentrated around patching Python >itself, >there is no practice in eating your own dogfood when making decisions. >Take >a SCons, for example, and try to port that to Python 3. You will see >the key >points that need to be solved (see the bulletproof unicode thread in >this list). >If Python developers had those toys at hand, the Python 3 would be more >practical language, but looks like it is a task for a university or a >full time paid job, because it is not fun for anybody here to do actual >development *in Python*, and discussing Python usage issues in >development >lists is discouraged even though the issues raised there are important >for >language usability. > >> - No one can agree on a best build system (for instance, I hate >CMake!) > >There is no best build system, because there is no build book with a >reference of "best" or even "good enough" criteria. Even Google failed >to give >good rationale while releasing their Bazel. It sounded like "that >worked for us >better". Also, most build packages are about a fairly complex subject >of >tracking dependencies, caching and traversing graphs, and their >documentation >often doesn't have any graphics at all! Knowing how long it takes for >a free time >coder to draw a picture, only the mighty company can allow that, but >even they >don't allow their "valuable resources" to spend time on that. > >So, the problem is not to use fancy build system, but to use one that >most >people with Python background can use and enhance. CMake needs C++ >skills >and a separate install, SCons can be stuffed into repository. I am sure >there >are plenty of other Python build systems to choose from that work like >this. >Also, if you look at why SCons codebase, you'll notice a huge wrapping >layers >over subprocess management and other things that should came shipped >with >the Python itself just to make it a good platform for system tools. I >believe that >the sole reason why Python loses to Go in systems programming is that >the >complexity of those wrappings that you need to do over Python to make >core >cross-platforms concepts right. The main thing is that no API is perfectly cross-platform, and no API is bulletproof. Go can get away with that because Go is very opinionated. Python, on the other hand, has a huge user base that they don't want to tick off. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From demianbrecht at gmail.com Wed May 27 20:28:49 2015 From: demianbrecht at gmail.com (Demian Brecht) Date: Wed, 27 May 2015 11:28:49 -0700 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> > On May 23, 2015, at 7:21 AM, Nick Coghlan wrote: > > https://www.djangopackages.com/ covers this well for the Django > ecosystem (I actually consider it to be one of Django's killer > features, and I'm pretty sure I'm not alone in that - like > ReadTheDocs, it was a product of DjangoDash 2010). Thanks again all for the great discussion here. It seems to have taken quite a turn to a couple other points that I?ve had in the back of my mind for a while: With with integration of pip and the focus on non-standard library packages, how do we increase discoverability? If the standard library isn?t going to be a mechanism for that (and I?m not putting forward the argument that it should), adopting something like Django Packages might be tremendously beneficial. Perhaps on top of what Django Packages already has, there could be ?recommended packages?. Recommended packages could go through nearly just as much of a rigorous review process as standard library adoption before being flagged, although there would be a number of barriers reduced. "Essentially, the standard library is where a library goes to die. It is appropriate for a module to be included when active development is no longer necessary.? (https://github.com/kennethreitz/requests/blob/master/docs/dev/philosophy.rst#standard-library) This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules? I would think that if there was a system similar to Django Packages that made discoverability/importing of packages as easy as using those in the standard library, having a distributed package model where bug fixes and releases could be done out of band with CPython releases would likely more beneficial to the end users. If there was a ?recommended packages? framework, perhaps there could also be buildbots put to testing interoperability of the recommended package set. Also, to put the original question in this thread to rest, while I personally think that the addition of jsonschema to the standard library, whether as a top level package or perhaps splitting the json module into a package and introducing it there would be beneficial, I think that solving the distributed package discoverability is a much more interesting problem and would serve many more packages and users. Aside from that, solving that problem would have the same intended effect as integrating jsonschema into the standard library. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From p.f.moore at gmail.com Wed May 27 20:46:18 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 27 May 2015 19:46:18 +0100 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> Message-ID: On 27 May 2015 at 19:28, Demian Brecht wrote: > This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules? It has been discussed on a number of occasions. The major issue with the idea is that a lot of people use Python in closed corporate environments, where access to the internet from tools such as pip can be restricted. Also, many companies have legal approval processes for software - getting approval for "Python" includes the standard library, but each external package required would need a separate, probably lengthy and possibly prohibitive, approval process before it could be used. So it's unlikely to ever happen, because it would cripple Python for a non-trivial group of its users. Paul From graffatcolmingov at gmail.com Wed May 27 20:55:35 2015 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Wed, 27 May 2015 13:55:35 -0500 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> Message-ID: On Wed, May 27, 2015 at 1:28 PM, Demian Brecht wrote: > >> On May 23, 2015, at 7:21 AM, Nick Coghlan wrote: >> >> https://www.djangopackages.com/ covers this well for the Django >> ecosystem (I actually consider it to be one of Django's killer >> features, and I'm pretty sure I'm not alone in that - like >> ReadTheDocs, it was a product of DjangoDash 2010). > > Thanks again all for the great discussion here. It seems to have taken quite a turn to a couple other points that I?ve had in the back of my mind for a while: > > With with integration of pip and the focus on non-standard library packages, how do we increase discoverability? If the standard library isn?t going to be a mechanism for that (and I?m not putting forward the argument that it should), adopting something like Django Packages might be tremendously beneficial. Perhaps on top of what Django Packages already has, there could be ?recommended packages?. Recommended packages could go through nearly just as much of a rigorous review process as standard library adoption before being flagged, although there would be a number of barriers reduced. > > "Essentially, the standard library is where a library goes to die. It is appropriate for a module to be included when active development is no longer necessary.? (https://github.com/kennethreitz/requests/blob/master/docs/dev/philosophy.rst#standard-library) > > This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules? I would think that if there was a system similar to Django Packages that made discoverability/importing of packages as easy as using those in the standard library, having a distributed package model where bug fixes and releases could be done out of band with CPython releases would likely more beneficial to the end users. If there was a ?recommended packages? framework, perhaps there could also be buildbots put to testing interoperability of the recommended package set. The mirror of this would be asking if Django should rip out it's base classes for models, views, etc. I think Python 4 could move towards perhaps deprecating any duplicated modules, but I see no point to rip the entire standard library out... except maybe for httplib/urllib/etc. (for various reasons beyond my obvious conflict of interest). > Also, to put the original question in this thread to rest, while I personally think that the addition of jsonschema to the standard library, whether as a top level package or perhaps splitting the json module into a package and introducing it there would be beneficial, I think that solving the distributed package discoverability is a much more interesting problem and would serve many more packages and users. Aside from that, solving that problem would have the same intended effect as integrating jsonschema into the standard library. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From demianbrecht at gmail.com Wed May 27 20:57:18 2015 From: demianbrecht at gmail.com (Demian Brecht) Date: Wed, 27 May 2015 11:57:18 -0700 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> Message-ID: <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com> > On May 27, 2015, at 11:46 AM, Paul Moore wrote: > > So it's unlikely to ever happen, because it would cripple Python for a > non-trivial group of its users. I?m just throwing ideas at the wall here, but would it not be possible to release two versions, one for those who choose to use decentralized packages with out-of-band releases and one with all ?recommended? packages bundled (obvious potential for version conflicts and such aside)? If one of the prerequisites of a ?recommended? package was that it?s released under PSFL, I?m assuming there wouldn?t be any legal issues with going down such a path? That way, you still get the ability to decentralize the library, but don?t alienate the user base that can?t rely on pip? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Wed May 27 21:03:52 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 27 May 2015 15:03:52 -0400 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com> Message-ID: On May 27, 2015 at 2:57:54 PM, Demian Brecht (demianbrecht at gmail.com) wrote: > > > On May 27, 2015, at 11:46 AM, Paul Moore wrote: > > > > So it's unlikely to ever happen, because it would cripple Python for a > > non-trivial group of its users. > > I?m just throwing ideas at the wall here, but would it not be possible to release two versions, > one for those who choose to use decentralized packages with out-of-band releases and > one with all ?recommended? packages bundled (obvious potential for version conflicts > and such aside)? If one of the prerequisites of a ?recommended? package was that it?s > released under PSFL, I?m assuming there wouldn?t be any legal issues with going down > such a path? That way, you still get the ability to decentralize the library, but don?t > alienate the user base that can?t rely on pip? I?m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. This you call ?FooLang Core? or something of the sort. Then you take the most popular or the best examples or whatever criteria you want from the ecosystem around that and you bundle them all together so that the third party packages essentially get preinstalled and you call that ?FooLang Platform? or something. This means that people who want/need a comprehensive standard library can get the Platform edition of the runtime which will function similar to the standard library of a language. However, if they run into some critical feature they need or a bug fix, they can selectively choose to step outside of that preset package versions and install a newer version of one of the bundled software. Of course they can install non-bundled software as well. As far as Python is concerned, while I think the above model is better in the general sense, I think that it?s probably too late to switch to that, the history of having a big standard library goes back pretty far and a lot of people and processes depend on it. We?re also still trying to heal the rift that 3.x created, and creating a new rift is probably not the most effective use of time. It?s also the case (though we?re working to make it less true) that our packaging tools still can routinely run into problems that would make me uncomfortable using them for this approach. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From demianbrecht at gmail.com Wed May 27 21:13:09 2015 From: demianbrecht at gmail.com (Demian Brecht) Date: Wed, 27 May 2015 12:13:09 -0700 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> Message-ID: <01071ECC-1949-40EF-8D7A-3073E643344B@gmail.com> > On May 27, 2015, at 11:55 AM, Ian Cordasco wrote: > > The mirror of this would be asking if Django should rip out it's base > classes for models, views, etc. I think Python 4 could move towards > perhaps deprecating any duplicated modules, but I see no point to rip > the entire standard library out... except maybe for > httplib/urllib/etc. (for various reasons beyond my obvious conflict of > interest). I can somewhat see the comparison, but not entirely because Django itself is a package and not the core interpreter and set of builtins. There are also other frameworks that split out modules from the core (I?m not overly familiar with either, but I believe both zope and wheezy follow such models). The major advantage of going with a fully distributed model would be the out-of-band releases. While nice to have for feature development, it can be crucial for bug fixes, but even more so for security patches. Other than that, I could see it opening the door to adoption of packages as ?recommended? without worrying too much about state of development. requests is a perfect example of that. Note that my personal focus on standard library development is the http package so I?m somewhat cutting my legs out from under me, but I?m starting to think that adopting such a distribution mechanism might solve a number of problems (but is probably just as likely to introduce new ones ;)). I?m also aware of the politics of such a change. What does it mean then for core devs who concentrate on the current standard library and don?t contribute to the interpreter core or builtins? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From demianbrecht at gmail.com Wed May 27 21:16:37 2015 From: demianbrecht at gmail.com (Demian Brecht) Date: Wed, 27 May 2015 12:16:37 -0700 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: <01071ECC-1949-40EF-8D7A-3073E643344B@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> <01071ECC-1949-40EF-8D7A-3073E643344B@gmail.com> Message-ID: <123C893A-CDA1-42E1-ACE6-C70A29A103C0@gmail.com> > On May 27, 2015, at 12:13 PM, Demian Brecht wrote: > > without worrying too much about state of development I should have elaborated on this more: What I mean is more around feature development, such as introducing HTTP/2.0 to requests. The core feature set would still have to be well proven and have minimal to no changes. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From wes.turner at gmail.com Wed May 27 21:23:23 2015 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 27 May 2015 14:23:23 -0500 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> Message-ID: On Wed, May 27, 2015 at 1:28 PM, Demian Brecht wrote: > > > On May 23, 2015, at 7:21 AM, Nick Coghlan wrote: > > > > https://www.djangopackages.com/ covers this well for the Django > > ecosystem (I actually consider it to be one of Django's killer > > features, and I'm pretty sure I'm not alone in that - like > > ReadTheDocs, it was a product of DjangoDash 2010). > > Thanks again all for the great discussion here. It seems to have taken > quite a turn to a couple other points that I?ve had in the back of my mind > for a while: > > With with integration of pip and the focus on non-standard library > packages, how do we increase discoverability? If the standard library isn?t > going to be a mechanism for that (and I?m not putting forward the argument > that it should), adopting something like Django Packages might be > tremendously beneficial. Perhaps on top of what Django Packages already > has, there could be ?recommended packages?. Recommended packages could go > through nearly just as much of a rigorous review process as standard > library adoption before being flagged, although there would be a number of > barriers reduced. > So there is a schema.org/SoftwareApplication (or doap:Project, or seon:) Resource, which has * a unique URI (e.g. http://python.org/pypi/readme) * JSON metadata extracted from setup.py into pydist.json (setuptools, wheel) - [ ] create JSON-LD @context - [ ] create mappings to standard schema * [ ] http://schema.org/SoftwareApplication * [ ] http://schema.org/SoftwareSourceCode In terms of schema.org, a Django Packages resource has: * [ ] a unique URI * [ ] typed features (predicates with ranges) * [ ] http://schema.org/review * [ ] http://schema.org/VoteAction * [ ] http://schema.org/LikeAction > > "Essentially, the standard library is where a library goes to die. It is > appropriate for a module to be included when active development is no > longer necessary.? ( > https://github.com/kennethreitz/requests/blob/master/docs/dev/philosophy.rst#standard-library > ) > > This is probably a silly idea, but given the above quote and the new(er) > focus on pip and distributed packages, has there been any discussion around > perhaps deprecating (and entirely removing from a Python 4 release) > non-builtin packages and modules? I would think that if there was a system > similar to Django Packages that made discoverability/importing of packages > as easy as using those in the standard library, having a distributed > package model where bug fixes and releases could be done out of band with > CPython releases would likely more beneficial to the end users. If there > was a ?recommended packages? framework, perhaps there could also be > buildbots put to testing interoperability of the recommended package set. > > Tox is great for this (in conjunction with whichever build system: BuildBot, TravisCI) > > > Also, to put the original question in this thread to rest, while I > personally think that the addition of jsonschema to the standard library, > whether as a top level package or perhaps splitting the json module into a > package and introducing it there would be beneficial, I think that solving > the distributed package discoverability is a much more interesting problem > and would serve many more packages and users. Aside from that, solving that > problem would have the same intended effect as integrating jsonschema into > the standard library. > jsonschema // JSON-LD (RDF) -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Wed May 27 22:28:07 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 27 May 2015 21:28:07 +0100 Subject: [Python-ideas] Increasing public package discoverability In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com> Message-ID: On 27/05/2015 20:03, Donald Stufft wrote: > > > On May 27, 2015 at 2:57:54 PM, Demian Brecht (demianbrecht at gmail.com) wrote: >> >>> On May 27, 2015, at 11:46 AM, Paul Moore wrote: >>> >>> So it's unlikely to ever happen, because it would cripple Python for a >>> non-trivial group of its users. >> >> I?m just throwing ideas at the wall here, but would it not be possible to release two versions, >> one for those who choose to use decentralized packages with out-of-band releases and >> one with all ?recommended? packages bundled (obvious potential for version conflicts >> and such aside)? If one of the prerequisites of a ?recommended? package was that it?s >> released under PSFL, I?m assuming there wouldn?t be any legal issues with going down >> such a path? That way, you still get the ability to decentralize the library, but don?t >> alienate the user base that can?t rely on pip? > > > I?m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. This you call ?FooLang Core? or something of the sort. Then you take the most popular or the best examples or whatever criteria you want from the ecosystem around that and you bundle them all together so that the third party packages essentially get preinstalled and you call that ?FooLang Platform? or something. > > This means that people who want/need a comprehensive standard library can get the Platform edition of the runtime which will function similar to the standard library of a language. However, if they run into some critical feature they need or a bug fix, they can selectively choose to step outside of that preset package versions and install a newer version of one of the bundled software. Of course they can install non-bundled software as well. > > As far as Python is concerned, while I think the above model is better in the general sense, I think that it?s probably too late to switch to that, the history of having a big standard library goes back pretty far and a lot of people and processes depend on it. We?re also still trying to heal the rift that 3.x created, and creating a new rift is probably not the most effective use of time. It?s also the case (though we?re working to make it less true) that our packaging tools still can routinely run into problems that would make me uncomfortable using them for this approach. > > --- > Donald Stufft > PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > Could Python 4 tear out the stdlib completely and go to pypi, to what I believe Nick Coghlan called stdlib+, or would this be A PEP Too Far, given the one or two minor issues over the move from Python 2 to Python 3? Yes this is my very dry sense of humour working, but at the same time if it gets somebody thinking, which in turn gets somebody else thinking, then hopefully ideas come up which are practical and everybody benefits. Just my ?0.02p worth. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From abarnert at yahoo.com Wed May 27 23:50:52 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 27 May 2015 14:50:52 -0700 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com> Message-ID: <59DB2A78-F6A7-45C0-A7EF-4152EE73504C@yahoo.com> On May 27, 2015, at 12:03, Donald Stufft wrote: > >> On May 27, 2015 at 2:57:54 PM, Demian Brecht (demianbrecht at gmail.com) wrote: >> >>> On May 27, 2015, at 11:46 AM, Paul Moore wrote: >>> >>> So it's unlikely to ever happen, because it would cripple Python for a >>> non-trivial group of its users. >> >> I?m just throwing ideas at the wall here, but would it not be possible to release two versions, >> one for those who choose to use decentralized packages with out-of-band releases and >> one with all ?recommended? packages bundled (obvious potential for version conflicts >> and such aside)? If one of the prerequisites of a ?recommended? package was that it?s >> released under PSFL, I?m assuming there wouldn?t be any legal issues with going down >> such a path? That way, you still get the ability to decentralize the library, but don?t >> alienate the user base that can?t rely on pip? > > > I?m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. This you call ?FooLang Core? or something of the sort. Then you take the most popular or the best examples or whatever criteria you want from the ecosystem around that and you bundle them all together so that the third party packages essentially get preinstalled and you call that ?FooLang Platform? or something. Dependencies are always going to be a problem. The best way to parse XML is lxml (and the best way to parse HTML is BeautifulSoup plus lxml); does that mean that the Python Platform requires libxml2? The best way to do numerical computing is with NumPy, and the best way to build NumPy is with MKL on platforms where it exists, ATLAS on others; does that mean the Python Platform requires MKL and/or ATLAS? The best way to build cross-platform GUIs with desktop integration is PySide; does that mean the Python Platform requires Qt? (One of the biggest portability problems for Python in practice has always been Tcl/Tk; Qt would be much worse.) You could look at it as something like the core plus distributions model used in OS's. FreeBSD has a core and ports; there's a simple rule for what's in core (a complete POSIX system plus enough to build ports, nothing else), and the practicality-vs.-purity decisions for how to apply that to real-life problems isn't that hard. But Linux took a different approach: it's just a kernel, and everything else--libc, the ports system, etc.--can be swapped out. There is no official distribution; at any given time in history, there are 3-6 competing "major distributions", dozens of others based on them, and some "special-case" distros like ucLinux or Android. And that means different distros can make different decisions on what dependencies are acceptable--include packages that only run on x86, or accept some corporate quasi-open-source license or closed-source blob. Python seems to have fallen into a place halfway between the two. The stdlib is closer to FreeBSD core than to Linux. On the other hand, while many people start with the official stdlib and use pip to expand on it, there are third-party distributions competing to provide more useful or better-organized batteries than the official version, plus custom distributions that come with some OS distros (e.g., Apple includes PyObjC with theirs), and special things like Kivy. That doesn't seem to have caused any harm, and may have caused a lot of benefit. While Python may not have found the perfect sweet spot, what it found isn't that bad. And the way it continues to evolve isn't that bad. If you could go back in time to 2010 and come up with a grand five-year plan for how the stdlib, core distribution, and third-party ecosystem should be better, how much different would Python be today? From donald at stufft.io Wed May 27 23:54:19 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 27 May 2015 17:54:19 -0400 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: <59DB2A78-F6A7-45C0-A7EF-4152EE73504C@yahoo.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com> <59DB2A78-F6A7-45C0-A7EF-4152EE73504C@yahoo.com> Message-ID: On May 27, 2015 at 5:50:55 PM, Andrew Barnert (abarnert at yahoo.com) wrote: > On May 27, 2015, at 12:03, Donald Stufft wrote: > > > >> On May 27, 2015 at 2:57:54 PM, Demian Brecht (demianbrecht at gmail.com) wrote: > >> > >>> On May 27, 2015, at 11:46 AM, Paul Moore wrote: > >>> > >>> So it's unlikely to ever happen, because it would cripple Python for a > >>> non-trivial group of its users. > >> > >> I?m just throwing ideas at the wall here, but would it not be possible to release two > versions, > >> one for those who choose to use decentralized packages with out-of-band releases > and > >> one with all ?recommended? packages bundled (obvious potential for version conflicts > >> and such aside)? If one of the prerequisites of a ?recommended? package was that it?s > >> released under PSFL, I?m assuming there wouldn?t be any legal issues with going down > >> such a path? That way, you still get the ability to decentralize the library, but don?t > >> alienate the user base that can?t rely on pip? > > > > > > I?m of the opinion that, given a brand new language, it makes more sense to have really > good packaging tools built in, but not to have a standard library. This you call ?FooLang > Core? or something of the sort. Then you take the most popular or the best examples or whatever > criteria you want from the ecosystem around that and you bundle them all together so that > the third party packages essentially get preinstalled and you call that ?FooLang Platform? > or something. > > Dependencies are always going to be a problem. The best way to parse XML is lxml (and the > best way to parse HTML is BeautifulSoup plus lxml); does that mean that the Python Platform > requires libxml2? The best way to do numerical computing is with NumPy, and the best way > to build NumPy is with MKL on platforms where it exists, ATLAS on others; does that mean > the Python Platform requires MKL and/or ATLAS? The best way to build cross-platform > GUIs with desktop integration is PySide; does that mean the Python Platform requires > Qt? (One of the biggest portability problems for Python in practice has always been Tcl/Tk; > Qt would be much worse.) > > You could look at it as something like the core plus distributions model used in OS's. > FreeBSD has a core and ports; there's a simple rule for what's in core (a complete POSIX > system plus enough to build ports, nothing else), and the practicality-vs.-purity > decisions for how to apply that to real-life problems isn't that hard. But Linux took > a different approach: it's just a kernel, and everything else--libc, the ports system, > etc.--can be swapped out. There is no official distribution; at any given time in history, > there are 3-6 competing "major distributions", dozens of others based on them, and some > "special-case" distros like ucLinux or Android. And that means different distros can > make different decisions on what dependencies are acceptable--include packages that > only run on x86, or accept some corporate quasi-open-source license or closed-source > blob. > > Python seems to have fallen into a place halfway between the two. The stdlib is closer > to FreeBSD core than to Linux. On the other hand, while many people start with the official > stdlib and use pip to expand on it, there are third-party distributions competing to > provide more useful or better-organized batteries than the official version, plus > custom distributions that come with some OS distros (e.g., Apple includes PyObjC with > theirs), and special things like Kivy. > > That doesn't seem to have caused any harm, and may have caused a lot of benefit. While Python > may not have found the perfect sweet spot, what it found isn't that bad. And the way it continues > to evolve isn't that bad. If you could go back in time to 2010 and come up with a grand five-year > plan for how the stdlib, core distribution, and third-party ecosystem should be better, > how much different would Python be today? > > It certainly doesn?t require you to add something to the ?Platform? for every topic either. You can still be conservative in what you include in the ?Platform? based on how many people are likely to need/want it and what sort of dependency or building impact it has on actually building out the full Platform. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From abarnert at yahoo.com Thu May 28 00:05:51 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 27 May 2015 15:05:51 -0700 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: <01071ECC-1949-40EF-8D7A-3073E643344B@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> <01071ECC-1949-40EF-8D7A-3073E643344B@gmail.com> Message-ID: <40169569-4709-463B-A4BB-687B1A7BE1ED@yahoo.com> On May 27, 2015, at 12:13, Demian Brecht wrote: > > The major advantage of going with a fully distributed model would be the out-of-band releases. While nice to have for feature development, it can be crucial for bug fixes, but even more so for security patches. Other than that, I could see it opening the door to adoption of packages as ?recommended? without worrying too much about state of development. requests is a perfect example of that. Note that my personal focus on standard library development is the http package so I?m somewhat cutting my legs out from under me, but I?m starting to think that adopting such a distribution mechanism might solve a number of problems (but is probably just as likely to introduce new ones ;)). One way to do that might be to focus the stdlib on picking the abstract interfaces (whether in the actual code, like dbm allows bsddb to plug in, or just in documentation, like DB-API 2) and providing a bare-bones implementation or none at all. It would be nice if things like lxml.etree didn't take so much work and it weren't so hard to quantify how perfect of a replacement it is. Or if we had a SortedMapping ABC so the half-dozen popular implementations could share a consistent API, so they could compete more cleanly on things that matter like performance or the need for a C extension. But the example of requests shows how hard, and possibly undesirable, that is. Most people use requests not because of the advanced features it has that urllib doesn't, but because the intermediate-level features that both include have a nicer interface in requests. And, while people have talked about how nice it would be to restructure urllib so that it matches requests' interface wherever possible (while still retaining the existing interface for backward compat), it doesn't seem that likely anyone will actually ever do it. And, even if someone did, and requests became a drop-in replacement for urllib' new-style API and urllib was eventually deprecated, what are the odds competitors like PyCurl would be reworked into a "URL-API 2.0" module? From scott+python-ideas at scottdial.com Thu May 28 00:39:29 2015 From: scott+python-ideas at scottdial.com (Scott Dial) Date: Wed, 27 May 2015 18:39:29 -0400 Subject: [Python-ideas] Framework for Python for CS101 In-Reply-To: References: Message-ID: <556647A1.8010703@scottdial.com> On 2015-05-25 1:50 PM, Rustom Mody wrote: > from > https://groups.google.com/d/msg/erlang-programming/5X1irAmLMD8/qCQJ11Y5jEAJ >From the same post: """ One problem is that Computer Science departments simply do not have the time to teach everything they need to teach. Students want to leave in 3 years with qualifications an employer will like, and employers want 'practical' languages in CVs. I have a colleague who cannot spell because he was taught to read using the Initial Teaching Alphabet, so I'm less convinced about the educational benefits of neat languages than I used to be. """ Would that not be the same problem with a Python-like teaching language? -- Scott Dial scott at scottdial.com From ncoghlan at gmail.com Thu May 28 01:16:04 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 28 May 2015 09:16:04 +1000 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> Message-ID: On 28 May 2015 04:46, "Paul Moore" wrote: > > On 27 May 2015 at 19:28, Demian Brecht wrote: > > This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules? > > It has been discussed on a number of occasions. The major issue with > the idea is that a lot of people use Python in closed corporate > environments, where access to the internet from tools such as pip can > be restricted. Also, many companies have legal approval processes for > software - getting approval for "Python" includes the standard > library, but each external package required would need a separate, > probably lengthy and possibly prohibitive, approval process before it > could be used. > > So it's unlikely to ever happen, because it would cripple Python for a > non-trivial group of its users. I expect splitting the standard library into a minimal core and a suite of default independently updatable add-ons will happen eventually, we just need to help fix the broken way a lot of organisations currently work as we go: http://community.redhat.com/blog/2015/02/the-quid-pro-quo-of-open-infrastructure/ Organisations that don't suitably adapt to the rise of open collaborative models for infrastructure development are going to have a very rough time of it in the coming years. Cheers, Nick. P.S. For a less verbally dense presentation of some of the concepts in that article: http://www.redhat.com/en/explore/infrastructure/na P.P.S. And for a book length exposition of these kinds of concepts: http://www.redhat.com/en/explore/the-open-organization-book > > Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu May 28 02:29:03 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 27 May 2015 17:29:03 -0700 Subject: [Python-ideas] Displaying DeprecationWarnings in the interactive interpreter, second try Message-ID: Hi all, I'm tired of getting bug reports like this one: https://github.com/numpy/numpy/issues/5919 where the issue is just that the user didn't see deprecation warnings, so I just filed a bug report requesting that the interactive Python REPL start printing DeprecationWarnings when users use deprecated functionality: https://bugs.python.org/issue24294 In the bug report it was pointed out that this was discussed on python-ideas a few months ago, and the discussion petered out without any consensus: http://thread.gmane.org/gmane.comp.python.ideas/32191 As far as I can tell, though, there were only two real objections raised in that previous thread, and IMO neither is really convincing. So let me pre-empt those now: Objection 1: This will cause the display of lots of unrelated warnings. Response: You misunderstand the proposal. I'm not suggesting that we display *all* DeprecationWarnings whenever the interactive interpreter is running; I'm only suggesting that we display the deprecation warnings that are warning about *code that was actually typed at the interpreter*. # not this warnings.filterwarnings("default", category=DeprecationWarning) # this warnings.filterwarnings("default", category=DeprecationWarning, module="__main__") So for example, if we have # module1.py def deprecated_function(): warnings.warn("stop it!", DeprecationWarning, stacklevel=2) # module2.py import module1 def foo(): module1.deprecated_function() >> import module1, module2 # This doesn't print a warning, because 'foo' is not deprecated # it merely uses deprecated functionality, which is not my problem, # because I am merely a user of module1, not the author. >> module2.foo() # This *does* print a warning, because now I am using the # deprecated functionality directly. >> module1.deprecated_function() __main__:1: DeprecationWarning: stop it! Objection 2: There are lots of places that code is run interactively besides the standard REPL -- there's IDLE and IPython and etc. Response: Well, this isn't really an objection :-). Basically I'm looking for consensus from the CPython team that this is what should happen in the interactive interpreters that they distribute. Other interfaces can then follow that lead or not. (For some value of "follow". By the time you read this IPython may have already made the change: https://github.com/ipython/ipython/pull/8480 ;-).) So, totally awesome idea, let's do it, yes/yes? -n -- Nathaniel J. Smith -- http://vorpus.org From stephen at xemacs.org Thu May 28 03:31:13 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 28 May 2015 10:31:13 +0900 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> Message-ID: <87mw0pfpy6.fsf@uwakimon.sk.tsukuba.ac.jp> Demian Brecht writes: > This is probably a silly idea, but given the above quote and the > new(er) focus on pip and distributed packages, has there been any > discussion around perhaps deprecating (and entirely removing from a > Python 4 release) non-builtin packages and modules? Of course there has, including in parallel to your post. It's a dead obvious idea. I'd point to threads, but none of the ones I remember would be of great use; the same ideas and suggestions that were advanced before have been reproduced here. The problems are that the devil is in the details which are rarely specified, and it would have a huge impact on relationships in the community. For example, in the context of a relatively short timed release cycle, I do recall the debates mentioned by Nick over corporate environments where "Python" (the CPython distribution) is approved as a single package, so stdlib facilities are automatically available to "Python" users, but other packages would need to be approved on a package-by-package basis. There's significant overhead to each such application, so it is efficiency-increasing to have a big stdlib in those environments. OK, you say, so we automatically bundle the separate stdlib current at a given point in time with the less frequently released Python core distribution. Now, in the Department of Devilsh Details, do those "same core + new stdlib" bundles get the core version number, the stdlib version number (which now must be different!) or a separate bundle version number? In the Bureau of Relationship Impacts, if I were a fascist QA/security person, I would surely view that bundle as a new release requiring a new iteration of the security vetting process (relationship impact). Maybe the departments doing such vetting are not as fascist as I would be, but we'd have to find out, wouldn't we? If we just went ahead with this process and discovered later that 80% of the people who were depending on the "Python" package now cannot benefit from the bundling because the tarball labelled "Python-X.Y" no longer is eternal, that would be sad. And although that is the drag on a core/stdlib release cycle split most often cited, I'm sure there are plenty of others. Is it worth the effort to try to discover and address all/most/some of those? Which ones to address (and we don't know what problems might exist yet!)? > I would think that if there was a system similar to Django Packages > that made discoverability/importing of packages as easy as using > those in the standard library, having a distributed package model > where bug fixes and releases could be done out of band with CPython > releases would likely more beneficial to the end users. If there > was a ?recommended packages? framework, perhaps there could also be > buildbots put to testing interoperability of the recommended > package set. I don't think either "recommended packages" or buildbots scales much beyond Django (and I wonder whether buildbots would even scale to the Django packages ecosystem). But the Python ecosystem includes all of Django already, plus NumPy, SciPy, Pandas, Twisted, Egenix's mx* stuff, a dozen more or less popular ORMs, a similar number of web frameworks more or less directly competing with Django itself, and all the rest of the cast of thousands on PyPI. At the present time, I think we need to accept that integration of a system, even one that implements a single application, has a shallow learning curve. It takes quite a bit of time to become aware of needs (my initial reaction was "json-schema in the stdlib? YAGNI!!"), and some time and a bit of Google-foo to translate needs to search keywords. After that, the Googling goes rapidly -- that's a solved problem, thank you very much DEC AltaVista. Then you hit the multiple implementations wall, and after recovering consciousness, you start moving forward again slowly, evaluating alternatives and choosing one. And that doesn't mean you're done, because those integration decisions will not be set in stone. Eg, for Mailman's 3.0 release, Barry decided to swap out two mission-critical modules, the ORM and the REST generator -- after the first beta was released! Granted, Mailman 3.0 has had an extremely long release process, but the example remains relevant -- such reevaluations occur in .2 or .9 releases all the time.) Except for Googling, none of these tasks are solved problems: the system integrator has to go through the process over again each time with a new system, or in an existing system when the relative strengths of the chosen modules vs. alternatives change dramatically. In this last case, it's true that choosing keywords is probably trivial, and the alternative pruning goes faster, but retrofitting the whole system to the new! improved! alternative!! module may be pretty painful -- and there's not necessarily a guarantee it will succeed. IMO, fiddling with the Python release and distribution is unlikely to solve any of the above problems, and is likely to be a step backward for some users. Of course at some point we decide the benefits to other users, the developers, and the release engineers outweigh the costs to the users who don't like the change, but it's never a no-brainer. From graffatcolmingov at gmail.com Thu May 28 04:39:09 2015 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Wed, 27 May 2015 21:39:09 -0500 Subject: [Python-ideas] Displaying DeprecationWarnings in the interactive interpreter, second try In-Reply-To: References: Message-ID: On Wed, May 27, 2015 at 7:29 PM, Nathaniel Smith wrote: > Hi all, > > I'm tired of getting bug reports like this one: > > https://github.com/numpy/numpy/issues/5919 > > where the issue is just that the user didn't see deprecation warnings, > so I just filed a bug report requesting that the interactive Python > REPL start printing DeprecationWarnings when users use deprecated > functionality: > > https://bugs.python.org/issue24294 > > In the bug report it was pointed out that this was discussed on > python-ideas a few months ago, and the discussion petered out without > any consensus: > > http://thread.gmane.org/gmane.comp.python.ideas/32191 > > As far as I can tell, though, there were only two real objections > raised in that previous thread, and IMO neither is really convincing. > So let me pre-empt those now: > > Objection 1: This will cause the display of lots of unrelated warnings. > > Response: You misunderstand the proposal. I'm not suggesting that we > display *all* DeprecationWarnings whenever the interactive interpreter > is running; I'm only suggesting that we display the deprecation > warnings that are warning about *code that was actually typed at the > interpreter*. > > # not this > warnings.filterwarnings("default", category=DeprecationWarning) > > # this > warnings.filterwarnings("default", category=DeprecationWarning, > module="__main__") > > So for example, if we have > > # module1.py > def deprecated_function(): > warnings.warn("stop it!", DeprecationWarning, stacklevel=2) > > # module2.py > import module1 > def foo(): > module1.deprecated_function() > >>> import module1, module2 > # This doesn't print a warning, because 'foo' is not deprecated > # it merely uses deprecated functionality, which is not my problem, > # because I am merely a user of module1, not the author. >>> module2.foo() > # This *does* print a warning, because now I am using the > # deprecated functionality directly. >>> module1.deprecated_function() > __main__:1: DeprecationWarning: stop it! > > > Objection 2: There are lots of places that code is run interactively > besides the standard REPL -- there's IDLE and IPython and etc. > > Response: Well, this isn't really an objection :-). Basically I'm > looking for consensus from the CPython team that this is what should > happen in the interactive interpreters that they distribute. Other > interfaces can then follow that lead or not. (For some value of > "follow". By the time you read this IPython may have already made the > change: https://github.com/ipython/ipython/pull/8480 ;-).) > > So, totally awesome idea, let's do it, yes/yes? > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ I'm in favor of this. It's especially convincing to me that IPython is considering a similar change. From ncoghlan at gmail.com Thu May 28 04:53:23 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 28 May 2015 12:53:23 +1000 Subject: [Python-ideas] Displaying DeprecationWarnings in the interactive interpreter, second try In-Reply-To: References: Message-ID: On 28 May 2015 at 10:29, Nathaniel Smith wrote: > Hi all, > > I'm tired of getting bug reports like this one: > > https://github.com/numpy/numpy/issues/5919 > > where the issue is just that the user didn't see deprecation warnings, > so I just filed a bug report requesting that the interactive Python > REPL start printing DeprecationWarnings when users use deprecated > functionality: > > https://bugs.python.org/issue24294 +1 from me. For folks that aren't aware of the history, prior to Python 2.7, the situation was like this (DW = DeprecationWarning, PDW = PendingDeprecationWarning): Test frameworks: DW visible by default, PDW hidden by default Interactive REPL: DW visible by default, PDW hidden by default Non-interactive execution: DW visible by default, PDW hidden by default In Python 2.7, this behaviour was changed to be as follows: Test frameworks: both visible by default Interactive REPL: both hidden by default Non-interactive execution: both hidden by default This eliminated deprecation warnings from the experience of end users running scripts and applications that merely happened to be written in Python, but also eliminated any real behavioural difference between DW and PDW, making it very unclear as to whether or not retaining PDW still had any practical purpose beyond backwards compatibility. In addition to better alerting end users to genuinely imminent deprecations that they should adapt to ASAP, splitting them again in the interactive REPL case would restore a meaningful behavioural difference that can help pragmatically guide decisions as to which is more appropriate to use for a given deprecation: Test frameworks: both visible by default Interactive REPL: DW visible by default, PDW hidden by default Non-interactive execution: both hidden by default Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From berker.peksag at gmail.com Thu May 28 04:59:04 2015 From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=) Date: Thu, 28 May 2015 05:59:04 +0300 Subject: [Python-ideas] Displaying DeprecationWarnings in the interactive interpreter, second try In-Reply-To: References: Message-ID: On Thu, May 28, 2015 at 5:53 AM, Nick Coghlan wrote: > On 28 May 2015 at 10:29, Nathaniel Smith wrote: >> Hi all, >> >> I'm tired of getting bug reports like this one: >> >> https://github.com/numpy/numpy/issues/5919 >> >> where the issue is just that the user didn't see deprecation warnings, >> so I just filed a bug report requesting that the interactive Python >> REPL start printing DeprecationWarnings when users use deprecated >> functionality: >> >> https://bugs.python.org/issue24294 > > +1 from me. +1 from me, too. --Berker From ben+python at benfinney.id.au Thu May 28 07:04:54 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 28 May 2015 15:04:54 +1000 Subject: [Python-ideas] Displaying DeprecationWarnings in the interactive interpreter, second try References: Message-ID: <85a8wpwavd.fsf@benfinney.id.au> Nick Coghlan writes: > In addition to better alerting end users to genuinely imminent > deprecations that they should adapt to ASAP, splitting them again in > the interactive REPL case would restore a meaningful behavioural > difference that can help pragmatically guide decisions as to which is > more appropriate to use for a given deprecation: > > Test frameworks: both visible by default > Interactive REPL: DW visible by default, PDW hidden by default > Non-interactive execution: both hidden by default Is there already a clear API for a ?test framework? or ?interactive REPL? to declare itself as such? Do all the test frameworks and interactive REPL implementations already follow that API? I ask this to know whether your proposal entails that each implementation of a test framework or REPL will likely behave differently from other implementations in how it fits into the above categories. -- \ ?We have clumsy, sputtering, inefficient brains?. It is a | `\ *struggle* to be rational and objective, and failures are not | _o__) evidence for an alternative reality.? ?Paul Z. Myers, 2010-10-14 | Ben Finney From tjreedy at udel.edu Thu May 28 08:22:43 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 28 May 2015 02:22:43 -0400 Subject: [Python-ideas] Displaying DeprecationWarnings in the interactive interpreter, second try In-Reply-To: References: Message-ID: On 5/27/2015 8:29 PM, Nathaniel Smith wrote: > https://bugs.python.org/issue24294 I had already planned to add this to Idle, at least as an option. I had not seen the issue yet. I am pretty ignorant about the warnings system so I posted some questions there. I was thinking to make this an option, but I do not know how to convey options set in the Idle process to the user execution process, as the rpc protocol seems undocumented. I might just turn DeprecationWarnings on the way they used to be and will be in the console interpreter, but I am slightly worried about warnings being intermixed with user output. This is not a problem with tracebacks as they end user output. -- Terry Jan Reedy From ncoghlan at gmail.com Thu May 28 08:46:12 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 28 May 2015 16:46:12 +1000 Subject: [Python-ideas] Displaying DeprecationWarnings in the interactive interpreter, second try In-Reply-To: <85a8wpwavd.fsf@benfinney.id.au> References: <85a8wpwavd.fsf@benfinney.id.au> Message-ID: On 28 May 2015 at 15:04, Ben Finney wrote: > Nick Coghlan writes: > >> In addition to better alerting end users to genuinely imminent >> deprecations that they should adapt to ASAP, splitting them again in >> the interactive REPL case would restore a meaningful behavioural >> difference that can help pragmatically guide decisions as to which is >> more appropriate to use for a given deprecation: >> >> Test frameworks: both visible by default >> Interactive REPL: DW visible by default, PDW hidden by default >> Non-interactive execution: both hidden by default > > Is there already a clear API for a ?test framework? or ?interactive > REPL? to declare itself as such? > > Do all the test frameworks and interactive REPL implementations already > follow that API? It's a convention. unittest sets the convention for test frameworks (and, as far as I am aware, other popular test runners like nose and py.test abide by it), while the default REPL, the code module, IDLE and IPython will set the convention for REPLs (assuming we change it away from matching the non-interactive default behaviour) > I ask this to know whether your proposal entails that each > implementation of a test framework or REPL will likely behave > differently from other implementations in how it fits into the above > categories. Test frameworks and REPLs that don't adjust the warning filters on startup will continue to default to the non-interactive behaviour. Nobody is proposing to change that. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mal at egenix.com Thu May 28 10:26:04 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 28 May 2015 10:26:04 +0200 Subject: [Python-ideas] Displaying DeprecationWarnings in the interactive interpreter, second try In-Reply-To: References: Message-ID: <5566D11C.6050002@egenix.com> On 28.05.2015 02:29, Nathaniel Smith wrote: > Hi all, > > I'm tired of getting bug reports like this one: > > https://github.com/numpy/numpy/issues/5919 Well, in that particular case, I think numpy should raise a TypeError instead of a DeprecationWarning :-) > where the issue is just that the user didn't see deprecation warnings, > so I just filed a bug report requesting that the interactive Python > REPL start printing DeprecationWarnings when users use deprecated > functionality: > > https://bugs.python.org/issue24294 +1 on the general idea, but I think this needs some more thought on the topic of how you detect an interactive session that's being used by a user. You wouldn't want these warning to show up when piping in commands to a Python interpreter. In eGenix PyRun we use sys.stdin.isatty() to check whether we want an interactive prompt or not. I guess the same could be done here. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 28 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jonathan at slenders.be Thu May 28 10:53:13 2015 From: jonathan at slenders.be (Jonathan Slenders) Date: Thu, 28 May 2015 10:53:13 +0200 Subject: [Python-ideas] Displaying DeprecationWarnings in the interactive interpreter, second try In-Reply-To: <5566D11C.6050002@egenix.com> References: <5566D11C.6050002@egenix.com> Message-ID: +1 on this too. I'm author of the "ptpython" REPL. Nathaniel Smith: could you tell me what I should do? Is it enough when I make sure that all code runs in __main__ and running this command at the start? warnings.filterwarnings("default", category=DeprecationWarning, module="__main__") Jonathan 2015-05-28 10:26 GMT+02:00 M.-A. Lemburg : > On 28.05.2015 02:29, Nathaniel Smith wrote: > > Hi all, > > > > I'm tired of getting bug reports like this one: > > > > https://github.com/numpy/numpy/issues/5919 > > Well, in that particular case, I think numpy should raise a TypeError > instead of a DeprecationWarning :-) > > > where the issue is just that the user didn't see deprecation warnings, > > so I just filed a bug report requesting that the interactive Python > > REPL start printing DeprecationWarnings when users use deprecated > > functionality: > > > > https://bugs.python.org/issue24294 > > +1 on the general idea, but I think this needs some more thought > on the topic of how you detect an interactive session that's being > used by a user. > > You wouldn't want these warning to show up when piping in commands > to a Python interpreter. > > In eGenix PyRun we use sys.stdin.isatty() to check whether we > want an interactive prompt or not. I guess the same could be done > here. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, May 28 2015) > >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu May 28 11:04:05 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 28 May 2015 02:04:05 -0700 Subject: [Python-ideas] Displaying DeprecationWarnings in the interactive interpreter, second try In-Reply-To: References: <5566D11C.6050002@egenix.com> Message-ID: On Thu, May 28, 2015 at 1:53 AM, Jonathan Slenders wrote: > +1 on this too. > > I'm author of the "ptpython" REPL. > > Nathaniel Smith: could you tell me what I should do? > > Is it enough when I make sure that all code runs in __main__ and running > this command at the start? > warnings.filterwarnings("default", category=DeprecationWarning, > module="__main__") That should do it, yes. -- Nathaniel J. Smith -- http://vorpus.org From greg at krypto.org Thu May 28 16:27:38 2015 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 28 May 2015 14:27:38 +0000 Subject: [Python-ideas] Displaying DeprecationWarnings in the interactive interpreter, second try In-Reply-To: <5566D11C.6050002@egenix.com> References: <5566D11C.6050002@egenix.com> Message-ID: On Thu, May 28, 2015, 1:26 AM M.-A. Lemburg wrote: > On 28.05.2015 02:29, Nathaniel Smith wrote: > > Hi all, > > > > I'm tired of getting bug reports like this one: > > > > https://github.com/numpy/numpy/issues/5919 > > Well, in that particular case, I think numpy should raise a TypeError > instead of a DeprecationWarning :-) > > > where the issue is just that the user didn't see deprecation warnings, > > so I just filed a bug report requesting that the interactive Python > > REPL start printing DeprecationWarnings when users use deprecated > > functionality: > > > > https://bugs.python.org/issue24294 > > +1 on the general idea, but I think this needs some more thought > on the topic of how you detect an interactive session that's being > used by a user. > > You wouldn't want these warning to show up when piping in commands > to a Python interpreter. > > In eGenix PyRun we use sys.stdin.isatty() to check whether we > want an interactive prompt or not. I guess the same could be done > here. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, May 28 2015) > >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skip.montanaro at gmail.com Thu May 28 16:34:23 2015 From: skip.montanaro at gmail.com (Skip Montanaro) Date: Thu, 28 May 2015 09:34:23 -0500 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com> Message-ID: On Wed, May 27, 2015 at 2:03 PM, Donald Stufft wrote: > I?m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. While perhaps nice in theory, the process of getting a package into the standard library provides a number of filters (hurdles, if you will) through which a package much pass (or surmount) before it is deemed suitable for broad availability by default to users, and for support by the core development team. Today, that includes documentation, unit tests, broad acceptance by the user community (in many cases), and a commitment by the core development team to maintain the package for the foreseeable future. To the best of my knowledge, none of those filters apply to PyPI-cataloged packages. That is not to say that the current process doesn't have its problems. Some really useful stuff is surely not available in the core. If the core development team was stacked with people who program numeric applications for a living, perhaps numpy or something similar would be in the core today. The other end of the spectrum is Perl. It has been more than a decade since I did any Perl programming, and even then, not much, but I still remember how confused I was trying to choose a package to manipulate dates and times from CPAN with no guidance. I know PyPI has a weight field. I just went back and reread the footnote describing it, but I really have no idea how it operates. I'm sure someone nefarious could game that system so their security compromising package drifts toward the top of the list. Try searching for "xml." 2208 packages are return, with weights ranging from 1 to 9. 107 packages have weights of 8 or 9. If the standard library is to dwindle down to next-to-nothing, a better scheme for package selection/recommendation will have to be developed. Skip From brett at python.org Thu May 28 16:45:01 2015 From: brett at python.org (Brett Cannon) Date: Thu, 28 May 2015 14:45:01 +0000 Subject: [Python-ideas] Displaying DeprecationWarnings in the interactive interpreter, second try In-Reply-To: References: <5566D11C.6050002@egenix.com> Message-ID: On Thu, May 28, 2015 at 10:28 AM Gregory P. Smith wrote: > > > On Thu, May 28, 2015, 1:26 AM M.-A. Lemburg wrote: > >> On 28.05.2015 02:29, Nathaniel Smith wrote: >> > Hi all, >> > >> > I'm tired of getting bug reports like this one: >> > >> > https://github.com/numpy/numpy/issues/5919 >> >> Well, in that particular case, I think numpy should raise a TypeError >> instead of a DeprecationWarning :-) >> >> > where the issue is just that the user didn't see deprecation warnings, >> > so I just filed a bug report requesting that the interactive Python >> > REPL start printing DeprecationWarnings when users use deprecated >> > functionality: >> > >> > https://bugs.python.org/issue24294 >> >> +1 on the general idea, but I think this needs some more thought >> on the topic of how you detect an interactive session that's being >> used by a user. >> >> You wouldn't want these warning to show up when piping in commands >> to a Python interpreter. >> >> In eGenix PyRun we use sys.stdin.isatty() to check whether we >> want an interactive prompt or not. I guess the same could be done >> here. >> >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Source (#1, May 28 2015) >> >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >> >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >> >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ >> ________________________________________________________________________ >> >> ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > +1 > +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu May 28 18:06:44 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 28 May 2015 09:06:44 -0700 Subject: [Python-ideas] Cmake as build system In-Reply-To: References: Message-ID: There was a big thread about this recently l-- and many more before that, I'm sure. Please read them before posting more... But: cPython is an open source project -- while it would be NICE to get core developer's support before going out and doing something new, if anyone is convinced that they can set up a better build system for Python -- go ahead and do it -- if it turns out all skeptic are wrong, and the issues they raise can be overcome easily enough -- then you will have proved that. But going on and on on this list about how other people should do something different isn't going to get you anywhere. One small note: > Take > a SCons, for example, and try to port that to Python 3. You will see the > key > points that need to be solved (see the bulletproof unicode thread in this > list). > uhm, in that thread, you ask for a Python2 solution (so apparently nothing to do with porting to py3) -- whereas in Python3, there is surrogateescape support. So while yes, python3's consistent, robust approach to Unicode has made processing ill-defined text harder than python2, this particular problem HAS been addressed, and in fact, is easier to to do in py2 than py3. discussing Python usage issues in development > lists is discouraged even though the issues raised there are important for > language usability. > you can only put so much on one list -- if you want to discuss how to do something with the existing implementation of Python (2 or 3...) then an "ideas" list or "devel" list isn't the right place. What is the problem with that? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Thu May 28 19:07:13 2015 From: wes.turner at gmail.com (Wes Turner) Date: Thu, 28 May 2015 12:07:13 -0500 Subject: [Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library) In-Reply-To: References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com> <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com> <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp> <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com> <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com> Message-ID: On Thu, May 28, 2015 at 9:34 AM, Skip Montanaro wrote: > On Wed, May 27, 2015 at 2:03 PM, Donald Stufft wrote: > > I?m of the opinion that, given a brand new language, it makes more sense > to have really good packaging tools built in, but not to have a standard > library. > > While perhaps nice in theory, the process of getting a package into > the standard library provides a number of filters (hurdles, if you > will) through which a package much pass (or surmount) before it is > deemed suitable for broad availability by default to users, and for > support by the core development team. Today, that includes > documentation, unit tests, broad acceptance by the user community (in > many cases), and a commitment by the core development team to maintain > the package for the foreseeable future. To the best of my knowledge, > none of those filters apply to PyPI-cataloged packages. That is not to > say that the current process doesn't have its problems. Some really > useful stuff is surely not available in the core. If the core > development team was stacked with people who program numeric > applications for a living, perhaps numpy or something similar would be > in the core today. > > The other end of the spectrum is Perl. It has been more than a decade > since I did any Perl programming, and even then, not much, but I still > remember how confused I was trying to choose a package to manipulate > dates and times from CPAN with no guidance. I know PyPI has a weight > field. I just went back and reread the footnote describing it, but I > really have no idea how it operates. I'm sure someone nefarious could > game that system so their security compromising package drifts toward > the top of the list. Try searching for "xml." 2208 packages are > return, with weights ranging from 1 to 9. 107 packages have weights of > 8 or 9. If the standard library is to dwindle down to next-to-nothing, > a better scheme for package selection/recommendation will have to be > developed. > A workflow for building CI-able, vendorable packages with coverage and fuzzing? * xUnit XML test results * http://schema.org/AssessAction * Quality 1 (Use Cases n, m) * Quality 2 (Use cases x, y) * SecurityAssessAction * http://schema.org/ChooseAction * Why am I downloading duplicate functionality? * http://schema.org/LikeAction * Community feedback is always helpful. Or, a workflow for maintaining a *distribution of* **versions of** (C and) Python packages? > Skip > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Fri May 29 10:10:53 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 29 May 2015 11:10:53 +0300 Subject: [Python-ideas] Lossless bulletproof conversion to unicode (backslashing) In-Reply-To: References: Message-ID: On Wed, May 27, 2015 at 6:28 PM, Paul Moore wrote: > On 26 May 2015 at 19:30, anatoly techtonik wrote: >> In real world you have to deal with broken and invalid >> output and UnicodeDecode crashes is not an option. >> The unicode() constructor proposes two options to >> deal with invalid output: >> >> 1. ignore - meaning skip and corrupt the data >> 2. replace - just corrupt the data > > There are other error handlers, specifically surrogateescape is > designed for this use. Only in Python 3.x admittedly, but this list is > about future versions of Python, so that's what matters here. Forwarded message to python-list and now I have a thread schizophrenia. I read it like python-list is also about Python 3 and got really mad about that. I was a click away from sending me into the ban list again. =) Ok. Closing thread in python-idea. This needs to be reopened when the thread is about Python 4 (which should be all about improving user experience and assessment of the results). -- anatoly t. From techtonik at gmail.com Fri May 29 10:56:44 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 29 May 2015 11:56:44 +0300 Subject: [Python-ideas] Why decode()/encode() name is harmful Message-ID: First, let me start with The Curse of Knowledge https://en.wikipedia.org/wiki/Curse_of_knowledge which can be summarized as: "Once you get something, it becomes hard to think how it was to be without it". I assume that all of you know difference between decode() and encode(), so you're cursed and therefore think that getting that right it is just a matter of reading documentation, experience and time. But quite a lot of had passed and Python 2 is still there, and Python 3, which is all unicode at the core (and which is great for people who finally get it) is not as popular. So, remember that you are biased towards (or against) decode/unicode perception. Now imaging a person who has a text file. The person need to process that with Python. That person is probably a journalist and doesn't know anything that "any developer should know about unicode". In Python 2 he just copy pastes regular expressions to match the letter and is happy. In Python 3 he needs to *convert* that text to unicode. Then he tries to read the documentation, it already starts to bring conflict to his mind. It says to him to "decode" the text. I don't know about you, but when I'm being told to decode the text, I assume that it is crypted, because I watched a few spy movies including ones with Sherlock Holmes and Stierlitz. But the text looks legit to me, I can clearly see and read it and now you say that I need to decode it. You're basically ruining my world right here. No wonder that I will resist. I probably stressed, has a lot of stuff to do, and you are trying to load me with all those abstract concepts that conflict with what I know. No way! Unless I have a really strong motivation (or scientific background) there is no chance to get this stuff for me right on this day. I will probably repeat the exercise and after a few tries will get the output right, but there is no chance I will remember this thing on that day. Because rewiring neural paths in my brain is much harder that paving them from scratch. -- anatoly t. From rosuav at gmail.com Fri May 29 17:57:53 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 30 May 2015 01:57:53 +1000 Subject: [Python-ideas] Why decode()/encode() name is harmful In-Reply-To: References: Message-ID: On Fri, May 29, 2015 at 6:56 PM, anatoly techtonik wrote: > Then he tries to read the documentation, it > already starts to bring conflict to his mind. It says > to him to "decode" the text. I don't know about you, > but when I'm being told to decode the text, I > assume that it is crypted, because I watched a > few spy movies including ones with Sherlock > Holmes and Stierlitz. But the text looks legit to me, > I can clearly see and read it and now you say that > I need to decode it. This is because you fundamentally do not understand the difference between bytes and text. Consequently, you are trying to shoehorn new knowledge into your preconceived idea that the file *already contains text*, which is not true. Go read: http://www.joelonsoftware.com/articles/Unicode.html http://nedbatchelder.com/text/unipain.html Also, why is this on python-ideas? Talk about this sort of thing on python-list. ChrisA From random832 at fastmail.us Fri May 29 21:32:04 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 29 May 2015 15:32:04 -0400 Subject: [Python-ideas] Why decode()/encode() name is harmful In-Reply-To: References: Message-ID: <1432927924.2536251.281727161.2C137F01@webmail.messagingengine.com> On Fri, May 29, 2015, at 04:56, anatoly techtonik wrote: > First, let me start with The Curse of Knowledge > https://en.wikipedia.org/wiki/Curse_of_knowledge > which can be summarized as: > > "Once you get something, it becomes hard > to think how it was to be without it". Let's think about how it is to be without _the idea that text is a byte stream in the first place_ - which some people here learned from Python 2, some learned from C, some may have learned from some other language. It was the way things always were, after all, before Unicode came along. The language I was using the most immediately before I started using Python was C#. And C# uses Unicode (well, UTF-16, but the important thing is that it's not an ASCII-compatible sequence of bytes) for strings. One could argue that this paradigm - and the attendant "encode" and "decode" concepts, and stream wrappers that take care of it in the common cases, are _the future_, and that one day nobody will learn that text's natural form is as a sequence of ASCII-compatible bytes... even if text files continue to be encoded that way on the disk. > Now imaging a person who has a text file. The > person need to process that with Python. That > person is probably a journalist and doesn't know > anything that "any developer should know about > unicode". In Python 2 he just copy pastes regular > expressions to match the letter and is happy. In > Python 3 he needs to *convert* that text to unicode. You don't have to do so explicitly, if the text file's encoding matches your locale. You can just open the file and read it, and it will open as a text-mode stream that takes care of this for you and returns unicode strings. It's a text file, so you open it in text mode. Even if it doesn't match your locale, the proper way is to pass an "encoding" argument to the open function; not to go so deep as to open it in binary mode and decode the bytes yourself. From graffatcolmingov at gmail.com Fri May 29 21:47:25 2015 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Fri, 29 May 2015 14:47:25 -0500 Subject: [Python-ideas] Why decode()/encode() name is harmful In-Reply-To: References: Message-ID: On Fri, May 29, 2015 at 3:56 AM, anatoly techtonik wrote: > First, let me start with The Curse of Knowledge > https://en.wikipedia.org/wiki/Curse_of_knowledge > which can be summarized as: > > "Once you get something, it becomes hard > to think how it was to be without it". > > I assume that all of you know difference between > decode() and encode(), so you're cursed and > therefore think that getting that right it is just a > matter of reading documentation, experience and > time. But quite a lot of had passed and Python 2 > is still there, and Python 3, which is all unicode > at the core (and which is great for people who > finally get it) is not as popular. So, remember that > you are biased towards (or against) > decode/unicode perception. > > > Now imaging a person who has a text file. The > person need to process that with Python. That > person is probably a journalist and doesn't know > anything that "any developer should know about > unicode". In Python 2 he just copy pastes regular > expressions to match the letter and is happy. In > Python 3 he needs to *convert* that text to unicode. > > Then he tries to read the documentation, it > already starts to bring conflict to his mind. It says > to him to "decode" the text. I don't know about you, > but when I'm being told to decode the text, I > assume that it is crypted, because I watched a > few spy movies including ones with Sherlock > Holmes and Stierlitz. But the text looks legit to me, > I can clearly see and read it and now you say that > I need to decode it. You're basically ruining my > world right here. No wonder that I will resist. I > probably stressed, has a lot of stuff to do, and you > are trying to load me with all those abstract > concepts that conflict with what I know. No way! > Unless I have a really strong motivation (or > scientific background) there is no chance to get > this stuff for me right on this day. I will probably > repeat the exercise and after a few tries will get > the output right, but there is no chance I will > remember this thing on that day. Because > rewiring neural paths in my brain is much harder > that paving them from scratch. > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ So, ignoring your lack of suggestions for different names, would you also argue that the codecs module (which is how people should be handling this when dealing with files on disk) should also be renamed? codecs is a portmanteau of coder-decoder and deals with converting the code-points to bytes and back. codecs, "Encoding", and "Decoding" are also used for non-text formats too (e.g., files containing video or audio). They in all of the related contexts they have the same meaning. I'm failing to understand your problem with the terminology. From abarnert at yahoo.com Fri May 29 21:57:16 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 29 May 2015 12:57:16 -0700 Subject: [Python-ideas] Why decode()/encode() name is harmful In-Reply-To: References: Message-ID: <659FCF6A-91F0-4D7D-A88E-28CD1D18EC38@yahoo.com> On May 29, 2015, at 01:56, anatoly techtonik wrote: > > First, let me start with The Curse of Knowledge > https://en.wikipedia.org/wiki/Curse_of_knowledge > which can be summarized as: > > "Once you get something, it becomes hard > to think how it was to be without it". > > I assume that all of you know difference between > decode() and encode(), so you're cursed and > therefore think that getting that right it is just a > matter of reading documentation, experience and > time. But quite a lot of had passed and Python 2 > is still there, and Python 3, which is all unicode > at the core (and which is great for people who > finally get it) is not as popular. So, remember that > you are biased towards (or against) > decode/unicode perception. > > > Now imaging a person who has a text file. The > person need to process that with Python. That > person is probably a journalist and doesn't know > anything that "any developer should know about > unicode". In Python 2 he just copy pastes regular > expressions to match the letter and is happy. In > Python 3 he needs to *convert* that text to unicode. No he doesn't. In Python 3, unless he goes out of his way to open the file in binary mode, or use binary string literals for his regexps, that text is unicode from the moment his code sees it. So he doesn't have to read the docs. Python 3 was deliberately designed to make it easier to never have to use bytes internally, so 80% of the users never even have to think about bytes (even at the cost of sometimes making things harder for the more advanced coders who need to write the low-level stuff like network protocol handlers and can't avoid bytes). Now, all those things _are_ still problems for people who use Python 2. But the only way to fix that is to get those people--and, even more importantly, new people--using Python 3. Which means not introducing any new radical inconsistencies in between Python 2 and 3 (or 4) for no good reason--or, of course, between Python 3.5 and 3.6 (or 4.0). > Then he tries to read the documentation, it > already starts to bring conflict to his mind. It says > to him to "decode" the text. Where in the documentation does it ever tell you to decode text? If you're inventing fictitious documentation that would confuse people if it existed but doesn't because it doesn't, you can just as well claim that the int method is confusing because it tells him he needs to truncate his integers even though integers are already truncated. Yes, that would be confusing--which is why the docs don't say that. > I don't know about you, > but when I'm being told to decode the text, I > assume that it is crypted, because I watched a > few spy movies including ones with Sherlock > Holmes and Stierlitz. If you open Shift-JIS text as if it were Latin-1 and see a mess of mojibake, it doesn't seem that surprising to be told that you need to decode it properly. If you open UTF-8 text as if it were UTF-8, and Python has already decoded it for you under the covers, you never have to think about it, so there's no opportunity to be surprised. > But the text looks legit to me, > I can clearly see and read it and now you say that > I need to decode it. You're basically ruining my > world right here. No wonder that I will resist. I > probably stressed, has a lot of stuff to do, and you > are trying to load me with all those abstract > concepts that conflict with what I know. No way! > Unless I have a really strong motivation (or > scientific background) there is no chance to get > this stuff for me right on this day. I will probably > repeat the exercise and after a few tries will get > the output right, but there is no chance I will > remember this thing on that day. That's a good point. That's exactly why you see people add random calls to str, unicode, encode, and decode to their Python 2 code until it seems to do the right thing on their one test input, and then freak out when it doesn't work on their second test input and go post a confused mess on StackOverflow or Python-list asking someone to solve it for them. What's the solution? Make it as unlikely as possible that you'll run into the problem in the first place by nearly forcing you to deal in Unicode all the way through your script, and, when you do need to deal with manual encoding and decoding, make the almost-certainly-wrong nonsensical code impossible to write by not having bytes.encode or str.decode or automatic conversions between the two types. Of course that's a backward-incompatible change, and maybe a radical-enough one that it'll take half a decade for the ecosystem to catch up to the point where most users can benefit from it. Which makes it a good thing that Python started that process half a decade ago. So now, to anyone who runs into that confusion, there's an answer: just upgrade from 2.7 to 3.4, undo all the changes you introduced trying to solve this problem incorrectly, and your original code just works. Even if you had a better solution than Python 3's (which I doubt, but let's assume you do), what good would that do? That would make the answer: wait 18 months for Python 3.6, then another 12 months for the last of the packages you depend on to finally adjust to the breaking incompatibility that 3.6 introduced, then undo all the changes you introduced trying to solve this problem incorrectly, then make different, more sensible, changes. That's clearly not a better answer. So, unless you have a better solution than Python 3's and also have a time machine to go back to 2007, what could you possibly have to propose? > Because > rewiring neural paths in my brain is much harder > that paving them from scratch. > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Sat May 30 02:18:12 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 30 May 2015 10:18:12 +1000 Subject: [Python-ideas] Why decode()/encode() name is harmful In-Reply-To: <659FCF6A-91F0-4D7D-A88E-28CD1D18EC38@yahoo.com> References: <659FCF6A-91F0-4D7D-A88E-28CD1D18EC38@yahoo.com> Message-ID: <20150530001811.GS932@ando.pearwood.info> On Fri, May 29, 2015 at 12:57:16PM -0700, Andrew Barnert via Python-ideas wrote: Before anyone else engages too deeply in this off-topic discussion, some background: Anatoly wrote to python-list asking for help dealing with a problem where he has a bunch of bytes (file names) which probably represent Russian text but in an unknown legacy encoding, and he wants to round-trip it from bytes to Unicode and back again losslessly. (Russian is a particularly nasty example, because there are multiple mutually-incompatible Russian encodings in widespread use.) As far as I can see, he has been given the solution, or at least a potential solution, on python-list, but as far as I can tell he either hasn't read it, or doesn't like the solutions offerred and so is ignoring them. So there's a real problem hidden here, buried beneath the dramatic presentation of imaginary journalists processing text, but I don't think it's a problem that needs discussing *here* (at least not unless somebody comes up with a concrete proposal or idea to be discussed). A couple more comments follow: > On May 29, 2015, at 01:56, anatoly techtonik wrote: > > In Python 2 he just copy pastes regular > > expressions to match the letter and is happy. In > > Python 3 he needs to *convert* that text to unicode. > > No he doesn't. In Python 3, unless he goes out of his way to open the > file in binary mode, or use binary string literals for his regexps, > that text is unicode from the moment his code sees it. So he doesn't > have to read the docs. This is not the case when you have to deal with unknown encodings. And from the perspective of people who only have ASCII (or at worst, Latin-1) text, or who don't care about moji-bake, Python 2 appears easier to work with. To quote Chris Smith: "I find it amusing when novice programmers believe their main job is preventing programs from crashing. More experienced programmers realize that correct code is great, code that crashes could use improvement, but incorrect code that doesn?t crash is a horrible nightmare." Python 2's string handling is designed to minimize the chance of getting an exception when dealing with text in an unknown encoding, but the consequence is that it also minimizes the chance of it doing the right thing except by accident. In Python 2, you can give me a bunch of arbitrary bytes as a string, and I can read them as text, in a sort of ASCII-ish pseudo-encoding, regardless of how inappropriate it is or how much moji-bake it generates. But it won't raise an exception, which for some people is all that matters. Moving to Unicode (in Python 2 or 3) can come as a shock to users who have never had to think about this before. Moji-bake is ubiquitous on the Internet, so there is a real problem to be solved. Python 2's string model is not the way to solve it. I don't think there is any "no-brainer" solution which doesn't involve thinking about bytes and encodings, but if Anatoly or anyone else wants to suggest one, we can discuss it. [...] > Now, all those things _are_ still problems for people who use Python > 2. But the only way to fix that is to get those people--and, even more > importantly, new people--using Python 3. Which means not introducing > any new radical inconsistencies in between Python 2 and 3 (or 4) for > no good reason--or, of course, between Python 3.5 and 3.6 (or 4.0). These same issues occur in Python 2 if you exclusively use unicode strings u"" instead of the default string type. [...] > So, unless you have a better solution than Python 3's and also have a > time machine to go back to 2007, what could you possibly have to > propose? Surely you would have to go back to 1953 when the ASCII encoding first started, so we can skip over the whole mess of dozens of mutually incompatible "extended ASCII" code pages? -- Steve From tjreedy at udel.edu Sat May 30 02:38:52 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 29 May 2015 20:38:52 -0400 Subject: [Python-ideas] Why decode()/encode() name is harmful In-Reply-To: References: Message-ID: On 5/29/2015 4:56 AM, anatoly techtonik wrote: This essay, which is mostly about the clash between python2 thinking and python3 thinking, is off topic for this list. Please use python-list, which is open to any python-related topic. -- Terry Jan Reedy From wes.turner at gmail.com Sat May 30 14:54:35 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 30 May 2015 07:54:35 -0500 Subject: [Python-ideas] import features; if "print_function" in features.data Message-ID: Would it be useful to have one Python source file with an OrderedDict of (API_feat_lbl, [(start, None)]) mappings and a lookup? * [ ] feat/version segments/rays map * [ ] .lookup("print[_function]") Syntax ideas: * has("print[_function]") Advantages * More pythonic to check for features than capabilities * Forward maintainability Disadvantages: * Alternatives: * six, nine, future * try/import ENOENT -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Sat May 30 15:15:31 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 30 May 2015 08:15:31 -0500 Subject: [Python-ideas] a segment tree of available features for the current/a given Python interpreter Message-ID: To reframe the problem (set the subject line), a segment tree of available features for the current/a given Python interpreter would be useful. * [ ] this could be e.g. 'features.py' and * [ ] requested of (new) implementations (with historical data) * [ ] very simple Python package (python.features ?) On May 30, 2015 7:54 AM, "Wes Turner" wrote: > Would it be useful to have one Python source file with an OrderedDict of > (API_feat_lbl, [(start, None)]) mappings > and a lookup? > > * [ ] feat/version segments/rays map > * [ ] .lookup("print[_function]") > > Syntax ideas: > > * has("print[_function]") > > Advantages > > * More pythonic to check for features than capabilities > * Forward maintainability > > Disadvantages: > > * > > Alternatives: > > * six, nine, future > * try/import ENOENT > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat May 30 15:25:45 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 30 May 2015 23:25:45 +1000 Subject: [Python-ideas] import features; if "print_function" in features.data In-Reply-To: References: Message-ID: On 30 May 2015 at 22:54, Wes Turner wrote: > Would it be useful to have one Python source file with an OrderedDict of > (API_feat_lbl, [(start, None)]) mappings > and a lookup? Your choice of example means I'm not sure what additional capabilities you're seeking. The __future__ module already aims to cover this for compiler directives: >>> import __future__ >>> __future__.all_feature_names ['nested_scopes', 'generators', 'division', 'absolute_import', 'with_statement', 'print_function', 'unicode_literals'] >>> __future__.print_function _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 65536) If you're looking for particular builtins, importing builtins (Python 3) or __builtin__ (Python 2) and checking attributes lets you see what is available via hasattr(). hasattr() will also cover most feature check needs for other modules (file descriptor support in the os module is an exception, hence the related dedicated query APIs for that). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat May 30 15:39:16 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 30 May 2015 23:39:16 +1000 Subject: [Python-ideas] a segment tree of available features for the current/a given Python interpreter In-Reply-To: References: Message-ID: On 30 May 2015 at 23:15, Wes Turner wrote: > To reframe the problem (set the subject line), a segment tree of available > features for the current/a given Python interpreter would be useful. > > * [ ] this could be e.g. 'features.py' and > * [ ] requested of (new) implementations (with historical data) > * [ ] very simple Python package (python.features ?) Now I'm even more convinced I'm not following you properly :) Is it perhaps a request for a programmatically queryable version of Ned Batchelder's "What's in which Python?" articles? http://nedbatchelder.com/blog/201109/whats_in_which_python.html http://nedbatchelder.com/blog/201310/whats_in_which_python_3.html If yes, that seems like a reasonable idea, but would likely work better as a community maintained PyPI module, rather than as a standard library module. My rationale for that: * older versions would need support for new feature checks to avoid failing on the feature checker * other implementations could contribute as needed to adjust feature checks that were overly specific to CPython * the community could collectively determine what "features" were sufficiently interesting to be worth tracking through the relevant projects issue tracker, rather than the core development team needing to decide a priori which new features in each release end users are going to want to conditionally adopt If that interpretation of the question is incorrect, then you're going to need to expand more on the problem you're hoping to solve with this suggestion. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ron3200 at gmail.com Sat May 30 17:45:54 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 30 May 2015 11:45:54 -0400 Subject: [Python-ideas] Explicitly shared objects with sub modules vs import Message-ID: While trying to debug a problem and thinking that it may be an issue with circular imports, I come up with an interesting idea that might be of value. It wasn't a circular import problem in this case, but I may have found the actual bug sooner if I didn't need to be concerned about that possibility. I have had some difficulty splitting larger modules into smaller modules in the past where if I split the code by functionality, it doesn't correspond with how the code is organized by dependency. The result is an imported module needs to import the module it's imported into. Which just doesn't feel right to me. The solution I found was to call a function to explicitly set the shared items in the imported module. (The example is from a language I'm experimenting with written in python. So don't be concerned about the shared object names in this case.) In the main module... import parse parse.set_main(List=List, Keyword=Keyword, Name=Name, String=String, Express=Express, keywords=keywords, raise_with=raise_with, nil=nil) And in parse... # Sets shared objects from main module. from collections import namedtuple def set_main(**d): global main main = namedtuple(__name__, d.keys()) for k, v in d.items(): setattr(main, k, v) After this, the sub module access's the parent modules objects with... main.Keyword Just the same as if the parent module was imported as main, but it only shares what is intended to be shared within this specific imported module. I think that is better than using "import from" in the sub module. And an improvement over importing the whole module which can possibly expose too much. The benifits: * The shared items are explicitly set by the parent module. * If an item is missing, it results in a nice error message * Accessing objects works the same as if import was used. * It avoids (most) circular import problems. * It's easier to think about once you understand what it does. The problem is the submodule needs a function to make it work. I think it would be nice if it could be made a builtin but doing that may be tricky. Where I've used "main", it could set the name of the shared parent module(s) automatically. The name of the function probably should be "shared" or "sharing". (Or some other thing that makes sense.) I would like to hear what other here think, and of course if there are any obvious improvements that can be made. Would this be a good candidate for a new builtin? Cheers, Ron From storchaka at gmail.com Sat May 30 18:13:44 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 30 May 2015 19:13:44 +0300 Subject: [Python-ideas] Explicitly shared objects with sub modules vs import In-Reply-To: References: Message-ID: On 30.05.15 18:45, Ron Adam wrote: > > While trying to debug a problem and thinking that it may be an issue > with circular imports, I come up with an interesting idea that might be > of value. It wasn't a circular import problem in this case, but I may > have found the actual bug sooner if I didn't need to be concerned about > that possibility. > > I have had some difficulty splitting larger modules into smaller modules > in the past where if I split the code by functionality, it doesn't > correspond with how the code is organized by dependency. The result is > an imported module needs to import the module it's imported into. Which > just doesn't feel right to me. > > The solution I found was to call a function to explicitly set the shared > items in the imported module. Why not move all shared objects in common module? Then in both main and parse module you can write from common import * From ron3200 at gmail.com Sat May 30 18:51:59 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 30 May 2015 12:51:59 -0400 Subject: [Python-ideas] Explicitly shared objects with sub modules vs import In-Reply-To: References: Message-ID: On 05/30/2015 12:13 PM, Serhiy Storchaka wrote: > On 30.05.15 18:45, Ron Adam wrote: >> >> While trying to debug a problem and thinking that it may be an issue >> with circular imports, I come up with an interesting idea that might be >> of value. It wasn't a circular import problem in this case, but I may >> have found the actual bug sooner if I didn't need to be concerned about >> that possibility. >> >> I have had some difficulty splitting larger modules into smaller modules >> in the past where if I split the code by functionality, it doesn't >> correspond with how the code is organized by dependency. The result is >> an imported module needs to import the module it's imported into. Which >> just doesn't feel right to me. >> >> The solution I found was to call a function to explicitly set the shared >> items in the imported module. > > Why not move all shared objects in common module? Then in both main and > parse module you can write > > from common import * As I said, sometimes I prefer to organise things by function rather than dependency. The point is this fits a somewhat different pattern than when you have independent common objects. These can be inter-dependent shared objects that would require a circular imports. So common may need an "import __main__ as main" in order for the items that are imported with "import *" to work. One argument might be the organisation of the code is wrong if that is needed, or the may be a better way to organise it. While that is a valid point, it may not be the only factor involved in deciding how to organise the code. I also like to avoid "import *" except when importing very general and common utility functions. ie.. "from math import *". Cheers, Ron From steve at pearwood.info Sun May 31 01:32:26 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 31 May 2015 09:32:26 +1000 Subject: [Python-ideas] import features; if "print_function" in features.data In-Reply-To: References: Message-ID: <20150530233220.GU932@ando.pearwood.info> On Sat, May 30, 2015 at 07:54:35AM -0500, Wes Turner wrote: > Would it be useful to have one Python source file with an OrderedDict of > (API_feat_lbl, [(start, None)]) mappings > and a lookup? Why an OrderedDict? This already exists for __future__ features: py> import __future__ py> __future__.all_feature_names ['nested_scopes', 'generators', 'division', 'absolute_import', 'with_statement', 'print_function', 'unicode_literals', 'barry_as_FLUFL'] > * [ ] feat/version segments/rays map > * [ ] .lookup("print[_function]") I don't know what this means. > Syntax ideas: > > * has("print[_function]") Why does it need a new function instead of just this? "print" in featureset > Advantages > > * More pythonic to check for features than capabilities I think that is wrong. I think that Look Before You Leap is generally considered *less* Pythonic. > * Forward maintainability Not when it comes to syntax changes. You can't write: if has("print"): print "Hello world" else: print("Hello world") because *it won't compile* if print_function is in effect. For non-syntax changes, it's not backwards compatible: if not has("enumerate takes a start argument"): def enumerate(values, start): for i, x in builtins.enumerate(values): yield i+start, x doesn't work for anything older than 3.6 (at the earliest). It's better to check for the feature directly, which always work: try: enumerate([], 1) except TypeError: ... > Disadvantages: > > * * It's ugly, especially for small changes to features, such as when a function started to accept an optional argument. * It requires more work: you have to duplicate the information about every feature in at least three places, not just two (the code itself, the documentation, plus the "features" database). * It's hard to use. * Bootstrapping problem: how do you check for the "has" feature itself? if has("has"): ... # obviously cannot work * Doesn't help with writing hybrid 2+3 code, as it doesn't exist in 2. > Alternatives: > > * six, nine, future > * try/import ENOENT I don't understand this. -- Steve From aquavitae69 at gmail.com Sun May 31 09:16:57 2015 From: aquavitae69 at gmail.com (David Townshend) Date: Sun, 31 May 2015 09:16:57 +0200 Subject: [Python-ideas] npm-style venv-aware launcher Message-ID: Pip and venv have done a lot to improve the accessibility and ease of installing python packages, but I believe there is still a lot of room for improvement. I only realised how cumbersome I find working with python packages when I recently spent a lot of time on a javascript project using npm. A bit of googling and I found several articles discussing pip, venv and npm, and all of them seemed to say the same thing, i.e. pip/venv could learn a lot from npm. My proposal revolves around two issues: 1. Setting up and working with virtual environments can be onerous. Creating one is easy enough, but using them means remembering to run `source activate` every time, which also means remembering which venv is used for which project. Not a major issue, but still and annoyance. 2. Managing lists of required packages is not nearly as easy as in npm since these is no equivalent to `npm install --save ...`. The best that pip offers is `pip freeze`. Howevere, using that is a) an extra step to remember and b) includes all implied dependencies which is not ideal. My proposal is to use a similar model to npm, where each project has a `venvrc` file which lets python-related tools know which environment to use. In order to showcase the sort of funcionality I'm proposing, I've created a basic example on github (https://github.com/aquavitae/pyle). This is currently py3.4 on linux only and very pre-alpha. Once I've added a few more features that I have in mind (e.g. multiple venvs) I'll add it to pypi and if there is sufficient interest I'd be happy to write up a PEP for getting it into the stdlib. Does this seem like the sort of tool that would be useful in the stdlib? Regards David -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun May 31 09:35:56 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 31 May 2015 00:35:56 -0700 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: References: Message-ID: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com> On May 31, 2015, at 00:16, David Townshend wrote: > > Pip and venv have done a lot to improve the accessibility and ease of installing python packages, but I believe there is still a lot of room for improvement. I only realised how cumbersome I find working with python packages when I recently spent a lot of time on a javascript project using npm. A bit of googling and I found several articles discussing pip, venv and npm, and all of them seemed to say the same thing, i.e. pip/venv could learn a lot from npm. > > My proposal revolves around two issues: > Setting up and working with virtual environments can be onerous. Creating one is easy enough, but using them means remembering to run `source activate` every time, which also means remembering which venv is used for which project. Not a major issue, but still and annoyance. If you're not using virtualenvwrapper. You do have to get used to using workon instead of cd to switch between environments--although if you want to, there's a hook you can alias cd to (virtualenvwrapperhelper). And I haven't tried either the native Windows cmd or PowerShell ports or the PowerShell port (it works great with MSYS bash, but I realize not everyone on Windows wants to pretend they're not on Windows). And managing multiple environments with different Python versions (at least different versions of 2.x or different versions of 3.x) could be nicer. But I think it does 90% of what you're looking for, and I think it might be easier to add the other 10% to virtualenvwrapper than to start from scratch. And it works with 2.6-3.3 as well as 3.4+ (with virtualenv instead of venv, of course), on most platforms. with multiple environments, with tab completion (at least in bash and zsh), etc. > Managing lists of required packages is not nearly as easy as in npm since these is no equivalent to `npm install --save ...`. The best that pip offers is `pip freeze`. Howevere, using that is a) an extra step to remember and b) includes all implied dependencies which is not ideal. > My proposal is to use a similar model to npm, where each project has a `venvrc` file which lets python-related tools know which environment to use. In order to showcase the sort of funcionality I'm proposing, I've created a basic example on github (https://github.com/aquavitae/pyle). This is currently py3.4 on linux only and very pre-alpha. Once I've added a few more features that I have in mind (e.g. multiple venvs) I'll add it to pypi and if there is sufficient interest I'd be happy to write up a PEP for getting it into the stdlib. > > Does this seem like the sort of tool that would be useful in the stdlib? > > Regards > > David > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From aquavitae69 at gmail.com Sun May 31 10:01:21 2015 From: aquavitae69 at gmail.com (David Townshend) Date: Sun, 31 May 2015 10:01:21 +0200 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com> References: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com> Message-ID: On Sun, May 31, 2015 at 9:35 AM, Andrew Barnert wrote: > On May 31, 2015, at 00:16, David Townshend wrote: > > Pip and venv have done a lot to improve the accessibility and ease of > installing python packages, but I believe there is still a lot of room for > improvement. I only realised how cumbersome I find working with python > packages when I recently spent a lot of time on a javascript project using > npm. A bit of googling and I found several articles discussing pip, venv > and npm, and all of them seemed to say the same thing, i.e. pip/venv could > learn a lot from npm. > > My proposal revolves around two issues: > > 1. Setting up and working with virtual environments can be onerous. > Creating one is easy enough, but using them means remembering to run > `source activate` every time, which also means remembering which venv is > used for which project. Not a major issue, but still and annoyance. > > If you're not using virtualenvwrapper. > > You do have to get used to using workon instead of cd to switch between > environments--although if you want to, there's a hook you can alias cd to > (virtualenvwrapperhelper). And I haven't tried either the native Windows > cmd or PowerShell ports or the PowerShell port (it works great with MSYS > bash, but I realize not everyone on Windows wants to pretend they're not on > Windows). And managing multiple environments with different Python versions > (at least different versions of 2.x or different versions of 3.x) could be > nicer. > > But I think it does 90% of what you're looking for, and I think it might > be easier to add the other 10% to virtualenvwrapper than to start from > scratch. And it works with 2.6-3.3 as well as 3.4+ (with virtualenv instead > of venv, of course), on most platforms. with multiple environments, with > tab completion (at least in bash and zsh), etc. > Virtualenvwrapper does help a bit, but nowhere near 90%. It doesn't touch any of the issues with pip, it still requires configuration and manually ensuring that the venv is activated. But the biggest issue with extending it is that it has a totally different workflow philosophy in that it enforces a separation between the venv and the project, whereas my proposal involves more integration of the two. I have used virtualenvwrapper quite a bit in the past, but in the end I've always found it easier to just work with venv because of the lack of flexibiltiy in where and how I store the venvs. > > 1. Managing lists of required packages is not nearly as easy as in npm > since these is no equivalent to `npm install --save ...`. The best that > pip offers is `pip freeze`. Howevere, using that is a) an extra step to > remember and b) includes all implied dependencies which is not ideal. > > My proposal is to use a similar model to npm, where each project has a > `venvrc` file which lets python-related tools know which environment to > use. In order to showcase the sort of funcionality I'm proposing, I've > created a basic example on github (https://github.com/aquavitae/pyle). > This is currently py3.4 on linux only and very pre-alpha. Once I've added > a few more features that I have in mind (e.g. multiple venvs) I'll add it > to pypi and if there is sufficient interest I'd be happy to write up a PEP > for getting it into the stdlib. > > Does this seem like the sort of tool that would be useful in the stdlib? > > Regards > > David > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun May 31 10:41:42 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 31 May 2015 01:41:42 -0700 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: References: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com> Message-ID: <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com> On May 31, 2015, at 01:01, David Townshend wrote: > > >> On Sun, May 31, 2015 at 9:35 AM, Andrew Barnert wrote: >>> On May 31, 2015, at 00:16, David Townshend wrote: >>> >>> Pip and venv have done a lot to improve the accessibility and ease of installing python packages, but I believe there is still a lot of room for improvement. I only realised how cumbersome I find working with python packages when I recently spent a lot of time on a javascript project using npm. A bit of googling and I found several articles discussing pip, venv and npm, and all of them seemed to say the same thing, i.e. pip/venv could learn a lot from npm. >>> >>> My proposal revolves around two issues: >>> Setting up and working with virtual environments can be onerous. Creating one is easy enough, but using them means remembering to run `source activate` every time, which also means remembering which venv is used for which project. Not a major issue, but still and annoyance. >> >> If you're not using virtualenvwrapper. >> >> You do have to get used to using workon instead of cd to switch between environments--although if you want to, there's a hook you can alias cd to (virtualenvwrapperhelper). And I haven't tried either the native Windows cmd or PowerShell ports or the PowerShell port (it works great with MSYS bash, but I realize not everyone on Windows wants to pretend they're not on Windows). And managing multiple environments with different Python versions (at least different versions of 2.x or different versions of 3.x) could be nicer. >> >> But I think it does 90% of what you're looking for, and I think it might be easier to add the other 10% to virtualenvwrapper than to start from scratch. And it works with 2.6-3.3 as well as 3.4+ (with virtualenv instead of venv, of course), on most platforms. with multiple environments, with tab completion (at least in bash and zsh), etc. > > Virtualenvwrapper does help a bit, but nowhere near 90%. It doesn't touch any of the issues with pip, it still requires configuration and manually ensuring that the venv is activated. As I already mentioned, if you use virtualenvwrapperhelper or autoenv, you don't need to manually ensure that the venv is activated. I personally use it by having workon cd into the directory for me instead of vice-versa, but if you like vice-versa, you can do it that way, so every time you cd into a directory with a venv in, it activates. > But the biggest issue with extending it is that it has a totally different workflow philosophy in that it enforces a separation between the venv and the project, I don't understand what you mean. I have a one-to-one mapping between venvs and projects (although you _can_ have multiple projects using the same venv, that isn't the simplest way to use it), and I have everything checked into git together, and I didn't have to do anything complicated to get there. > whereas my proposal involves more integration of the two. I have used virtualenvwrapper quite a bit in the past, but in the end I've always found it easier to just work with venv because of the lack of flexibiltiy in where and how I store the venvs. The default for npm is that your package dir is attached directly to the project. You can get more flexibility by setting an environment variable or creating a symlink, but normally you don't. It has about the same flexibility as virtualenvwrapper, with about the same amount of effort. So if virtualenvwrapper isn't flexible enough for you, my guess is that your take on npm won't be flexible enough either, it'll just come preconfigured for your own idiosyncratic use and everyone else will have to adjust... >>> Managing lists of required packages is not nearly as easy as in npm since these is no equivalent to `npm install --save ...`. The best that pip offers is `pip freeze`. Howevere, using that is a) an extra step to remember and b) includes all implied dependencies which is not ideal. >>> My proposal is to use a similar model to npm, where each project has a `venvrc` file which lets python-related tools know which environment to use. In order to showcase the sort of funcionality I'm proposing, I've created a basic example on github (https://github.com/aquavitae/pyle). This is currently py3.4 on linux only and very pre-alpha. Once I've added a few more features that I have in mind (e.g. multiple venvs) I'll add it to pypi and if there is sufficient interest I'd be happy to write up a PEP for getting it into the stdlib. >>> >>> Does this seem like the sort of tool that would be useful in the stdlib? >>> >>> Regards >>> >>> David >>> >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.andrefreitas at gmail.com Sun May 31 12:36:33 2015 From: p.andrefreitas at gmail.com (=?UTF-8?Q?Andr=C3=A9_Freitas?=) Date: Sun, 31 May 2015 10:36:33 +0000 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com> References: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com> <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com> Message-ID: +1 for this idea David. I am using requirements.txt for managing dependencies but the NPM approach is simpler than doing pip freeze, inspecting what are the requirements we really use and setting up a virtualenv. If you need help with the PEP writing I can help you. Em dom, 31 de mai de 2015 ?s 09:45, Andrew Barnert via Python-ideas < python-ideas at python.org> escreveu: > On May 31, 2015, at 01:01, David Townshend wrote: > > > On Sun, May 31, 2015 at 9:35 AM, Andrew Barnert > wrote: > >> On May 31, 2015, at 00:16, David Townshend wrote: >> >> Pip and venv have done a lot to improve the accessibility and ease of >> installing python packages, but I believe there is still a lot of room for >> improvement. I only realised how cumbersome I find working with python >> packages when I recently spent a lot of time on a javascript project using >> npm. A bit of googling and I found several articles discussing pip, venv >> and npm, and all of them seemed to say the same thing, i.e. pip/venv could >> learn a lot from npm. >> >> My proposal revolves around two issues: >> >> 1. Setting up and working with virtual environments can be onerous. >> Creating one is easy enough, but using them means remembering to run >> `source activate` every time, which also means remembering which venv is >> used for which project. Not a major issue, but still and annoyance. >> >> If you're not using virtualenvwrapper. >> >> You do have to get used to using workon instead of cd to switch between >> environments--although if you want to, there's a hook you can alias cd to >> (virtualenvwrapperhelper). And I haven't tried either the native Windows >> cmd or PowerShell ports or the PowerShell port (it works great with MSYS >> bash, but I realize not everyone on Windows wants to pretend they're not on >> Windows). And managing multiple environments with different Python versions >> (at least different versions of 2.x or different versions of 3.x) could be >> nicer. >> >> But I think it does 90% of what you're looking for, and I think it might >> be easier to add the other 10% to virtualenvwrapper than to start from >> scratch. And it works with 2.6-3.3 as well as 3.4+ (with virtualenv instead >> of venv, of course), on most platforms. with multiple environments, with >> tab completion (at least in bash and zsh), etc. >> > > Virtualenvwrapper does help a bit, but nowhere near 90%. It doesn't touch > any of the issues with pip, it still requires configuration and manually > ensuring that the venv is activated. > > > As I already mentioned, if you use virtualenvwrapperhelper or autoenv, you > don't need to manually ensure that the venv is activated. I personally use > it by having workon cd into the directory for me instead of vice-versa, but > if you like vice-versa, you can do it that way, so every time you cd into a > directory with a venv in, it activates. > > But the biggest issue with extending it is that it has a totally different > workflow philosophy in that it enforces a separation between the venv and > the project, > > > I don't understand what you mean. I have a one-to-one mapping between > venvs and projects (although you _can_ have multiple projects using the > same venv, that isn't the simplest way to use it), and I have everything > checked into git together, and I didn't have to do anything complicated to > get there. > > whereas my proposal involves more integration of the two. I have used > virtualenvwrapper quite a bit in the past, but in the end I've always found > it easier to just work with venv because of the lack of flexibiltiy in > where and how I store the venvs. > > > The default for npm is that your package dir is attached directly to the > project. You can get more flexibility by setting an environment variable or > creating a symlink, but normally you don't. It has about the same > flexibility as virtualenvwrapper, with about the same amount of effort. So > if virtualenvwrapper isn't flexible enough for you, my guess is that your > take on npm won't be flexible enough either, it'll just come preconfigured > for your own idiosyncratic use and everyone else will have to adjust... > > >> 1. Managing lists of required packages is not nearly as easy as in >> npm since these is no equivalent to `npm install --save ...`. The best >> that pip offers is `pip freeze`. Howevere, using that is a) an extra step >> to remember and b) includes all implied dependencies which is not ideal. >> >> My proposal is to use a similar model to npm, where each project has a >> `venvrc` file which lets python-related tools know which environment to >> use. In order to showcase the sort of funcionality I'm proposing, I've >> created a basic example on github (https://github.com/aquavitae/pyle). >> This is currently py3.4 on linux only and very pre-alpha. Once I've added >> a few more features that I have in mind (e.g. multiple venvs) I'll add it >> to pypi and if there is sufficient interest I'd be happy to write up a PEP >> for getting it into the stdlib. >> >> Does this seem like the sort of tool that would be useful in the stdlib? >> >> Regards >> >> David >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun May 31 13:32:25 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 31 May 2015 21:32:25 +1000 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: References: Message-ID: <20150531113225.GW932@ando.pearwood.info> On Sun, May 31, 2015 at 09:16:57AM +0200, David Townshend wrote: > Pip and venv have done a lot to improve the accessibility and ease of > installing python packages, but I believe there is still a lot of room for > improvement. I only realised how cumbersome I find working with python > packages when I recently spent a lot of time on a javascript project using > npm. A bit of googling and I found several articles discussing pip, venv > and npm, and all of them seemed to say the same thing, i.e. pip/venv could > learn a lot from npm. > > My proposal revolves around two issues: [...] I don't think this is the right place to discuss either of those ideas. pip is not part of either the Python language or the standard library (apart from the very narrow sense that the most recent versions of Python include a tool to bootstrap pip). I think you should submit them on whatever forum pip uses to discuss feature suggestions. -- Steve From tritium-list at sdamon.com Sun May 31 13:34:14 2015 From: tritium-list at sdamon.com (Alexander Walters) Date: Sun, 31 May 2015 07:34:14 -0400 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: References: Message-ID: <556AF1B6.6000808@sdamon.com> You might want to shoot this over to the distutils-sig mailing list. On 5/31/2015 03:16, David Townshend wrote: > Pip and venv have done a lot to improve the accessibility and ease of > installing python packages, but I believe there is still a lot of room > for improvement. I only realised how cumbersome I find working with > python packages when I recently spent a lot of time on a javascript > project using npm. A bit of googling and I found several articles > discussing pip, venv and npm, and all of them seemed to say the same > thing, i.e. pip/venv could learn a lot from npm. > > My proposal revolves around two issues: > > 1. Setting up and working with virtual environments can be onerous. > Creating one is easy enough, but using them means remembering to > run `source activate` every time, which also means remembering > which venv is used for which project. Not a major issue, but > still and annoyance. > 2. Managing lists of required packages is not nearly as easy as in > npm since these is no equivalent to `npm install --save ...`. The > best that pip offers is `pip freeze`. Howevere, using that is a) > an extra step to remember and b) includes all implied dependencies > which is not ideal. > > My proposal is to use a similar model to npm, where each project has a > `venvrc` file which lets python-related tools know which environment > to use. In order to showcase the sort of funcionality I'm proposing, > I've created a basic example on github > (https://github.com/aquavitae/pyle). This is currently py3.4 on linux > only and very pre-alpha. Once I've added a few more features that I > have in mind (e.g. multiple venvs) I'll add it to pypi and if there is > sufficient interest I'd be happy to write up a PEP for getting it into > the stdlib. > > Does this seem like the sort of tool that would be useful in the stdlib? > > Regards > > David > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sun May 31 15:10:54 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 31 May 2015 22:10:54 +0900 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: <20150531113225.GW932@ando.pearwood.info> References: <20150531113225.GW932@ando.pearwood.info> Message-ID: <87pp5guc2p.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > I don't think this is the right place to discuss either of those ideas. I think you're missing the point -- this is part of the larger discussion on packaging, as Alexander recognized ("shoot this over to distutils-sig", he said). While technically it may belong elsewhere (distutils, for example), the amount of attention it's attracting from core committers right now suggests that it's a real pain point, and should get discussion from the wider community while requirements are still unclear. While I'm not one for suggesting that TOOWTDI is obvious in advance (and not even if you're Dutch), surely it's worth narrowing down the field by looking at a lot of ideas. From ncoghlan at gmail.com Sun May 31 17:04:07 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 1 Jun 2015 01:04:07 +1000 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: <87pp5guc2p.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20150531113225.GW932@ando.pearwood.info> <87pp5guc2p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 31 May 2015 at 23:10, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > I don't think this is the right place to discuss either of those ideas. > > I think you're missing the point -- this is part of the larger > discussion on packaging, as Alexander recognized ("shoot this over to > distutils-sig", he said). While technically it may belong elsewhere > (distutils, for example), the amount of attention it's attracting from > core committers right now suggests that it's a real pain point, and > should get discussion from the wider community while requirements are > still unclear. > > While I'm not one for suggesting that TOOWTDI is obvious in advance > (and not even if you're Dutch), surely it's worth narrowing down the > field by looking at a lot of ideas. There are a plethora of environment management options out there, and https://github.com/pypa/python-packaging-user-guide/issues/118 discusses some of them (focusing specifically on the ad hoc environment management side of things rather than VCS linked environment management, though). The npm model in particular unfortunately gets a lot of its "simplicity" by isolating all the dependencies from each other during component development (including freely permitting duplicates and even different versions of the same component), so you get the excitement of live integration at runtime instead of rationalising your dependency set as part of your design and development process (see https://speakerdeck.com/nzpug/francois-marier-external-dependencies-in-web-apps-system-libs-are-not-that-scary?slide=9 ). As developers, we can make our lives *very* easy if we're happy to discount the interests of other folks that are actually tasked with deploying and maintaining our code (either an operations team if we have one, or at the very least future maintainers if we don't). So while there are still useful user experience lessons to be learned from npm, they require careful filtering to ensure they actually *are* a simplification of the overall user experience, rather than cases where the designers of the system have made things easier for developers working on the project itself at the expense of making them harder for operators and end users that just want to install it (potentially as part of a larger integrated system). Cheers, Nick. P.S. I've unfortunately never found the time to write up my own packaging system research properly, but https://bitbucket.org/ncoghlan/misc/src/default/talks/2013-07-pyconau/packaging/brispy-talk.md has some rough notes from a couple of years ago, while https://fedoraproject.org/wiki/Env_and_Stacks/Projects/UserLevelPackageManagement looks at the general problem space from an operating system developer experience design perspective. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From donald at stufft.io Sun May 31 17:17:46 2015 From: donald at stufft.io (Donald Stufft) Date: Sun, 31 May 2015 11:17:46 -0400 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: References: <20150531113225.GW932@ando.pearwood.info> <87pp5guc2p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On May 31, 2015 at 11:05:24 AM, Nick Coghlan (ncoghlan at gmail.com) wrote: > On 31 May 2015 at 23:10, Stephen J. Turnbull wrote: > > Steven D'Aprano writes: > > > > > I don't think this is the right place to discuss either of those ideas. > > > > I think you're missing the point -- this is part of the larger > > discussion on packaging, as Alexander recognized ("shoot this over to > > distutils-sig", he said). While technically it may belong elsewhere > > (distutils, for example), the amount of attention it's attracting from > > core committers right now suggests that it's a real pain point, and > > should get discussion from the wider community while requirements are > > still unclear. > > > > While I'm not one for suggesting that TOOWTDI is obvious in advance > > (and not even if you're Dutch), surely it's worth narrowing down the > > field by looking at a lot of ideas. > > There are a plethora of environment management options out there, and > https://github.com/pypa/python-packaging-user-guide/issues/118 > discusses some of them (focusing specifically on the ad hoc > environment management side of things rather than VCS linked > environment management, though). > > The npm model in particular unfortunately gets a lot of its > "simplicity" by isolating all the dependencies from each other during > component development (including freely permitting duplicates and even > different versions of the same component), so you get the excitement > of live integration at runtime instead of rationalising your > dependency set as part of your design and development process (see > https://speakerdeck.com/nzpug/francois-marier-external-dependencies-in-web-apps-system-libs-are-not-that-scary?slide=9 > ). As developers, we can make our lives *very* easy if we're happy to > discount the interests of other folks that are actually tasked with > deploying and maintaining our code (either an operations team if we > have one, or at the very least future maintainers if we don't). > > So while there are still useful user experience lessons to be learned > from npm, they require careful filtering to ensure they actually *are* > a simplification of the overall user experience, rather than cases > where the designers of the system have made things easier for > developers working on the project itself at the expense of making them > harder for operators and end users that just want to install it > (potentially as part of a larger integrated system). > > Cheers, > Nick. > > P.S. I've unfortunately never found the time to write up my own > packaging system research properly, but > https://bitbucket.org/ncoghlan/misc/src/default/talks/2013-07-pyconau/packaging/brispy-talk.md > has some rough notes from a couple of years ago, while > https://fedoraproject.org/wiki/Env_and_Stacks/Projects/UserLevelPackageManagement > looks at the general problem space from an operating system developer > experience design perspective. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > One of the things that make NPM a lot simpler is that their ?virtualenv? is implicit and the default, and you have to go out of your way to get a ?global? install. It would be possible to add this to Python by doing something like ``sys.path.append(?./.python-modules/?)`` (but it also needs to recurse upwards) to the Python?startup (and possibly some file you can put in that folder so that it?doesn?t add the typical site-packages or user-packages to the sys.path. This makes it easier to have isolation being the default, however it comes with it?s own problems. It becomes a lot harder to determine what?s going to happen when you type ``python`` since you have to inspect the entire directory hierarchy above you looking for a .python_modules file. There?s also the problem that binary scripts tend to get installed into something like .python-modules/bin/ or so in that layout, but that?s rarely what people want. The npm community ?solved? this by having the actual CLI command be installable on it?s own that will call into the main program that you have installed per project. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From aquavitae69 at gmail.com Sun May 31 18:19:09 2015 From: aquavitae69 at gmail.com (David Townshend) Date: Sun, 31 May 2015 18:19:09 +0200 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com> References: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com> <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com> Message-ID: > > The default for npm is that your package dir is attached directly to the > project. You can get more flexibility by setting an environment variable or > creating a symlink, but normally you don't. It has about the same > flexibility as virtualenvwrapper, with about the same amount of effort. So > if virtualenvwrapper isn't flexible enough for you, my guess is that your > take on npm won't be flexible enough either, it'll just come preconfigured > for your own idiosyncratic use and everyone else will have to adjust... > You have a point. Maybe lack of flexibility is not actually the issue - it's too much flexibility. The problem that I have with virtualenv is that it requires quite a bit of configuration and a great deal of awareness by the user of what is going on and how things are configured. As stated on it's home page While there is nothing specifically wrong with this, I usually just want a way to do something in a venv without thinking too much about where it is or when or how to activate it. If you've had a look at the details of the sort of tool I'm proposing, it is completely transparent. Perhaps the preconfiguration is just to my own idiosyncrasies, but if it serves its use 90% of the time then maybe that is good enough. Some of what I'm proposing could be incorporated in to pip (i.e. better requirements) and some could possibly be incorporated into virtualenvwrapper (although I still think that my proposal for handling venvs is just too different from that of virtualenvwrapper to be worth pursuing that course), but one of the main aims is to merge it all into one tool that manages both the venv and the requirements. I'm quite sure that this proposal is not going to accepted without a trial period on pypi, so maybe that will be the test of whether this is useful. Is this the right place for this, or would distutils-sig be better? -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Sun May 31 19:07:41 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 31 May 2015 12:07:41 -0500 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: References: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com> <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com> Message-ID: On May 31, 2015 11:20 AM, "David Townshend" wrote: > > >> >> The default for npm is that your package dir is attached directly to the project. You can get more flexibility by setting an environment variable or creating a symlink, but normally you don't. I set variables in $VIRTUAL_ENV/bin/postactivate (for Python, Go, NPM, ...) [Virtualenvwrapper]. > It has about the same flexibility as virtualenvwrapper, with about the same amount of effort. So if virtualenvwrapper isn't flexible enough for you, my guess is that your take on npm won't be flexible enough either, it'll just come preconfigured for your own idiosyncratic use and everyone else will have to adjust... > > > You have a point. Maybe lack of flexibility is not actually the issue - it's too much flexibility. The problem that I have with virtualenv is that it requires quite a bit of configuration and a great deal of awareness by the user of what is going on and how things are configured. You must set WORKON_HOME and PROJECT_HOME. > As stated on it's home page While there is nothing specifically wrong with this, I usually just want a way to do something in a venv without thinking too much about where it is or when or how to activate it. If you've had a look at the details of the sort of tool I'm proposing, it is completely transparent. Perhaps the preconfiguration is just to my own idiosyncrasies, but if it serves its use 90% of the time then maybe that is good enough. > > Some of what I'm proposing could be incorporated in to pip (i.e. better requirements) and some could possibly be incorporated into virtualenvwrapper (although I still think that my proposal for handling venvs is just too different from that of virtualenvwrapper to be worth pursuing that course), but one of the main aims is to merge it all into one tool that manages both the venv and the requirements. * you can install an initial set of packages with just virtualenv (a minimal covering / only explicitly installed packages would be useful (for pruning deprecated dependencies)) * conda-env manages requirements for conda envs (conda env export) * http://conda.pydata.org/docs/test-drive.html#managing-environments * http://conda.pydata.org/docs/env-commands.html * I've a similar script for working with virtualenv (now venv) and/or conda envs in gh:westurner/dotfiles/dotfiles/venv/ipython_config.py that sets FSH paths and more commands and aliases (like cdv for cdvirtualenv) . IDK whether this would be useful for these use cases. So: * [ ] ENH: pip freeze --minimum-covering * [ ] ENH: pip freeze --explicit-only * [ ] DOC: virtualenv for NPM'ers > > I'm quite sure that this proposal is not going to accepted without a trial period on pypi, so maybe that will be the test of whether this is useful. > > Is this the right place for this, or would distutils-sig be better? PyPA: https://github.com/mitsuhiko/pipsi/issues/44#issuecomment-105961957 > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun May 31 21:00:57 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 31 May 2015 12:00:57 -0700 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: References: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com> <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com> Message-ID: On May 31, 2015, at 09:19, David Townshend wrote: >> >> The default for npm is that your package dir is attached directly to the project. You can get more flexibility by setting an environment variable or creating a symlink, but normally you don't. It has about the same flexibility as virtualenvwrapper, with about the same amount of effort. So if virtualenvwrapper isn't flexible enough for you, my guess is that your take on npm won't be flexible enough either, it'll just come preconfigured for your own idiosyncratic use and everyone else will have to adjust... > > You have a point. Maybe lack of flexibility is not actually the issue - it's too much flexibility. I think Python needs that kind of flexibility, because it's used in a much wider range of use cases, from binary end-user applications to OS components to "just run this script against your system environment" to conda packages, not just web apps managed by a deployment team and other things that fall into the same model. And it needs to be backward compatible with the different ways people have come up with for handling all those models. While it's possible to rebuild all of those models around the npm model, and the node community is gradually coming up with ways of doing so (although notice that much of the node community is instead relying on docker or VMs...), you'd have to be able to transparently replace all of the current Python use cases today if you wanted to change Python today. Also, as Nick pointed out, making things easier for the developer comes at the cost of making things harder for the user--which is acceptable when the user is the developer himself or a deployment team that sits at the next set of cubicles, but may not be acceptable when the user is someone who just wants to run a script he found online. Again, the Node community is coming to terms with this, but they haven't got to the same level as the Python community, and, even if they had, it still wouldn't work as a drop-in replacement without a lot of work. What someone _could_ do is make it easier to set up a dev-friendly environment based on virtualenvwrapper and virtualenvwrapperhelper. Currently, you have to know what you're looking for and find a blog page somewhere that tells you how to install and configure all the tools and follow three or four steps. That's obvious less than ideal. It would be nice if there were a single "pip install envstuff" that got you ready out of the box (including working for Windows cmd and PowerShell), and if links to that were included in the basic Python docs. It would also be nice if there were a way to transfer your own custom setup to a new machine. But I don't see why that can't all be built as improvements on the existing tools (and a new package that just included requirements and configuration and no new tools). > The problem that I have with virtualenv is that it requires quite a bit of configuration and a great deal of awareness by the user of what is going on and how things are configured. As stated on it's home page While there is nothing specifically wrong with this, I usually just want a way to do something in a venv without thinking too much about where it is or when or how to activate it. But again, if that's what you want, that's what you have with virtualenvwrapper or autoenv. You just cd into the directory (whether a new one you just created with the wrapper or an old one you just pulled from git) and it's set up for you. And setting up a new environment or cloning an existing one is just a single command, too. Sure, you can make your configuration more complicated than that, but if you don't want to, you don't have to. > If you've had a look at the details of the sort of tool I'm proposing, it is completely transparent. Perhaps the preconfiguration is just to my own idiosyncrasies, but if it serves its use 90% of the time then maybe that is good enough. > > Some of what I'm proposing could be incorporated in to pip (i.e. better requirements) and some could possibly be incorporated into virtualenvwrapper (although I still think that my proposal for handling venvs is just too different from that of virtualenvwrapper to be worth pursuing that course), but one of the main aims is to merge it all into one tool that manages both the venv and the requirements. There are major advantages in not splitting the Python community between two different sets of tools. We've only recently gotten past easy_install vs. pip and distribute vs. setuptools, which has finally enabled a clean story for everyone who wants to distribute packages to get it right, which has finally started to happen (although there are people still finding and following blog posts that tell them to install distribute or not to use virtualenv because it doesn't play nice with py2app or whatever). > I'm quite sure that this proposal is not going to accepted without a trial period on pypi, so maybe that will be the test of whether this is useful. > > Is this the right place for this, or would distutils-sig be better? Other people have made the case for both sides of that earlier in the thread and I'm not sure which one is more compelling... Also, the pure pip enhancement of coming up with something better than freeze/-r may belong on distutils-sig while the environment-aware launcher and/or environment-managing tools may belong here. (Notice that Python includes venv and the py launcher, but doesn't include setuptools or pip...) -------------- next part -------------- An HTML attachment was scrubbed... URL: From aquavitae69 at gmail.com Sun May 31 21:50:36 2015 From: aquavitae69 at gmail.com (David Townshend) Date: Sun, 31 May 2015 21:50:36 +0200 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: References: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com> <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com> Message-ID: On Sun, May 31, 2015 at 9:00 PM, Andrew Barnert wrote: > On May 31, 2015, at 09:19, David Townshend wrote: > > >> The default for npm is that your package dir is attached directly to the >> project. You can get more flexibility by setting an environment variable or >> creating a symlink, but normally you don't. It has about the same >> flexibility as virtualenvwrapper, with about the same amount of effort. So >> if virtualenvwrapper isn't flexible enough for you, my guess is that your >> take on npm won't be flexible enough either, it'll just come preconfigured >> for your own idiosyncratic use and everyone else will have to adjust... >> > > You have a point. Maybe lack of flexibility is not actually the issue - > it's too much flexibility. > > > I think Python needs that kind of flexibility, because it's used in a much > wider range of use cases, from binary end-user applications to OS > components to "just run this script against your system environment" to > conda packages, not just web apps managed by a deployment team and other > things that fall into the same model. And it needs to be backward > compatible with the different ways people have come up with for handling > all those models. > > While it's possible to rebuild all of those models around the npm model, > and the node community is gradually coming up with ways of doing so > (although notice that much of the node community is instead relying on > docker or VMs...), you'd have to be able to transparently replace all of > the current Python use cases today if you wanted to change Python today. > > Also, as Nick pointed out, making things easier for the developer comes at > the cost of making things harder for the user--which is acceptable when the > user is the developer himself or a deployment team that sits at the next > set of cubicles, but may not be acceptable when the user is someone who > just wants to run a script he found online. Again, the Node community is > coming to terms with this, but they haven't got to the same level as the > Python community, and, even if they had, it still wouldn't work as a > drop-in replacement without a lot of work. > > What someone _could_ do is make it easier to set up a dev-friendly > environment based on virtualenvwrapper and virtualenvwrapperhelper. > Currently, you have to know what you're looking for and find a blog page > somewhere that tells you how to install and configure all the tools and > follow three or four steps. That's obvious less than ideal. It would be > nice if there were a single "pip install envstuff" that got you ready out > of the box (including working for Windows cmd and PowerShell), and if links > to that were included in the basic Python docs. It would also be nice if > there were a way to transfer your own custom setup to a new machine. But I > don't see why that can't all be built as improvements on the existing tools > (and a new package that just included requirements and configuration and no > new tools). > > The problem that I have with virtualenv is that it requires quite a bit of > configuration and a great deal of awareness by the user of what is going on > and how things are configured. As stated on it's home page While there is > nothing specifically wrong with this, I usually just want a way to do > something in a venv without thinking too much about where it is or when or > how to activate it. > > > But again, if that's what you want, that's what you have with > virtualenvwrapper or autoenv. You just cd into the directory (whether a new > one you just created with the wrapper or an old one you just pulled from > git) and it's set up for you. And setting up a new environment or cloning > an existing one is just a single command, too. Sure, you can make your > configuration more complicated than that, but if you don't want to, you > don't have to. > > If you've had a look at the details of the sort of tool I'm proposing, it > is completely transparent. Perhaps the preconfiguration is just to my own > idiosyncrasies, but if it serves its use 90% of the time then maybe that is > good enough. > > > Some of what I'm proposing could be incorporated in to pip (i.e. better > requirements) and some could possibly be incorporated into > virtualenvwrapper (although I still think that my proposal for handling > venvs is just too different from that of virtualenvwrapper to be worth > pursuing that course), but one of the main aims is to merge it all into one > tool that manages both the venv and the requirements. > > > There are major advantages in not splitting the Python community between > two different sets of tools. We've only recently gotten past easy_install > vs. pip and distribute vs. setuptools, which has finally enabled a clean > story for everyone who wants to distribute packages to get it right, which > has finally started to happen (although there are people still finding and > following blog posts that tell them to install distribute or not to use > virtualenv because it doesn't play nice with py2app or whatever). > > I'm quite sure that this proposal is not going to accepted without a trial > period on pypi, so maybe that will be the test of whether this is useful. > > Is this the right place for this, or would distutils-sig be better? > > > Other people have made the case for both sides of that earlier in the > thread and I'm not sure which one is more compelling... > > Also, the pure pip enhancement of coming up with something better than > freeze/-r may belong on distutils-sig while the environment-aware launcher > and/or environment-managing tools may belong here. (Notice that Python > includes venv and the py launcher, but doesn't include setuptools or pip...) > Just to be clear, I'm not suggesting changing the python executable itself, or any of the other tools already in existence. My proposal is a separate wrapper around existing python, pip and venv which would not change anything about the way it works currently. A dev environment set up using it could still be deployed in the same way it would be now, and there would still be the option of using virtualenvwrapper, or something else for those that want to. It is obviously way too early to try to get it included in the next python release (apart form anything else, pip would need to be added first), so really this proposal is meant more to gauge interest in the concept so that if it is popular I can carry on developing it and preparing it for inclusion in the stdlib, or at least a serious discussion about including it, once it is mature. That said, Andrew's arguments have convinced me that much could be done to improve existing tools before creating a new one, although I still don't believe virtualenvwrapper can be squashed into the shape I'm aiming for without fundamental changes. Also, from the other responses so far it seems that the general feeling is that handling of requirements could definitely be improved, but that anything too prescriptive with venvs would be problematic. Unfortunately for my proposal, if something like what I'm suggesting were officially supported via inclusion in the stdlib it would quickly become, at best, the "strongly recommended" way of working and at worst the One Obvious Way. With all this in mind, I'll withdraw my proposal, but continue development on my version and see if it goes anywhere. I'll also see how much of it's functionality I can put into other tools (specifically pip's requirements handling) instead. -------------- next part -------------- An HTML attachment was scrubbed... URL: