From ethan at stoneleaf.us Thu Aug 1 00:02:20 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 31 Jul 2013 15:02:20 -0700 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <51F89E99.2060509@pearwood.info> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <51F89E99.2060509@pearwood.info> Message-ID: <51F9896C.1050809@stoneleaf.us> On 07/30/2013 10:20 PM, Steven D'Aprano wrote: > On 31/07/13 11:41, Terry Reedy wrote: >> On 7/30/2013 11:59 AM, Ronald Oussoren wrote: >> >>> "Never" is a long time. AFAIK the main reason why Python doesn't have >>> multi-line lambda's is that nobody has proposed a suitable syntax yet >>> (and not for lack of trying, the archives of this list and python-dev >>> contain a lot of proposals that were found lacking). >> >> There is also the fact that a generic .__name__ attribute of '' is inferior to a possibly unique and >> meaningful name. This is not just in tracebacks. Consider >> [, ] >> versus >> [ at 0x0000000003470B70>, at 0x0000000003470BF8>] > > > True, but if we're going to hypothesize nice syntax for multi-line lambdas, it's not much harder to imagine that there's > also nice syntax to give them a name and a doc string at the same time :-) We already have nice syntax to assign a name and doc string at the same time -- it's called `def`. ;) -- ~Ethan~ From abarnert at yahoo.com Thu Aug 1 01:17:47 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 31 Jul 2013 16:17:47 -0700 (PDT) Subject: [Python-ideas] Enhance definition of functions In-Reply-To: References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> Message-ID: <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: Paul Moore Sent: Wednesday, July 31, 2013 2:15 AM >On 31 July 2013 07:47, Andrew Barnert wrote: > >>> It might be lack of imagination on my part, but I have a lot of nested functions named "function" or "callback" that are too complex to be a lambda, but too simple or specialized to bother making them proper functions. The key function for sort is one of the usecases. >>> >>> I'd love to have anonymous functions for that, but haven't seen a proposal for those yet that would fit the language. >> >>Would it really help anything? If you're worried about keystrokes you can always call them "f" instead of "function". And I don't think anonymous functions would be as nice in tracebacks as even genetically-named ones. >> >>I think having to define them out of line is usually a more serious problem than having to name them, and if you solve that problem you may get the other one for free (although admittedly you may not, as the @in proposal shows...). >The only real reason I ever use lambdas (and would sometimes like a multiline version or similar) is for readability, where I want to pass a callback to a function and naming it and placing it before the call over-emphasises its importance. It's hard to make this objective, but to my eyes > > >? ? def k(obj): >? ? ? ? return obj['x'] / obj['y'] >? ? s = list(sorted(l, key=k) > > >reads marginally worse than > > >? ? s = list(sorted(l, key=k)) where: >? ? ? ? def k(obj): >? ? ? ? ? ? return obj['x'] / obj['y'] > > >simply because the focus of the block of code (building a sorted list) is at the start in the latter. This is "the @in proposal" I referenced earlier?specifically, PEP 403 (http://www.python.org/dev/peps/pep-0403/). With PEP 403, your code would actually look like this: @in s = list(sorted(l, key=k)) def k(obj): ? ? return obj['x'] / obj['y'] Your syntax is closer to that of PEP 3150, but if you look toward the end of PEP 403, it specifically describes using PEP 3150-like syntax for the (simpler) PEP 403 semantics, and the result is exactly your suggestion except with the keyword "given" instead of "where". >But because the difference is so subtle, it's very hard to get a syntax that improves things sufficiently to justify new syntax. And it's also not at all obvious to me that any improvement in readability that can be gained in simple example code that you can post in an email, will actually still be present in "real world" code (which, in my experience, is always far messier than constructed examples :-)) The motivating examples in PEP 403 are, like yours, marginally better, for pretty much the same reason?putting the focus of the code at the start.?And when PEP 403 pops up in relation to some different proposal, it's "if we had PEP 403, there would be less reason to want this new idea? but still not zero". And so on. It feels like there should be more benefit to the idea than this, but nobody's found it yet. And that's why it's stalled and deferred. If you can come up with a better motivating example, even if it's too hard to put into an email, that could definitely be helpful. But anyway, I think you're mostly agreeing with me. When neither lambda nor def feels right, it's usually not because you really want a multi-line expression, or a multi-line anonymous function, but because you want to get the petty details of the function "out of the way" of the important code, right? From rymg19 at gmail.com Thu Aug 1 02:25:27 2013 From: rymg19 at gmail.com (Ryan) Date: Wed, 31 Jul 2013 19:25:27 -0500 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: I just realized I misexpressed myself...again. I meant ASCII or binary, not text or binary. Kind of like the old FTP programs. The implementation would determine if it was ASCII or binary. And, the '/nothingness/is/eternal' is a quote from Xemnas in Kingdom Hearts. I was hoping someone would pick it up. Terry Reedy wrote: >On 7/31/2013 3:03 PM, Ryan wrote: >> 1.The link I provided wasn't how I wanted it to be. > >And there is no 'one way' that will satisfy everyone, or every most >people, as they will have different use cases for 'istext'. > >> I was using it as an example to show it wasn't impossible. > >It is obviously possible to apply any arbitrary predicate to any object > >within its input domain. No one has claimed otherwise that I know of. > >> 2.You yourself stated it doesn't work on UTF-8 files. If you wanted >one >> that worked on all text files, it wouldn't work right. > >The problem is that the problem is ill-defined. Every file is (or can >be >viewed as) a sequence of binary bytes. Every file can be interpreted as > >a text file encoded with any of the encodings (like at least some >latin-1 encodings, and the IBM PC Graphics encoding) that give a >character meaning to every byte. So, to be strict, every file is both >binary and text. Python allows us to open any file as either binary or >text (with some encoding, with latin-1 one of the possible choices). > >The pragmatic question is 'Is this file 'likely' *intended* to be >interpreted as text, given that the creator is a member of our *local >culture*. For the function you referenced, the 'local culture' is >'closed Western European'. For 'closed American', the threshold of >allowed non-ascii text and control chars should be more like 0 or 1%. >For many cultures, the referenced function is nonsensical. > >For an open global context, istext would have to try all standard text >encodings and for those that worked, apply the grammar rules of the >languages that normally are encoded with that encoding. > >-- >Terry Jan Reedy > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From clay.sweetser at gmail.com Thu Aug 1 02:39:57 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Wed, 31 Jul 2013 20:39:57 -0400 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: On Jul 31, 2013 8:26 PM, "Ryan" wrote: > > I just realized I misexpressed myself...again. I meant ASCII or binary, not text or binary. Kind of like the old FTP programs. The implementation would determine if it was ASCII or binary. Even so, that raises the question, "Why ASCII? why not Unicode, or any of the other hundreds of text formats out there?" If this is something to be included into the standard library, a collection used by people from all around the world, some forethought into the backgrounds of it's users should be taken into consideration. > > And, the '/nothingness/is/eternal' is a quote from Xemnas in Kingdom Hearts. I was hoping someone would pick it up. > > > Terry Reedy wrote: >> >> On 7/31/2013 3:03 PM, Ryan wrote: >>> >>> 1.The link I provided wasn't how I wanted it to be. >> >> >> And there is no 'one way' that will satisfy everyone, or every most >> people, as they will have different use cases for 'istext'. >> >>> I was using it as an example to show it wasn't impossible. >> >> >> It is obviously possible to apply any arbitrary predicate to any object >> within its input domain. No one has claimed otherwise that I know of. >> >>> 2.You yourself stated it doesn't work on UTF-8 files. >>> If you wanted one >>> that worked on all text files, it wouldn't work right. >> >> >> The problem is that the problem is ill-defined. Every file is (or can be >> viewed as) a sequence of binary bytes. Every file can be interpreted as >> a text file encoded with any of the encodings (like at least some >> latin-1 encodings, and the IBM PC Graphics encoding) that give a >> character meaning to every byte. So, to be strict, every file is both >> binary and text. Python allows us to open any file as either binary or >> text (with some encoding, with latin-1 one of the possible choices). >> >> The pragmatic question is 'Is this file 'likely' *intended* to be >> interpreted as text, given that the creator is a member of our *local >> culture*. For the function you referenced, the 'local culture' is >> 'closed Western European'. For 'closed American', the threshold of >> allowed non-ascii text and control chars should be more like 0 or 1%. >> For many cultures, the referenced function is nonsensical. >> >> For an open global context, istext would have to try all standard text >> encodings and for those that worked, apply the grammar rules of the >> languages that normally are encoded with that encoding. > > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Aug 1 02:42:24 2013 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 1 Aug 2013 01:42:24 +0100 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: On Wed, Jul 31, 2013 at 4:40 PM, Ryan wrote: > Here's something more interesting than my shlex idea. > > os.path is, pretty much, the Python FS toolbox, along with shutil. But, > there's one feature missing: check if a file is binary. It isn't hard, see > http://code.activestate.com/recipes/173220/. But, writing 50 lines of code > for a more common task isn't really Python-ish. > > So... > > What if os.path had a binary checker that works just like isfile: > os.path.isbinary('/nothingness/is/eternal') # Returns boolean Going right back to the beginning here. Suppose this were deemed useful. Why should it be in os.path? Nothing else there, as far as I know, looks at the *contents* of a file. Everything's looking at directory entries, sometimes not even that (eg os.path.basename is pure string manipulation). I should be able to getctime() on a file even without permission to read it. I can't see whether it's binary or text without read permission. This sounds more like a job for a file-like object, maybe a subclass of file that reads (and buffers) the first 512 bytes, guesses whether it's text or binary, and then watches everything that goes through after that and revises its guess later on. And then the question becomes: How useful would that be? But mainly, I think it's only going to cause problems to have a potentially expensive operation stuck away with the very cheap operations in os.path. ChrisA From steve at pearwood.info Thu Aug 1 03:59:48 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 01 Aug 2013 11:59:48 +1000 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: <51F9C114.4050902@pearwood.info> On 01/08/13 05:03, Ryan wrote: > 1.The link I provided wasn't how I wanted it to be. I was using it as an example to show it wasn't impossible. But it *is* impossible even in principle to tell the difference between "text" and "binary", since both text and binary files are made up of the same bytes. Whether something is text or binary depends in part on the intention of the reader. E.g. a text file containing the ASCII string "Greetings and salutations Ryan\r\n" is bit-for-bit identical with a binary file containing four C doubles: 1.6937577544703708e+190 2.6890193974129695e+161 9.083672029092351e+223 2.9908963169274674e-260 So any such "is binary" function cannot determine whether a file actually is binary or not. The best it can do is "might be text". That perhaps leads to a less bad (although maybe not actually good) idea, a function which takes an encoding and tries to determine whether or not the contexts of the file could be text in that encoding. But really, file type guessing is too complex to be a simple function like "isbinary" or even "maybetext". > 3.Did no one get the 'nothingness/is/eternal' joke? Not me. -- Steven From steve at pearwood.info Thu Aug 1 04:11:19 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 01 Aug 2013 12:11:19 +1000 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: <51F9C3C7.4050009@pearwood.info> On 01/08/13 10:25, Ryan wrote: > I just realized I misexpressed myself...again. I meant ASCII or binary, not text or binary. Kind of like the old FTP programs. The implementation would determine if it was ASCII or binary. Still can't be done reliably, but even if it could, what's so special about ASCII? Should we have dozens of such functions? isascii isbig5 iskoi8u iskoi8r and so on? The concept of "isbinary" is fundamentally flawed. The concept of "try to guess what sort of data a file might plausibly contain" is not flawed, but is a much, much bigger problem than is suitable for a simple os.path function. -- Steven From rymg19 at gmail.com Thu Aug 1 04:45:33 2013 From: rymg19 at gmail.com (Ryan) Date: Wed, 31 Jul 2013 21:45:33 -0500 Subject: [Python-ideas] os.path.isbinary In-Reply-To: <51F9C3C7.4050009@pearwood.info> References: <51F9C3C7.4050009@pearwood.info> Message-ID: What about something like this: https://github.com/ahupp/python-magic And, I explained the joke in my last post. Steven D'Aprano wrote: >On 01/08/13 10:25, Ryan wrote: >> I just realized I misexpressed myself...again. I meant ASCII or >binary, not text or binary. Kind of like the old FTP programs. The >implementation would determine if it was ASCII or binary. > >Still can't be done reliably, but even if it could, what's so special >about ASCII? Should we have dozens of such functions? > >isascii >isbig5 >iskoi8u >iskoi8r > >and so on? > > >The concept of "isbinary" is fundamentally flawed. The concept of "try >to guess what sort of data a file might plausibly contain" is not >flawed, but is a much, much bigger problem than is suitable for a >simple os.path function. > > >-- >Steven >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Thu Aug 1 04:57:58 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 31 Jul 2013 22:57:58 -0400 Subject: [Python-ideas] os.path.isbinary In-Reply-To: <51F9C3C7.4050009@pearwood.info> References: <51F9C3C7.4050009@pearwood.info> Message-ID: On Wed, Jul 31, 2013 at 10:11 PM, Steven D'Aprano wrote: > Still can't be done reliably, but even if it could, what's so special > about ASCII? Lots of things are special about ASCII. It is a 7-bit subset of pretty much every modern encoding scheme. Being 7-bit, it can be fairly reliably distinguished from most binary formats. Same is true about UTF-8. It is very unlikely that a binary dump of a double array make a valid UTF-8 text and vice versa - UTF-8 text interpreted as a list of doubles is unlikely to produce numbers that are in a reasonable range. I would not mind seeing an "istext()" function somewhere in the stdlib that would only recognize ASCII and UTF-8 as text. -------------- next part -------------- An HTML attachment was scrubbed... URL: From grosser.meister.morti at gmx.net Thu Aug 1 05:19:50 2013 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Thu, 01 Aug 2013 05:19:50 +0200 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: <51F97F8E.30507@gmx.net> Message-ID: <51F9D3D6.80101@gmx.net> On 07/31/2013 11:29 PM, Chris Kaynor wrote: > Besides the high chance of false positives, what makes this method (and the problem it tries to solve) so so difficult is that binary files may contain what is considered to be large amounts of text, and text files may contain pieces of binary data. > For example, consider a windows executable file - Much of the data in such a file is considered binary data, but there are defined sections where strings and text resources are stored. Any heuristic algorithm like the one mentioned will be insufficient in such cases. > Although I can't think of a situation off hand where the opposite may be true (binary data embedded in what is considered to be a text file) I'm pretty sure such a situation exists. > > > One could consider PDF to be such a format (text with embedded binary data). > > > RTF is another example. > > Doesn't RTF use base64 or hex encoding? PDF uses *binary*, not base64. From goktug.kayaalp at gmail.com Thu Aug 1 05:24:21 2013 From: goktug.kayaalp at gmail.com (=?UTF-8?B?R8O2a3R1xJ8gS2F5YWFscA==?=) Date: Thu, 1 Aug 2013 06:24:21 +0300 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> Message-ID: I believe that anything that can not be expressed as a Python anonymous function must be a def. It is possible to express conditionals and loops within a lambda statement, if that is what you are looking for: >>> hidden = [ file for file in os.listdir(givenpath) if file.startswith(".")] \ ..: if isdir(givenpath) \ ..: else [givenpath.startswith(".") BTW, if a multi-statement anonymous function syntax was to be considered seriously, I'd recommend a lambda statement with colon replaced with a brace-delimited block, which would barely cause code written for an interpreter lacking it get refused: server = nodeishServer( lambda req, res { res.writeHead(200, ContentType= "text/html"); res.end("Hello"); } ) In fact, grammar for every statement which introduces a new block (if, def, for, with, lambda) can be altered such that if the statement ends with a `:' (semicolon), following lines are parsed as in usual Python syntax, or, if the statement ends with a `{' (left brace), following lines are parsed with non-indentation defined, C-ish syntax. So: def fibonacci(n): x, y, z = 1, 1, 0 for i in range(1, n): z = x x += y y = x return x could be also written as def fibonacci(n) { x, y, z = 1, 1, 0; for i in range(1, n) { z = x; x += y; y = x; } return x; } which is a) subject of a different thread, and b) ridiculous. -gk On 30 July 2013 18:19, Musical Notation wrote: > Yes, I know that multiline lambda will never be implemented in Python, but > in many languages it is possible to write an anonymous function without > using lambda at all. > In JavaScript: > Instead of "function (){code}" you can write "var name; > name=function(){code}" > Python (proposed): > def func(a,b): > print(a+b) > return a+b > > becomes > > func=function a,b: > print(a+b) > return a+b > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From abarnert at yahoo.com Thu Aug 1 06:07:51 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 31 Jul 2013 21:07:51 -0700 (PDT) Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: <51F9C3C7.4050009@pearwood.info> Message-ID: <1375330071.91143.YahooMailNeo@web184705.mail.ne1.yahoo.com> From: Ryan Sent: Wednesday, July 31, 2013 7:45 PM >What about something like this: > >https://github.com/ahupp/python-magic That particular wrapper is just a ctypes wrapper libmagic.so/.dylib, and not a very portable one (e.g., it's got the path?/usr/local/Cellar/libmagic/5.10/ hardcoded into it? which will only work for Mac Homebrew users who are 4 versions/19 months out of date?).?Also, note that libmagic already comes with very similar Python ctypes-based bindings. However, there are a half-dozen other wrappers around libmagic on PyPI, and it's pretty trivial to create a new one.?The tricky bit is where to get the libmagic code and data files from.?If you want to make it usable on most platforms, you'd need to add the file source distribution to the Python source, build it with Python, and statically link it into a module, ala zlib or sqlite. And, unlike those modules, you'd also need to include a data file (magic.mgc) with the binary distribution. I slapped together a quick&dirty wrapper to see what the costs are. It adds 640KB to the 14MB source distribution, 300KB to the 91MB binary (64-bit Mac framework build), and under 10 seconds to the build process.?There'd be a bit of an extra maintenance burden in tracking updates (the most recent two updates were 21 Mar 2013 and 22 Feb 2013). The code and data are BSD-licensed, which shouldn't be a problem. The library is very portable: "./configure --enable-static; make" worked even on Windows. So, is it worth adding to Python? I don't know. But it seems at least feasible. From abarnert at yahoo.com Thu Aug 1 06:35:02 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 31 Jul 2013 21:35:02 -0700 (PDT) Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: <51F9C3C7.4050009@pearwood.info> Message-ID: <1375331702.12723.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: Alexander Belopolsky Sent: Wednesday, July 31, 2013 7:57 PM >On Wed, Jul 31, 2013 at 10:11 PM, Steven D'Aprano wrote: > >>Still can't be done reliably, but even if it could, what's so special about ASCII? >Lots of things are special about ASCII. ?It is a 7-bit subset of pretty much every modern encoding scheme. ?Being 7-bit, it can be fairly reliably distinguished from most binary formats. ? Same is true about UTF-8. ? It is very unlikely that a binary dump of a double array make a valid UTF-8 text and vice versa - UTF-8 text interpreted as a list of doubles is unlikely to produce numbers that are in a reasonable range. > >I would not mind seeing an "istext()" function somewhere in the stdlib that would only recognize ASCII and UTF-8 as text.? Plenty of files in popular charsets are actually perfectly valid UTF-8, but garbage when read that way. This and its converse are probably the most common cause of mojibake problems people have today. (I don't know if you can search Stack Overflow for problems with "?" in the description, but if you can, it'll be illuminating.)?Do you really want a file that sorts half your Latin-1 files into "UTF-8 text files" that are unreadable garbage and the other half into "binary files"? Also, while ASCII is much simpler and more robust to detect, it's not nearly as useful as it used to be. We don't have to deal with 7-bit data channels very often nowadays? and when you do, do you really want to treat pickle format 0 or base-64 or RTF as "text"? Meanwhile,?text-processing code that only handles ASCII is generally considered broken. Anyway, if you want that "istext()" function, it's trivial to write it yourself: ? ? def istext(b): ? ? ? ? try: ? ? ? ? ? ? b.decode('utf-8') ? ? ? ? except?UnicodeDecodeError: ? ? ? ? ? ? return False ? ? ? ? else: ? ? ? ? ? ? return True (There's no reason to try 'ascii', because any ASCII-decodable text is also UTF-8-decodable.) And really, since you're usually going to do something like this: ? ? if istext(b): ? ? ? ? dotextstuff(b) ? ? else: ? ? ? ? dobinarystuff(b) ? you're probably better off following EAFP and just doing this: ? ? try: ? ? ? ? dotextstuff(b) ? ? except UnicodeDecodeError: ? ? ? ? dobinstuff(b) From steve at pearwood.info Thu Aug 1 07:49:39 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 01 Aug 2013 15:49:39 +1000 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> Message-ID: <51F9F6F3.7070401@pearwood.info> On 01/08/13 13:24, G?ktu? Kayaalp wrote: > BTW, if a multi-statement anonymous function syntax was to be considered > seriously, I'd recommend a lambda statement with colon replaced with a > brace-delimited block You can get an idea of how brace-delimited blocks are treated by using a __future__ directive. At the interactive interpreter: from __future__ import braces and then take it from there. -- Steven From stephen at xemacs.org Thu Aug 1 08:45:32 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 01 Aug 2013 15:45:32 +0900 Subject: [Python-ideas] os.path.isbinary In-Reply-To: <1375331702.12723.YahooMailNeo@web184703.mail.ne1.yahoo.com> References: <51F9C3C7.4050009@pearwood.info> <1375331702.12723.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: <8761vpoqnn.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > Plenty of files in popular charsets are actually perfectly valid > UTF-8, FVO "popular charset" in {ASCII} or "plenty of files" in "len(file) < 1KB", yes. Otherwise, see below. > but garbage when read that way. This and its converse are The converse *is* a problem, because the ISO 8859 family (and even more so the Windows 125x family) basically use up all the bytes. > probably the most common cause of mojibake problems people have > today. Actually the most common cause in my experience is Apache or MUA configuration of a default charset and/or fallback to Latin-1 for files actually written in UTF-8, combined with conformant browsers and MUAs that respect transport-level defaults or protocol defaults rather than try to detect the charset. Viz: > (I don't know if you can search Stack Overflow for problems with > "?" in the description, but if you can, it'll be illuminating.) But: > ? you're probably better off following EAFP and just doing this: > > ? ? try: > ? ? ? ? dotextstuff(b) > ? ? except UnicodeDecodeError: > ? ? ? ? dobinstuff(b) Yes, indeedy! Just because those algorithms exist doesn't mean it's a good idea to use them (outside of some interactive applications like text editors where the user can look at the mojibake and tell the editor either the right encoding or to try another guess). From steve at pearwood.info Thu Aug 1 15:58:32 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 01 Aug 2013 23:58:32 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F39A87.5030209@pearwood.info> References: <51F39A87.5030209@pearwood.info> Message-ID: <51FA6988.6050307@pearwood.info> I have raised an issue on the tracker for this: http://bugs.python.org/issue18614 -- Steven From mertz at gnosis.cx Thu Aug 1 20:29:07 2013 From: mertz at gnosis.cx (David Mertz) Date: Thu, 1 Aug 2013 11:29:07 -0700 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: On Wed, Jul 31, 2013 at 5:42 PM, Chris Angelico wrote: > This sounds more like a job for a file-like object, maybe a subclass > of file that reads (and buffers) the first 512 bytes, guesses whether > it's text or binary, and then watches everything that goes through > after that and revises its guess later on. Something like: if fh.read(512).isprintable(): do_the_ascii_stuff(fh) else: do_the_bin_stuff(fh) -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Thu Aug 1 20:45:16 2013 From: rymg19 at gmail.com (Ryan) Date: Thu, 01 Aug 2013 13:45:16 -0500 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: <90b4885a-3858-410d-821f-0c1dab48b420@email.android.com> That's a pretty good idea. Or it could be like this: if fh.printable(): It would have an optional argument: the number of bytes to read in. Default is 512. So, if we wanted 1024 bytes instead of 512: if fh.printable(1024): David Mertz wrote: >On Wed, Jul 31, 2013 at 5:42 PM, Chris Angelico >wrote: > >> This sounds more like a job for a file-like object, maybe a subclass >> of file that reads (and buffers) the first 512 bytes, guesses >whether >> it's text or binary, and then watches everything that goes through >> after that and revises its guess later on. > > >Something like: > > if fh.read(512).isprintable(): > do_the_ascii_stuff(fh) > else: > do_the_bin_stuff(fh) > > > >-- >Keeping medicines from the bloodstreams of the sick; food >from the bellies of the hungry; books from the hands of the >uneducated; technology from the underdeveloped; and putting >advocates of freedom in prisons. Intellectual property is >to the 21st century what the slave trade was to the 16th. > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Aug 1 20:47:22 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 01 Aug 2013 19:47:22 +0100 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: <51FAAD3A.2070903@mrabarnett.plus.com> On 01/08/2013 19:29, David Mertz wrote: > On Wed, Jul 31, 2013 at 5:42 PM, Chris Angelico > wrote: > > This sounds more like a job for a file-like object, maybe a subclass > of file that reads (and buffers) the first 512 bytes, guesses whether > it's text or binary, and then watches everything that goes through > after that and revises its guess later on. > > > Something like: > > if fh.read(512).isprintable(): > do_the_ascii_stuff(fh) > else: > do_the_bin_stuff(fh) > Except that: >>> "\n".isprintable() False From bruce at leapyear.org Thu Aug 1 19:14:11 2013 From: bruce at leapyear.org (Bruce Leban) Date: Thu, 1 Aug 2013 10:14:11 -0700 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51FA6988.6050307@pearwood.info> References: <51F39A87.5030209@pearwood.info> <51FA6988.6050307@pearwood.info> Message-ID: I wonder if this should also support the special labels for characters without names: control-NNNN reserved-NNNN noncharacter-NNNN private-use-NNNN surrogate-NNNN see p. 138 of http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf I would think that unicodedata.name should not return these, but perhaps unicodedata.lookup should accept them. Note that the doc says that these are frequently displayed enclosed in <>, so perhaps unicodedata.lookup('U+0001') == unicodedata.lookup('control-0001') == unicodedata.lookup('') == '\x01' --- Bruce I'm hiring: http://www.cadencemd.com/info/jobs Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Thu Aug 1 22:05:49 2013 From: mertz at gnosis.cx (David Mertz) Date: Thu, 1 Aug 2013 16:05:49 -0400 Subject: [Python-ideas] os.path.isbinary In-Reply-To: <51FAAD3A.2070903@mrabarnett.plus.com> References: <51FAAD3A.2070903@mrabarnett.plus.com> Message-ID: On Thu, Aug 1, 2013 at 2:47 PM, MRAB wrote: >> Something like: >> >> if fh.read(512).isprintable(): >> do_the_ascii_stuff(fh) >> else: >> do_the_bin_stuff(fh) >> > Except that: > >>>> "\n".isprintable() > False Doesn't that seem like a bug: ----- Help on method_descriptor: isprintable(...) S.isprintable() -> bool Return True if all characters in S are considered printable in repr() or S is empty, False otherwise. ----- In what sense is "\n" "not printable in repr()"?! -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. From random832 at fastmail.us Thu Aug 1 22:18:52 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 01 Aug 2013 16:18:52 -0400 Subject: [Python-ideas] Remove tty module In-Reply-To: References: <51EF97F6.5080302@egenix.com> <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> Message-ID: <1375388332.27282.4765835.58C6D730@webmail.messagingengine.com> On Thu, Jul 25, 2013, at 11:58, Andrew Barnert wrote: > Faking termios on Windows (and presumably faking attributes for the > cmd.exe console window) would probably be almost as much work as faking > curses, and a lot less useful, so I'm not sure that would be worth doing. What about faking conio on Unix? There's no rule that says cross-platform python APIs have to be inspired by POSIX C APIs. It's a very simple API and is sufficient to build a pure-python windowing library on top of. From random832 at fastmail.us Thu Aug 1 22:31:13 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 01 Aug 2013 16:31:13 -0400 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: <51FAAD3A.2070903@mrabarnett.plus.com> Message-ID: <1375389073.31723.4778411.4C6F5543@webmail.messagingengine.com> On Thu, Aug 1, 2013, at 16:05, David Mertz wrote: > In what sense is "\n" "not printable in repr()"?! Because it prints as \n instead of as itself, doesn't it? Or is that only in Python 2? From tjreedy at udel.edu Fri Aug 2 00:09:09 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 01 Aug 2013 18:09:09 -0400 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51FA6988.6050307@pearwood.info> Message-ID: On 8/1/2013 1:14 PM, Bruce Leban wrote: > I wonder if this should also support the special labels for characters > without names: > > control-NNNN > reserved-NNNN > noncharacter-NNNN > private-use-NNNN > surrogate-NNNN > > see p. 138 of http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf > > I would think that unicodedata.name should not > return these, but perhaps unicodedata.lookup should accept them. Note > that the doc says that these are frequently displayed enclosed in <>, so > perhaps > > unicodedata.lookup('U+0001') > == unicodedata.lookup('control-0001') > == unicodedata.lookup('') > == '\x01' That is a lot of added complication of both doc and code for what seems like little gain. Why would someone write 'control-' instead of 'U+'? -- Terry Jan Reedy From alexander.belopolsky at gmail.com Fri Aug 2 00:58:49 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 1 Aug 2013 18:58:49 -0400 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51FA6988.6050307@pearwood.info> Message-ID: On Thu, Aug 1, 2013 at 6:09 PM, Terry Reedy wrote: > Why would someone write 'control-' instead of 'U+'? Because this is the recommended way to form the code-point labels: "For each code point type without character names, code point labels are constructed by using a lowercase prefix derived from the code point type, followed by a hyphen-minus and then a 4- to 6-digit hexadecimal representation of the code point." "To avoid any possible confusion with actual, non-null Name property values, constructed Unicode code point labels are often displayed between angle brackets: , , and so on. This convention is used consistently in the data files for the Unicode Character Database." "A constructed code point label is distinguished from the designation of the code point itself (for example, ?U+0009? or ?U+FFFF?), which is also a unique identifier, as described in Appendix A, Notational Conventions." < http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf> I would rather see unicodedata.lookup() to be extended to accept code-point labels rather than "the designation of the code point itself." The same applies to \N escape: I would rather see \N{control-NNNN} or \N{surrogate-NNNN} in string literals than some mysterious \N{U+NNNN}. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Aug 2 01:20:46 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 2 Aug 2013 09:20:46 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51FA6988.6050307@pearwood.info> Message-ID: On 2 Aug 2013 09:00, "Alexander Belopolsky" wrote: > > > On Thu, Aug 1, 2013 at 6:09 PM, Terry Reedy wrote: >> >> Why would someone write 'control-' instead of 'U+'? > > > Because this is the recommended way to form the code-point labels: > > "For each code point type without character names, code point labels are constructed by using a lowercase prefix derived from the code point type, followed by a hyphen-minus and then a 4- to 6-digit hexadecimal representation of the code point." > > "To avoid any possible confusion with actual, non-null Name property values, constructed Unicode code point labels are often displayed between angle brackets: , , and so on. This convention is used consistently in the data files for the Unicode Character Database." > > "A constructed code point label is distinguished from the designation of the code point itself (for example, ?U+0009? or ?U+FFFF?), which is also a unique identifier, as described in Appendix A, Notational Conventions." < http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf> > > I would rather see unicodedata.lookup() to be extended to accept code-point labels rather than "the designation of the code point itself." The same applies to \N escape: I would rather see \N{control-NNNN} or \N{surrogate-NNNN} in string literals than some mysterious \N{U+NNNN}. -1. I'd never even heard of code point labels before this thread, while the "U+" notation is incredibly common. Cheers, Nick. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Fri Aug 2 01:53:56 2013 From: rymg19 at gmail.com (Ryan) Date: Thu, 01 Aug 2013 18:53:56 -0500 Subject: [Python-ideas] os.path.isbinary In-Reply-To: <1375389073.31723.4778411.4C6F5543@webmail.messagingengine.com> References: <51FAAD3A.2070903@mrabarnett.plus.com> <1375389073.31723.4778411.4C6F5543@webmail.messagingengine.com> Message-ID: I see... This works: import re with open('test.xml', 'r') as f: print re.sub(r'(^(\'|\")|(\'|\")$)', '', repr(f.read()).replace('\\n', '\n')) But, repr can still print binary characters. Opening libexpat.so shows all sorts of crazy characters like \x00. random832 at fastmail.us wrote: >On Thu, Aug 1, 2013, at 16:05, David Mertz wrote: >> In what sense is "\n" "not printable in repr()"?! > >Because it prints as \n instead of as itself, doesn't it? Or is that >only in Python 2? >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Aug 2 01:55:47 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 1 Aug 2013 19:55:47 -0400 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51FA6988.6050307@pearwood.info> Message-ID: On Thu, Aug 1, 2013 at 7:20 PM, Nick Coghlan wrote: > I'd never even heard of code point labels before this thread, while the > "U+" notation is incredibly common. Nick, Did you see this part: "A constructed code point label is distinguished from the designation of the code point itself (for example, ?U+0009? or ?U+FFFF?), which is also a unique identifier"? The purpose of unicode.lookup() is to look up the unicode code point by name and "U+NNNN" is not a name - it is "the designation of the code point itself." There is no need to look up anything if you want to process an occasional s = "U+FFFF" string: chr(int(s[2:], 16) ) will do the job. The original proposal was to allow \U+NNNN escape as a shortcut for \U0000NNNN. This is a clear readability improvement while \N{U+001B}, for example, is not an improvement over \N{ESCAPE}. However, for more obscure control characters, \N{control-NNNN} may be clearer than any currently available spelling. For example, \N{control-001E} is easier to understand than \036, \x1e, \u001E, \N{RS} or even the most verbose \N{INFORMATION SEPARATOR TWO}. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leapyear.org Fri Aug 2 02:04:56 2013 From: bruce at leapyear.org (Bruce Leban) Date: Thu, 1 Aug 2013 17:04:56 -0700 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51FA6988.6050307@pearwood.info> Message-ID: On Thu, Aug 1, 2013 at 4:55 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > On Thu, Aug 1, 2013 at 7:20 PM, Nick Coghlan wrote: > >> I'd never even heard of code point labels before this thread, while the >> "U+" notation is incredibly common. > > > > > The original proposal was to allow \U+NNNN escape as a shortcut for > \U0000NNNN. This is a clear readability improvement while \N{U+001B}, for > example, is not an improvement over \N{ESCAPE}. However, for more obscure > control characters, \N{control-NNNN} may be clearer than any currently > available spelling. For example, \N{control-001E} is easier to > understand than \036, \x1e, \u001E, \N{RS} or even the most verbose > \N{INFORMATION SEPARATOR TWO}. > My reason to suggest including it is that it's in the standard as the label for these characters so it's reasonable to expect lookup to know about these labels just as it knows about 'EXCLAMATION MARK'. If someone has created data using the standard and passes it to unicode.lookup, it should work. I'm +/-0 on having 'control-' and 'reserved-' etc. simply being different spellings of 'U+' so that '\N{control-0021}' == '\N{U+0021}' == '\x21' == '!' even though that isn't a control character. That is, if the data doesn't conform to the standard, it wouldn't necessarily be terrible if it did something reasonable rather than raising an exception. And, I'm only suggesting this be supported on the reading side. --- Bruce I'm hiring: http://www.cadencemd.com/info/jobs Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Aug 2 02:17:14 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 1 Aug 2013 20:17:14 -0400 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51FA6988.6050307@pearwood.info> Message-ID: On Thu, Aug 1, 2013 at 8:04 PM, Bruce Leban wrote: > I'm +/-0 on having 'control-' and 'reserved-' etc. simply being different > spellings of 'U+' so that '\N{control-0021}' == '\N{U+0021}' == '\x21' == > '!' even though that isn't a control character. This misses the point of adding the code point type prefix. If you fat-finger \N{control-0021} instead of intended \N{control-0012} you would want a quick syntax error rather than an obscure bug. Similarly, when you are reading someone else's code, you don't want to consult the code table every time you see \N{control-NNNN} to assure that this is really a control character rather than a surrogate- or private-use- one. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Aug 2 03:08:58 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 1 Aug 2013 21:08:58 -0400 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F39A87.5030209@pearwood.info> References: <51F39A87.5030209@pearwood.info> Message-ID: On Sat, Jul 27, 2013 at 6:01 AM, Steven D'Aprano wrote: > Why do we need yet another way of writing escape sequences? > ------------------------------**----------------------------- > > We don't need another one, we need a better one. U+xxxx is the standard > Unicode notation, while existing Python escapes have various problems. > The current situation with \u and \U escapes can hardly qualify as an obvious way to do it. There is nothing obvious about either \u limitation to four digits nor \U requirement to have eight. (I remember discovering that after first trying something like \u1FFFF, then \U1FFFF and then checking the reference manual to discover \U0001FFFF. I don't think my experience was unique.) I have a counter-proposal that may improve the situation: allow 4, 5, 6 or 8 hex digits after \U optionally surrounded by braces. When used without braces, maximal munch rule applies: the escape sequence ends at the first non-hex-digit. I would allow only upper-case A-F in 4-6 digits escapes to minimize the need for braces. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Fri Aug 2 03:15:42 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 02 Aug 2013 10:15:42 +0900 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51FA6988.6050307@pearwood.info> Message-ID: <87mwp0nb9d.fsf@uwakimon.sk.tsukuba.ac.jp> Alexander Belopolsky writes: > On Thu, Aug 1, 2013 at 8:04 PM, Bruce Leban wrote: >> I'm +/-0 on having 'control-' and 'reserved-' etc. simply being >> different spellings of 'U+' so that '\N{control-0021}' == '\N{U+0021}' >> == '\x21' == '!' even though that isn't a control character. > This misses the point of adding the code point type prefix. Not really. That would just pass the responsibility for enforcing consistency to linters, instead of the translator. You can't just make this a syntax error because a code point may be reserved one Python version and a letter in another, depending on which versions of the Unicode tables are being used by those versions of Python. That would conflict with Unicode itself, which says that unknown code points must be treated as characters. This is way too fragile to be allowed to cause syntax errors. >?If you fat-finger?\N{control-0021} instead of intended?\N{control-0012} > you would want a quick syntax error rather than an obscure bug. >?Similarly, when you are reading someone else's code, you don't want > to consult the code table every time you see \N{control-NNNN} to > assure that this is really a control character rather than a > surrogate- or private-use- one. +0 on Bruce's idea, -1 on syntax errors It might be on rare occasions be useful to be strict about fixed-for- all-time types like surrogate and private use. (But even those weren't fixed for all time in the past!) Really, this is an editor or linter function. From python at mrabarnett.plus.com Fri Aug 2 03:46:40 2013 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 02 Aug 2013 02:46:40 +0100 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> Message-ID: <51FB0F80.1040001@mrabarnett.plus.com> On 02/08/2013 02:08, Alexander Belopolsky wrote: > > On Sat, Jul 27, 2013 at 6:01 AM, Steven D'Aprano > wrote: > > Why do we need yet another way of writing escape sequences? > ------------------------------__----------------------------- > > We don't need another one, we need a better one. U+xxxx is the > standard Unicode notation, while existing Python escapes have > various problems. > > > The current situation with \u and \U escapes can hardly qualify as an > obvious way to do it. There is nothing obvious about either \u > limitation to four digits nor \U requirement to have eight. (I remember > discovering that after first trying something like \u1FFFF, then > \U1FFFF and then checking the reference manual to discover \U0001FFFF. > I don't think my experience was unique.) > > I have a counter-proposal that may improve the situation: allow 4, 5, 6 > or 8 hex digits after \U optionally surrounded by braces. When used > without braces, maximal munch rule applies: the escape sequence ends at > the first non-hex-digit. I would allow only upper-case A-F in 4-6 > digits escapes to minimize the need for braces. > Perl has \x{...}. Ruby has \u{...}. Python would have \U{...}. We could follow Perl or Ruby, or both of them, or even allow braces with any of the hex escapes. From alexander.belopolsky at gmail.com Fri Aug 2 04:14:31 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 1 Aug 2013 22:14:31 -0400 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <87mwp0nb9d.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51F39A87.5030209@pearwood.info> <51FA6988.6050307@pearwood.info> <87mwp0nb9d.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Aug 1, 2013 at 9:15 PM, Stephen J. Turnbull wrote: > > Alexander Belopolsky writes: > > On Thu, Aug 1, 2013 at 8:04 PM, Bruce Leban wrote: > .. > > This misses the point of adding the code point type prefix. > > Not really. That would just pass the responsibility for enforcing > consistency to linters, instead of the translator. I have not seen a linter yet that would suggest that "\x41" should be written as "A". The choice of the best literal syntax requires human judgement. A linter cannot tell you when 1.00 is better than 1.0 or 1. I would choose a more verbose \N{control-NNNN} over shorter \uNNNN when I want to make it obvious to the human reader of my code that I use a control character rather than anything else. > > You can't just > make this a syntax error because a code point may be reserved one > Python version and a letter in another, depending on which versions of > the Unicode tables are being used by those versions of Python. That's true, but why would you write \N{reserved-NNNN} instead of \uNNNN to begin with? I would assume you would only choose a longer spelling when it is important for your program that you use a reserved character and your program will not work correctly with the UCD version where the NNNN code point is assigned. > > That > would conflict with Unicode itself, which says that unknown code > points must be treated as characters. This is way too fragile to be > allowed to cause syntax errors. You can always avoid syntax errors by using \uNNNN. If you choose to specify the character type you hopefully do it for a good reason. > > .. > > It might be on rare occasions be useful to be strict about fixed-for- > all-time types like surrogate and private use. There are only five type prefixes: control-, reserved-, non-character-, private-use-, and surrogate-. With the possible exception or reserved-, on a rare occasion when you want to be explicit about the character type, it is useful to be strict. In case of reserved-, I cannot think of any legitimate use for a reserved character in a string literal, so if strictness is a problem in this case, I would disallow \N{reserved-NNNN} altogether. > (But even those weren't fixed for all time in the past!) Now they are: control- property is immutable since version 1.1.5, surrogate- and private-use- since 2.0, and noncharacter- since 3.1.0. (See .) Moreover, since 2.1.0, "The enumeration of General_Category property values is fixed. No new values will be added." -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Aug 2 04:25:35 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 1 Aug 2013 22:25:35 -0400 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51FB0F80.1040001@mrabarnett.plus.com> References: <51F39A87.5030209@pearwood.info> <51FB0F80.1040001@mrabarnett.plus.com> Message-ID: On Thu, Aug 1, 2013 at 9:46 PM, MRAB wrote: > We could follow Perl or Ruby, or both of them, or even allow braces > with any of the hex escapes. > That choice is unfortunately precluded by backwards compatibility because both "\u1FFFF" and "\x1FFFF" are valid strings. (Are braces optional in Perl's \x{..} or Ruby's \u{..}?) Also, the upper-case U is more in-line with U+ notation and \N escape. If we are looking for "one obvious way," I think it should be \U with \x and \u remaining the other less obvious ways. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Fri Aug 2 05:30:29 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 02 Aug 2013 12:30:29 +0900 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51FB0F80.1040001@mrabarnett.plus.com> Message-ID: <87haf8n50q.fsf@uwakimon.sk.tsukuba.ac.jp> Alexander Belopolsky writes: > On Thu, Aug 1, 2013 at 9:46 PM, MRAB wrote: >> We could follow Perl or Ruby, or both of them, or even allow >> braces with any of the hex escapes. > That choice is unfortunately precluded by backwards compatibility > because both "\u1FFFF" and "\x1FFFF" are valid strings. (Are braces > optional in Perl's \x{..} or Ruby's \u{..}?)?Also, the upper-case U > is more in-line with U+ notation and \N escape. ?If we are looking > for "one obvious way," I think it should be \U with \x and \u > remaining the other less obvious ways. -1. The obvious way forward is \N{U+1FFFF}. That *looks* like an algorithmically generated name, and (wow!) that's what it *is*.[1] The existing \U, \u, and \x escapes are fine as they are. They can't really be deprecated because they're needed for portability to older Python versions which won't have any of the proposed extensions. Changing the syntax of \U to allow braces with a variable-width hexadecimal argument is only a minor compatibility break, but please have pity on the folks who support python-list. They'll forever be dealing with questions like "I know I've seen other people write '\U3bb', why do I get a weird syntax error?" and "I use Python 3.3. Why do I get a syntax error with '\U{3BB}'?" On the other hand, \N{U+1FFFF} will currently get a lookup failure. I think that's OK, since currently code needs to be prepared for that to fail anyway since it raises an error, and users will be used to it because it's easy to typo Unicode names when typing from memory -- they're pretty regular but not 100% so. Footnotes: [1] Of course, it's also an invalid code point in any Unicode stream. ;-) From steve at pearwood.info Fri Aug 2 06:11:39 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 02 Aug 2013 14:11:39 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51FA6988.6050307@pearwood.info> Message-ID: <51FB317B.2090907@pearwood.info> On 02/08/13 09:55, Alexander Belopolsky wrote: > The original proposal was to allow \U+NNNN escape as a shortcut for > \U0000NNNN. This is a clear readability improvement while \N{U+001B}, for > example, is not an improvement over \N{ESCAPE}. However, for more obscure > control characters, \N{control-NNNN} may be clearer than any currently > available spelling. For example, \N{control-001E} is easier to understand > than \036, \x1e, \u001E, \N{RS} or even the most verbose \N{INFORMATION > SEPARATOR TWO}. Despite the vigorous objections to a variable-length escape sequence[1] I still consider that the One Obvious Way to refer to a Unicode code-point numerically is by U+NNNN with 4-6 hex digits. Add a backslash to turn it into an escape sequence, and we have \U+NNNN. If I'm still around when Python 4000 is under development, I'll propose that syntax as an outright replacement for legacy escapes \xNN \oNNN \uNNNN and \U00NNNNNN (for strings, but not bytes, where \xNN is still the OOWTDI). But that's a *long* way away. In the meantime, we're constrained by backward compatibility to keep existing escape formats. There is considerable opposition to another variable-length escape sequence without delimiters, and \N{U+NNNN} seems to be a reasonable compromise to me even though it is actually longer than the current \U00NNNNNN escape. I consider this proposal to be about two things, conformity with Unicode notation, and clarity, not length. If somebody wishes to champion the proposal to support code-point labels, please start a separate thread. The two features are independent. [1] None of which persuade me -- many languages have variable-length octal escapes, and this is the first time I've ever heard anyone complain about them being harmful. -- Steven From abarnert at yahoo.com Fri Aug 2 06:47:24 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 1 Aug 2013 21:47:24 -0700 (PDT) Subject: [Python-ideas] Remove tty module In-Reply-To: <1375388332.27282.4765835.58C6D730@webmail.messagingengine.com> References: <51EF97F6.5080302@egenix.com> <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> <1375388332.27282.4765835.58C6D730@webmail.messagingengine.com> Message-ID: <1375418844.67829.YahooMailNeo@web184701.mail.ne1.yahoo.com> From: "random832 at fastmail.us" Sent: Thursday, August 1, 2013 1:18 PM > On Thu, Jul 25, 2013, at 11:58, Andrew Barnert wrote: >> Faking termios on Windows (and presumably faking attributes for the >> cmd.exe console window) would probably be almost as much work as faking >> curses, and a lot less useful, so I'm not sure that would be worth > doing. > > What about faking conio on Unix? There's no rule that says > cross-platform python APIs have to be inspired by POSIX C APIs. It's a > very simple API and is sufficient to build a pure-python windowing > library on top of. That's a great idea. The quasi-standardized core of conio is definitely not enough to write a windowing library. Although some of the old DOS implementations had gotoxy, setfg, and setbg functions, the Win32 implementations don't have those; they just have? well, the same functions as MSVCRT's console APIs, which the stdlib already wraps (see?http://docs.python.org/3/library/msvcrt.html#console-i-o for details). And?really, all we were looking for was a way to do raw input and a few related things in a cross-platform way (and, on Unix, an easier way than tty/termios), and those APIs seem like a good match.?And they'd be pretty easy to implement on Unix. Something like this (for Unix; the Windows implementations would just call the msvcrt functions): ? ? def _check_tty(f): ? ? ? ? if not f.isatty(): ? ? ? ? ? ? raise RuntimeError('consoleio on non-tty') ? ? @contextmanager ? ? def _tty_context(f, raw=False, echo=True): ? ? ? ? fd = f.fileno() ? ? ? ? stash = termios.tcgetattr(fd) ? ? ? ? tty.setraw(fd) ? ? ? ? yield ? ? ? ? termios.tcsetattr(fd, termios.TCSAFLUSH, stash) ? ?? ? ? _pushback = [] ? ? def kbhit(): ? ? ? ? # Not sure how to write this in a cross-platform way... ? ? def getwch(): ? ? ? ? _check_tty(sys.stdin) ? ? ? ? if _pushback: ? ? ? ? ? ? return _pushback.pop(0) ? ? ? ? with _tty_context(sys.stdin, raw=True, echo=False): ? ? ? ? ? ? return sys.stdin.read(1) ? ? def ungetwch(wch): ? ? ? ? _check_tty(sys.stdin) ? ? ? ? _pushback.append(wch) ? ? def putwch(wch): ? ? ? ? _check_tty(sys.stdout) ? ? ? ? sys.stdout.write(wch) ? ? ? ? sys.stdout.flush() ? ? # For the non-"wide" versions? just use sys.stdout.buffer instead of sys.stdout? If we can require the user to call enable(input_only=False) before using any consoleio functions, and disable() before using normal I/O (maybe "with consoleio.context(input_only=False):" to wrap it), it could be simpler (especially the kbhit part), and more efficient and reliable. Of course it is possible to have a TTY that isn't on stdin/stdout, or even two TTYs, but I don't think there's any need for the extra complexity. Anyway, if you're interested, I could clean this up, test it out, and put something up on PyPI, and then we could see if it gets enough traction to be worth considering stdlib-ifying. From rymg19 at gmail.com Fri Aug 2 07:43:38 2013 From: rymg19 at gmail.com (Ryan) Date: Fri, 02 Aug 2013 00:43:38 -0500 Subject: [Python-ideas] os.path.isbinary In-Reply-To: <90b4885a-3858-410d-821f-0c1dab48b420@email.android.com> References: <90b4885a-3858-410d-821f-0c1dab48b420@email.android.com> Message-ID: <8b816f14-4032-48f1-af8a-268913f61931@email.android.com> I actually got something simple working. It's only been tested on Android: import re def isbinary(fpath): with open(fpath, 'r') as f: data = re.sub(r'(^(\'|\")|(\'|\")$)', '', repr(f.read()).replace('\\n', '\n')) binchars = re.findall(r'\\x[0123456789abcdef]{2}', data) per = (float(len(binchars)) / float(len(data))) * 100 if int(per) == 0: return True else: return False Ryan wrote: >That's a pretty good idea. Or it could be like this: > >if fh.printable(): > >It would have an optional argument: the number of bytes to read in. >Default is 512. So, if we wanted 1024 bytes instead of 512: > >if fh.printable(1024): > >David Mertz wrote: > >>On Wed, Jul 31, 2013 at 5:42 PM, Chris Angelico >>wrote: >> >>> This sounds more like a job for a file-like object, maybe a subclass >>> of file that reads (and buffers) the first 512 bytes, guesses >>whether >>> it's text or binary, and then watches everything that goes through >>> after that and revises its guess later on. >> >> >>Something like: >> >> if fh.read(512).isprintable(): >> do_the_ascii_stuff(fh) >> else: >> do_the_bin_stuff(fh) >> >> >> >>-- >>Keeping medicines from the bloodstreams of the sick; food >>from the bellies of the hungry; books from the hands of the >>uneducated; technology from the underdeveloped; and putting >>advocates of freedom in prisons. Intellectual property is >>to the 21st century what the slave trade was to the 16th. >> >> >>------------------------------------------------------------------------ >> >>_______________________________________________ >>Python-ideas mailing list >>Python-ideas at python.org >>http://mail.python.org/mailman/listinfo/python-ideas > >-- >Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Fri Aug 2 07:43:36 2013 From: rymg19 at gmail.com (Ryan) Date: Fri, 02 Aug 2013 00:43:36 -0500 Subject: [Python-ideas] os.path.isbinary In-Reply-To: <90b4885a-3858-410d-821f-0c1dab48b420@email.android.com> References: <90b4885a-3858-410d-821f-0c1dab48b420@email.android.com> Message-ID: <1e485939-9606-4113-b208-57591afa4782@email.android.com> I actually got something simple working. It's only been tested on Android: import re def isbinary(fpath): with open(fpath, 'r') as f: data = re.sub(r'(^(\'|\")|(\'|\")$)', '', repr(f.read()).replace('\\n', '\n')) binchars = re.findall(r'\\x[0123456789abcdef]{2}', data) per = (float(len(binchars)) / float(len(data))) * 100 if int(per) == 0: return True else: return False Ryan wrote: >That's a pretty good idea. Or it could be like this: > >if fh.printable(): > >It would have an optional argument: the number of bytes to read in. >Default is 512. So, if we wanted 1024 bytes instead of 512: > >if fh.printable(1024): > >David Mertz wrote: > >>On Wed, Jul 31, 2013 at 5:42 PM, Chris Angelico >>wrote: >> >>> This sounds more like a job for a file-like object, maybe a subclass >>> of file that reads (and buffers) the first 512 bytes, guesses >>whether >>> it's text or binary, and then watches everything that goes through >>> after that and revises its guess later on. >> >> >>Something like: >> >> if fh.read(512).isprintable(): >> do_the_ascii_stuff(fh) >> else: >> do_the_bin_stuff(fh) >> >> >> >>-- >>Keeping medicines from the bloodstreams of the sick; food >>from the bellies of the hungry; books from the hands of the >>uneducated; technology from the underdeveloped; and putting >>advocates of freedom in prisons. Intellectual property is >>to the 21st century what the slave trade was to the 16th. >> >> >>------------------------------------------------------------------------ >> >>_______________________________________________ >>Python-ideas mailing list >>Python-ideas at python.org >>http://mail.python.org/mailman/listinfo/python-ideas > >-- >Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Aug 2 08:03:25 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 1 Aug 2013 23:03:25 -0700 (PDT) Subject: [Python-ideas] Remove tty module In-Reply-To: <1375418844.67829.YahooMailNeo@web184701.mail.ne1.yahoo.com> References: <51EF97F6.5080302@egenix.com> <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> <1375388332.27282.4765835.58C6D730@webmail.messagingengine.com> <1375418844.67829.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: <1375423405.3108.YahooMailNeo@web184705.mail.ne1.yahoo.com> From: Andrew Barnert Sent: Thursday, August 1, 2013 9:47 PM > From: "random832 at fastmail.us" > Sent: Thursday, August 1, 2013 1:18 PM >> What about faking conio on Unix? > > That's a great idea. See?https://github.com/abarnert/consoleio for a rough implementation. You might even be able to "pip install git+https://github.com/abarnert/consoleio". It does not work on Python 2.x, although I think it wouldn't be that hard to make it do so. It probably works on 3.0-3.2, but I haven't tested on anything earlier than 3.3.0. I also haven't tested on Windows, but if it doesn't work, it should be trivial to fix. I went with the idea of only allowing consoleio functions inside an enabling() context (or explicit enable() and disable() calls) instead of switching on the fly. It makes khbit easier to implement and to use, and it's generally simpler, cleaner, and more efficient, and I don't think anyone will complain too much. Anyway, if something like were added to the stdlib, it definitely wouldn't allow us to deprecate tty or termios (especially since it uses them? but even if it didn't, sometimes you need more flexibility), but it would allow us to add a note at the top saying "If you're using looking for simple, more-portable raw I/O, see the consoleio module." From alexander.belopolsky at gmail.com Fri Aug 2 08:17:43 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 2 Aug 2013 02:17:43 -0400 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <87haf8n50q.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51F39A87.5030209@pearwood.info> <51FB0F80.1040001@mrabarnett.plus.com> <87haf8n50q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Aug 1, 2013 at 11:30 PM, Stephen J. Turnbull wrote: > > > -1. The obvious way forward is \N{U+1FFFF}. That *looks* like an > algorithmically generated name, and (wow!) that's what it *is*.[1] The only problem is that this is not a conforming name according to the Unicode standard. The standard is very explicit in its recommendation on how the names should be generated: "Use in APIs. APIs which return the value of a Unicode ?character name? for a given code point might vary somewhat in their behavior. An API which is defined as strictly returning the value of the Unicode Name property (the ?na? attribute), should return a null string for any Unicode code point other than graphic or format characters, as that is the actual value of the property for such code points. On the other hand, an API which returns a name for Unicode code points, but which is expected to provide useful, unique labels for unassigned, reserved code points and other special code point types, should return the value of the Unicode Name property for any code point for which it is non-null, but should otherwise construct a code point label to stand in for a character name." The recommendation on what should be accepted as a valid name is more relaxed: "... it can be more effective for a user interface to use names that were translated or otherwise adjusted to meet the expectations of the targeted user community. By also listing the formal character name, a user interface could ensure that users can unambiguously refer to the character by the name documented in the Unicode Standard." This does not literally preclude treating U+NNNN as a character name, but it looks like such use is discouraged: "A constructed code point label is distinguished from the designation of the code point itself (for example, ?U+0009? or ?U+FFFF?), which is also a unique identifier." > [1] Of course, it's also an invalid code point in any Unicode stream. ;-) This is not accurate. U+1FFFF is a valid code point and its generated label is . Noncharacters "are forbidden for use in open interchange of Unicode text data. ... Applications are free to use any of these noncharacter code points internally but should never attempt to exchange them." (See Chapter 16.7 Noncharacters.) In Python 0x1FFFF is a valid code point: >>> chr(0x1FFFF) '\U0001ffff' An application written in Python can use strings containing '\U0001ffff' internally, but should not interchange such strings with other applications. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Fri Aug 2 09:36:57 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 02 Aug 2013 16:36:57 +0900 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51FA6988.6050307@pearwood.info> <87mwp0nb9d.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87fvusmtly.fsf@uwakimon.sk.tsukuba.ac.jp> Alexander Belopolsky writes: > On Thu, Aug 1, 2013 at 9:15 PM, Stephen J. Turnbull wrote: >> Alexander Belopolsky writes: ?>>> This misses the point of adding the code point type prefix. >> Not really. ?That would just pass the responsibility for enforcing >> consistency to linters, instead of the translator. > I have not seen a linter yet that would suggest that "\x41" should be > written as "A". Irrelevant. All I suggest the linter do is the "is \N{control-0x21} consistent in the sense that U+0021 is a control character?" check. That's what you said is the point. I just want that check done outside of the compiler. ?>>?You can't just make this a syntax error because a code point may >> be reserved one Python version and a letter in another, depending >> on which versions of the Unicode tables are being used by those >> versions of Python. > That's true, but why would you write \N{reserved-NNNN} instead of > \uNNNN to begin with? I wouldn't. The problem isn't writing "\N{reserved-50000}". It's the other way around: I want to *write* "\N{control-50000}" which expresses my intent in Python 3.5 and not have it blow up in Python 3.4 which uses an older UCD where U+50000 is unassigned. > With the possible exception or reserved-, on a rare occasion when you > want to be explicit about the character type, it is useful to be > strict. As explained above, strictness is not backward compatible with older versions of the UCD that might be in use in older versions of Python. From stephen at xemacs.org Fri Aug 2 10:32:45 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 02 Aug 2013 17:32:45 +0900 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51FB0F80.1040001@mrabarnett.plus.com> <87haf8n50q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87ehacmr0y.fsf@uwakimon.sk.tsukuba.ac.jp> Alexander Belopolsky writes: > On Thu, Aug 1, 2013 at 11:30 PM, Stephen J. Turnbull wrote: >> -1. ?The obvious way forward is \N{U+1FFFF}. ?That *looks* like an >> algorithmically generated name, and (wow!) that's what it *is*. > The only problem is that this is not a conforming name according to > the Unicode standard. The standard is very explicit in its > recommendation on how the names should be generated: "Use in > APIs. APIs which return the value of a Unicode ?character name? [...] This whole section of the standard is irrelevant. Of course unicodedata.name('A') should *return* 'LATIN CAPITAL LETTER A', but we're discussing the possibility of extending what unicodedata.lookup() should *accept*. > The recommendation on what should be accepted as a valid name is > more relaxed: "... it can be more effective for a user interface to > use names that were translated or otherwise adjusted to meet the > expectations of the targeted user community." It seems to me that's exactly what those of us who advocate using \N{} are saying. > This does not literally preclude treating U+NNNN as a character > name, but it looks like such use is discouraged: "A constructed > code point label is distinguished from the designation of the code > point itself (for example, ?U+0009? or ?U+FFFF?), which is also a > unique identifier." I don't see any such implication. What's being said here is that an application should not expect a conforming implementation to treat "U+0009" and "control-0009" identically in all respects. For example, "control-0009" might be subjected to the kind of consistency check you want. Or only one of the two might be acceptable to a name lookup function. Or you might have to use different functions to convert them to characters. Steve From clay.sweetser at gmail.com Fri Aug 2 13:19:10 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Fri, 2 Aug 2013 07:19:10 -0400 Subject: [Python-ideas] Remove tty module In-Reply-To: <1375423405.3108.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <51EF97F6.5080302@egenix.com> <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> <1375388332.27282.4765835.58C6D730@webmail.messagingengine.com> <1375418844.67829.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1375423405.3108.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: Actually On Aug 2, 2013 2:07 AM, "Andrew Barnert" wrote: > > From: Andrew Barnert > > Sent: Thursday, August 1, 2013 9:47 PM > > > > From: "random832 at fastmail.us" > > Sent: Thursday, August 1, 2013 1:18 PM > > > >> What about faking conio on Unix? > > > > That's a great idea. > > > See https://github.com/abarnert/consoleio for a rough implementation. > > You might even be able to "pip install git+ https://github.com/abarnert/consoleio". > > It does not work on Python 2.x, although I think it wouldn't be that hard to make it do so. It probably works on 3.0-3.2, but I haven't tested on anything earlier than 3.3.0. I also haven't tested on Windows, but if it doesn't work, it should be trivial to fix. > > I went with the idea of only allowing consoleio functions inside an enabling() context (or explicit enable() and disable() calls) instead of switching on the fly. It makes khbit easier to implement and to use, and it's generally simpler, cleaner, and more efficient, and I don't think anyone will complain too much. > > Anyway, if something like were added to the stdlib, it definitely wouldn't allow us to deprecate tty or termios (especially since it uses them? but even if it didn't, sometimes you need more flexibility), but it would allow us to add a note at the top saying "If you're using looking for simple, more-portable raw I/O, see the consoleio module." Not to mention finally putting an obvious end to all those questions on stack overflow and friends, on how to do simple, non-blocking, cross platform console input. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Aug 2 14:48:33 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 2 Aug 2013 22:48:33 +1000 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: On 1 August 2013 09:17, Andrew Barnert wrote: > But anyway, I think you're mostly agreeing with me. When neither lambda nor def feels right, it's usually not because you really want a multi-line expression, or a multi-line anonymous function, but because you want to get the petty details of the function "out of the way" of the important code, right? Yeah, I've been grappling with this problem for years, which is why I have two competing deferred PEPs about it :) For certain kinds of problem, the natural way to think about them is as "I want to do X", and as an incidental part of doing X, you need to define a function that does Y. Sorting, complex comprehensions, various flavours of event driven programming (especially GUI programming, where the yield-driven approach of PEP 3156 may not be appropriate, as well as the low level transport code for PEP 3156 style systems). Lambda isn't a great solution because it embeds all the complexity of the function directly in the main expression, obscuring the fact that the overall operation is "do X". Ruby's block syntax is an elegant solution, but the specific problem with adapting that to Python is deciding how to spell the forward reference to the trailing definition. Ruby solves that problem through a convention (the block is just the last positional argument), but Python has no such convention - there's a wide variety of signatures for higher order operations, and a syntax that required new signatures for everything is no solution at all. PEP 403 is the approach I dislike least so far, but that's a far cry from being something I'm willing to propose for inclusion in the language. The "@in" does hint at the out of order execution nicely, but it's also a bit too heavy (drawing attention away from the subsequent simple statement), and as an ordinary name, the forward reference doesn't quite stand out enough. PEP 3150 could possibly by improved by having the hidden function implicitly end with (a more efficient equivalent of) "return types.SimpleNamespace(**locals())" and introducing "?" as a forward reference to that result: sorted_data = sorted(data, key=?.k) given: def k(item): return item.attr1, item.attr2 But it's hardly what one could call a *concise* syntax. On the other hand, it *does* let you do some pretty neat things, like: dispatch_table = vars(?) given: def command1(*args, **kwds): ... def command2(*args, **kwds): ... def command3(*args, **kwds): ... I would also tweak the early binding syntax to require an additional keyword to make it read more like English: seq = [] for i in range(10): seq.append(?.f) given i=i in: def f(): return i assert [f() for f in seq] == list(range(10)) Using up another precious symbol would be a big call, but it's starting to feel more like something of sufficient power to justify new syntax. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From random832 at fastmail.us Fri Aug 2 14:53:45 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 02 Aug 2013 08:53:45 -0400 Subject: [Python-ideas] Remove tty module In-Reply-To: <1375418844.67829.YahooMailNeo@web184701.mail.ne1.yahoo.com> References: <51EF97F6.5080302@egenix.com> <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> <1375388332.27282.4765835.58C6D730@webmail.messagingengine.com> <1375418844.67829.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: <1375448025.28109.5021267.0C96584B@webmail.messagingengine.com> On Fri, Aug 2, 2013, at 0:47, Andrew Barnert wrote: > The quasi-standardized core of conio is definitely not enough to write a > windowing library. Although some of the old DOS implementations had > gotoxy, setfg, and setbg functions, the Win32 implementations don't have > those; they just have? well, the same functions as MSVCRT's console APIs, > which the stdlib already wraps Yes, but it wraps them in msvcrt; I'm proposing moving it to a cross-platform "conio" module. I had for some reason thought there was a gotoxy function in there. Regardless, that's no reason not to add one to the python library. As for kbhit, you could probably implement it on unix with a call to select. If the tty file descriptor is ready for reading, then return true. The one possible wrinkle is that getwch could block if an incomplete multibyte character is read - something that cannot happen on windows. From random832 at fastmail.us Fri Aug 2 15:15:18 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 02 Aug 2013 09:15:18 -0400 Subject: [Python-ideas] Remove tty module In-Reply-To: <1375423405.3108.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <51EF97F6.5080302@egenix.com> <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> <1375388332.27282.4765835.58C6D730@webmail.messagingengine.com> <1375418844.67829.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1375423405.3108.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: <1375449318.1941.5022851.37DC1F3C@webmail.messagingengine.com> On Fri, Aug 2, 2013, at 2:03, Andrew Barnert wrote: > I went with the idea of only allowing consoleio functions inside an > enabling() context (or explicit enable() and disable() calls) instead of > switching on the fly. It makes khbit easier to implement and to use, and > it's generally simpler, cleaner, and more efficient, and I don't think > anyone will complain too much. > > Anyway, if something like were added to the stdlib, it definitely > wouldn't allow us to deprecate tty or termios (especially since it uses > them? but even if it didn't, sometimes you need more flexibility), but it > would allow us to add a note at the top saying "If you're using looking > for simple, more-portable raw I/O, see the consoleio module." I don't think deprecating termios was ever on the table. As for "sometimes you need more flexibility" - as I understood it, the problem is that tty occupies an intermediate stage of flexibility/complexity - it's unlikely that you need more than consoleio without needing termios. (or curses, if we add screen manipulation functions) I really do think this should also include a clrscr/gotoxy (and attribute functions - you mentioned setbg/setfg in DOS versions, but conio.sourceforge.net has textcolor/textbackground/textattr instead), to provide a cross-platform way to do those things (and a foundation for an eventual windowing library). I can implement these for win32 this weekend. We might want a separate enable() call/context for cursor manipulation; you have to scroll to the top of the buffer for it to work properly (and this is what native fullscreen apps such as vim do). From solipsis at pitrou.net Fri Aug 2 15:55:02 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 2 Aug 2013 15:55:02 +0200 Subject: [Python-ideas] Enhance definition of functions References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: <20130802155502.70623e1a@pitrou.net> Le Fri, 2 Aug 2013 22:48:33 +1000, Nick Coghlan a ?crit : > > I would also tweak the early binding syntax to require an additional > keyword to make it read more like English: > > seq = [] > for i in range(10): > seq.append(?.f) given i=i in: > def f(): return i > assert [f() for f in seq] == list(range(10)) I think any "inline function" proposal should focus on callback-based programming for its use cases. In this context, you usually have one or two callbacks (two in Twisted-style programming: one for success, one for failure), passed positionally to a consuming function: loop.create_connection((host, port), @cb, @eb) where: def cb(sock): # Do something with socket def eb(exc): logging.exception( "Failed connecting to %s:%s", host, port) Regards Antoine. From ncoghlan at gmail.com Fri Aug 2 17:46:37 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 3 Aug 2013 01:46:37 +1000 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <20130802155502.70623e1a@pitrou.net> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130802155502.70623e1a@pitrou.net> Message-ID: On 2 August 2013 23:55, Antoine Pitrou wrote: > Le Fri, 2 Aug 2013 22:48:33 +1000, > Nick Coghlan a > ?crit : >> >> I would also tweak the early binding syntax to require an additional >> keyword to make it read more like English: >> >> seq = [] >> for i in range(10): >> seq.append(?.f) given i=i in: >> def f(): return i >> assert [f() for f in seq] == list(range(10)) > > I think any "inline function" proposal should focus on callback-based > programming for its use cases. I think callback based programming is a *good* use case, certainly, but not the only one. > In this context, you usually have one or > two callbacks (two in Twisted-style programming: one for success, one > for failure), passed positionally to a consuming function: > > loop.create_connection((host, port), @cb, @eb) where: > def cb(sock): > # Do something with socket > def eb(exc): > logging.exception( > "Failed connecting to %s:%s", host, port) We can't use 'where' because we know it conflicts with the SQL sense of the term in too many APIs. We're reasonably sure we can get away with "given" without too much conflict, though. Using "@" as the marker character is also problematic, since the following degenerate case will probably confuse the parser (due to it looking too much like a decorator clause): @something() given: ... I liked the notion of "?" as suggesting doubt and uncertainty - an element of "leave this undefined for now, we'll fill it in later". While the out of order execution is related to decorators (hence @in for PEP 403), I think PEP 3150 is more of a different notion, especially with the revisions I suggested in this thread. I believe your example still looks reasonable with the "?." notation for the forward reference: loop.create_connection((host, port), ?.cb, ?.eb) given: def cb(sock): # Do something with socket def eb(exc): logging.exception( "Failed connecting to %s:%s", host, port) Anyway, not something that's going to happen for 3.4, but a problem I'm happy to keep chipping away at - some day we might find a proposed solution that doesn't send Guido screaming in the other direction :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Fri Aug 2 18:00:48 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 2 Aug 2013 18:00:48 +0200 Subject: [Python-ideas] Enhance definition of functions References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130802155502.70623e1a@pitrou.net> Message-ID: <20130802180048.43ca437e@pitrou.net> Le Sat, 3 Aug 2013 01:46:37 +1000, Nick Coghlan a ?crit : > > > In this context, you usually have one or > > two callbacks (two in Twisted-style programming: one for success, > > one for failure), passed positionally to a consuming function: > > > > loop.create_connection((host, port), @cb, @eb) where: > > def cb(sock): > > # Do something with socket > > def eb(exc): > > logging.exception( > > "Failed connecting to %s:%s", host, port) > > We can't use 'where' because we know it conflicts with the SQL sense > of the term in too many APIs. We're reasonably sure we can get away > with "given" without too much conflict, though. How about reusing "with"? There's no ambiguity with context managers since the syntactic context is different. > Using "@" as the marker character is also problematic, since the > following degenerate case will probably confuse the parser (due to it > looking too much like a decorator clause): > > @something() given: > ... No, that would simply be forbidden. In this proposal, "@" can only mark names of parameters in function calls. We already reuse "*" and "**" for a specific meaning in front of function call parameters, so there's a precedent for such polysemy. > I liked the notion of "?" as suggesting doubt and uncertainty - an > element of "leave this undefined for now, we'll fill it in later". I don't really like it :-) "?" has other meanings traditionally: as part of the ternary operator in C-like languages (many of them), as a wildcard character in pattern matching languages, as a marker of optional matchers in regular expressions. Also, I really don't like the idea that "?" represents a full-blown object with attribute access capabilities and whatnot. It smells too much like Perl-style (Ruby-style?) magic variables. My proposal is more limited: it's a syntactic addition, but it doesn't create new runtime objects or types. Regards Antoine. From abarnert at yahoo.com Fri Aug 2 18:25:05 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 2 Aug 2013 09:25:05 -0700 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <20130802180048.43ca437e@pitrou.net> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130802155502.70623e1a@pitrou.net> <20130802180048.43ca437e@pitrou.net> Message-ID: <43FEADED-0036-4726-9830-2772F1A98583@yahoo.com> On Aug 2, 2013, at 9:00, Antoine Pitrou wrote: >> Using "@" as the marker character is also problematic, since the >> following degenerate case will probably confuse the parser (due to it >> looking too much like a decorator clause): >> >> @something() given: >> ... > > No, that would simply be forbidden. In this proposal, "@" can only mark > names of parameters in function calls. We already reuse "*" and "**" > for a specific meaning in front of function call parameters, so there's > a precedent for such polysemy. That's fine if callbacks are the _only_ case you want to handle, but as Nick just explained, there are many other cases that are also useful. The middle of an if expression in a comprehension, for example, isn't a function parameter. Also, when you have a long function call expression--as you almost always do in, say, PyObjC or PyWin32 GUIs--you often want to put each parameter on its own line. While that won't confuse the parser, it could easily confuse a human, who will see "@callback," on a line by itself and think "decorator". It's probably worth taking some real examples from a bunch of different domains where you've defined something out-of-line but would use this proposal if you could, and rewriting them with each variation to see what they look like. From abarnert at yahoo.com Fri Aug 2 18:28:20 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 2 Aug 2013 09:28:20 -0700 Subject: [Python-ideas] Remove tty module In-Reply-To: References: <51EF97F6.5080302@egenix.com> <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> <1375388332.27282.4765835.58C6D730@webmail.messagingengine.com> <1375418844.67829.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1375423405.3108.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: <4D9FC980-821E-4615-A966-0A7B36AA5503@yahoo.com> On Aug 2, 2013, at 4:19, Clay Sweetser wrote: > but it would allow us to add a note at the top saying "If you're using looking for simple, more-portable raw I/O, see the consoleio module." > > Not to mention finally putting an obvious end to all those questions on stack overflow and friends, on how to do simple, non-blocking, cross platform console input. > Exactly. That's my motivation for getting involved here. Many novices want to know how to do something trivial like "press any key to continue", and having to explain the tty and termios modules, and all the stuff you have to go through to deal with edge cases that will affect even trivial programs, is painful. I would love to be able to give them a link to the docs and say: with consoleio.enabled(): anykey = consoleio.getwch() -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.walter at gmail.com Fri Aug 2 18:41:22 2013 From: michael.walter at gmail.com (Michael Walter) Date: Fri, 2 Aug 2013 18:41:22 +0200 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <43FEADED-0036-4726-9830-2772F1A98583@yahoo.com> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130802155502.70623e1a@pitrou.net> <20130802180048.43ca437e@pitrou.net> <43FEADED-0036-4726-9830-2772F1A98583@yahoo.com> Message-ID: On Fri, Aug 2, 2013 at 6:25 PM, Andrew Barnert wrote: > On Aug 2, 2013, at 9:00, Antoine Pitrou wrote: > > >> Using "@" as the marker character is also problematic, since the > >> following degenerate case will probably confuse the parser (due to it > >> looking too much like a decorator clause): > >> > >> @something() given: > >> ... > > > > No, that would simply be forbidden. In this proposal, "@" can only mark > > names of parameters in function calls. We already reuse "*" and "**" > > for a specific meaning in front of function call parameters, so there's > > a precedent for such polysemy. > > That's fine if callbacks are the _only_ case you want to handle, but as > Nick just explained, there are many other cases that are also useful. The > middle of an if expression in a comprehension, for example, isn't a > function parameter. > > Also, when you have a long function call expression--as you almost always > do in, say, PyObjC or PyWin32 GUIs--you often want to put each parameter on > its own line. While that won't confuse the parser, it could easily confuse > a human, who will see "@callback," on a line by itself and think > "decorator". > > It's probably worth taking some real examples from a bunch of different > domains where you've defined something out-of-line but would use this > proposal if you could, and rewriting them with each variation to see what > they look like. Is the reason to mark up the variables that are defined in the with/where/given block only a technical one (to support the parser)? Best wishes, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Aug 2 18:42:19 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 2 Aug 2013 12:42:19 -0400 Subject: [Python-ideas] Support Unicode code point labels (Was: notation) Message-ID: I am starting a new thread to discuss an idea that is orthogonal to Steven D'Aprano's \U+NNNN proposal. The Unicode Standard defines five types of code points for which it does not provide a unique Name property. These types are: Control, Reserved, Noncharacter, Private-use and Surrogate. When a unique descriptive label is required for any such code point, the standard recommends constructing a label as follows: "For each code point type without character names, code point labels are constructed by using a lowercase prefix derived from the code point type, followed by a hyphen-minus and then a 4- to 6-digit hexadecimal representation of the code point." I propose adding support for these labels to unicodedata.lookup(), \N{..} and unicodedata.name() (or unicodedata.label()). In the previous thread, there was a disagreement on whether invalid labels (such as reserved-0009 instead of control-0009) should be accepted. I will address this in my response to Stephen Turnbull's e-mail below. Another question is how to add support for generating the labels in a backward compatible manner. Currently unicodedata.name() raises ValueError when no name is available for a code point: >>> unicodedata.name(chr(0x0009)) Traceback (most recent call last): File "", line 1, in ValueError: no such name Since unicodedata.name() also supports specifying default, it is unlikely that users write code like this try: name = unicodedata.name(x) except ValueError: name = 'U+{:04X}'.format(ord(x)) instead of name = unicodedata.name(x, '') or 'U+{:04X}'.format(ord(x)) However, the most conservative approach is not to change the behavior of unicodedata.name() and provide a new function unicodedata.label(). On Fri, Aug 2, 2013 at 3:36 AM, Stephen J. Turnbull wrote: > > Alexander Belopolsky writes: > > .. why would you write \N{reserved-NNNN} instead of > > \uNNNN to begin with? > > I wouldn't. The problem isn't writing "\N{reserved-50000}". It's > the other way around: I want to *write* "\N{control-50000}" which > expresses my intent in Python 3.5 and not have it blow up in Python > 3.4 which uses an older UCD where U+50000 is unassigned. "\N{control-50000}" will blow up in every past, present or future Python version. Since Unicode 1.1.5, "The General_Category property value Control (Cc) is immutable: the set of code points with that value will never change." > > With the possible exception or reserved-, on a rare occasion when you > > want to be explicit about the character type, it is useful to be > > strict. > > As explained above, strictness is not backward compatible with older > versions of the UCD that might be in use in older versions of Python. > This is not an issue for versions of Python that currently exist because they do not support \N{-NNNN} syntax at all. What may happen if my proposal is accepted is that \N{reserved-50000} will be valid in Python 3.N but invalid in 3.N+1 for some N > 3. If this becomes an issue, we can solve this problem when the time comes. It is always easier to relax the rules than to make them stricter. Yet, I still don't see the problem. You can already write assert unicodedata.category(chr(0x50000)) == 'Cn' in your code and this will blow up in any future version that will use UCD with U+50000 assigned. You can think of "\N{-NNNN}" as a syntax sugar for "\uNNNN" followed by an assert. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Fri Aug 2 18:49:21 2013 From: rymg19 at gmail.com (Ryan) Date: Fri, 02 Aug 2013 11:49:21 -0500 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <20130802180048.43ca437e@pitrou.net> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130802155502.70623e1a@pitrou.net> <20130802180048.43ca437e@pitrou.net> Message-ID: It does feel like Perl, but, what if there was a keyword after the symbol? It'd be more readable and not Perl-ish, but, it wouldn't confuse the parser(or at least I wouldn't think it would). Antoine Pitrou wrote: >Le Sat, 3 Aug 2013 01:46:37 +1000, >Nick Coghlan a >?crit : >> >> > In this context, you usually have one or >> > two callbacks (two in Twisted-style programming: one for success, >> > one for failure), passed positionally to a consuming function: >> > >> > loop.create_connection((host, port), @cb, @eb) where: >> > def cb(sock): >> > # Do something with socket >> > def eb(exc): >> > logging.exception( >> > "Failed connecting to %s:%s", host, port) >> >> We can't use 'where' because we know it conflicts with the SQL sense >> of the term in too many APIs. We're reasonably sure we can get away >> with "given" without too much conflict, though. > >How about reusing "with"? There's no ambiguity with context managers >since the syntactic context is different. > >> Using "@" as the marker character is also problematic, since the >> following degenerate case will probably confuse the parser (due to it >> looking too much like a decorator clause): >> >> @something() given: >> ... > >No, that would simply be forbidden. In this proposal, "@" can only mark >names of parameters in function calls. We already reuse "*" and "**" >for a specific meaning in front of function call parameters, so there's >a precedent for such polysemy. > >> I liked the notion of "?" as suggesting doubt and uncertainty - an >> element of "leave this undefined for now, we'll fill it in later". > >I don't really like it :-) "?" has other meanings traditionally: as >part >of the ternary operator in C-like languages (many of them), as a >wildcard character in pattern matching languages, as a marker of >optional matchers in regular expressions. > >Also, I really don't like the idea that "?" represents a full-blown >object with attribute access capabilities and whatnot. It smells too >much like Perl-style (Ruby-style?) magic variables. My proposal is more >limited: it's a syntactic addition, but it doesn't create new runtime >objects or types. > >Regards > >Antoine. > > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Aug 2 18:49:59 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 3 Aug 2013 02:49:59 +1000 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130802155502.70623e1a@pitrou.net> <20130802180048.43ca437e@pitrou.net> <43FEADED-0036-4726-9830-2772F1A98583@yahoo.com> Message-ID: On 3 August 2013 02:41, Michael Walter wrote: > On Fri, Aug 2, 2013 at 6:25 PM, Andrew Barnert wrote: >> >> On Aug 2, 2013, at 9:00, Antoine Pitrou wrote: >> >> >> Using "@" as the marker character is also problematic, since the >> >> following degenerate case will probably confuse the parser (due to it >> >> looking too much like a decorator clause): >> >> >> >> @something() given: >> >> ... >> > >> > No, that would simply be forbidden. In this proposal, "@" can only mark >> > names of parameters in function calls. We already reuse "*" and "**" >> > for a specific meaning in front of function call parameters, so there's >> > a precedent for such polysemy. >> >> That's fine if callbacks are the _only_ case you want to handle, but as >> Nick just explained, there are many other cases that are also useful. The >> middle of an if expression in a comprehension, for example, isn't a function >> parameter. >> >> Also, when you have a long function call expression--as you almost always >> do in, say, PyObjC or PyWin32 GUIs--you often want to put each parameter on >> its own line. While that won't confuse the parser, it could easily confuse a >> human, who will see "@callback," on a line by itself and think "decorator". >> >> It's probably worth taking some real examples from a bunch of different >> domains where you've defined something out-of-line but would use this >> proposal if you could, and rewriting them with each variation to see what >> they look like. > > > Is the reason to mark up the variables that are defined in the > with/where/given block only a technical one (to support the parser)? It's more than just the parser that needs that extra help to make it implementable :) However, the syntactic marker is helpful for human readers, too - it makes it clear which names come from the statement local namespace, and which are just ordinary name references. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Fri Aug 2 18:57:19 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 2 Aug 2013 09:57:19 -0700 Subject: [Python-ideas] Remove tty module In-Reply-To: <1375448025.28109.5021267.0C96584B@webmail.messagingengine.com> References: <51EF97F6.5080302@egenix.com> <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> <1375388332.27282.4765835.58C6D730@webmail.messagingengine.com> <1375418844.67829.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1375448025.28109.5021267.0C96584B@webmail.messagingengine.com> Message-ID: <539994AC-2A07-402B-A759-09B90587A79C@yahoo.com> On Aug 2, 2013, at 5:53, random832 at fastmail.us wrote: > On Fri, Aug 2, 2013, at 0:47, Andrew Barnert wrote: >> The quasi-standardized core of conio is definitely not enough to write a >> windowing library. Although some of the old DOS implementations had >> gotoxy, setfg, and setbg functions, the Win32 implementations don't have >> those; they just have? well, the same functions as MSVCRT's console APIs, >> which the stdlib already wraps > > Yes, but it wraps them in msvcrt; I'm proposing moving it to a > cross-platform "conio" module. My suggestion is to leave them there, and also leave termios and tty there, and just have consoleio use them as appropriate (in the same way tty already uses termios). > > I had for some reason thought there was a gotoxy function in there. > Regardless, that's no reason not to add one to the python library. Sure there is. It's harder to implement, and will be less portable. Even just getting the screen width is tricky without curses. And more importantly, a gotoxy function can only work after you've taken over the whole terminal in the same way curses does. There are a lot of things you may want to do--from getwch to setting colors--that don't require that. So, a module that didn't let you getwch unless you enter a curses-like mode would be less useful. I think a simple consoleio module that just does nonblocking I/O is a useful thing. A separate module that does non-full-screen formatting (and there are dozens of these on PyPI) makes a nice complement. (Especially since there are good use cases for using that _without_ a terminal--e.g., creating ANSI art text files--but also of course many good use cases for doing both together.) A curses/conio wrapper for full-screen GUIs seems like an almost entirely separate thing, except for the fact that both would happen to use some of the same msvcrt calls on Windows. > As for kbhit, you could probably implement it on unix with a call to > select. If the tty file descriptor is ready for reading, then return > true. The one possible wrinkle is that getwch could block if an > incomplete multibyte character is read - something that cannot happen on > windows. There are other wrinkles. For example, on some posix platforms you also need to fcntl the fd into nonblocking mode. Meanwhile, multibyte characters are not actually a problem. At an actual console, if you type one, all of the bytes become ready at the same time, so you can getwch. On a serial line, that isn't true, but it isn't true on Windows either, so you have to loop around kbhit and getch and decode manually. Keys that trigger escape sequences are likewise the same. Meanwhile, multi-keystroke sequences don't trigger kbhit until the last keystroke (and then, of course, they may trigger multiple characters), but again that's already true on Windows (although possibly more noticeable on posix, especially on OS X, where users typing option-e followed by e to get ? is part of many users' unconscious muscle memory). There are also problems that are Windows specific that already affect msvcrt and fancier implementations that I haven't made any attempt to deal with, like the fact that getwch can return half a surrogate pair. See the caveats in my readme, and the todo file, for everything I've discovered so far. And please experiment with the code and find all the problems I haven't discovered, because I'm sure there are plenty. From abarnert at yahoo.com Fri Aug 2 19:11:32 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 2 Aug 2013 10:11:32 -0700 Subject: [Python-ideas] Remove tty module In-Reply-To: <1375449318.1941.5022851.37DC1F3C@webmail.messagingengine.com> References: <51EF97F6.5080302@egenix.com> <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> <1375388332.27282.4765835.58C6D730@webmail.messagingengine.com> <1375418844.67829.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1375423405.3108.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1375449318.1941.5022851.37DC1F3C@webmail.messagingengine.com> Message-ID: <7B1B9775-B486-4F3E-B8D5-AAC262FF3E0F@yahoo.com> On Aug 2, 2013, at 6:15, random832 at fastmail.us wrote: > On Fri, Aug 2, 2013, at 2:03, Andrew Barnert wrote: >> I went with the idea of only allowing consoleio functions inside an >> enabling() context (or explicit enable() and disable() calls) instead of >> switching on the fly. It makes khbit easier to implement and to use, and >> it's generally simpler, cleaner, and more efficient, and I don't think >> anyone will complain too much. >> >> Anyway, if something like were added to the stdlib, it definitely >> wouldn't allow us to deprecate tty or termios (especially since it uses >> them? but even if it didn't, sometimes you need more flexibility), but it >> would allow us to add a note at the top saying "If you're using looking >> for simple, more-portable raw I/O, see the consoleio module." > > I don't think deprecating termios was ever on the table. As for > "sometimes you need more flexibility" - as I understood it, the problem > is that tty occupies an intermediate stage of flexibility/complexity - > it's unlikely that you need more than consoleio without needing termios. > (or curses, if we add screen manipulation functions) What about cbreak mode? It's a useful thing, there's no obvious way to fit it into the conio-style paradigm, and tty wraps it up for you. The fact that you also need termios to stash and restore the terminal settings if you want to leave cbreak mode is a problem, but that might be best handled by adding a simple wrapper for that to tty, rather than trying to add cbreak mode to consoleio. > I really do think this should also include a clrscr/gotoxy As I said in my last email, that implies that we need either curses or a whole lot of code rather than just termios, and more importantly that nobody can use consoleio without going into a curses full-screen mode. We could of course have two different modes that you can enable (just raw I/O vs. curses full screen), where the functionality that's common to both has the same names both ways, which you suggest later. But I'm wary about that, because getwch, kbhit, etc. based on curses will have many subtle differences from implementations based on select and raw mode. > (and > attribute functions - you mentioned setbg/setfg in DOS versions, but > conio.sourceforge.net has textcolor/textbackground/textattr instead) This is one of the traditional problems with conio--there were various different libraries for it, some following Microsoft's names, some Borland's. (Borland also had some extended functionality that got compiled and linked into your code if you used it whose license was never totally clear, and there were various clones of that functionality. I think the library you're looking at on PyPI is a wrapper around such code.) At any rate, formatted output is a pretty different problem from raw I/O, and I think it should be solved separately--especially since there are already so many mature solutions out there (some of them including functionality we'd never dream of adding to the stdlib, like dithering jpg files to ASCII art). And as I said in the other email, there are good use cases for wanting one without the other. The fact that they both happened to be implemented with related APIs in DOS, and that there are emulators for those APIs for Windows, doesn't mean they have to go together in the stdlib. From steve at pearwood.info Fri Aug 2 19:45:19 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 03 Aug 2013 03:45:19 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python Message-ID: <51FBF02F.1000202@pearwood.info> I have raised an issue on the tracker to add a statistics module to Python's standard library: http://bugs.python.org/issue18606 and have been asked to write a PEP. Attached is my draft PEP. Feedback is requested, thanks in advance. -- Steven -------------- next part -------------- PEP: xxx Title: Adding A Statistics Module To The Standard Library Version: $Revision$ Last-Modified: $Date$ Author: Steven D'Aprano Status: Draft Type: Standards Track Content-Type: text/plain Created: 01-Aug-2013 Python-Version: 3.4 Post-History: Abstract This PEP proposes the addition of a module for common statistics functions such as mean, median, variance and standard deviation to the Python standard library. Rationale The proposed statistics module is motivated by the "batteries included" philosophy towards the Python standard library. Statistical functions such as mean, standard deviation and others are obvious and useful batteries, familiar to any Secondary School student. Even cheap scientific calculators typically include multiple statistical functions, such as: - mean - population and sample variance - population and sample standard deviation - linear regression - correlation coefficient Graphing calculators aimed at Secondary School students typically include all of the above, plus some or all of: - median - mode - functions for calculating the probability of random variables from the normal, t, chi-squared, and F distributions - inference on the mean and others[1]. Likewise spreadsheet applications such as Microsoft Excel, LibreOffice and Gnumeric include rich collections of statistical functions[2]. In contrast, Python currently has no standard way to calculate even the simplest and most obvious statistical functions such as mean. For those who need statistical functions in Python, there are two obvious solutions: - install numpy and/or scipy[3]; - or use a Do It Yourself solution. Numpy is perhaps the most full-featured solution, but it has a few disadvantages: - It may be overkill for many purposes. The documentation for numpy even warns "It can be hard to know what functions are available in numpy. This is not a complete list, but it does cover most of them."[4] and then goes on to list over 270 functions, only a small number of which are related to statistics. - Numpy is aimed at those doing heavy numerical work, and may be intimidating to those who don't have a background in computational mathematics and computer science. For example, numpy.mean takes four arguments: mean(a, axis=None, dtype=None, out=None) although fortunately for the beginner or casual numpy user, three are optional and numpy.mean does the right thing in simple cases: >>> numpy.mean([1, 2, 3, 4]) 2.5 - For many people, installing numpy may be difficult or impossible. For example, people in corporate environments may have to go through a difficult, time-consuming process before being permitted to install third-party software. For the casual Python user, having to learn about installing third-party packages in order to average a list of numbers is unfortunate. This leads to option number 2, DIY statistics functions. At first glance, this appears to be an attractive option, due to the apparent simplicity of common statistical functions. For example: def mean(data): return sum(data)/len(data) def variance(data): # Use the Computational Formula for Variance. n = len(data) ss = sum(x**2 for x in data) - (sum(data)**2)/n return ss/(n-1) def standard_deviation(data): return math.sqrt(variance(data)) The above appears to be correct with a casual test: >>> data = [1, 2, 4, 5, 8] >>> variance(data) 7.5 But adding a constant to every data point should not change the variance: >>> data = [x+1e12 for x in data] >>> variance(data) 0.0 And variance should *never* be negative: >>> variance(data*100) -1239429440.1282566 By contrast, the proposed reference implementation gets the exactly correct answer 7.5 for the first two examples, and a reasonably close answer for the third: 6.012. numpy does no better[5]. Even simple statistical calculations contain traps for the unwary, starting with the Computational Formula itself. Despite the name, it is numerically unstable and can be extremely inaccurate, as can be seen above. It is completely unsuitable for computation by computer[6]. This problem plagues users of many programming language, not just Python[7], as coders reinvent the same numerically inaccurate code over and over again[8], or advise others to do so[9]. It isn't just the variance and standard deviation. Even the mean is not quite as straight-forward as it might appear. The above implementation seems to simple to have problems, but it does: - The built-in sum can lose accuracy when dealing with floats of wildly differing magnitude. Consequently, the above naive mean fails this "torture test" with an error of 100%: assert mean([1e30, 1, 3, -1e30]) == 1 - Using math.fsum inside mean will make it more accurate with float data, but it also has the side-effect of converting any arguments to float even when unnecessary. E.g. we should expect the mean of a list of Fractions to be a Fraction, not a float. While the above mean implementation does not fail quite as catastrophically as the naive variance does, a standard library function can do much better than the DIY versions. Comparison To Other Languages/Packages The proposed statistics library is not intended to be a competitor to such third-party libraries as numpy/scipy, or of proprietary full-featured statistics packages aimed at professional statisticians such as Minitab, SAS and Matlab. It is aimed at the level of graphing and scientific calculators. Most programming languages have little or no built-in support for statistics functions. Some exceptions: R R (and its proprietary cousin, S) is a programming language designed for statistics work. It is extremely popular with statisticians and is extremely feature-rich[10]. C# The C# LINQ package includes extension methods to calculate the average of enumerables[11]. Ruby Ruby does not ship with a standard statistics module, despite some apparent demand[12]. Statsample appears to be a feature-rich third- party library, aiming to compete with R[13]. PHP PHP has an extremely feature-rich (although mostly undocumented) set of advanced statistical functions[14]. Delphi Delphi includes standard statistical functions including Mean, Sum, Variance, TotalVariance, MomentSkewKurtosis in its Math library[15]. GNU Scientific Library The GNU Scientific Library includes standard statistical functions, percentiles, median and others[16]. One innovation I have borrowed from the GSL is to allow the caller to optionally specify the pre- calculated mean of the sample (or an a priori known population mean) when calculating the variance and standard deviation[17]. Design Decisions Of The Module In the statistics module, I have aimed for the following design features: - Correctness over speed. It is easier to speed up a correct but slow function than to correct a fast but buggy one. - The caller should, where possible, avoid having to make decisions about implementations. The caller should not have to decide whether to call a one-pass function data in an iterator, or a two-pass function for data in a list. Instead, the function should do the right thing in each case. - Where there is the possibility of different results depending on one-pass or two-pass algorithms, that possibility should be documented and the caller can then make the appropriate choice. (In most cases, such differences will be tiny.) - Functions should, as much as possible, honour any type of numeric data. E.g. the mean of a list of Decimals should be a Decimal, not a float. When this is not possible, treat float as the "lowest common data type". - Although functions support data sets of floats, Decimals or Fractions, there is no guarantee that *mixed* data sets will be supported. (But on the other hand, they aren't explicitly rejected either.) - Plenty of documentation, aimed at readers who understand the basic concepts but may not know (for example) which variance they should use (population or sample?). Mathematicians and statisticians have a terrible habit of being inconsistent with both notation and terminology[18], and having spent many hours making sense of the contradictory/confusing definitions in use, it is only fair that I do my best to clarify rather than obfuscate the topic. - But avoid going into tedious[19] mathematical detail. Specification As the proposed reference implementation is in pure Python, other Python implementations can easily make use of the module unchanged, or adapt it as they see fit. Previous Discussions This proposal has been previously discussed here[20]. Open Issues My intention is to start small and grow the library, rather than try to include everything from the start. - At this stage, I am unsure of the best API for multivariate statistical functions such as linear regression, correlation coefficient, and covariance. Possible APIs include: * Separate arguments for x and y data: function([x0, x1, ...], [y0, y1, ...]) * A single argument for (x, y) data: function([(x0, y0), (x1, y1), ...]) * Selecting arbitrary columns from a 2D array: function([[a0, x0, y0, z0], [a1, x1, y1, z1], ...], x=1, y=2) * Some combination of two or more of the above. In the absence of a consensus of preferred API for multivariate stats, I will defer including such multivariate functions until Python 3.5. - Likewise, functions for calculating probability of random variables and inference testing will be deferred until 3.5. References [1] http://support.casio.com/pdf/004/CP330PLUSver310_Soft_E.pdf [2] Gnumeric: https://projects.gnome.org/gnumeric/functions.shtml LibreOffice: https://help.libreoffice.org/Calc/Statistical_Functions_Part_One https://help.libreoffice.org/Calc/Statistical_Functions_Part_Two https://help.libreoffice.org/Calc/Statistical_Functions_Part_Three https://help.libreoffice.org/Calc/Statistical_Functions_Part_Four https://help.libreoffice.org/Calc/Statistical_Functions_Part_Five [3] Scipy: http://scipy-central.org/ Numpy: http://www.numpy.org/ [4] http://wiki.scipy.org/Numpy_Functions_by_Category [5] Tested with numpy 1.6.1 and Python 2.7. [6] http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ [7] http://rosettacode.org/wiki/Standard_deviation [8] https://bitbucket.org/larsyencken/simplestats/src/c42e048a6625/src/basic.py [9] http://stackoverflow.com/questions/2341340/calculate-mean-and-variance-with-one-iteration [10] http://www.r-project.org/ [11] http://msdn.microsoft.com/en-us/library/system.linq.enumerable.average.aspx [12] https://www.bcg.wisc.edu/webteam/support/ruby/standard_deviation [13] http://ruby-statsample.rubyforge.org/ [14] http://www.php.net/manual/en/ref.stats.php [15] http://www.ayton.id.au/gary/it/Delphi/D_maths.htm#Delphi%20Statistical%20functions. [16] http://www.gnu.org/software/gsl/manual/html_node/Statistics.html [17] http://www.gnu.org/software/gsl/manual/html_node/Mean-and-standard-deviation-and-variance.html [18] http://mathworld.wolfram.com/Skewness.html [19] At least, tedious to those who don't like this sort of thing. [20] http://mail.python.org/pipermail/python-ideas/2011-September/011524.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From solipsis at pitrou.net Fri Aug 2 19:46:24 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 2 Aug 2013 19:46:24 +0200 Subject: [Python-ideas] Enhance definition of functions References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130802155502.70623e1a@pitrou.net> <20130802180048.43ca437e@pitrou.net> <43FEADED-0036-4726-9830-2772F1A98583@yahoo.com> Message-ID: <20130802194624.3166b0a6@fsol> On Fri, 2 Aug 2013 09:25:05 -0700 Andrew Barnert wrote: > On Aug 2, 2013, at 9:00, Antoine Pitrou wrote: > > >> Using "@" as the marker character is also problematic, since the > >> following degenerate case will probably confuse the parser (due to it > >> looking too much like a decorator clause): > >> > >> @something() given: > >> ... > > > > No, that would simply be forbidden. In this proposal, "@" can only mark > > names of parameters in function calls. We already reuse "*" and "**" > > for a specific meaning in front of function call parameters, so there's > > a precedent for such polysemy. > > That's fine if callbacks are the _only_ case you want to handle, but as Nick just explained, there are many other cases that are also useful. The middle of an if expression in a comprehension, for example, isn't a function parameter. There may be many other cases, but almost any time people complain about the lack of inline functions, it's in situations where they are declaring callbacks (i.e. GUI- or network-programming). So, yes, I don't think the other use cases should be a primary point of concern. > Also, when you have a long function call expression--as you almost always > do in, say, PyObjC or PyWin32 GUIs--you often want to put each > parameter on its own line. While that won't confuse the parser, it > could easily confuse a human, who will see "@callback," on a line by > itself and think "decorator". Even if indented, inside a parenthesis and not preceding a similarly indented "def"? Regards Antoine. From random832 at fastmail.us Fri Aug 2 19:56:14 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 02 Aug 2013 13:56:14 -0400 Subject: [Python-ideas] get(w)ch, non-letter keys, and cross-platform non-blocking input. Message-ID: <1375466174.21273.5116415.56F56E76@webmail.messagingengine.com> The current behavior of getch and getwch on windows on receiving e.g. the "up arrow" key is to return two values on subsequent calls: 0xE0 0x48. The problem, other than the obvious of being split across two events, is that this cannot be distinguished between ordinary input of the character 0xE0. For getwch, this is U+00E0 LATIN SMALL LETTER A WITH GRAVE (followed by 0x48 'H'). For getch, ordinary values are returned in the DOS character set (as defined with the chcp command), in which 0xE0 is various characters such as a greek alpha in cp437, a capital O with acute in cp850, or a greek omega in cp737. This additionally makes getch unusable as-is for non-ascii characters. The obvious solution for windows is to write an entirely new function that calls ReadConsoleInput and returns a "keypress event" object or tuple instead of a single character. On Unix, there's a different problem. The fact that text input is byte-oriented means multiple bytes need to be read (necessitating multiple read calls for unbuffered input) for a multibyte character, or for an escape sequence for a non-graphical key. And if you're doing non-blocking input, you would want a timeout in case the final byte of the sequence never arrives. You may want a timeout anyway, to handle manual input of ESC differently from the start of an escape sequence. And handling escape sequences at all introduces a dependency on terminfo. Or alternately, people may want a lighter solution like a dictionary of escape sequences to meanings - or a heavier one like parsing the sequences generated by xterm's modifyOtherKeys feature for combinations not supported by terminfo [such as ctrl+shift+letter]. From ethan at stoneleaf.us Fri Aug 2 20:14:08 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 02 Aug 2013 11:14:08 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FBF02F.1000202@pearwood.info> References: <51FBF02F.1000202@pearwood.info> Message-ID: <51FBF6F0.1030305@stoneleaf.us> On 08/02/2013 10:45 AM, Steven D'Aprano wrote: > It isn't just the variance and standard deviation. Even the mean is not > quite as straight-forward as it might appear. The above implementation > seems to simple to have problems, but it does: "seems too simple" (two o's are needed ;) Looks good! Thanks for the work! -- ~Ethan~ From brian at python.org Fri Aug 2 20:37:58 2013 From: brian at python.org (Brian Curtin) Date: Fri, 2 Aug 2013 13:37:58 -0500 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FBF02F.1000202@pearwood.info> References: <51FBF02F.1000202@pearwood.info> Message-ID: On Fri, Aug 2, 2013 at 12:45 PM, Steven D'Aprano wrote: > I have raised an issue on the tracker to add a statistics module to Python's > standard library: > > http://bugs.python.org/issue18606 > > and have been asked to write a PEP. Attached is my draft PEP. Feedback is > requested, thanks in advance. Like everything else we add, shouldn't a module live in the Python ecosystem, standout as the best of breed, and *then* be proposed for inclusion? From michelelacchia at gmail.com Fri Aug 2 20:42:04 2013 From: michelelacchia at gmail.com (Michele Lacchia) Date: Fri, 2 Aug 2013 20:42:04 +0200 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FBF6F0.1030305@stoneleaf.us> References: <51FBF02F.1000202@pearwood.info> <51FBF6F0.1030305@stoneleaf.us> Message-ID: Wow, this is great news! FWIW when I was first learning Python I wrote a Python module with many statistical functions as an exercise: https://github.com/rubik/pyst/blob/master/pyst/pyst.py I should point out that this was no more than an exercise, thus it's unmaintained since a long time. However it may still be useful to someone and I thought I would mention it here. It also isn't sensible to some gotchas, like the one regarding math.fsum Steven mentioned. Il giorno 02/ago/2013 20:15, "Ethan Furman" ha scritto: > On 08/02/2013 10:45 AM, Steven D'Aprano wrote: > > It isn't just the variance and standard deviation. Even the mean is not >> quite as straight-forward as it might appear. The above >> implementation >> seems to simple to have problems, but it does: >> > > "seems too simple" (two o's are needed ;) > > Looks good! Thanks for the work! > > -- > ~Ethan~ > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michelelacchia at gmail.com Fri Aug 2 20:53:19 2013 From: michelelacchia at gmail.com (Michele Lacchia) Date: Fri, 2 Aug 2013 20:53:19 +0200 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FBF02F.1000202@pearwood.info> References: <51FBF02F.1000202@pearwood.info> Message-ID: I should also add that the implementation in the above-mentioned module is very naive. As for Steven's implementation I think it's very accurate. I have one question though. Why there is a class for 'median' with various methods and not one for 'stdev' and 'variance' with maybe two methods, 'population' and 'sample'? In those cases there is a problem with what __new__ should return. It could either raise an exception or choose a default implementation. I asked this just for consistency. Thanks, Michele Il giorno 02/ago/2013 19:48, "Steven D'Aprano" ha scritto: > I have raised an issue on the tracker to add a statistics module to > Python's standard library: > > http://bugs.python.org/**issue18606 > > and have been asked to write a PEP. Attached is my draft PEP. Feedback is > requested, thanks in advance. > > > > -- > Steven > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amcnabb at mcnabbs.org Fri Aug 2 22:18:48 2013 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Fri, 2 Aug 2013 15:18:48 -0500 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: <20130802201848.GC20955@mcnabbs.org> On Fri, Aug 02, 2013 at 01:37:58PM -0500, Brian Curtin wrote: > > Like everything else we add, shouldn't a module live in the Python > ecosystem, standout as the best of breed, and *then* be proposed for > inclusion? As Steven pointed out, numpy/scipy are best of breed in the Python ecosystem, but they're too "advanced" for inclusion in the standard library. There's room for a standard implementation, but the module wouldn't be complex enough to require years of development outside the standard library. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From random832 at fastmail.us Fri Aug 2 22:39:16 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 02 Aug 2013 16:39:16 -0400 Subject: [Python-ideas] Remove tty module In-Reply-To: <1375473944.25641.5158051.75C90860@webmail.messagingengine.com> References: <51EF97F6.5080302@egenix.com> <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> <1375388332.27282.4765835.58C6D730@webmail.messagingengine.com> <1375418844.67829.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1375448025.28109.5021267.0C96584B@webmail.messagingengine.com> <539994AC-2A07-402B-A759-09B90587A79C@yahoo.com> <1375473944.25641.5158051.75C90860@webmail.messagingengine.com> Message-ID: <1375475956.1850.5179375.0ED618BE@webmail.messagingengine.com> I forgot to hit "reply all". Omitted some of this that was in reply to something off-list, just in case that wasn't an oversight (it was boring stuff about having multiple consoles open anyway) On Fri, Aug 2, 2013, at 16:05, random832 at fastmail.us wrote: > > > On Fri, Aug 2, 2013, at 12:57, Andrew Barnert wrote: > > Sure there is. It's harder to implement, and will be less portable. Even > > just getting the screen width is tricky without curses. > > Why would it be less portable? Right now, we have curses, which only > works on unix. Implementing a simple set of functions for both unix and > windows is more portable. > > I am thinking in terms of "implement core functionality for multiple > platforms, then implement anything more complex on top of that in pure > python" - this necessarily means duplicating some of the work that > curses does now for unix systems. What's wrong with this approach? > > > And more importantly, a gotoxy function can only work after you've taken > > over the whole terminal in the same way curses does. > > Uh, all you have to do for that is clear the screen... and clrscr() was > going to be the next function I was going to propose > > There are a lot of > > things you may want to do--from getwch to setting colors--that don't > > require that. So, a module that didn't let you getwch unless you enter a > > curses-like mode would be less useful. > > > > I think a simple consoleio module that just does nonblocking I/O is a > > useful thing. A separate module that does non-full-screen formatting (and > > there are dozens of these on PyPI) makes a nice complement. (Especially > > since there are good use cases for using that _without_ a terminal--e.g., > > creating ANSI art text files--but also of course many good use cases for > > doing both together.) A curses/conio wrapper for full-screen GUIs seems > > like an almost entirely separate thing, except for the fact that both > > would happen to use some of the same msvcrt calls on Windows. > > > As for kbhit, you could probably implement it on unix with a call to > > > select. If the tty file descriptor is ready for reading, then return > > > true. The one possible wrinkle is that getwch could block if an > > > incomplete multibyte character is read - something that cannot happen on > > > windows. > > > > There are other wrinkles. For example, on some posix platforms you also > > need to fcntl the fd into nonblocking mode. > > What platforms are those? I thought the whole POINT of select was that > the file descriptor doesn't have to be in nonblocking mode, since > otherwise you could just attempt to read and have it return > EWOULDBLOCK/EAGAIN. > > > Meanwhile, multibyte characters are not actually a problem. At an actual > > console, if you type one, all of the bytes become ready at the same time, > > so you can getwch. > > Yes, the problem is if you type a single character in a non-UTF8 > character set that python _thinks_ is the first byte of a UTF8 > character. This is really more of an issue for escape sequences than > multibyte characters, since in the multibyte case you could just say > it's a misconfiguration so it's only natural that it leads to bad > behavior. > > > On a serial line, that isn't true, but it isn't true > > on Windows either, > > How is it not true on windows? The physical input on windows is unicode > characters; any translation to multibyte characters happens within > getch, getwch will never even _see_ anything that's not a whole unicode > codepoint. The only reason you get multiple values for arrow keys is > because getwch translates it _into_ multiple values from a lower-level > source that generates a single event (see my other post where I propose > bypassing this) > > > There are also problems that are Windows specific that already affect > > msvcrt and fancier implementations that I haven't made any attempt to > > deal with, like the fact that getwch can return half a surrogate pair. > > Surrogate pair support on the console is terrible in general. You might > get half a surrogate pair and never get the other half, because there is > no part of the data path that actually deals with whole code points [so > it's possible the pair never existed] > > > See the caveats in my readme, and the todo file, for everything I've > > discovered so far. And please experiment with the code and find all the > > problems I haven't discovered, because I'm sure there are plenty. > > The way I see it, there are five possible types of keyboard events: > > A unicode character is typed (if multiple characters are typed, this can > be multiple events, even if it came from one key) - you still might want > additional info to differentiate ctrl-h from ctrl-shift-h or backspace. > Obviously, this is a normal event that you should be able to read and > kbhit should return true. > > An "action" key is typed, e.g. arrows, home, end, etc. You almost always > want to be able to read this and it should trigger kbhit. > > A modifier key or dead key is pressed. Generally, you don't want to read > this or trigger kbhit, and on unix systems it is impossible to do so. > Escape sequence on unix. > > A key is released. Same as above, you don't want this event, and it's > not possible on unix, barring some seriously esoteric xterm feature I'm > not aware of. > > Mouse events. On either unix or windows, this is part of the same > "stream" as keyboard events, and you only get it if you ask for it. > > > What about cbreak mode? It's a useful thing, there's no obvious way to fit it into the conio-style paradigm, and tty wraps it up for you. > > I'd think cbreak mode basically consists of calling getche() all the > time. What's the difference, other than that? > > Windows acts weird if you mix line buffering and getch, by the way, > which we may have to simply tolerate: type a line that's 10 characters, > read 5, call getch, then read 5 more, and the second read will actually > get the next 5 characters from the first line, even with no userspace > buffering [directly calling os.read; haven't tried kernel32.ReadFile or > ReadConsole yet - it's _possible_ there's a buffer we can flush > somewhere]. > > >As I said in my last email, that implies that we need either curses or a whole lot of code rather than just termios, and more importantly that nobody can use consoleio without going into a curses full-screen mode. > > > > We could of course have two different modes that you can enable (just raw I/O vs. curses full screen), where the functionality that's common to both has the same names both ways, which you suggest later. But I'm wary about that, because getwch, kbhit, etc. based on curses will have many subtle differences from implementations based on select and raw mode. > > I still don't understand your objection. People would be able to use the > rest of consoleio all the time, just without using those functions (one > of which, clrscr, basically _is_ "going into a curses full-screen mode" > - i'm not sure what else you think going into a full-screen mode > consists of) Input and output are basically completely independent > (other than echoing), anyway, why would they have to use an output > function to be able to use an input function? Why would being in full > screen output mode have any effect on getwch or kbhit? Unless you're > proposing using the _actual_ curses input functions, which I never so > much as breathed a word of. -- Random832 From ron3200 at gmail.com Sat Aug 3 00:04:43 2013 From: ron3200 at gmail.com (Ron Adam) Date: Fri, 02 Aug 2013 17:04:43 -0500 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: On 08/02/2013 07:48 AM, Nick Coghlan wrote: > On 1 August 2013 09:17, Andrew Barnert wrote: >> But anyway, I think you're mostly agreeing with me. When neither lambda nor def feels right, it's usually not because you really want a multi-line expression, or a multi-line anonymous function, but because you want to get the petty details of the function "out of the way" of the important code, right? > > Yeah, I've been grappling with this problem for years, which is why I > have two competing deferred PEPs about it :) > > For certain kinds of problem, the natural way to think about them is > as "I want to do X", and as an incidental part of doing X, you need to > define a function that does Y. Sorting, complex comprehensions, > various flavours of event driven programming (especially GUI > programming, where the yield-driven approach of PEP 3156 may not be > appropriate, as well as the low level transport code for PEP 3156 > style systems). Lambda isn't a great solution because it embeds all > the complexity of the function directly in the main expression, > obscuring the fact that the overall operation is "do X". I'm not sure how much a function adds to complexity. Generally they simplify things by moving the complexity to within the function. But short one off functions are a different thing. In most cases they are in the following situations: 1. Reuse a small bock of code over and over again locally. 2. Capture a value now, to be used later with that code block later. 3. Be sent into or out of a context so it can be used with values that can't be accessed locally or presently. > Using up another precious symbol would be a big call, but it's > starting to feel more like something of sufficient power to justify > new syntax. My feelings is that this type of thing requires a lower level solution, rather than a higher level abstraction. All of the above cases could be handled nicely if we could define a code block without a signature and call it with separately defined dictionary. It might look something like... seq = [] for i in range(10): ns = dict(i=i) co = def: i seq.append(ns with co) assert [expr() for expr in seq] == list(range(10)) It could be shortened to... seq = [({'i':i} with def:i) for i in range(10)] assert [expr() for expr in seq] == list(range(10)) The 'def:..' expression could return a code object. Which by itself wouldn't be callable. To make it callable, you would need to combine it with a signature object, or a name space. (Some sanity checks could be made at that time.) A signature object would return a name space constructor. This part is the part we don't need in many cases. (*) So being able to use an already constructed dictionary (or a yet to be constructed dictionary) as a name space has some advantages. The 'with' keyword is used here to combine two objects together. At least one of them would need to know what to do with the other. In this case it's the dictionary object that knows how to take a code object and return a callable frame like object. Probably it has a __with_code__ method. It may be too general of a definition of 'with', but possibly that can also be a good thing. If it always results in a callable object, then it would have just enough consistency to be easy to figure out when you come across them later without being too restrictive. But this part is just an option as a dictionary could just have a method.. callable_frame = {'i':i}.callable_with(def: i) Cheers, Ron From haoyi.sg at gmail.com Sat Aug 3 00:20:43 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Sat, 3 Aug 2013 06:20:43 +0800 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <20130802194624.3166b0a6@fsol> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130802155502.70623e1a@pitrou.net> <20130802180048.43ca437e@pitrou.net> <43FEADED-0036-4726-9830-2772F1A98583@yahoo.com> <20130802194624.3166b0a6@fsol> Message-ID: PEP 3150 is actually pretty simple to implement using macros: >>> @in(map(_, [1, 2, 3])) ... def thing(x): ... return x + 1 ... >>> thing [2, 3, 4] The macro is about 5 lines long, since it's exactly the same as my 15-line quick lambda macro (i.e. f[_ + _] -> lambda a, b: a + b); this should have worked out of the box, requiring no additional code: @f[map(_, [1, 2, 3]] def thing(x): return x + 1 But it doesn't due to the parser not liking [] in decorators; boo for arbitrary syntactic restrictions =( =( =(. This only works for higher-order-functions with a single callback. As things stand now, the status-quo solution for multi-callback functions is to make a class and inherit from it, and fill in the methods you need to fill in. There wasn't any PEP for this but that's what people are doing all over the place: many many classes exist purely to let people fill in the missing functions. Not because the class objects have any state, or you ever intend to create objects which will live for more than one function call: class MyClass(BaseRequestClass): def print_data(d): print d def print_error(failure): sys.sys.stderr.write(str(failure)) result = MyClass().make_request() # never gonna use MyClass() ever again! Java uses this pattern too, and I do not like it. One possibility, though, would just codify/streamline the status quo, via macros: @in(make_request(_, _)) class result: def print_data(d): print d def print_error(failure): sys.sys.stderr.write(str(failure)) Which would desugar (using macro-magic and metaclasses) into def print_data(d): print d def print_error(failure): sys.sys.stderr.write(str(failure)) result = make_request(print_data, print_error) Potential bikesheds are over whether to use `_` to leave holes, or `print_data` and `print_error` as named arguments, as well as whether `result` should be a class or function def. Overall, though, it looks exactly like PEP3150 except this public_name = ?.MeaningfulClassName(*params) given: class MeaningfulClassName(): ... becomes this: @in(_.MeaningfulClassName(*params)) class public_name: class MeaningfulClassName(): ... Which is pretty close. I can put up the implementations of all of these things if anyone's interested in playing around with the syntax/semantics in the REPL. -Haoyi On Sat, Aug 3, 2013 at 1:46 AM, Antoine Pitrou wrote: > On Fri, 2 Aug 2013 09:25:05 -0700 > Andrew Barnert wrote: > > On Aug 2, 2013, at 9:00, Antoine Pitrou wrote: > > > > >> Using "@" as the marker character is also problematic, since the > > >> following degenerate case will probably confuse the parser (due to it > > >> looking too much like a decorator clause): > > >> > > >> @something() given: > > >> ... > > > > > > No, that would simply be forbidden. In this proposal, "@" can only mark > > > names of parameters in function calls. We already reuse "*" and "**" > > > for a specific meaning in front of function call parameters, so there's > > > a precedent for such polysemy. > > > > That's fine if callbacks are the _only_ case you want to handle, but as > Nick just explained, there are many other cases that are also useful. The > middle of an if expression in a comprehension, for example, isn't a > function parameter. > > There may be many other cases, but almost any time people complain > about the lack of inline functions, it's in situations where they are > declaring callbacks (i.e. GUI- or network-programming). So, yes, I > don't think the other use cases should be a primary point of concern. > > > Also, when you have a long function call expression--as you almost always > > do in, say, PyObjC or PyWin32 GUIs--you often want to put each > > parameter on its own line. While that won't confuse the parser, it > > could easily confuse a human, who will see "@callback," on a line by > > itself and think "decorator". > > Even if indented, inside a parenthesis and not preceding a similarly > indented "def"? > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Aug 3 00:52:02 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 2 Aug 2013 15:52:02 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <20130802201848.GC20955@mcnabbs.org> References: <51FBF02F.1000202@pearwood.info> <20130802201848.GC20955@mcnabbs.org> Message-ID: <5C687C9F-014D-4D12-ABEE-E3A019237C78@yahoo.com> On Aug 2, 2013, at 13:18, Andrew McNabb wrote: > On Fri, Aug 02, 2013 at 01:37:58PM -0500, Brian Curtin wrote: >> >> Like everything else we add, shouldn't a module live in the Python >> ecosystem, standout as the best of breed, and *then* be proposed for >> inclusion? > > As Steven pointed out, numpy/scipy are best of breed in the Python > ecosystem, but they're too "advanced" for inclusion in the standard > library. There's room for a standard implementation, but the module > wouldn't be complex enough to require years of development outside the > standard library. Years of development, no. But a few months on PyPI (with people pointing to it from places like python-list and StackOverflow) would capture a lot wider experience and testing than just a discussion on this list. > Also, if it's reasonably possible to make the implementation work for 3.0-3.3 (or even 2.6-3.3) a PyPI module will remain useful as a quasi-official backport even after acceptance in the stdlib. So, I don't think short-circuiting the process is a good idea unless there's a really compelling reason to do so. From abarnert at yahoo.com Sat Aug 3 01:19:46 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 2 Aug 2013 16:19:46 -0700 Subject: [Python-ideas] Fwd: Remove tty module References: <2EFB5020-47EB-451C-99C2-DD2E11B92D14@yahoo.com> Message-ID: Looks like I replied to Random's accidentally-offline reply instead of the right one. So, to amplify the confusion, I'll just forward it here. :) Sent from a random iPhone Begin forwarded message: > From: Andrew Barnert > Date: August 2, 2013, 16:18:36 PDT > To: "random832 at fastmail.us" > Subject: Re: [Python-ideas] Remove tty module > > On Aug 2, 2013, at 13:05, random832 at fastmail.us wrote: > >> >> >> On Fri, Aug 2, 2013, at 12:57, Andrew Barnert wrote: >>> Sure there is. It's harder to implement, and will be less portable. Even >>> just getting the screen width is tricky without curses. >> >> Why would it be less portable? Right now, we have curses, which only >> works on unix. > > Because curses isn't available on every platform termios is. And it isn't appropriate for every use case where termios is. (For the most dramatic case, getch makes sense on a serial line; gotoxy does not. But even on an actual terminal, there are many cases where you don't want to take over the terminal, break scrollback, etc.) > >> Implementing a simple set of functions for both unix and >> windows is more portable. > > Sure, but implementing an even simpler set of functions that works on a broader range of unix plus windows and can be used in a broader range of cases is even _more_ portable. > > And of course it's also simpler, meaning less coding, less debugging, less bikeshedding, etc. > >> I am thinking in terms of "implement core functionality for multiple >> platforms, then implement anything more complex on top of that in pure >> python" - this necessarily means duplicating some of the work that >> curses does now for unix systems. What's wrong with this approach? > > Well, duplicating the work of curses instead of just using curses may well be a mistake. > > But otherwise, there's nothing wrong with this; it's just a potentially (and, I think, desirably) separate idea from implementing basic raw terminal I/O. > > As I said before, I think we ultimately may want three packages: > > * a raw console I/O package > * a simple formatting (color) package > * a simple foundation for console GUIs > > The first two are nearly orthogonal, and could easily be built to work together when appropriate but also work separately when desired. Also, the second one already has dozens of existing implementations to choose from, and the first took me a few minutes to hack up a prototype. > > The third, meanwhile, is going to require a lot more work. It may make use of the first two always, only on Windows (using curses on Unix), or never, but an end user is rarely if ever going to use it in the same program as the others. > >>> And more importantly, a gotoxy function can only work after you've taken >>> over the whole terminal in the same way curses does. >> >> Uh, all you have to do for that is clear the screen... and clrscr() was >> going to be the next function I was going to propose >> >> There are a lot of >>> things you may want to do--from getwch to setting colors--that don't >>> require that. So, a module that didn't let you getwch unless you enter a >>> curses-like mode would be less useful. >>> >>> I think a simple consoleio module that just does nonblocking I/O is a >>> useful thing. A separate module that does non-full-screen formatting (and >>> there are dozens of these on PyPI) makes a nice complement. (Especially >>> since there are good use cases for using that _without_ a terminal--e.g., >>> creating ANSI art text files--but also of course many good use cases for >>> doing both together.) A curses/conio wrapper for full-screen GUIs seems >>> like an almost entirely separate thing, except for the fact that both >>> would happen to use some of the same msvcrt calls on Windows. >>>> As for kbhit, you could probably implement it on unix with a call to >>>> select. If the tty file descriptor is ready for reading, then return >>>> true. The one possible wrinkle is that getwch could block if an >>>> incomplete multibyte character is read - something that cannot happen on >>>> windows. >>> >>> There are other wrinkles. For example, on some posix platforms you also >>> need to fcntl the fd into nonblocking mode. >> >> What platforms are those? I thought the whole POINT of select was that >> the file descriptor doesn't have to be in nonblocking mode, since >> otherwise you could just attempt to read and have it return >> EWOULDBLOCK/EAGAIN. >> >>> Meanwhile, multibyte characters are not actually a problem. At an actual >>> console, if you type one, all of the bytes become ready at the same time, >>> so you can getwch. >> >> Yes, the problem is if you type a single character in a non-UTF8 >> character set that python _thinks_ is the first byte of a UTF8 >> character. > > Only if it thinks your terminal is UTF-8. Which it shouldn't. > >> This is really more of an issue for escape sequences than >> multibyte characters, since in the multibyte case you could just say >> it's a misconfiguration so it's only natural that it leads to bad >> behavior. > > Yes, escape sequences are a problem even with proper configuration. > > But again, this is a problem that conio-style code had always had, from the DOS days up to the current msvcrt implementation. > >>> On a serial line, that isn't true, but it isn't true >>> on Windows either, >> >> How is it not true on windows? The physical input on windows is unicode >> characters; any translation to multibyte characters happens within >> getch, getwch will never even _see_ anything that's not a whole unicode >> codepoint. The only reason you get multiple values for arrow keys is >> because getwch translates it _into_ multiple values from a lower-level >> source that generates a single event (see my other post where I propose >> bypassing this) > > But people writing conio-style code today are using either msvcrt or libraries like python-conio, where it _is_ a problem. > >>> There are also problems that are Windows specific that already affect >>> msvcrt and fancier implementations that I haven't made any attempt to >>> deal with, like the fact that getwch can return half a surrogate pair. >> >> Surrogate pair support on the console is terrible in general. You might >> get half a surrogate pair and never get the other half, because there is >> no part of the data path that actually deals with whole code points [so >> it's possible the pair never existed] >> >>> See the caveats in my readme, and the todo file, for everything I've >>> discovered so far. And please experiment with the code and find all the >>> problems I haven't discovered, because I'm sure there are plenty. >> >> The way I see it, there are five possible types of keyboard events: >> >> A unicode character is typed (if multiple characters are typed, this can >> be multiple events, even if it came from one key) - you still might want >> additional info to differentiate ctrl-h from ctrl-shift-h or backspace. >> Obviously, this is a normal event that you should be able to read and >> kbhit should return true. >> >> An "action" key is typed, e.g. arrows, home, end, etc. You almost always >> want to be able to read this and it should trigger kbhit. >> >> A modifier key or dead key is pressed. Generally, you don't want to read >> this or trigger kbhit, and on unix systems it is impossible to do so. >> Escape sequence on unix. >> >> A key is released. Same as above, you don't want this event, and it's >> not possible on unix, barring some seriously esoteric xterm feature I'm >> not aware of. >> >> Mouse events. On either unix or windows, this is part of the same >> "stream" as keyboard events, and you only get it if you ask for it. >> >>> What about cbreak mode? It's a useful thing, there's no obvious way to fit it into the conio-style paradigm, and tty wraps it up for you. >> >> I'd think cbreak mode basically consists of calling getche() all the >> time. What's the difference, other than that? > > No, raw/cbreak/cooked is entirely orthogonal to echo/noecho. For example, in raw mode, ^C is just a character; in cbreak mode, as in cooked mode, it's a signal. Cbreak basically means to turn off line discipline, but leave on everything else. > >> Windows acts weird if you mix line buffering and getch, by the way, >> which we may have to simply tolerate: type a line that's 10 characters, >> read 5, call getch, then read 5 more, and the second read will actually >> get the next 5 characters from the first line, even with no userspace >> buffering [directly calling os.read; haven't tried kernel32.ReadFile or >> ReadConsole yet - it's _possible_ there's a buffer we can flush >> somewhere]. > > This is a big part of the reason I decided to require enable/disable pairs and just say that normal stdin/out is illegal (but not necessarily checked) inside an enable and consoleio illegal outside of one. > >>> As I said in my last email, that implies that we need either curses or a whole lot of code rather than just termios, and more importantly that nobody can use consoleio without going into a curses full-screen mode. >>> >>> We could of course have two different modes that you can enable (just raw I/O vs. curses full screen), where the functionality that's common to both has the same names both ways, which you suggest later. But I'm wary about that, because getwch, kbhit, etc. based on curses will have many subtle differences from implementations based on select and raw mode. >> >> I still don't understand your objection. People would be able to use the >> rest of consoleio all the time, just without using those functions (one >> of which, clrscr, basically _is_ "going into a curses full-screen mode" >> - i'm not sure what else you think going into a full-screen mode >> consists of) > > Read the source to curses. It consists of doing a bunch of additional termios stuff you don't need for raw mode, doing various bizarre workarounds for different platforms, setting env variables to interact with a variety of different terminal emulators, sending a variety of escape sequences to control things like soft labels and take over scrolling and scrollback on terminals that support it, etc. There's a reason lib_newterm.c is 352 lines long. > >> Input and output are basically completely independent >> (other than echoing), anyway, why would they have to use an output >> function to be able to use an input function? Why would being in full >> screen output mode have any effect on getwch or kbhit? Unless you're >> proposing using the _actual_ curses input functions, which I never so >> much as breathed a word of. > > But I've repeatedly said that if you want full-screen graphics, you _do_ want to use curses, rather than try to reproduce decades worth of development and debugging to get it to work on a wide variety of platforms, handle features you haven't thought of, etc. > >>> For that matter, there's no reason you shouldn't be able to consoleio on a second tty or pty if you've got a handle to one. >> >> I agree, but this is tricky to define in a cross-platform way, >> particularly since on windows you A) absolutely cannot have a second >> console without doing _seriously_ tricky things with multiple processes >> [you can attach to any process's console, so we could spawn a subprocess >> just to get its console] and B) even if you could, the console consists >> of a pair of handles, not just one (but see A, what you really need is a >> process ID...), C) using a serial port is very different from using the >> console. >> >> Also, /dev/tty has the advantage of being available to a program which >> has its input and output redirected, and python already uses it for >> getpass, so "you really should have to know a little bit about what >> you're doing." is already a lost cause. And I say this despite >> considering and rejecting calling AllocConsole if the windows version of >> the API is enabled from a pythonw process - redirection is a more common >> use case than wanting to do console I/O from a windows program, and >> msvcrt already has about half a dozen pitfalls we _can't_ fix from >> python, not least being the fact that calling msvcrt.getwch [etc] for >> the first time is a bell you can't unring. >> >>> Although, come to think of it, I think windows has APIs to set a terminal HANDLE as your terminal for getwch and friends, so maybe a param for that as well? >> >> Not really. There's an API to _close_ the console and open a new one or >> attach to the one belonging to a different process, but it's very >> global, there's no way to actually pass it in. And if you're the only >> process attached to a console and you close it, it goes away. So you >> could do a very hacky thing with subprocesses and switching on every >> call, I suppose. >> >> -- >> Random832 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sat Aug 3 01:06:22 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 02 Aug 2013 16:06:22 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <5C687C9F-014D-4D12-ABEE-E3A019237C78@yahoo.com> References: <51FBF02F.1000202@pearwood.info> <20130802201848.GC20955@mcnabbs.org> <5C687C9F-014D-4D12-ABEE-E3A019237C78@yahoo.com> Message-ID: <51FC3B6E.6030208@stoneleaf.us> On 08/02/2013 03:52 PM, Andrew Barnert wrote: > On Aug 2, 2013, at 13:18, Andrew McNabb wrote: > >> On Fri, Aug 02, 2013 at 01:37:58PM -0500, Brian Curtin wrote: >>> >>> Like everything else we add, shouldn't a module live in the Python >>> ecosystem, standout as the best of breed, and *then* be proposed for >>> inclusion? >> >> As Steven pointed out, numpy/scipy are best of breed in the Python >> ecosystem, but they're too "advanced" for inclusion in the standard >> library. There's room for a standard implementation, but the module >> wouldn't be complex enough to require years of development outside the >> standard library. > > Years of development, no. But a few months on PyPI (with people pointing to it from places like python-list and StackOverflow) would capture a lot wider experience and testing than just a discussion on this list. > >> > > Also, if it's reasonably possible to make the implementation work for 3.0-3.3 (or even 2.6-3.3) a PyPI module will remain useful as a quasi-official backport even after acceptance in the stdlib. > > So, I don't think short-circuiting the process is a good idea unless there's a really compelling reason to do so. The compelling reasons are listed in the PEP. The two most important in my mind are: 1) easy to get wrong if doing it DIY 2) not being able to access third-party code (or only with great pain) -- ~Ethan~ From alexander.belopolsky at gmail.com Sat Aug 3 02:10:06 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 2 Aug 2013 20:10:06 -0400 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: On Fri, Aug 2, 2013 at 2:37 PM, Brian Curtin wrote: > Like everything else we add, shouldn't a module live in the Python > ecosystem, standout as the best of breed, and *then* be proposed for > inclusion? > NumPy (and Numeric before it) has always stood out in the Python ecosystem. There have been several attempts to bring it into stdlib, but instead various ideas served as an inspiration for new stdlib features. (Most recent such feature was memoryview.) I like Steve's module, but I don't think it will compete well with NumPy in the wild. The only advantage feature-wise it will have ver NumPy is immediate availability in the stdlib. Implementation-wise, however, it has many advantages: it is small, it focuses on correctness and compatibility rather than speed, it is written in modern pure python, etc. I don't think these advantages will make any current NumPy users switch, but this is exactly what I want to see in stdlib. In short, +1 on the PEP. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Aug 3 04:15:57 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 2 Aug 2013 19:15:57 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FC3B6E.6030208@stoneleaf.us> References: <51FBF02F.1000202@pearwood.info> <20130802201848.GC20955@mcnabbs.org> <5C687C9F-014D-4D12-ABEE-E3A019237C78@yahoo.com> <51FC3B6E.6030208@stoneleaf.us> Message-ID: <468A3372-E2FD-4766-81B6-3D1969D97473@yahoo.com> On Aug 2, 2013, at 16:06, Ethan Furman wrote: > On 08/02/2013 03:52 PM, Andrew Barnert wrote: >> On Aug 2, 2013, at 13:18, Andrew McNabb wrote: >> >>> On Fri, Aug 02, 2013 at 01:37:58PM -0500, Brian Curtin wrote: >>>> >>>> Like everything else we add, shouldn't a module live in the Python >>>> ecosystem, standout as the best of breed, and *then* be proposed for >>>> inclusion? >>> >>> As Steven pointed out, numpy/scipy are best of breed in the Python >>> ecosystem, but they're too "advanced" for inclusion in the standard >>> library. There's room for a standard implementation, but the module >>> wouldn't be complex enough to require years of development outside the >>> standard library. >> >> Years of development, no. But a few months on PyPI (with people pointing to it from places like python-list and StackOverflow) would capture a lot wider experience and testing than just a discussion on this list. >> >> >> Also, if it's reasonably possible to make the implementation work for 3.0-3.3 (or even 2.6-3.3) a PyPI module will remain useful as a quasi-official backport even after acceptance in the stdlib. >> >> So, I don't think short-circuiting the process is a good idea unless there's a really compelling reason to do so. > > The compelling reasons are listed in the PEP. The two most important in my mind are: > > 1) easy to get wrong if doing it DIY > > 2) not being able to access third-party code (or only with great pain) Those are definitely compelling reasons for the module to _exist_, but not compelling reasons to avoid the normal process for getting it into the stdlib. Is there any reason to believe that this module would not benefit from wider exposure and use before finalizing it? Is it so urgent that we can't afford to wait for that to happen? Is it inappropriate for PyPI for some reason? From ncoghlan at gmail.com Sat Aug 3 04:47:40 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 3 Aug 2013 12:47:40 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <468A3372-E2FD-4766-81B6-3D1969D97473@yahoo.com> References: <51FBF02F.1000202@pearwood.info> <20130802201848.GC20955@mcnabbs.org> <5C687C9F-014D-4D12-ABEE-E3A019237C78@yahoo.com> <51FC3B6E.6030208@stoneleaf.us> <468A3372-E2FD-4766-81B6-3D1969D97473@yahoo.com> Message-ID: On 3 Aug 2013 12:19, "Andrew Barnert" wrote: > > On Aug 2, 2013, at 16:06, Ethan Furman wrote: > > > On 08/02/2013 03:52 PM, Andrew Barnert wrote: > >> On Aug 2, 2013, at 13:18, Andrew McNabb wrote: > >> > >>> On Fri, Aug 02, 2013 at 01:37:58PM -0500, Brian Curtin wrote: > >>>> > >>>> Like everything else we add, shouldn't a module live in the Python > >>>> ecosystem, standout as the best of breed, and *then* be proposed for > >>>> inclusion? > >>> > >>> As Steven pointed out, numpy/scipy are best of breed in the Python > >>> ecosystem, but they're too "advanced" for inclusion in the standard > >>> library. There's room for a standard implementation, but the module > >>> wouldn't be complex enough to require years of development outside the > >>> standard library. > >> > >> Years of development, no. But a few months on PyPI (with people pointing to it from places like python-list and StackOverflow) would capture a lot wider experience and testing than just a discussion on this list. > >> > >> > >> Also, if it's reasonably possible to make the implementation work for 3.0-3.3 (or even 2.6-3.3) a PyPI module will remain useful as a quasi-official backport even after acceptance in the stdlib. > >> > >> So, I don't think short-circuiting the process is a good idea unless there's a really compelling reason to do so. > > > > The compelling reasons are listed in the PEP. The two most important in my mind are: > > > > 1) easy to get wrong if doing it DIY > > > > 2) not being able to access third-party code (or only with great pain) > > Those are definitely compelling reasons for the module to _exist_, but not compelling reasons to avoid the normal process for getting it into the stdlib. > > Is there any reason to believe that this module would not benefit from wider exposure and use before finalizing it? Is it so urgent that we can't afford to wait for that to happen? Is it inappropriate for PyPI for some reason? Yes: on PyPI, there's little reason to believe that anyone would choose this over one of the more sophisticated options that isn't suitable for the standard library. We did much the same thing when redesigning the ipaddress API: because the changes were to benefit beginners rather than expects, there was little chance the new API could compete with ipaddr and netaddr, it was added directly. Raymond made the call a while ago for a minimal stats library to fill the gap between "roll your own broken version" and "use NumPy/SciPy", and several of us agreed that was a good idea. Steven's PEP is ultimately a response to that request. Accordingly, I think this would be an appropriate use of the "provisional API" status. The PEP should cover this issue though, and explicitly call out the proposed API as provisional (and perhaps more directly call out the genesis of the idea). It should also be made available for use with earlier versions of Python on PyPI, but I don't consider that a gating criteria for inclusion. Cheers, Nick. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Sat Aug 3 05:03:35 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 02 Aug 2013 23:03:35 -0400 Subject: [Python-ideas] Remove tty module Message-ID: <1375499015.21955.5260727.61B1FBC4@webmail.messagingengine.com> Eh, and everyone _insists_ it's a good idea not to have reply-to headers. On Fri, Aug 2, 2013, at 19:18, Andrew Barnert wrote: > On Aug 2, 2013, at 13:05, random832 at fastmail.us wrote: > > Why would it be less portable? Right now, we have curses, which only > > works on unix. > > Because curses isn't available on every platform termios is. Once again, I'm not suggesting _actually using curses_, I'm talking about implementing basic functionality from scratch, using just terminfo/termcap data. And I don't know what you think is hard about getting the screen width... it's just in an environment variable (it's handling resize events that is complicated, but we could simply not handle them on systems where it doesn't work by the known mechanisms) Reading $COLUMNS is not hard. Sending escape sequences is not hard. 99.99% of apps on 99.99% of terminals don't need anything else in terms of actual output to the terminal. > And it isn't appropriate for every use case where termios is. (For the > most dramatic case, getch makes sense on a serial line; gotoxy does > not. But even on an actual terminal, there are many cases where you > don't want to take over the terminal, break scrollback, etc.) I don't get what you think that "taking over the terminal" actually involves. And that wouldn't happen unless those functions are actually called, anyway. You wouldn't have to go into that mode just to use getch. like I said, except for echo, input is independent of output, and screen manipulation functions are purely an output thing. Also, getch doesn't really make sense on a non-terminal because its meaning is tied up with the terminal line discipline. On a non-terminal you would use os.read. getch is all about turning on cbreak and turning off echo before calling read, then changing it back after. > > Implementing a simple set of functions for both unix and > > windows is more portable. > > Sure, but implementing an even simpler set of functions that works on a > broader range of unix plus windows and can be used in a broader range of > cases is even _more_ portable. I don't know why you are arguing against implementing these other functions. They don't affect the functions you want to implement. > And of course it's also simpler, meaning less coding, less debugging, > less bikeshedding, etc. > > > I am thinking in terms of "implement core functionality for multiple > > platforms, then implement anything more complex on top of that in pure > > python" - this necessarily means duplicating some of the work that > > curses does now for unix systems. What's wrong with this approach? > > Well, duplicating the work of curses instead of just using curses may > well be a mistake. curses isn't available on windows. The best curses-like implementation available for windows doesn't use the console. There is _nothing_ cross-platform available right now. And curses is overkill for _precisely_ the reasons you think it's necessary - because it requires you to take over the screen and use its special input functions if you want to do any cursor movement. > > But otherwise, there's nothing wrong with this; it's just a potentially > (and, I think, desirably) separate idea from implementing basic raw > terminal I/O. I don't see why they shouldn't go in the same module. > As I said before, I think we ultimately may want three packages: > > * a raw console I/O package You keep saying "raw I/O", but this functionality is all about input. There's no "O" in your I/O. > Only if it thinks your terminal is UTF-8. Which it shouldn't. I see it as an extension of the win32 issue with surrogate pairs - some terminal environments may not provide validation for pasted data (or for something like a "stuff" keybinding on screen), so even when it's supposed to be UTF-8 the next byte may never come. But it's _something_ that's misconfigured, so it's a low priority. > But people writing conio-style code today are using either msvcrt or > libraries like python-conio, where it _is_ a problem. It's still not the same problem. If you type a genuine multibyte character, like in shift-JIS or something, then the second byte is available immediately even though you read it in another call, just like what we're talking about on unix. There's never an "orphan lead byte" like I was saying, where the trail byte might block or might never come. The fact that it's a second call is immaterial. > >> What about cbreak mode? It's a useful thing, there's no obvious way > >> to fit it into the conio-style paradigm, and tty wraps it up for > >> you. > > > > I'd think cbreak mode basically consists of calling getche() all the > > time. What's the difference, other than that? > > No, raw/cbreak/cooked is entirely orthogonal to echo/noecho. I am not talking about echo/noecho, I just thought by cbreak you meant cbreak+echo. The point is both getch and getche are inherently cbreak. But it sounds like you're actually talking about not disabling signals. I actually don't think getch should disable signals, or it should be an option passed to it. It was actually implementation-dependent on DOS. (and it doesn't disable control-break on windows, only control-C) But in both cases you're reading a single character immediately, there's no reason to have a persistent "mode" that alters the behavior of os.read. You're already looking at a kernel context switch on every keystroke, so switching the terminal mode on every call isn't a huge cost on top of that. > For example, > in raw mode, ^C is just a character; in cbreak mode, as in cooked mode, > it's a signal. Cbreak basically means to turn off line discipline, but > leave on everything else. > > > Windows acts weird if you mix line buffering and getch, by the way, > > This is a big part of the reason I decided to require enable/disable > pairs and just say that normal stdin/out is illegal (but not necessarily > checked) inside an enable and consoleio illegal outside of one. I don't think that was really necessary. And I think it's contributed to your thinking in terms of "taking over the screen", thinking that if we add gotoxy you will have to add that to your enable function and it will happen to people who don't want it. > Read the source to curses. It consists of doing a bunch of additional > termios stuff you don't need for raw mode, doing various bizarre > workarounds for different platforms, setting env variables to interact > with a variety of different terminal emulators, sending a variety of > escape sequences to control things like soft labels and take over > scrolling and scrollback on terminals that support it, etc. There's a > reason lib_newterm.c is 352 lines long. And my whole point is NOT DOING those things. None of those are required to clear the screen and go to a coordinate point on 99.999% of terminals; all you have to do is send two escape sequences. In most of the remaining .001%, all you have to do is send another escape sequence beforehand. You don't even have to turn off cooked mode. People put this stuff in their prompts. A lot of curses is also geared to performance, making sure you have the shortest possible byte sequence for a screen update, especially on terminals that don't support region scrolling. > But I've repeatedly said that if you want full-screen graphics, you _do_ > want to use curses, rather than try to reproduce decades worth of > development and debugging to get it to work on a wide variety of > platforms, handle features you haven't thought of, etc. That's cargo-cult thinking. Soft labels? When have you ever used a terminal with them? When have you ever heard of an app that sets them? And you just listed it off like something naturally no-one could do without. From abarnert at yahoo.com Sat Aug 3 06:48:14 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 2 Aug 2013 21:48:14 -0700 Subject: [Python-ideas] Remove tty module In-Reply-To: <1375499015.21955.5260727.61B1FBC4@webmail.messagingengine.com> References: <1375499015.21955.5260727.61B1FBC4@webmail.messagingengine.com> Message-ID: On Aug 2, 2013, at 20:03, random832 at fastmail.us wrote: > Eh, and everyone _insists_ it's a good idea not to have reply-to > headers. > > On Fri, Aug 2, 2013, at 19:18, Andrew Barnert wrote: >> On Aug 2, 2013, at 13:05, random832 at fastmail.us wrote: >>> Why would it be less portable? Right now, we have curses, which only >>> works on unix. >> >> Because curses isn't available on every platform termios is. > > Once again, I'm not suggesting _actually using curses_, I'm talking > about implementing basic functionality from scratch, using just > terminfo/termcap data. And I'm suggesting that if you really want to do curses type stuff, you _do_ want to use curses, for the reasons I explained. > And I don't know what you think is hard about > getting the screen width... it's just in an environment variable (it's > handling resize events that is complicated, but we could simply not > handle them on systems where it doesn't work by the known mechanisms) First, COLUMNS is not an environment variable, it's a shell variable. Which is why os.environ['COLUMNS'] will raise a KeyError. Or even running a bash subshell. Try this: cat >foo #!/bin/bash echo $COLUMNS ^d chmod +x foo ./foo Does it print 80, or a blank line? On top of that, even if you do know how to get it, it's not always correct. For example, if you run a program that ignores SIGWINCH, resize the window, and have it run a child process, the child will get the original width, not the new one. The usual way around both problems in shell scripts is to use the tput tool... Which is part of curses. Curses has the appropriate solutions for all of these kinds of problems on all terminals. If you want to reimplement it from scratch, you will have to discover and solve all of them yourself. > Reading $COLUMNS is not hard. Sending escape sequences is not hard. > 99.99% of apps on 99.99% of terminals don't need anything else in terms > of actual output to the terminal. If you're willing to break Mac scrolling, not interoperable properly with the scrollback buffer in various terminals, hang on Cygwin, not handle terminal width properly in a variety of common cases, ... >> And it isn't appropriate for every use case where termios is. (For the >> most dramatic case, getch makes sense on a serial line; gotoxy does >> not. But even on an actual terminal, there are many cases where you >> don't want to take over the terminal, break scrollback, etc.) > > I don't get what you think that "taking over the terminal" actually > involves. I explained it repeatedly; I'm not sure how else I can explain it. > And that wouldn't happen unless those functions are actually > called, anyway. You wouldn't have to go into that mode just to use > getch. like I said, except for echo, input is independent of output, and > screen manipulation functions are purely an output thing. > > Also, getch doesn't really make sense on a non-terminal because its > meaning is tied up with the terminal line discipline. On a non-terminal > you would use os.read. getch is all about turning on cbreak and turning > off echo before calling read, then changing it back after. Raw mode and cbreak mode are not the same thing. Every implementation of getch I've found turns on raw mode (that is, turns off icanon). Do you think this isn't necessary? >>> Implementing a simple set of functions for both unix and >>> windows is more portable. >> >> Sure, but implementing an even simpler set of functions that works on a >> broader range of unix plus windows and can be used in a broader range of >> cases is even _more_ portable. > > I don't know why you are arguing against implementing these other > functions. They don't affect the functions you want to implement. But they do affect getting a module designed, built, tested, used, maintained, and accepted into the stdlib. If the module isn't considered done until it can be used to build portable console GUIs, it will be much longer before it's done. Which is why I think they belong as separate modules. I've never said I don't think your functionality shouldn't exist, just that it shouldn't be shoehorned into the much simpler and smaller module I want to build. >> And of course it's also simpler, meaning less coding, less debugging, >> less bikeshedding, etc. >> >>> I am thinking in terms of "implement core functionality for multiple >>> platforms, then implement anything more complex on top of that in pure >>> python" - this necessarily means duplicating some of the work that >>> curses does now for unix systems. What's wrong with this approach? >> >> Well, duplicating the work of curses instead of just using curses may >> well be a mistake. > > curses isn't available on windows. Sure. Which is exactly why I suggested that a fullscreen GUI module should implement a subset of what curses can do instead of all of it, and should do so by using curses on Unix and different functions on Windows. > The best curses-like implementation > available for windows doesn't use the console. There is _nothing_ > cross-platform available right now. And curses is overkill for > _precisely_ the reasons you think it's necessary - because it requires > you to take over the screen and use its special input functions if you > want to do any cursor movement. Exactly. Which is a reason the full screen GUI module should be separate from the raw I/O module. >> But otherwise, there's nothing wrong with this; it's just a potentially >> (and, I think, desirably) separate idea from implementing basic raw >> terminal I/O. > > I don't see why they shouldn't go in the same module. Again, so the one can get finished quickly while you're just getting started on the other. > >> As I said before, I think we ultimately may want three packages: >> >> * a raw console I/O package > > You keep saying "raw I/O", but this functionality is all about input. > There's no "O" in your I/O. putch and friends may not be very powerful, maybe not even very useful, but they are certainly output. >> Only if it thinks your terminal is UTF-8. Which it shouldn't. > > I see it as an extension of the win32 issue with surrogate pairs - some > terminal environments may not provide validation for pasted data (or for > something like a "stuff" keybinding on screen), so even when it's > supposed to be UTF-8 the next byte may never come. But it's _something_ > that's misconfigured, so it's a low priority. > >> But people writing conio-style code today are using either msvcrt or >> libraries like python-conio, where it _is_ a problem. > > It's still not the same problem. If you type a genuine multibyte > character, like in shift-JIS or something, then the second byte is > available immediately even though you read it in another call, just like > what we're talking about on unix. There's never an "orphan lead byte" > like I was saying, where the trail byte might block or might never come. > The fact that it's a second call is immaterial. > >>>> What about cbreak mode? It's a useful thing, there's no obvious way >>>> to fit it into the conio-style paradigm, and tty wraps it up for >>>> you. >>> >>> I'd think cbreak mode basically consists of calling getche() all the >>> time. What's the difference, other than that? >> >> No, raw/cbreak/cooked is entirely orthogonal to echo/noecho. > > I am not talking about echo/noecho, I just thought by cbreak you meant > cbreak+echo. Well, I didn't. By cbreak, I meant cbreak. As opposed to raw and canon/cooked. Any of the three can be used with or without echo. > The point is both getch and getche are inherently cbreak. I think they're inherently raw. But whichever one you choose, the point is that there are three primary modes, and my suggested module will only handle one of them, and therefore tty will still be the only easy way to get the third. > But it sounds like you're actually talking about not disabling signals. Nobody said anything about disabling signals. The issue (or rather this small subissue) is that different terminal modes map different sequences to signals. How (or whether) you deal with those signals is a separate question that I don't think is even relevant here. > I actually don't think getch should disable signals, or it should be an > option passed to it. It was actually implementation-dependent on DOS. > (and it doesn't disable control-break on windows, only control-C) But in > both cases you're reading a single character immediately, there's no > reason to have a persistent "mode" that alters the behavior of os.read. Yes there is. With the obvious posix implementation, if you switch to raw mode for kbhit, switch back to cooked mode, and switch back to raw mode for getch, the character may no longer be immediately available. If we used higher-level functions instead of a direct mapping of conio, so you could getch with a timeout instead of polling kbhit, it would be possible to work around that. But otherwise, it isn't. And meanwhile, windows has exactly the issue you raised in your previous email if you mix conio and stdio, and I don't know of any way around that except to declare that illegal. > You're already looking at a kernel context switch on every keystroke, so > switching the terminal mode on every call isn't a huge cost on top of > that. It's a correctness issue, not a performance one. >> For example, >> in raw mode, ^C is just a character; in cbreak mode, as in cooked mode, >> it's a signal. Cbreak basically means to turn off line discipline, but >> leave on everything else. >> >>> Windows acts weird if you mix line buffering and getch, by the way, >> >> This is a big part of the reason I decided to require enable/disable >> pairs and just say that normal stdin/out is illegal (but not necessarily >> checked) inside an enable and consoleio illegal outside of one. > > I don't think that was really necessary. And I think it's contributed to > your thinking in terms of "taking over the screen", thinking that if we > add gotoxy you will have to add that to your enable function and it will > happen to people who don't want it. > >> Read the source to curses. It consists of doing a bunch of additional >> termios stuff you don't need for raw mode, doing various bizarre >> workarounds for different platforms, setting env variables to interact >> with a variety of different terminal emulators, sending a variety of >> escape sequences to control things like soft labels and take over >> scrolling and scrollback on terminals that support it, etc. There's a >> reason lib_newterm.c is 352 lines long. > > And my whole point is NOT DOING those things. None of those are required > to clear the screen and go to a coordinate point on 99.999% of > terminals; So Cygwin makes up less than 0.001%? The first of those workarounds is to prevent a hang on Cygwin. (IIRC, you have to leak a /dev/tty handle before you can use stdin as a tty in some fares. IIR incorrectly... Then I'd write code that hangs.) > all you have to do is send two escape sequences. In most of > the remaining .001%, all you have to do is send another escape sequence > beforehand. You don't even have to turn off cooked mode. People put this > stuff in their prompts. Which is why the fancy prompt I built on my old Fedora box doesn't work right when I ssh in from my Mac or my Ubuntu box... > A lot of curses is also geared to performance, making sure you have the > shortest possible byte sequence for a screen update, especially on > terminals > that don't support region scrolling. Yes, a lot of it is. But there's also a lot that's about working correctly on different platforms/terminals. >> But I've repeatedly said that if you want full-screen graphics, you _do_ >> want to use curses, rather than try to reproduce decades worth of >> development and debugging to get it to work on a wide variety of >> platforms, handle features you haven't thought of, etc. > > That's cargo-cult thinking. Soft labels? When have you ever used > a terminal with them? When have you ever heard of an app that sets them? > And you just listed it off like something naturally no-one could do > without. If you're just going to pick one piece out of a paragraph and ignore the rest of it, it's going to be very hard to have a productive discussion. Sure, it's been a long time since I've even thought about soft labels. But scrollback buffers, to take the very next phrase, are something we both use every day. I would also like to have scrolling integrate properly with OS scrolling in Terminal.app, as emacs or any curses or app does, but many other fullscreen apps do not. And so on. If you want to go through all the features and workarounds curses deals with and decide which ones you do and don't need, of course you can. You also probably want to explore the various alternatives like termbox to see where they fall down and whether you care. I think you'll have a much easier time wrapping a small subset of curses to implement the functionality you want (and writing a separate Windows implementation, of course) than doing it from scratch. But I'm not planning to write that module, so what I think isn't as important here. From tjreedy at udel.edu Sat Aug 3 07:23:23 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 03 Aug 2013 01:23:23 -0400 Subject: [Python-ideas] get(w)ch, non-letter keys, and cross-platform non-blocking input. In-Reply-To: <1375466174.21273.5116415.56F56E76@webmail.messagingengine.com> References: <1375466174.21273.5116415.56F56E76@webmail.messagingengine.com> Message-ID: On 8/2/2013 1:56 PM, random832 at fastmail.us wrote: > The current behavior of getch and getwch on windows Do you mean the curses module windows functions (which is mostly *nix and not obviously on Windows) or the msvcrt module Windows functions? I am a little confused because you talked about both 'windows' and 'unix' as separate things. -- Terry Jan Reedy From steve at pearwood.info Sat Aug 3 09:10:58 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 03 Aug 2013 17:10:58 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <5C687C9F-014D-4D12-ABEE-E3A019237C78@yahoo.com> References: <51FBF02F.1000202@pearwood.info> <20130802201848.GC20955@mcnabbs.org> <5C687C9F-014D-4D12-ABEE-E3A019237C78@yahoo.com> Message-ID: <51FCAD02.4020308@pearwood.info> On 03/08/13 08:52, Andrew Barnert wrote: >> As Steven pointed out, numpy/scipy are best of breed in the Python >> ecosystem, but they're too "advanced" for inclusion in the standard >> library. There's room for a standard implementation, but the module >> wouldn't be complex enough to require years of development outside the >> standard library. > > Years of development, no. But a few months on PyPI (with people pointing to it from places like python-list and StackOverflow) would capture a lot wider experience and testing than just a discussion on this list. I have had a similar (but much more extensive) module on PyPy for 30+ months, and although there have been a reasonable number of downloads, I've had very little feedback. https://pypi.python.org/pypi/stats/ The only negative feedback I've received was an extended argument that circular_mean is invalid, never mind what the mathematicians say. I do not accept that claim. (In short, the mean of (say) 10? and 350? is 180? which is pointing in the wrong direction; circular_mean returns 0? which is probably what you want. Google for "mean of circular quantities" if you want to know more.) > Also, if it's reasonably possible to make the implementation work for 3.0-3.3 (or even 2.6-3.3) a PyPI module will remain useful as a quasi-official backport even after acceptance in the stdlib. I am happy to target 3.3 and keep it on PyPI; I'll look into backporting to previous versions. -- Steven From steve at pearwood.info Sat Aug 3 09:15:46 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 03 Aug 2013 17:15:46 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <468A3372-E2FD-4766-81B6-3D1969D97473@yahoo.com> References: <51FBF02F.1000202@pearwood.info> <20130802201848.GC20955@mcnabbs.org> <5C687C9F-014D-4D12-ABEE-E3A019237C78@yahoo.com> <51FC3B6E.6030208@stoneleaf.us> <468A3372-E2FD-4766-81B6-3D1969D97473@yahoo.com> Message-ID: <51FCAE22.1060108@pearwood.info> On 03/08/13 12:15, Andrew Barnert wrote: > Is there any reason to believe that this module would not benefit from wider exposure and use before finalizing it? Is it so urgent that we can't afford to wait for that to happen? Is it inappropriate for PyPI for some reason? Based on my experience with the module here: https://pypi.python.org/pypi/stats/ I believe that people will use a standard library module if it is available, or they will download and install a full-featured numpy style package if they are serious, heavy users of numerical code, but the sort of casual users who just want to calculate the average of a bunch of numbers, or calculate the standard deviation for some school work, aren't likely to download a third-party package. -- Steven From stephen at xemacs.org Sat Aug 3 16:35:32 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 03 Aug 2013 23:35:32 +0900 Subject: [Python-ideas] Support Unicode code point labels (Was: notation) In-Reply-To: References: Message-ID: <87a9kyn8p7.fsf@uwakimon.sk.tsukuba.ac.jp> Alexander Belopolsky writes: >?Yet, I still don't see the problem. >?You can already write > assert?unicodedata.category(chr(0x50000)) == > 'Cn' > in your code and this will blow up in any future version that > will use UCD with U+50000 assigned. That's not a problem. As you say, "presumably you're doing that for good reason." The problem is that someone will use code written by someone using a future version and run it with a past version, and the assert will trigger. I don't see any good reason why it should. The Unicode Standard explicitly specifies how unknown code points should be handled. Raising an exception is not part of that spec. From alexander.belopolsky at gmail.com Sat Aug 3 21:47:05 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 3 Aug 2013 15:47:05 -0400 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FBF02F.1000202@pearwood.info> References: <51FBF02F.1000202@pearwood.info> Message-ID: On Fri, Aug 2, 2013 at 1:45 PM, Steven D'Aprano wrote: > I have raised an issue on the tracker to add a statistics module to > Python's standard library: > > http://bugs.python.org/**issue18606 > > and have been asked to write a PEP. Attached is my draft PEP. Feedback is > requested, thanks in advance. > The PEP does not mention statistics.sum(), but the reference implementation includes it. I am not sure stdlib needs the third sum function after builtins.sum and math.fsum. I think it will be better to improve builtins.sum instead. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sat Aug 3 22:59:08 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 3 Aug 2013 16:59:08 -0400 Subject: [Python-ideas] Support Unicode code point labels (Was: notation) In-Reply-To: <87a9kyn8p7.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87a9kyn8p7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sat, Aug 3, 2013 at 10:35 AM, Stephen J. Turnbull wrote: > The problem is that someone will use code written by someone using a > future version and run it with a past version, and the assert will > trigger. I don't see any good reason why it should. The Unicode > Standard explicitly specifies how unknown code points should be > handled. Raising an exception is not part of that spec. It looks like we are running into a confusion between code points and their name property. I agree that a conforming function that returns the name for a code point should not raise an exception. The standard offers two alternatives: return an empty string or return a generated label. In this sense unicodedata.name() is not conforming: >>> unicodedata.name('\u0009') Traceback (most recent call last): File "", line 1, in ValueError: no such name However, it is trivial to achieve conforming behavior: >>> unicodedata.name('\u0009', '') '' I propose adding unicode.label() function that will return that will return 'control-0009' in this case. I think this proposal is fully inline with the standard. For the inverse operation, unicodedata.lookup(), I don't see anything in the standard that precludes raising an exception on an unknown name. If that was a problem, we would have it already. In Python >=3.2: >>> unicodedata.lookup('BELL') '?' But in Python 3.1: >>> unicodedata.lookup('BELL') Traceback (most recent call last): File "", line 1, in KeyError: "undefined character name 'BELL'" The only potential problem that I see with my proposal is that it is reasonable to expect that if '\N{whatever}' works in one version it will work the same in all versions after that. My proposal will break this expectation only in the case of '\N{reserved-NNNN}'. Once a code point NNNN is assigned '\N{reserved-NNNN}' will become a syntax error. If you agree that this is the only problematic case, let's focus on it. I cannot think of any reason to deliberately use reserved characters other than to stress-test your unicode handling software. In this application, you probably want to see an error once NNNN is assigned because your tests will no longer cover the unassigned character case. Can you suggest any other use? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Aug 4 00:23:38 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 03 Aug 2013 15:23:38 -0700 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <20130802194624.3166b0a6@fsol> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> <1375312667.43041.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130802155502.70623e1a@pitrou.net> <20130802180048.43ca437e@pitrou.net> <43FEADED-0036-4726-9830-2772F1A98583@yahoo.com> <20130802194624.3166b0a6@fsol> Message-ID: <51FD82EA.4080807@stoneleaf.us> On 08/02/2013 10:46 AM, Antoine Pitrou wrote: > > There may be many other cases, but almost any time people complain > about the lack of inline functions, it's in situations where they are > declaring callbacks (i.e. GUI- or network-programming). So, yes, I > don't think the other use cases should be a primary point of concern. So long as the other use-cases are still considered and allowed for. ;) -- Ethan From random832 at fastmail.us Sun Aug 4 00:58:22 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sat, 03 Aug 2013 18:58:22 -0400 Subject: [Python-ideas] get(w)ch, non-letter keys, and cross-platform non-blocking input. In-Reply-To: References: <1375466174.21273.5116415.56F56E76@webmail.messagingengine.com> Message-ID: <1375570702.12485.5474787.3D3E62AF@webmail.messagingengine.com> On Sat, Aug 3, 2013, at 1:23, Terry Reedy wrote: > On 8/2/2013 1:56 PM, random832 at fastmail.us > wrote: > > The current behavior of getch and getwch on windows > > Do you mean the curses module windows functions (which is mostly *nix > and not obviously on Windows) or the msvcrt module Windows functions? I > am a little confused because you talked about both 'windows' and 'unix' > as separate things. I meant the msvcrt functions. When I talked about unix later in the email I was talking about what is possible or not possible on unix, not what is currently implemented in any particular library. Sorry for the confusion. From python at mrabarnett.plus.com Sun Aug 4 01:57:13 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 04 Aug 2013 00:57:13 +0100 Subject: [Python-ideas] Support Unicode code point labels (Was: notation) In-Reply-To: References: <87a9kyn8p7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51FD98D9.6040309@mrabarnett.plus.com> On 03/08/2013 21:59, Alexander Belopolsky wrote: > > On Sat, Aug 3, 2013 at 10:35 AM, Stephen J. Turnbull > wrote: > > The problem is that someone will use code written by someone using a > future version and run it with a past version, and the assert will > trigger. I don't see any good reason why it should. The Unicode > Standard explicitly specifies how unknown code points should be > handled. Raising an exception is not part of that spec. > > > It looks like we are running into a confusion between code points and > their name property. I agree that a conforming function that returns > the name for a code point should not raise an exception. The standard > offers two alternatives: return an empty string or return a generated > label. In this sense unicodedata.name () is > not conforming: > > >>> unicodedata.name ('\u0009') > Traceback (most recent call last): > File "", line 1, in > ValueError: no such name > > However, it is trivial to achieve conforming behavior: > > >>> unicodedata.name('\u0009', '') > '' > > I propose adding unicode.label() function that will return that will > return 'control-0009' in this case. I think this proposal is fully > inline with the standard. > > For the inverse operation, unicodedata.lookup(), I don't see anything in > the standard that precludes raising an exception on an unknown name. If > that was a problem, we would have it already. > > In Python >=3.2: > > >>> unicodedata.lookup('BELL') > '?' > > But in Python 3.1: > > >>> unicodedata.lookup('BELL') > Traceback (most recent call last): > File "", line 1, in > KeyError: "undefined character name 'BELL'" > > The only potential problem that I see with my proposal is that it is > reasonable to expect that if '\N{whatever}' works in one version it will > work the same in all versions after that. My proposal will break this > expectation only in the case of '\N{reserved-NNNN}'. Once a code point > NNNN is assigned '\N{reserved-NNNN}' will become a syntax error. > I think that's to be expected. When a codepoint is assigned, it's no longer "reserved". It's just unfortunate that it'll break the code. > If you agree that this is the only problematic case, let's focus on it. > I cannot think of any reason to deliberately use reserved characters > other than to stress-test your unicode handling software. In this > application, you probably want to see an error once NNNN is assigned > because your tests will no longer cover the unassigned character case. > Actually putting '\N{reserved-NNNN}' in your code would be a bad idea because at some point in the future you won't be able to run the code at all! > Can you suggest any other use? > From steve at pearwood.info Sun Aug 4 03:51:49 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 04 Aug 2013 11:51:49 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: <51FDB3B5.4080506@pearwood.info> On 04/08/13 05:47, Alexander Belopolsky wrote: > The PEP does not mention statistics.sum(), but the reference implementation > includes it. I am not sure stdlib needs the third sum function after > builtins.sum and math.fsum. I think it will be better to improve > builtins.sum instead. I don't know enough C to volunteer to do that. If the built-in sum() is improved to the point it passes my unit tests, I would consider using it in the future. However, it is traditional to expose a sum() function under Stats in scientific calculators, and I think that whether I use my own, or the built-in, the statistics module should continue to expose it as a public function. For the same reason, I'm very slightly +0.01 leaning towards adding a sum2 function for calculating the sum of squares, but on the other hand it is simple enough to do with a generator expression: sum(x**2 for x in data) so I thought I'd leave it out and see if there is demand for it. -- Steven From python at mrabarnett.plus.com Sun Aug 4 04:00:52 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 04 Aug 2013 03:00:52 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FDB3B5.4080506@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <51FDB3B5.4080506@pearwood.info> Message-ID: <51FDB5D4.9040901@mrabarnett.plus.com> On 04/08/2013 02:51, Steven D'Aprano wrote: > On 04/08/13 05:47, Alexander Belopolsky wrote: > >> The PEP does not mention statistics.sum(), but the reference >> implementation includes it. I am not sure stdlib needs the third >> sum function after builtins.sum and math.fsum. I think it will be >> better to improve builtins.sum instead. > > > I don't know enough C to volunteer to do that. If the built-in sum() > is improved to the point it passes my unit tests, I would consider > using it in the future. However, it is traditional to expose a sum() > function under Stats in scientific calculators, and I think that > whether I use my own, or the built-in, the statistics module should > continue to expose it as a public function. > > For the same reason, I'm very slightly +0.01 leaning towards adding a > sum2 function for calculating the sum of squares, but on the other > hand it is simple enough to do with a generator expression: > > sum(x**2 for x in data) > > so I thought I'd leave it out and see if there is demand for it. > If you do add it, a better name might be "sum_sq" or similar. From eliben at gmail.com Sun Aug 4 04:00:59 2013 From: eliben at gmail.com (Eli Bendersky) Date: Sat, 3 Aug 2013 19:00:59 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: On Sat, Aug 3, 2013 at 12:47 PM, Alexander Belopolsky wrote: > > On Fri, Aug 2, 2013 at 1:45 PM, Steven D'Aprano wrote: >> >> I have raised an issue on the tracker to add a statistics module to >> Python's standard library: >> >> http://bugs.python.org/issue18606 >> >> and have been asked to write a PEP. Attached is my draft PEP. Feedback is >> requested, thanks in advance. > > > The PEP does not mention statistics.sum(), but the reference implementation > includes it. I am not sure stdlib needs the third sum function after > builtins.sum and math.fsum. I think it will be better to improve > builtins.sum instead. While I'm somewhat -0.5 on the general idea of the statistics module (competing with well-established, super-optimized and by-themselves-famous numeric libraries Python has does not sound like a worthy goal), I have to agree with Alexander w.r.t. "sum". Strongly -1 from me on having functions with the same name as existing stdlib functions but different functionality. This is very much unpythonic. Eli From steve at pearwood.info Sun Aug 4 04:16:50 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 04 Aug 2013 12:16:50 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FDB3B5.4080506@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <51FDB3B5.4080506@pearwood.info> Message-ID: <51FDB992.5080305@pearwood.info> On 04/08/13 11:51, Steven D'Aprano wrote: > I don't know enough C to volunteer to do that. If the built-in sum() is improved to the point it passes my unit tests, I would consider using it in the future. Actually, on further thought, I don't think I would. A statistic sum should be restricted to only operate on numbers, not on arbitrary non-numeric values that happen to support the + operator (e.g. lists). As far as I know, the *only* stats function that is defined to work with non-numeric data is mode() (which my code supports). So even if the built-in was improved, I'd still need to wrap it with something vaguely like this: def sum(data, start=0): if not isinstance(start, numbers.Number): raise ... result = builtins.sum(data, sum) if not isinstance(result, numbers.Number): raise ... return result (In hindsight, the decision to allow built-in sum to support non-numbers seems more and more unfortunate to me.) Being able to add numbers, and get a nice error if a non-numeric type slips into your data, is part of the API for statistics libraries. Built-in sum() doesn't meet that requirement. That they happen to have the same name is neither here nor there. That's why we don't force everything into one giant flat namespace. -- Steven From joshua at landau.ws Sun Aug 4 04:42:45 2013 From: joshua at landau.ws (Joshua Landau) Date: Sun, 4 Aug 2013 03:42:45 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: On 4 August 2013 03:00, Eli Bendersky wrote: > On Sat, Aug 3, 2013 at 12:47 PM, Alexander Belopolsky > wrote: > > > > On Fri, Aug 2, 2013 at 1:45 PM, Steven D'Aprano > wrote: > >> > >> I have raised an issue on the tracker to add a statistics module to > >> Python's standard library: > >> > >> http://bugs.python.org/issue18606 > >> > >> and have been asked to write a PEP. Attached is my draft PEP. Feedback > is > >> requested, thanks in advance. > > > > > > The PEP does not mention statistics.sum(), but the reference > implementation > > includes it. I am not sure stdlib needs the third sum function after > > builtins.sum and math.fsum. I think it will be better to improve > > builtins.sum instead. > > While I'm somewhat -0.5 on the general idea of the statistics module > (competing with well-established, super-optimized and > by-themselves-famous numeric libraries Python has does not sound like > a worthy goal), I don't believe it is, in the general case. This is for those cases where you might go only with reluctance with numpy, or even be forced to roll your own. Numpy is a beast that some people, me included, haven't need to learn yet statistics often come in use in a lot of algorithms. Not to mention the full third-second lag to import numpy ;). > I have to agree with Alexander w.r.t. "sum". Strongly > -1 from me on having functions with the same name as existing stdlib > functions but different functionality. This is very much unpythonic. > I don't agree that this is a segregation that has to happen, but I agree that it's not something that stdlib does AFAIK. I think that's a tradition worth keeping. Additionally it's not immediately obvious to any newcomer why statistics.sum is implemented differently to builtins.sum - this should be made evident from the name (akin to fsum). statistics.sum is a statistical sum of numeric data optimised to be correct. builtins.sum is, as far as the user can tell, just iterated addition. They both have their place but they're different places and it should be more immediately obvious where. Finally -- do we need math.fsum? if we have statistics.sum? ? I just noticed fsum says "a float is required" when given invalid data despite accepting generic numerics. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Aug 4 05:25:50 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 4 Aug 2013 13:25:50 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: On 4 Aug 2013 12:44, "Joshua Landau" wrote: > > On 4 August 2013 03:00, Eli Bendersky wrote: >> >> On Sat, Aug 3, 2013 at 12:47 PM, Alexander Belopolsky >> wrote: >> > >> > On Fri, Aug 2, 2013 at 1:45 PM, Steven D'Aprano wrote: >> >> >> >> I have raised an issue on the tracker to add a statistics module to >> >> Python's standard library: >> >> >> >> http://bugs.python.org/issue18606 >> >> >> >> and have been asked to write a PEP. Attached is my draft PEP. Feedback is >> >> requested, thanks in advance. >> > >> > >> > The PEP does not mention statistics.sum(), but the reference implementation >> > includes it. I am not sure stdlib needs the third sum function after >> > builtins.sum and math.fsum. I think it will be better to improve >> > builtins.sum instead. >> >> While I'm somewhat -0.5 on the general idea of the statistics module >> (competing with well-established, super-optimized and >> by-themselves-famous numeric libraries Python has does not sound like >> a worthy goal), > > > I don't believe it is, in the general case. This is for those cases where you might go only with reluctance with numpy, or even be forced to roll your own. Numpy is a beast that some people, me included, haven't need to learn yet statistics often come in use in a lot of algorithms. Not to mention the full third-second lag to import numpy ;). > >> >> I have to agree with Alexander w.r.t. "sum". Strongly >> -1 from me on having functions with the same name as existing stdlib >> functions but different functionality. This is very much unpythonic. > > > I don't agree that this is a segregation that has to happen, but I agree that it's not something that stdlib does AFAIK. I think that's a tradition worth keeping. Additionally it's not immediately obvious to any newcomer why statistics.sum is implemented differently to builtins.sum - this should be made evident from the name (akin to fsum). > > statistics.sum is a statistical sum of numeric data optimised to be correct. builtins.sum is, as far as the user can tell, just iterated addition. They both have their place but they're different places and it should be more immediately obvious where. > > Finally -- do we need math.fsum? if we have statistics.sum? Right, statistics.sum should be seen as a more obvious replacement for math.fsum, rather than replacing the builtin sum. (However, it may make sense for statistics.sum to use math.fsum internally). A pre-emptive FAQ answer may also be appropriate. Cheers, Nick. > > ? I just noticed fsum says "a float is required" when given invalid data despite accepting generic numerics. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andres.osinski at gmail.com Sun Aug 4 07:48:57 2013 From: andres.osinski at gmail.com (Andres Osinski) Date: Sun, 4 Aug 2013 02:48:57 -0300 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: Could not care less so long as there is consistency. On Sun, Aug 4, 2013 at 12:25 AM, Nick Coghlan wrote: > > On 4 Aug 2013 12:44, "Joshua Landau" wrote: > > > > On 4 August 2013 03:00, Eli Bendersky wrote: > >> > >> On Sat, Aug 3, 2013 at 12:47 PM, Alexander Belopolsky > >> wrote: > >> > > >> > On Fri, Aug 2, 2013 at 1:45 PM, Steven D'Aprano > wrote: > >> >> > >> >> I have raised an issue on the tracker to add a statistics module to > >> >> Python's standard library: > >> >> > >> >> http://bugs.python.org/issue18606 > >> >> > >> >> and have been asked to write a PEP. Attached is my draft PEP. > Feedback is > >> >> requested, thanks in advance. > >> > > >> > > >> > The PEP does not mention statistics.sum(), but the reference > implementation > >> > includes it. I am not sure stdlib needs the third sum function after > >> > builtins.sum and math.fsum. I think it will be better to improve > >> > builtins.sum instead. > >> > >> While I'm somewhat -0.5 on the general idea of the statistics module > >> (competing with well-established, super-optimized and > >> by-themselves-famous numeric libraries Python has does not sound like > >> a worthy goal), > > > > > > I don't believe it is, in the general case. This is for those cases > where you might go only with reluctance with numpy, or even be forced to > roll your own. Numpy is a beast that some people, me included, haven't need > to learn yet statistics often come in use in a lot of algorithms. Not to > mention the full third-second lag to import numpy ;). > > > >> > >> I have to agree with Alexander w.r.t. "sum". Strongly > >> -1 from me on having functions with the same name as existing stdlib > >> functions but different functionality. This is very much unpythonic. > > > > > > I don't agree that this is a segregation that has to happen, but I agree > that it's not something that stdlib does AFAIK. I think that's a tradition > worth keeping. Additionally it's not immediately obvious to any newcomer > why statistics.sum is implemented differently to builtins.sum - this should > be made evident from the name (akin to fsum). > > > > statistics.sum is a statistical sum of numeric data optimised to be > correct. builtins.sum is, as far as the user can tell, just iterated > addition. They both have their place but they're different places and it > should be more immediately obvious where. > > > > Finally -- do we need math.fsum? if we have statistics.sum? > > Right, statistics.sum should be seen as a more obvious replacement for > math.fsum, rather than replacing the builtin sum. (However, it may make > sense for statistics.sum to use math.fsum internally). > > A pre-emptive FAQ answer may also be appropriate. > > Cheers, > Nick. > > > > > ? I just noticed fsum says "a float is required" when given invalid data > despite accepting generic numerics. > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- Andr?s Osinski http://www.andresosinski.com.ar/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Aug 4 09:07:04 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 04 Aug 2013 00:07:04 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: <51FDFD98.3010902@stoneleaf.us> On 08/03/2013 07:00 PM, Eli Bendersky wrote: > > While I'm somewhat -0.5 on the general idea of the statistics module > (competing with well-established, super-optimized and > by-themselves-famous numeric libraries Python has does not sound like > a worthy goal), Sure, competing with already established libraries is silly. Fortunately, that's not what is happening here. This PEP is about providing a minimal, common set of statistics functions for the average person. > I have to agree with Alexander w.r.t. "sum". Strongly > -1 from me on having functions with the same name as existing stdlib > functions but different functionality. This is very much unpythonic. I thought the whole point of name spaces was to be able to have the same name mean different things in different contexts. Surely no one expects to be able to use `webbrowser.open` or `gzip.open` anywhere `open` can be used. -- ~Ethan~ From stephen at xemacs.org Sun Aug 4 10:16:22 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 04 Aug 2013 17:16:22 +0900 Subject: [Python-ideas] Support Unicode code point labels (Was: notation) In-Reply-To: References: <87a9kyn8p7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <8738qpna5l.fsf@uwakimon.sk.tsukuba.ac.jp> Alexander Belopolsky writes: > It looks like we are running into a confusion between code points > and their name property. No. There is no confusion, not on my part, anyway. Let me state my position in full, without rationale. The name property is an entry in the UCD, indexed by code point. unidata.name(codepoint) should return that property, and nothing else (perhaps returning an empty string or placeholder string that cannot be a character name, instead of an exception, for a codepoint that doesn't have a name property). The code_point_type-nnnn construct, a label, should never be returned by unidata.name(). I have no position on whether a new method such as .label() should be added to support deriving labels from code points. (I suspect it's a YAGNI but I have no good evidence for that.) The question is how to handle strings that purport to uniquely describe some Unicode code point. First, since the code point is described uniquely, I see no need (except perhaps backward compatibility) for a new method. If the backward incompatibility is judged small (I think it is), then the .lookup() method should be extended to handle strings that are not the name property of any character. Second, if the string argument is the name property of a Unicode character, that character's code point should be returned. I think these first two points are non-controversial (modulo one's opinion on the backwark compatibility issue). Third, I contend that unidata.lookup() should recognize the "U+nnnn" format and return int("nnnn", 16). Further, use of "\N{U+nnnn}" is preferable to Steven's proposed "\U+nnnn" escape sequence, or variants using braces to delimit the code point. Fourth, I find it acceptable that unidata.lookup() should recognize the "code_point_type-nnnn" label format for any code_point_type defined in Unicode, and return int("nnnn", 16). (Again, personally I think it's a YAGNI, but others might find the redundant code point type information useful for consistency checking.) Further, unidata.lookup() should not raise an exception if code_point_type is inconsistent with nnnn. This consistency checking should be left up to programs like pylint. From ncoghlan at gmail.com Sun Aug 4 11:49:10 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 4 Aug 2013 19:49:10 +1000 Subject: [Python-ideas] Support Unicode code point labels (Was: notation) In-Reply-To: <8738qpna5l.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87a9kyn8p7.fsf@uwakimon.sk.tsukuba.ac.jp> <8738qpna5l.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 4 August 2013 18:16, Stephen J. Turnbull wrote: > Alexander Belopolsky writes: > > > It looks like we are running into a confusion between code points > > and their name property. > > No. There is no confusion, not on my part, anyway. Let me state my > position in full, without rationale. And just for the record: my position is consistent with Stephen's, including the "You Ain't Gonna Need It" call for the "code_point_type-nnnn" format. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From eliben at gmail.com Sun Aug 4 14:51:45 2013 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 4 Aug 2013 05:51:45 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FDFD98.3010902@stoneleaf.us> References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> Message-ID: On Sun, Aug 4, 2013 at 12:07 AM, Ethan Furman wrote: > On 08/03/2013 07:00 PM, Eli Bendersky wrote: >> >> >> While I'm somewhat -0.5 on the general idea of the statistics module >> (competing with well-established, super-optimized and >> by-themselves-famous numeric libraries Python has does not sound like >> a worthy goal), > > > Sure, competing with already established libraries is silly. Fortunately, > that's not what is happening here. This PEP is about providing a minimal, > common set of statistics functions for the average person. I'm really not sure who this average person is, but everyone keeps talking about him. Is it the same person for whom Dummies books are written? Anyhow, "minimal" is a dangerous slope. With such a module in the stdlib, I'm 100% sure we'll get a constant stream of - please add just this function (from SciPy) - it's so useful to the "average person" - requests. This is unavoidable. And it will be difficult to judge at that point why certain funcitonality belongs or does not belong here. So over time we'll end up with a partial Greenspun, by containing an ad hoc, slow implementation of half of Numpy/SciPy. Efforts are better spent in writing a new tutorial on Numpy that shows how to do the stuff statistics.py does. Call it "Numpy statistics for the average person". >> I have to agree with Alexander w.r.t. "sum". Strongly >> -1 from me on having functions with the same name as existing stdlib >> functions but different functionality. This is very much unpythonic. > > > I thought the whole point of name spaces was to be able to have the same > name mean different things in different contexts. Surely no one expects to > be able to use `webbrowser.open` or `gzip.open` anywhere `open` can be used. This is not a fair comparison. As a pop quiz, try to imagine the difference between 'open' and 'gzip.open' - do you immediately come up with the differences in their functionalities? Now, how about 'sum' and 'statistics.sum'? I definitely struggle with the latter. That may be because I'm average, of course. [Sorry to have beaten on this average thing so much; patronization drives me mad] Eli From ncoghlan at gmail.com Sun Aug 4 15:41:01 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 4 Aug 2013 23:41:01 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> Message-ID: On 4 August 2013 22:51, Eli Bendersky wrote: > On Sun, Aug 4, 2013 at 12:07 AM, Ethan Furman wrote: >> On 08/03/2013 07:00 PM, Eli Bendersky wrote: >>> >>> >>> While I'm somewhat -0.5 on the general idea of the statistics module >>> (competing with well-established, super-optimized and >>> by-themselves-famous numeric libraries Python has does not sound like >>> a worthy goal), >> >> >> Sure, competing with already established libraries is silly. Fortunately, >> that's not what is happening here. This PEP is about providing a minimal, >> common set of statistics functions for the average person. > > I'm really not sure who this average person is, but everyone keeps > talking about him. Is it the same person for whom Dummies books are > written? > > Anyhow, "minimal" is a dangerous slope. With such a module in the > stdlib, I'm 100% sure we'll get a constant stream of - please add just > this function (from SciPy) - it's so useful to the "average person" - > requests. This is unavoidable. And it will be difficult to judge at > that point why certain funcitonality belongs or does not belong here. > So over time we'll end up with a partial Greenspun, by containing an > ad hoc, slow implementation of half of Numpy/SciPy. This is why the PEP needs to reference Raymond's original proposal to create and add this library. The "average person" in this context is "students of a very experienced Python instructor". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From clay.sweetser at gmail.com Sun Aug 4 15:41:09 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Sun, 4 Aug 2013 09:41:09 -0400 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> Message-ID: On Aug 4, 2013 8:53 AM, "Eli Bendersky" wrote: > > On Sun, Aug 4, 2013 at 12:07 AM, Ethan Furman wrote: > > On 08/03/2013 07:00 PM, Eli Bendersky wrote: > >> > >> > >> While I'm somewhat -0.5 on the general idea of the statistics module > >> (competing with well-established, super-optimized and > >> by-themselves-famous numeric libraries Python has does not sound like > >> a worthy goal), > > > > > > Sure, competing with already established libraries is silly. Fortunately, > > that's not what is happening here. This PEP is about providing a minimal, > > common set of statistics functions for the average person. > > I'm really not sure who this average person is, but everyone keeps > talking about him. Is it the same person for whom Dummies books are > written? > > Anyhow, "minimal" is a dangerous slope. With such a module in the > stdlib, I'm 100% sure we'll get a constant stream of - please add just > this function (from SciPy) - it's so useful to the "average person" - > requests. This is unavoidable. And it will be difficult to judge at > that point why certain funcitonality belongs or does not belong here. > So over time we'll end up with a partial Greenspun, by containing an > ad hoc, slow implementation of half of Numpy/SciPy. > > Efforts are better spent in writing a new tutorial on Numpy that shows > how to do the stuff statistics.py does. Call it "Numpy statistics for > the average person". By this same logic, had common modules such as math not already been proposed, any proposal to add them now would be rejected. Why have the math module, when numpy is available? Why have asyncore (Ill designed as some may call it) or any of the port and connection libraries, when twisted and tornado are available? Would you want them removed when Python 4000 comes along? If a good statistics module, with a well defined scope, is created, then I believe there will be minimal requests for additions. For those requests that do come along, one only has to look at the mail archives to see how often a proposal for addition of something into a standard library module succeeds to know that it is unlikely that a statistics module will "accumulate" features. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at gmail.com Sun Aug 4 15:53:37 2013 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 4 Aug 2013 06:53:37 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> Message-ID: On Sun, Aug 4, 2013 at 6:41 AM, Clay Sweetser wrote: > > On Aug 4, 2013 8:53 AM, "Eli Bendersky" wrote: >> >> On Sun, Aug 4, 2013 at 12:07 AM, Ethan Furman wrote: >> > On 08/03/2013 07:00 PM, Eli Bendersky wrote: >> >> >> >> >> >> While I'm somewhat -0.5 on the general idea of the statistics module >> >> (competing with well-established, super-optimized and >> >> by-themselves-famous numeric libraries Python has does not sound like >> >> a worthy goal), >> > >> > >> > Sure, competing with already established libraries is silly. >> > Fortunately, >> > that's not what is happening here. This PEP is about providing a >> > minimal, >> > common set of statistics functions for the average person. >> >> I'm really not sure who this average person is, but everyone keeps >> talking about him. Is it the same person for whom Dummies books are >> written? >> >> Anyhow, "minimal" is a dangerous slope. With such a module in the >> stdlib, I'm 100% sure we'll get a constant stream of - please add just >> this function (from SciPy) - it's so useful to the "average person" - >> requests. This is unavoidable. And it will be difficult to judge at >> that point why certain funcitonality belongs or does not belong here. >> So over time we'll end up with a partial Greenspun, by containing an >> ad hoc, slow implementation of half of Numpy/SciPy. >> >> Efforts are better spent in writing a new tutorial on Numpy that shows >> how to do the stuff statistics.py does. Call it "Numpy statistics for >> the average person". > By this same logic, had common modules such as math not already been > proposed, any proposal to add them now would be rejected. Why have the math > module, when numpy is available? Why have asyncore (Ill designed as some may > call it) or any of the port and connection libraries, when twisted and > tornado are available? Would you want them removed when Python 4000 comes > along? Comparison with existing, historical code that pre-dated most of the 3rd party libs out there is irrelevant, of course. Had the stdlib been designed today, I'm sure it would look differently, and yet this is not the situation we're in. > If a good statistics module, with a well defined scope, is created, then I > believe there will be minimal requests for additions. On what is this belief based? Years of observing this mailing list? Once you have foo and bar in "statistics", every discussion will end up justifying why they are better than "baz" that was left out. > For those requests that do come along, one only has to look at the mail > archives to see how often a proposal for addition of something into a > standard library module succeeds to know that it is unlikely that a > statistics module will "accumulate" features. Right, so it's better to nip it at the bud. There's a good reason the stdlib does not grow new features every second friday. It's because there is a group of people who has to stick with it for years maintaining all that code. It's perfectly OK to look critically at all new proposals. Having one way to do it is a Python design goal. Eli From ncoghlan at gmail.com Sun Aug 4 16:35:26 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 5 Aug 2013 00:35:26 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> Message-ID: On 4 August 2013 23:53, Eli Bendersky wrote: > Right, so it's better to nip it at the bud. There's a good reason the > stdlib does not grow new features every second friday. It's because > there is a group of people who has to stick with it for years > maintaining all that code. It's perfectly OK to look critically at all > new proposals. Having one way to do it is a Python design goal. Creation of a module like this was *explicitly requested* by core developers (originally Raymond, but supported by others, including me), to provide a robust implementation of "high school statistics" in the standard library. This isn't something Steven just dropped on us out of nowhere - the idea had its genesis on python-ideas, spawned Steven's stats module on PyPI, and has now come full circle with the proposal for a cutdown version of the stats module as "statistics" in the standard library. (And yes, this history should be covered in the PEP, rather than assuming people know it already) Just as the math module covers the basics of trigonometry, without covering everything else that is provided by NumPy/SciPy, this proposed module covers the basics of statistics. It's the included battery - numpy/scipy is the nuclear reactor that needs to stay as a third party download. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Sun Aug 4 17:24:09 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 04 Aug 2013 08:24:09 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> Message-ID: <51FE7219.4010804@stoneleaf.us> On 08/04/2013 06:53 AM, Eli Bendersky wrote: > > Having one way to do it is a Python design goal. 1) Having one *obvious* way to do it is a Python design goal. 2) Third-party libs are not part of Python. -- ~Ethan~ From ethan at stoneleaf.us Sun Aug 4 17:20:58 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 04 Aug 2013 08:20:58 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> Message-ID: <51FE715A.6080102@stoneleaf.us> On 08/04/2013 05:51 AM, Eli Bendersky wrote: > On Sun, Aug 4, 2013 at 12:07 AM, Ethan Furman wrote: > > Anyhow, "minimal" is a dangerous slope. With such a module in the > stdlib, I'm 100% sure we'll get a constant stream of - please add just > this function (from SciPy) - it's so useful to the "average person" - > requests. This is unavoidable. And it will be difficult to judge at > that point why certain funcitonality belongs or does not belong here. > So over time we'll end up with a partial Greenspun, by containing an > ad hoc, slow implementation of half of Numpy/SciPy. Fair point. > Efforts are better spent in writing a new tutorial on Numpy that shows > how to do the stuff statistics.py does. Call it "Numpy statistics for > the average person". Sounds useful. >> I thought the whole point of name spaces was to be able to have the same >> name mean different things in different contexts. Surely no one expects to >> be able to use `webbrowser.open` or `gzip.open` anywhere `open` can be used. > > This is not a fair comparison. As a pop quiz, try to imagine the > difference between 'open' and 'gzip.open' - do you immediately come up > with the differences in their functionalities? Now, how about 'sum' > and 'statistics.sum'? It's an absolutely fair comparison. Different modules, same name. Their functionalities? No, I don't immediately come up with the differences, unless "gzip.open must have something to do with gzip files" counts. Coincidentally, that's the same difference I immediately come up with for sum and statistics.sum -- "statistics.sum must have something do to with statistics"; and I would never think about it again unless I had a problem with statistics. > I definitely struggle with the latter. That may be because I'm > average, of course. Why should you have to be able to? 1) That's why we have documentation. 2) If you are summing objects of type xyz why would you reach for something called statistics.sum? > [Sorry to have beaten on this average thing so much; patronization > drives me mad] No offense intended. I definitely count myself in the "average" camp when it comes to statistics. -- ~Ethan~ From joshua at landau.ws Mon Aug 5 00:51:36 2013 From: joshua at landau.ws (Joshua Landau) Date: Sun, 4 Aug 2013 23:51:36 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FE715A.6080102@stoneleaf.us> References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> <51FE715A.6080102@stoneleaf.us> Message-ID: On 08/04/2013 05:51 AM, Eli Bendersky wrote: > On Sun, Aug 4, 2013 at 12:07 AM, Ethan Furman wrote: > > Anyhow, "minimal" is a dangerous slope. With such a module in the > stdlib, I'm 100% sure we'll get a constant stream of - please add just > this function (from SciPy) - it's so useful to the "average person" - > requests. This is unavoidable. And it will be difficult to judge at > that point why certain funcitonality belongs or does not belong here. > So over time we'll end up with a partial Greenspun, by containing an > ad hoc, slow implementation of half of Numpy/SciPy. > I disagree. Has numpy made there an unreasonable number of additions to the math module? Why would it be different for statistics modules? On 4 August 2013 16:20, Ethan Furman wrote: > I thought the whole point of name spaces was to be able to have the same >>> name mean different things in different contexts. Surely no one expects >>> to >>> be able to use `webbrowser.open` or `gzip.open` anywhere `open` can be >>> used. >>> >> >> This is not a fair comparison. As a pop quiz, try to imagine the >> difference between 'open' and 'gzip.open' - do you immediately come up >> with the differences in their functionalities? Now, how about 'sum' >> and 'statistics.sum'? >> > > It's an absolutely fair comparison. Different modules, same name. Their > functionalities? No, I don't immediately come up with the differences, > unless "gzip.open must have something to do with gzip files" counts. I'd say it does count. > Coincidentally, that's the same difference I immediately come up with for > sum and statistics.sum -- "statistics.sum must have something do to with > statistics"; and I would never think about it again unless I had a problem > with statistics. That's not really true -- statistics.sum is better named statistics.precise_sum. It's not only useful when doing statistics. I tend to think of it this way: the name should make it clear when you should read the documentation. gzip.open is obvious; you should read it when you work with gzip files. statistics.sum is not because *all* sums are statistical sums. An accurate name ? la "precise_sum" would make it obvious that you should read the docs whenever you're doing sums that need precision. The docs should quickly say that it deals with loss of precision dealing with variations in orders of magnitude and other floating point mischiefs (including sum([0.1]*10)). On a third point, would it make sense for this to be maths.statistics? It'd increase discoverability for exactly the target audience and it seems to make sense to me. (We could, as a bonus, then easily deprecate math.fsum in favour of math.statistics.sum). -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Mon Aug 5 01:38:04 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 5 Aug 2013 00:38:04 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FBF02F.1000202@pearwood.info> References: <51FBF02F.1000202@pearwood.info> Message-ID: On 2 August 2013 18:45, Steven D'Aprano wrote: > I have raised an issue on the tracker to add a statistics module to Python's > standard library: > > http://bugs.python.org/issue18606 > > and have been asked to write a PEP. Attached is my draft PEP. Feedback is > requested, thanks in advance. Excellent work Steven! If I had a penny for every time I rolled-my-own mean() I could buy you a drink or two to thank you for this (actually at current UK prices one pint would be a stretch but we could call it two and I'd cover the difference!). I've just begun to look at the PEP but I've already noticed one thing that seems to be missing from the rationale (and from this email thread). There is much discussion about numpy/scipy as the obvious way to get stats functions and why some users might not want to install/use them. A significant point that is worth mentioning is that numpy/scipy are CPython-specific. There is the numpypy project that aims to bridge part of this gap between CPython and pypy but it's incomplete and I don't know of similar efforts for the other Python implementations. There may be alternative libraries that expose stats functions for Jython etc, but AFAIK there's currently no decent cross-implementation solution for even basic stats like the mean. Your proposal would rectify that problem and it should say this in the PEP. (Expect a more detailed response to the proposal/implementation in the next few days.) Oscar From ethan at stoneleaf.us Mon Aug 5 01:28:00 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 04 Aug 2013 16:28:00 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> <51FE715A.6080102@stoneleaf.us> Message-ID: <51FEE380.1090504@stoneleaf.us> On 08/04/2013 03:51 PM, Joshua Landau wrote: > >> Coincidentally, that's the same difference I immediately come up >> with for sum and statistics.sum -- "statistics.sum must have >> something do to with statistics"; and I would never think about >> it again unless I had a problem with statistics. > > > That's not really true -- statistics.sum is better named > statistics.precise_sum. It's not only useful when doing statistics. That may be, but it is in the statistics module, so I'm not going to think of it unless I'm doing something requiring statistics. > I tend to think of it this way: the name should make it clear when > you should read the documentation. gzip.open is obvious; you should > read it when you work with gzip files. statistics.sum is not because > *all* sums are statistical sums. Apparently you know more about stats than I do. I must admit I'm curious how the sum 10 is statistical (to make it interesting we can say it's the value of the dollars I have in my pocket). At any rate, you can see that I, at least, would read the docs to see what statistics.sum would do. -- ~Ethan~ From ethan at stoneleaf.us Mon Aug 5 02:06:43 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 04 Aug 2013 17:06:43 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> <51FE715A.6080102@stoneleaf.us> Message-ID: <51FEEC93.8020907@stoneleaf.us> On 08/04/2013 03:51 PM, Joshua Landau wrote: > > I tend to think of it this way: the name should make it clear when you should read the documentation. If the function is in a module you haven't dealt with before, you should read the docs. -- ~Ethan~ From joshua at landau.ws Mon Aug 5 02:38:39 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 5 Aug 2013 01:38:39 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FEE380.1090504@stoneleaf.us> References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> <51FE715A.6080102@stoneleaf.us> <51FEE380.1090504@stoneleaf.us> Message-ID: On 5 August 2013 00:28, Ethan Furman wrote: > On 08/04/2013 03:51 PM, Joshua Landau wrote: > >> >> Coincidentally, that's the same difference I immediately come up >>> with for sum and statistics.sum -- "statistics.sum must have >>> something do to with statistics"; and I would never think about >>> it again unless I had a problem with statistics. >>> >> >> That's not really true -- statistics.sum is better named >> statistics.precise_sum. It's not only useful when doing statistics. >> > > That may be, but it is in the statistics module, so I'm not going to > think of it unless I'm doing something requiring statistics. That's mostly true and part of the justification for my proposal of putting this under "math". Statistics is a branch of math and often when you need precise numerical results you'd think of the math module even if you didn't firstly think you were doing statistics. I tend to think of it this way: the name should make it clear when >> you should read the documentation. gzip.open is obvious; you should >> read it when you work with gzip files. statistics.sum is not because >> *all* sums are statistical sums. >> > > Apparently you know more about stats than I do. I must admit I'm curious > how the sum 10 is statistical (to make it interesting we can say it's the > value of the dollars I have in my pocket). At any rate, you can see that > I, at least, would read the docs to see what statistics.sum would do. ? la wikipedia, "Statistics is the study of the collection, organization, analysis, interpretation and presentation of data". Hence a summation is statistical, irrelevant of the specifics. It's just pedantry by this point but I was trying to confer the idea that it's not clear cut when someone is doing "statistics" if you're really just using sum. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Mon Aug 5 02:40:16 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 5 Aug 2013 01:40:16 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FEEC93.8020907@stoneleaf.us> References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> <51FE715A.6080102@stoneleaf.us> <51FEEC93.8020907@stoneleaf.us> Message-ID: On 5 August 2013 01:06, Ethan Furman wrote: > On 08/04/2013 03:51 PM, Joshua Landau wrote: > >> >> I tend to think of it this way: the name should make it clear when you >> should read the documentation. >> > > If the function is in a module you haven't dealt with before, you should > read the docs. Only if you plan on using it. The point was that the name should hint when you should think of using it and the docs should confirm. Obviously you shouldn't use an unfamiliar function before you know what it does ;). -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Aug 5 03:40:27 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 05 Aug 2013 11:40:27 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> Message-ID: <51FF028B.6020409@pearwood.info> On 04/08/13 22:51, Eli Bendersky wrote: > On Sun, Aug 4, 2013 at 12:07 AM, Ethan Furman wrote: >> On 08/03/2013 07:00 PM, Eli Bendersky wrote: >>> >>> >>> While I'm somewhat -0.5 on the general idea of the statistics module >>> (competing with well-established, super-optimized and >>> by-themselves-famous numeric libraries Python has does not sound like >>> a worthy goal), >> >> >> Sure, competing with already established libraries is silly. Fortunately, >> that's not what is happening here. This PEP is about providing a minimal, >> common set of statistics functions for the average person. > > I'm really not sure who this average person is, but everyone keeps > talking about him. Is it the same person for whom Dummies books are > written? > > Anyhow, "minimal" is a dangerous slope. With such a module in the > stdlib, I'm 100% sure we'll get a constant stream of - please add just > this function (from SciPy) - it's so useful to the "average person" - > requests. This is unavoidable. And it will be difficult to judge at > that point why certain funcitonality belongs or does not belong here. > So over time we'll end up with a partial Greenspun, by containing an > ad hoc, slow implementation of half of Numpy/SciPy. [only half serious] Perhaps we should have a pure-Python implementation of numpy/scipy, for non-C based Pythons. If I recall correctly, PyPy had to engage in a massive effort to get numpy even partially working. The pure-Python part of the stdlib is not just the stdlib for CPython, but potentially for the entire Python universe. > Efforts are better spent in writing a new tutorial on Numpy that shows > how to do the stuff statistics.py does. Call it "Numpy statistics for > the average person". That does not help those who are unable to install numpy due to restrictive policies about what software can be installed. The choice is not either statistics or better tutorials. We can have both, if somebody volunteers to write those tutorials, or neither. I am not volunteering to write numpy tutorials. >>> I have to agree with Alexander w.r.t. "sum". Strongly >>> -1 from me on having functions with the same name as existing stdlib >>> functions but different functionality. This is very much unpythonic. >> >> >> I thought the whole point of name spaces was to be able to have the same >> name mean different things in different contexts. Surely no one expects to >> be able to use `webbrowser.open` or `gzip.open` anywhere `open` can be used. > > This is not a fair comparison. As a pop quiz, try to imagine the > difference between 'open' and 'gzip.open' - do you immediately come up > with the differences in their functionalities? Now, how about 'sum' > and 'statistics.sum'? As far as gzip.open goes, I have no idea. Like most people, I expect that there is some difference -- perhaps it only works on gzip files? is the API different in some way? -- but beyond that vague idea that "it is in a different module, therefore it must be different *somehow*" I have have no idea how it actually differs from the built-in, or codecs.open. I would have to look them up to find out what the differences actually are. I expect that any even moderately competent user will think the same way: "statistics.sum is in a different module, presumably it is different somehow, I should look it up to find out how". If you're going to assume a user who is familiar enough with the gzip module to immediately know the differences between gzip.open and builtins.open, then I should also be permitted to assume a user who is familiar with the statistics module and can likewise immediately come up with the differences: - built-in sum supports any object which supports + except str and bytes, even if what + does is nothing like addition in the usual sense; - built-in sum may be fast (written in C) or horribly slow (algorithmically O(n**2) for some types); - built-in sum may be inaccurate for floats; while - statistics.sum supports numbers only; - statistics.sum should always[1] be O(n) but may be slow (currently no C-accelerator version); - statistics.sum may be more accurate for floats. I don't expect people to know this without being told. Frankly, I don't even expect the typical numerically naive user to use statistics.sum when it is so much shorter to type "sum". I can provide a better numeric sum, but I can't force people to use it. But the statistics module uses it extensively, neither the built-in sum nor math.fsum are suitable for my purposes, and I wish to expose that functionality to users who are willing to use it. And finally, I categorically refuse to call it any variation of statistics.ssum or statistics.statistics_sum. We have namespaces for a reason. If anyone wishes to start bikeshedding names, I will consider reasonable alternatives that don't repeat the name of the namespace. [1] Excluding weird numeric types that do bizarre things. -- Steven From joshua at landau.ws Mon Aug 5 04:02:38 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 5 Aug 2013 03:02:38 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FF028B.6020409@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> <51FF028B.6020409@pearwood.info> Message-ID: On 5 August 2013 02:40, Steven D'Aprano wrote: > On 04/08/13 22:51, Eli Bendersky wrote: > >> On Sun, Aug 4, 2013 at 12:07 AM, Ethan Furman wrote: >> >>> On 08/03/2013 07:00 PM, Eli Bendersky wrote: >>> >>>> >>>> >>>> While I'm somewhat -0.5 on the general idea of the statistics module >>>> (competing with well-established, super-optimized and >>>> by-themselves-famous numeric libraries Python has does not sound like >>>> a worthy goal), >>>> >>> >>> >>> Sure, competing with already established libraries is silly. >>> Fortunately, >>> that's not what is happening here. This PEP is about providing a >>> minimal, >>> common set of statistics functions for the average person. >>> >> >> I'm really not sure who this average person is, but everyone keeps >> talking about him. Is it the same person for whom Dummies books are >> written? >> >> Anyhow, "minimal" is a dangerous slope. With such a module in the >> stdlib, I'm 100% sure we'll get a constant stream of - please add just >> this function (from SciPy) - it's so useful to the "average person" - >> requests. This is unavoidable. And it will be difficult to judge at >> that point why certain funcitonality belongs or does not belong here. >> So over time we'll end up with a partial Greenspun, by containing an >> ad hoc, slow implementation of half of Numpy/SciPy. >> > > [only half serious] > Perhaps we should have a pure-Python implementation of numpy/scipy, for > non-C based Pythons. If I recall correctly, PyPy had to engage in a massive > effort to get numpy even partially working. The pure-Python part of the > stdlib is not just the stdlib for CPython, but potentially for the entire > Python universe. > > > > Efforts are better spent in writing a new tutorial on Numpy that shows >> how to do the stuff statistics.py does. Call it "Numpy statistics for >> the average person". >> > > That does not help those who are unable to install numpy due to > restrictive policies about what software can be installed. > > The choice is not either statistics or better tutorials. We can have both, > if somebody volunteers to write those tutorials, or neither. I am not > volunteering to write numpy tutorials. > > > > I have to agree with Alexander w.r.t. "sum". Strongly >>>> -1 from me on having functions with the same name as existing stdlib >>>> functions but different functionality. This is very much unpythonic. >>>> >>> >>> >>> I thought the whole point of name spaces was to be able to have the same >>> name mean different things in different contexts. Surely no one expects >>> to >>> be able to use `webbrowser.open` or `gzip.open` anywhere `open` can be >>> used. >>> >> >> This is not a fair comparison. As a pop quiz, try to imagine the >> difference between 'open' and 'gzip.open' - do you immediately come up >> with the differences in their functionalities? Now, how about 'sum' >> and 'statistics.sum'? >> > > As far as gzip.open goes, I have no idea. Like most people, I expect that > there is some difference -- perhaps it only works on gzip files? is the API > different in some way? -- but beyond that vague idea that "it is in a > different module, therefore it must be different *somehow*" I have have no > idea how it actually differs from the built-in, or codecs.open. I would > have to look them up to find out what the differences actually are. > > I expect that any even moderately competent user will think the same way: > "statistics.sum is in a different module, presumably it is different > somehow, I should look it up to find out how". > As I'd said somewhere earlier, the name should be such that you only have to know the name to know whether it's relevant. I don't believe you if you say you thought gzip.open had nothing to do with gzip -- you know at least that you can ignore it until you're interested in gzip files. I don't expect people to know this without being told. Frankly, I don't > even expect the typical numerically naive user to use statistics.sum when > it is so much shorter to type "sum". I can provide a better numeric sum, > but I can't force people to use it. But the statistics module uses it > extensively, neither the built-in sum nor math.fsum are suitable for my > purposes, and I wish to expose that functionality to users who are willing > to use it. > I can agree that this shouldn't be a replacement for builtins.sum but I don't think that it shouldn't be obvious what solution it solves. If you're coming up with inaccurate sums a name like "precise_sum" would be very guiding. "statistics.sum" doesn't hint at the differences. To go back to gzip again, you'll know what it does whenever it's relevant. The same is not true of a miscellaneous "sum" from a "statistics" module. > And finally, I categorically refuse to call it any variation of > statistics.ssum or statistics.statistics_sum. We have namespaces for a > reason. If anyone wishes to start bikeshedding names, I will consider > reasonable alternatives that don't repeat the name of the namespace. > I don't think anyone proposed that. I've proposed "precise_sum" as a possibility although there's probably a shorter variant somewhere. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Aug 5 04:04:16 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 05 Aug 2013 12:04:16 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> <51FE715A.6080102@stoneleaf.us> Message-ID: <51FF0820.9040102@pearwood.info> On 05/08/13 08:51, Joshua Landau wrote: > On a third point, would it make sense for this to be maths.statistics? It'd > increase discoverability for exactly the target audience and it seems to > make sense to me. (We could, as a bonus, then easily deprecate math.fsum in > favour of math.statistics.sum). I would be okay with that in principle, but there's at least one step needed first. There is/was a policy that math is (mostly) just a lightweight wrapper around the platform C math library. While that's less true now than it used to be, there would need to be agreement that it was appropriate to bury that policy once and for all. -- Steven From steve at pearwood.info Mon Aug 5 04:24:48 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 05 Aug 2013 12:24:48 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> <51FF028B.6020409@pearwood.info> Message-ID: <51FF0CF0.2040400@pearwood.info> On 05/08/13 12:02, Joshua Landau wrote: >>> >>This is not a fair comparison. As a pop quiz, try to imagine the >>> >>difference between 'open' and 'gzip.open' - do you immediately come up >>> >>with the differences in their functionalities? Now, how about 'sum' >>> >>and 'statistics.sum'? >>> >> >> > >> >As far as gzip.open goes, I have no idea. Like most people, I expect that >> >there is some difference -- perhaps it only works on gzip files? is the API >> >different in some way? -- but beyond that vague idea that "it is in a >> >different module, therefore it must be different*somehow*" I have have no >> >idea how it actually differs from the built-in, or codecs.open. I would >> >have to look them up to find out what the differences actually are. >> > >> >I expect that any even moderately competent user will think the same way: >> >"statistics.sum is in a different module, presumably it is different >> >somehow, I should look it up to find out how". >> > > As I'd said somewhere earlier, the name should be such that you only have > to know the name to know whether it's relevant. I don't believe you if you > say you thought gzip.open had nothing to do with gzip -- you know at least > that you can ignore it until you're interested in gzip files. I didn't say that I thought gzip.open had "nothing" to do with gzip. I said I didn't know if it *only* works on gzip files. Without looking it up, perhaps it does the equivalent of: def open(filename, *args): if filename.endswith('gzip'): ... else: return builtins.open(filename, *args) I probably wouldn't write it that way, but I didn't write the gzip module and I can't rule it out without checking the docs or the source. The point is that any reasonably competent user will expect that there is *some* difference between two otherwise similar names in different namespaces, but it is asking too much to expect the name alone to clue them in on all the differences. Or even *any* of the differences. One might legitimately have artist.draw() and gunslinger.draw() methods, and somebody ignorant of art or Western gunslingers may have no idea what the differences are. [...] > I can agree that this shouldn't be a replacement for builtins.sum but I > don't think that it shouldn't be obvious what solution it solves. If you're > coming up with inaccurate sums a name like "precise_sum" would be very > guiding. "statistics.sum" doesn't hint at the differences. Built-in sum is infinitely precise if you pass it ints or Fractions. math.fsum is also high-precision (although not infinitely so), but it coerces everything to floats. If we're going to insist that the name makes it obvious what problem it solves, we'll end up with a name like statistics.high_precision_numeric_only_sum_without_coercing_to_float() which is just ridiculous. Obviously some differences will remain non-obvious. Reading the name is not a substitute for reading the docs. > To go back to gzip again, you'll know what it does whenever it's relevant. > The same is not true of a miscellaneous "sum" from a "statistics" module. It's the sum you should use when you are doing statistics. If you want to know *why* you should use it rather than built-in sum, read the docs, or ask on python-list at python.org. There's only so much knowledge that can be encoded into a single name. -- Steven From guido at python.org Mon Aug 5 04:30:26 2013 From: guido at python.org (Guido van Rossum) Date: Sun, 4 Aug 2013 19:30:26 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FF0CF0.2040400@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> <51FF028B.6020409@pearwood.info> <51FF0CF0.2040400@pearwood.info> Message-ID: This argument is getting tedious. Instead of arguing who said or meant what when, get the code working. On Sunday, August 4, 2013, Steven D'Aprano wrote: > On 05/08/13 12:02, Joshua Landau wrote: > > >>This is not a fair comparison. As a pop quiz, try to imagine the >>>> >>difference between 'open' and 'gzip.open' - do you immediately come up >>>> >>with the differences in their functionalities? Now, how about 'sum' >>>> >>and 'statistics.sum'? >>>> >> >>>> >>> > >>> >As far as gzip.open goes, I have no idea. Like most people, I expect >>> that >>> >there is some difference -- perhaps it only works on gzip files? is the >>> API >>> >different in some way? -- but beyond that vague idea that "it is in a >>> >different module, therefore it must be different*somehow*" I have have >>> no >>> >idea how it actually differs from the built-in, or codecs.open. I would >>> >have to look them up to find out what the differences actually are. >>> > >>> >I expect that any even moderately competent user will think the same >>> way: >>> >"statistics.sum is in a different module, presumably it is different >>> >somehow, I should look it up to find out how". >>> > >>> >> As I'd said somewhere earlier, the name should be such that you only have >> to know the name to know whether it's relevant. I don't believe you if you >> say you thought gzip.open had nothing to do with gzip -- you know at least >> that you can ignore it until you're interested in gzip files. >> > > I didn't say that I thought gzip.open had "nothing" to do with gzip. I > said I didn't know if it *only* works on gzip files. Without looking it up, > perhaps it does the equivalent of: > > def open(filename, *args): > if filename.endswith('gzip'): > ... > else: > return builtins.open(filename, *args) > > > I probably wouldn't write it that way, but I didn't write the gzip module > and I can't rule it out without checking the docs or the source. > > The point is that any reasonably competent user will expect that there is > *some* difference between two otherwise similar names in different > namespaces, but it is asking too much to expect the name alone to clue them > in on all the differences. Or even *any* of the differences. One might > legitimately have artist.draw() and gunslinger.draw() methods, and somebody > ignorant of art or Western gunslingers may have no idea what the > differences are. > > > [...] > >> I can agree that this shouldn't be a replacement for builtins.sum but I >> don't think that it shouldn't be obvious what solution it solves. If >> you're >> coming up with inaccurate sums a name like "precise_sum" would be very >> guiding. "statistics.sum" doesn't hint at the differences. >> > > Built-in sum is infinitely precise if you pass it ints or Fractions. > math.fsum is also high-precision (although not infinitely so), but it > coerces everything to floats. If we're going to insist that the name makes > it obvious what problem it solves, we'll end up with a name like > > statistics.high_precision_**numeric_only_sum_without_**coercing_to_float() > > which is just ridiculous. Obviously some differences will remain > non-obvious. Reading the name is not a substitute for reading the docs. > > > To go back to gzip again, you'll know what it does whenever it's relevant. >> The same is not true of a miscellaneous "sum" from a "statistics" module. >> > > It's the sum you should use when you are doing statistics. If you want to > know *why* you should use it rather than built-in sum, read the docs, or > ask on python-list at python.org. There's only so much knowledge that can be > encoded into a single name. > > > > -- > Steven > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Aug 5 04:59:06 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 05 Aug 2013 11:59:06 +0900 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FBF02F.1000202@pearwood.info> References: <51FBF02F.1000202@pearwood.info> Message-ID: <87zjswlu6d.fsf@uwakimon.sk.tsukuba.ac.jp> I couldn't find a list of functions proposed for inclusion in the statistics package in the pre-PEP, only lists of functions in other implementations that "suggest" the content of this package. Did I miss something? I can't agree with your rationale for inclusion based on the imprecision in math.sum. (That doesn't mean I'm opposed to inclusion, but it does cause me to raise some questions below.) Correcting the numerical instability issues you describe doesn't result in improvement in statistical accuracy in the applications I'm aware of. Rather, you're effectively assuming that data values are given with infinite precision and infinite accuracy. Are there any applications of statistics where that assumption makes sense? And some of your arguments are basically incorrect when considered from the standpoint of *interpreting*, rather than *computing*, statistics: Steven D'Aprano writes: > - The built-in sum can lose accuracy when dealing with floats of wildly > differing magnitude. Consequently, the above naive mean fails this > "torture test" with an error of 100%: > > assert mean([1e30, 1, 3, -1e30]) == 1 100%? This is a relative error of sqrt(2)*1e-30. The mean is simply not an appropriate choice of unit in statistics, especially not when it's 0 to 30 decimal places in standard deviation units. > - Using math.fsum inside mean will make it more accurate with > float data, Not necessarily. It will be more statistically accurate if statistical accuracy == numerical precision, but in most statistical applications this is nowhere near the case. My point throughout is that if high-precision calculation matters in statistics, you've got more fundamental problems in your data than precision of calculation can address. Garbage in, garbage out applies no matter how good the algorithms are. So I would throw out all these appealing arguments that depend on confounding numerical accuracy and statistical accuracy, and replace it with a correct argument showing how precision does matter in statistical interpretation: The first step in interpreting variation in data (including dealing with ill-conditioned data) is standardization of the data to a series with variance 1 (and often, mean 0). Standardization requires accurate computation of tne mean and standard deviation of the raw series. However, naive computation of mean and standard deviation can lose precision very quickly. Because precision bounds accuracy, it is important to use the most precise possible algorithms for computing mean and standard deviation, or the results of standardization are themselves useless. This (in combination with your examples) makes it clear why having such functions in Python makes sense. However, it remains unclear to me that other statistical functions are really needed. Without having actually thought about it[1], I suspect to think that replacing math.sum with the proposed statistics.sum, adding mean and standard_deviation functions to math, and moving the existing math.sum to math.faster_sum would be sufficient to address the real needs here. (Of course, math.faster_sum should be documented to be avoided in applications where ill-conditioned data might arise -- this includes any case, such as computing variance, where a series is generated as the difference of two series with similar means.) I also wonder about the utility of a "statistics" package that has no functionality for presenting and operating on the most fundamental "statistic" of all: the (empirical) distribution. Eg my own statistics package will *never* suffer from ill-conditioned data (it's only used for dealing with generated series of 10-100 data points with a maximum dynamic range of about 100 to 1), but it's important for my purposes to be able to flexibly deal with distributions (computing modes and arbitrary percentiles, "bootstrap" random functions, recognize multimodality, generate histograms, etc). That's only an example, specific to teaching (and I use spreadsheets and R, not Python, for demonstrations of actual computational applications). I think the wide variety of applications of distributions merits consideration of their inclusion in a "batteries included" statistical package. Footnotes: [1] Because the PEP doesn't specify a list of functions. From stephen at xemacs.org Mon Aug 5 05:13:45 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 05 Aug 2013 12:13:45 +0900 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> <51FE715A.6080102@stoneleaf.us> <51FEE380.1090504@stoneleaf.us> Message-ID: <87y58glthy.fsf@uwakimon.sk.tsukuba.ac.jp> Joshua Landau writes: > Statistics is a branch of math It's not. *Probability* is a branch of math. Statistics *uses* math (not limited to probability) heavily, but its fundamental mode of thinking is applied, not mathematical: interpretation of statistics inherently requires domain knowledge. > I must admit I'm curious how the sum 10 is statistical (to make it > interesting we can say it's the value of the dollars I have in my > pocket). Because it's an integral with respect to a positive measure (ie, a distribution) defined on the set of currency units. ;-) From ethan at stoneleaf.us Mon Aug 5 04:52:27 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 04 Aug 2013 19:52:27 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> <51FF028B.6020409@pearwood.info> <51FF0CF0.2040400@pearwood.info> Message-ID: <51FF136B.3090501@stoneleaf.us> On 08/04/2013 07:30 PM, Guido van Rossum wrote: > This argument is getting tedious. +1 >Instead of arguing who said or meant what when, get the code working. I think the code is working. Perhaps you meant: Include the relevant ideas and history in the PEP, and put it on Py-Dev? -- ~Ethan~ From stephen at xemacs.org Mon Aug 5 05:22:05 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 05 Aug 2013 12:22:05 +0900 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <87zjswlu6d.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51FBF02F.1000202@pearwood.info> <87zjswlu6d.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87wqo0lt42.fsf@uwakimon.sk.tsukuba.ac.jp> Sorry for the self-followup, but hopefully I can catch this before wasting too many people's time responding to a stupid and mostly unimportant error. Stephen J. Turnbull writes: > I can't agree with your rationale for inclusion based on the > imprecision in math.sum. Oops, brain bubble. There is no math.sum. The argument applies with s/math.sum/builtin sum/.) > Without having actually thought about it[1], I suspect > to think that replacing math.sum with the proposed statistics.sum, This should read "adding math.sum with the implementation of the proposed statistics.sum". > adding mean and standard_deviation functions to math, OK. > and moving the existing math.sum to math.faster_sum Delete this. > would be sufficient to address the real needs here. (Of course, > math.faster_sum s/math.faster_sum/builtin sum/ here. > should be documented to be avoided in applications where > ill-conditioned data might arise. From rymg19 at gmail.com Mon Aug 5 07:07:44 2013 From: rymg19 at gmail.com (Ryan) Date: Mon, 05 Aug 2013 00:07:44 -0500 Subject: [Python-ideas] Error messages for shared libraries for other platform Message-ID: <4011a761-f650-4e1f-a634-2a8054d09834@email.android.com> Here are my experiences in accidently getting a .so/.dll file for the wrong chip/CPU type: Windows: %1 is not a valid Win32 application *nix/Android: A long message about improper memory mappings and such. I'd like to propose the concept of better errors in these cases. Both Windows and Posix errors is this case are horrible, and it'd be nice for them to actually be written in English. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Aug 5 07:34:09 2013 From: guido at python.org (Guido van Rossum) Date: Sun, 4 Aug 2013 22:34:09 -0700 Subject: [Python-ideas] Error messages for shared libraries for other platform In-Reply-To: <4011a761-f650-4e1f-a634-2a8054d09834@email.android.com> References: <4011a761-f650-4e1f-a634-2a8054d09834@email.android.com> Message-ID: Do you know how to fix this? On Sunday, August 4, 2013, Ryan wrote: > Here are my experiences in accidently getting a .so/.dll file for the > wrong chip/CPU type: > > Windows: > > %1 is not a valid Win32 application > > *nix/Android: > > A long message about improper memory mappings and such. > > I'd like to propose the concept of better errors in these cases. Both > Windows and Posix errors is this case are horrible, and it'd be nice for > them to actually be written in English. > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Mon Aug 5 07:55:26 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Mon, 5 Aug 2013 07:55:26 +0200 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FDB3B5.4080506@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <51FDB3B5.4080506@pearwood.info> Message-ID: <1A7283A8-0BDA-4C50-BD41-2FCB6AC4CD7C@mac.com> On 4 Aug, 2013, at 3:51, Steven D'Aprano wrote: > On 04/08/13 05:47, Alexander Belopolsky wrote: > >> The PEP does not mention statistics.sum(), but the reference implementation >> includes it. I am not sure stdlib needs the third sum function after >> builtins.sum and math.fsum. I think it will be better to improve >> builtins.sum instead. > > > I don't know enough C to volunteer to do that. If the built-in sum() is improved to the point it passes my unit tests, I would consider using it in the future. Does math.fsum pass your tests? From the description and references to the cookbook it seems that your sum is functionally equivalent to math.fsum. Ronald From rymg19 at gmail.com Mon Aug 5 08:01:18 2013 From: rymg19 at gmail.com (Ryan) Date: Mon, 05 Aug 2013 01:01:18 -0500 Subject: [Python-ideas] Error messages for shared libraries for other platform In-Reply-To: References: <4011a761-f650-4e1f-a634-2a8054d09834@email.android.com> Message-ID: I don't really know C. At all. I was thinking the errors could be caught at a higher level, something like(the code isn't runnable): except windowserror as ex: if ex.string == '%1 is not...: raise_error_here Guido van Rossum wrote: >Do you know how to fix this? > >On Sunday, August 4, 2013, Ryan wrote: > >> Here are my experiences in accidently getting a .so/.dll file for the >> wrong chip/CPU type: >> >> Windows: >> >> %1 is not a valid Win32 application >> >> *nix/Android: >> >> A long message about improper memory mappings and such. >> >> I'd like to propose the concept of better errors in these cases. Both >> Windows and Posix errors is this case are horrible, and it'd be nice >for >> them to actually be written in English. >> -- >> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >> > > >-- >--Guido van Rossum (on iPad) -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Mon Aug 5 08:15:18 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 5 Aug 2013 07:15:18 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <1A7283A8-0BDA-4C50-BD41-2FCB6AC4CD7C@mac.com> References: <51FBF02F.1000202@pearwood.info> <51FDB3B5.4080506@pearwood.info> <1A7283A8-0BDA-4C50-BD41-2FCB6AC4CD7C@mac.com> Message-ID: On 5 August 2013 06:55, Ronald Oussoren wrote: > > On 4 Aug, 2013, at 3:51, Steven D'Aprano wrote: > > > On 04/08/13 05:47, Alexander Belopolsky wrote: > > > >> The PEP does not mention statistics.sum(), but the reference > implementation > >> includes it. I am not sure stdlib needs the third sum function after > >> builtins.sum and math.fsum. I think it will be better to improve > >> builtins.sum instead. > > > > > > I don't know enough C to volunteer to do that. If the built-in sum() is > improved to the point it passes my unit tests, I would consider using it in > the future. > > Does math.fsum pass your tests? From the description and references to the > cookbook it seems that your sum is functionally equivalent to math.fsum. > math.fsum casts to floats, so no?. ? Just an educated guess based off of his previous comments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From clay.sweetser at gmail.com Mon Aug 5 08:38:04 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Mon, 5 Aug 2013 02:38:04 -0400 Subject: [Python-ideas] Error messages for shared libraries for other platform In-Reply-To: References: <4011a761-f650-4e1f-a634-2a8054d09834@email.android.com> Message-ID: (I have only had this happen with an improper DLL for the main Python executable, don't know about extension modules) The problem is, these errors tend to happen to the main Python executable, as it's code and the code contained in shared libraries are being loaded into memory. Because of this, there is no easy way to catch these errors, as they happen *before* Python is fully initialized and running. The only way I can think of to fix this would be to have a pre-script or program run the Python executable as a subprocess, and analyze stdout for these errors. Another solution would be to have a troubleshooting page at python.org explaining these errors. "The trouble with having an open mind, of course, is that people will come along and insist of putting things in it." - Terry Pratchett I don't really know C. At all. I was thinking the errors could be caught at a higher level, something like(the code isn't runnable): except windowserror as ex: if ex.string == '%1 is not...: raise_error_here Guido van Rossum wrote: > > Do you know how to fix this? > > On Sunday, August 4, 2013, Ryan wrote: > >> Here are my experiences in accidently getting a .so/.dll file for the >> wrong chip/CPU type: >> >> Windows: >> >> %1 is not a valid Win32 application >> >> *nix/Android: >> >> A long message about improper memory mappings and such. >> >> I'd like to propose the concept of better errors in these cases. Both >> Windows and Posix errors is this case are horrible, and it'd be nice for >> them to actually be written in English. >> -- >> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >> > > -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Aug 5 08:58:04 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 4 Aug 2013 23:58:04 -0700 Subject: [Python-ideas] Error messages for shared libraries for other platform In-Reply-To: References: <4011a761-f650-4e1f-a634-2a8054d09834@email.android.com> Message-ID: <92750E5C-3DF7-44F2-970C-181E7A655E15@yahoo.com> I suspect that on most platforms, we just get a NULL return from dlopen, call dlerror, and use the error string it returns. A quick test on OS X seems to bear that out. So, short of parsing the dlerror string (or trying to parse the elf/mach-o/etc. headers ourselves), I'm not sure what we could do. Sent from a random iPhone On Aug 4, 2013, at 23:01, Ryan wrote: > I don't really know C. At all. I was thinking the errors could be caught at a higher level, something like(the code isn't runnable): > > except windowserror as ex: > if ex.string == '%1 is not...: > raise_error_here > > > Guido van Rossum wrote: >> >> Do you know how to fix this? >> >> On Sunday, August 4, 2013, Ryan wrote: >>> Here are my experiences in accidently getting a .so/.dll file for the wrong chip/CPU type: >>> >>> Windows: >>> >>> %1 is not a valid Win32 application >>> >>> *nix/Android: >>> >>> A long message about improper memory mappings and such. >>> >>> I'd like to propose the concept of better errors in these cases. Both Windows and Posix errors is this case are horrible, and it'd be nice for them to actually be written in English. >>> -- >>> Sent from my Android phone with K-9 Mail. Please excuse my brevity. > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From __peter__ at web.de Mon Aug 5 09:46:03 2013 From: __peter__ at web.de (Peter Otten) Date: Mon, 05 Aug 2013 09:46:03 +0200 Subject: [Python-ideas] Allow filter(items) Message-ID: filter(items) looks much cleaner than filter(None, items) and is easy to understand. Fewer people would use alternative spellings like filter(bool, items) filter(len, items) filter(lambda s: s != "", strings) The signature change may lead you to spell filter(predicate, items) # correct as filter(items, predicate) # wrong but this is a noisy error. I think the advantage of making the magic None redundant outweighs this potential pitfall. From shane at umbrellacode.com Mon Aug 5 14:55:17 2013 From: shane at umbrellacode.com (Shane Green) Date: Mon, 5 Aug 2013 05:55:17 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: +1. Isn?t None?s meaning in this context basically the identity function? If so, then passing the iterable to filter directly seems to make more sense. On Aug 5, 2013, at 12:46 AM, Peter Otten <__peter__ at web.de> wrote: > filter(items) > > looks much cleaner than > > filter(None, items) > > and is easy to understand. Fewer people would use alternative spellings like > > filter(bool, items) > filter(len, items) > filter(lambda s: s != "", strings) > > The signature change may lead you to spell > > filter(predicate, items) # correct > > as > > filter(items, predicate) # wrong > > but this is a noisy error. I think the advantage of making the magic None > redundant outweighs this potential pitfall. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Mon Aug 5 16:53:35 2013 From: rymg19 at gmail.com (Ryan) Date: Mon, 05 Aug 2013 09:53:35 -0500 Subject: [Python-ideas] Error messages for shared libraries for other platform In-Reply-To: References: <4011a761-f650-4e1f-a634-2a8054d09834@email.android.com> Message-ID: <28f83432-f58a-41b6-8e95-ab5213d1c94e@email.android.com> These errors are referring to extension modules. i.e. LLVMPy, the wrong TkInter(I accidentally installed Python 32-bit on top of Python 64-bit), and stuff like that. Clay Sweetser wrote: >(I have only had this happen with an improper DLL for the main Python >executable, don't know about extension modules) >The problem is, these errors tend to happen to the main Python >executable, >as it's code and the code contained in shared libraries are being >loaded >into memory. >Because of this, there is no easy way to catch these errors, as they >happen >*before* Python is fully initialized and running. >The only way I can think of to fix this would be to have a pre-script >or >program run the Python executable as a subprocess, and analyze stdout >for >these errors. Another solution would be to have a troubleshooting page >at >python.org explaining these errors. > >"The trouble with having an open mind, of course, is that people will >come >along and insist of putting things in it." - Terry Pratchett >I don't really know C. At all. I was thinking the errors could be >caught at >a higher level, something like(the code isn't runnable): > >except windowserror as ex: >if ex.string == '%1 is not...: >raise_error_here > > >Guido van Rossum wrote: >> >> Do you know how to fix this? >> >> On Sunday, August 4, 2013, Ryan wrote: >> >>> Here are my experiences in accidently getting a .so/.dll file for >the >>> wrong chip/CPU type: >>> >>> Windows: >>> >>> %1 is not a valid Win32 application >>> >>> *nix/Android: >>> >>> A long message about improper memory mappings and such. >>> >>> I'd like to propose the concept of better errors in these cases. >Both >>> Windows and Posix errors is this case are horrible, and it'd be nice >for >>> them to actually be written in English. >>> -- >>> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >>> >> >> >-- >Sent from my Android phone with K-9 Mail. Please excuse my brevity. > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Mon Aug 5 16:58:52 2013 From: rymg19 at gmail.com (Ryan) Date: Mon, 05 Aug 2013 09:58:52 -0500 Subject: [Python-ideas] Error messages for shared libraries for other platform In-Reply-To: <92750E5C-3DF7-44F2-970C-181E7A655E15@yahoo.com> References: <4011a761-f650-4e1f-a634-2a8054d09834@email.android.com> <92750E5C-3DF7-44F2-970C-181E7A655E15@yahoo.com> Message-ID: <8bf9a62b-b5f0-4ed3-a11d-aed6111ff7a8@email.android.com> Well, the error seems to follow this pattern: ImportError: Cannot load library: loadsegments[number]: *number* cannot map segment from `library.so` at ... On Windows, it always says '%1 is not a valid Win32 application' So, could you catch the exceptions and compare the string or perform a regex match or the like? Andrew Barnert wrote: >I suspect that on most platforms, we just get a NULL return from >dlopen, call dlerror, and use the error string it returns. A quick test >on OS X seems to bear that out. So, short of parsing the dlerror string >(or trying to parse the elf/mach-o/etc. headers ourselves), I'm not >sure what we could do. > >Sent from a random iPhone > >On Aug 4, 2013, at 23:01, Ryan wrote: > >> I don't really know C. At all. I was thinking the errors could be >caught at a higher level, something like(the code isn't runnable): >> >> except windowserror as ex: >> if ex.string == '%1 is not...: >> raise_error_here >> >> >> Guido van Rossum wrote: >>> >>> Do you know how to fix this? >>> >>> On Sunday, August 4, 2013, Ryan wrote: >>>> Here are my experiences in accidently getting a .so/.dll file for >the wrong chip/CPU type: >>>> >>>> Windows: >>>> >>>> %1 is not a valid Win32 application >>>> >>>> *nix/Android: >>>> >>>> A long message about improper memory mappings and such. >>>> >>>> I'd like to propose the concept of better errors in these cases. >Both Windows and Posix errors is this case are horrible, and it'd be >nice for them to actually be written in English. >>>> -- >>>> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >> >> -- >> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwpolska at gmail.com Mon Aug 5 17:04:18 2013 From: kwpolska at gmail.com (=?UTF-8?B?Q2hyaXMg4oCcS3dwb2xza2HigJ0gV2Fycmljaw==?=) Date: Mon, 5 Aug 2013 17:04:18 +0200 Subject: [Python-ideas] Error messages for shared libraries for other platform In-Reply-To: <8bf9a62b-b5f0-4ed3-a11d-aed6111ff7a8@email.android.com> References: <4011a761-f650-4e1f-a634-2a8054d09834@email.android.com> <92750E5C-3DF7-44F2-970C-181E7A655E15@yahoo.com> <8bf9a62b-b5f0-4ed3-a11d-aed6111ff7a8@email.android.com> Message-ID: On Mon, Aug 5, 2013 at 4:58 PM, Ryan wrote: > Well, the error seems to follow this pattern: > > ImportError: Cannot load library: loadsegments[number]: *number* cannot map > segment from `library.so` at ... > > On Windows, it always says '%1 is not a valid Win32 application' ?unless you have Windows in a language that is not English. > So, could you catch the exceptions and compare the string or perform a regex > match or the like? No. Unless you wanted to hard-code all translations of this message. Or find an API to identify this message. -- Chris ?Kwpolska? Warrick PGP: 5EAAEA16 stop html mail | always bottom-post | only UTF-8 makes sense From steve at pearwood.info Mon Aug 5 17:12:14 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 06 Aug 2013 01:12:14 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: <51FFC0CE.2030709@pearwood.info> On 03/08/13 04:53, Michele Lacchia wrote: > As for Steven's implementation I think it's very accurate. I have one > question though. Why there is a class for 'median' with various methods and > not one for 'stdev' and 'variance' with maybe two methods, 'population' and > 'sample'? Those familiar with calculator statistics will expect separate functions for sample variance and population variance, and the same for standard deviation. This is a de facto standard in nearly everything I've looked at (although numpy is a conspicuous exception), so I chose to follow the same convention. On the other hand, median is less commonly found on calculators and I did not want to overload the beginner with too many top-level median functions, so I made a decision to bless the version taught in secondary schools as the default (even though it is probably the least useful), and provide the others as methods. A previous version of this module had a single median function that took an optional argument to select between different kinds of median: median(data, scheme='grouped') I have come to the conclusion that having separate methods on median not only simplifies the implementation, but it also reads better: median.grouped(data) -- Steven From oscar.j.benjamin at gmail.com Mon Aug 5 17:58:04 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 5 Aug 2013 16:58:04 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FBF02F.1000202@pearwood.info> References: <51FBF02F.1000202@pearwood.info> Message-ID: On 2 August 2013 18:45, Steven D'Aprano wrote: > I have raised an issue on the tracker to add a statistics module to Python's > standard library: > > http://bugs.python.org/issue18606 > > and have been asked to write a PEP. Attached is my draft PEP. Feedback is > requested, thanks in advance. Having looked at the reference implementation I'm slightly confused about the mode API/implementation. It took a little while for me to understand what the ``window`` parameter is for (I was only able to understand it by studying the source) but I've got it now. ISTM that the mode class is splicing two fundamentally different things together: 1) Finding the most frequently occurring values in some collection of data. 2) Estimating the location of the peak of a hypothetical continuous probability distribution from which some real-valued numeric data is drawn. The 2) part does not seem like something that is normally in secondary school maths. It's also not common AFAIK in other statistical packages (at least not under the name mode). If scipy has this then it has a different name because scipy.stats.mode just does case 1). The same goes for MATLAB's mode function, MS Excel, LibreOffice and basically anything else I can remember using. Also the API for invoking case 2) which is conceptually a completely different thing is to call mode(data, window=3) which seems very cryptic given the significant conceptual and algorithmic differences that are invoked as a result. (What's wrong with using window=2 anyway?) I would suggest that mode should be split into two separate entities for these two different operations. But, then really I don't expect many people to use the 2) part and it doesn't really come under the "minimal" specification described in the PEP. So instead I think that it should just be removed to simplify the documentation and implementation of mode for the common case. Oscar From amcnabb at mcnabbs.org Mon Aug 5 18:14:16 2013 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Mon, 5 Aug 2013 11:14:16 -0500 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> Message-ID: <20130805161416.GD10672@mcnabbs.org> On Sun, Aug 04, 2013 at 05:51:45AM -0700, Eli Bendersky wrote: > > Efforts are better spent in writing a new tutorial on Numpy that shows > how to do the stuff statistics.py does. Call it "Numpy statistics for > the average person". > [Sorry to have beaten on this average thing so much; patronization > drives me mad] As someone who uses numpy heavily, I think that Numpy is no replacement for having a basic statistics module in the standard library. Numpy is heavy in dependencies (Fortran, LAPACK, etc.), in load time, and conceptually. It is difficult to install from source, so it's unavailable on many systems with Python. Numpy is great, but it involves enough commitment that it's not something to rely on casually. I use Numpy all the time, but I would also use a stats library in the standard library. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From michelelacchia at gmail.com Mon Aug 5 18:23:48 2013 From: michelelacchia at gmail.com (Michele Lacchia) Date: Mon, 5 Aug 2013 18:23:48 +0200 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FFC0CE.2030709@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <51FFC0CE.2030709@pearwood.info> Message-ID: Thank you! Yep, that makes really sense now! Il giorno 05/ago/2013 17:13, "Steven D'Aprano" ha scritto: > On 03/08/13 04:53, Michele Lacchia wrote: > > As for Steven's implementation I think it's very accurate. I have one >> question though. Why there is a class for 'median' with various methods >> and >> not one for 'stdev' and 'variance' with maybe two methods, 'population' >> and >> 'sample'? >> > > Those familiar with calculator statistics will expect separate functions > for sample variance and population variance, and the same for standard > deviation. This is a de facto standard in nearly everything I've looked at > (although numpy is a conspicuous exception), so I chose to follow the same > convention. > > On the other hand, median is less commonly found on calculators and I did > not want to overload the beginner with too many top-level median functions, > so I made a decision to bless the version taught in secondary schools as > the default (even though it is probably the least useful), and provide the > others as methods. > > A previous version of this module had a single median function that took > an optional argument to select between different kinds of median: > > median(data, scheme='grouped') > > I have come to the conclusion that having separate methods on median not > only simplifies the implementation, but it also reads better: > > median.grouped(data) > > > -- > Steven > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Mon Aug 5 18:23:35 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 5 Aug 2013 17:23:35 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <20130805161416.GD10672@mcnabbs.org> References: <51FBF02F.1000202@pearwood.info> <51FDFD98.3010902@stoneleaf.us> <20130805161416.GD10672@mcnabbs.org> Message-ID: On 5 August 2013 17:14, Andrew McNabb wrote: > On Sun, Aug 04, 2013 at 05:51:45AM -0700, Eli Bendersky wrote: >> >> Efforts are better spent in writing a new tutorial on Numpy that shows >> how to do the stuff statistics.py does. Call it "Numpy statistics for >> the average person". > >> [Sorry to have beaten on this average thing so much; patronization >> drives me mad] > > As someone who uses numpy heavily, I think that Numpy is no replacement > for having a basic statistics module in the standard library. Numpy is > heavy in dependencies (Fortran, LAPACK, etc.), in load time, and > conceptually. It is difficult to install from source, so it's > unavailable on many systems with Python. Numpy is great, but it > involves enough commitment that it's not something to rely on casually. > > I use Numpy all the time, but I would also use a stats library in the > standard library. I'd like to second Andrew's points. I wouldn't normally consider a Python installation complete until numpy and many others are installed but that doesn't mean I wouldn't use this or that it shouldn't be in tte stdlib. To offer two particular ways in which Steven's library is better than numpy's stats functions even if they are available: 1) It can work with iterators in some cases where numpy would require a concrete collection. 2) It can take advantage of Python's infinite range and infinite/arbitrary precision numeric types: integer, Decimal and Fraction in cases where numpy would just coerce everything to float. Oscar From abarnert at yahoo.com Mon Aug 5 18:31:15 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 5 Aug 2013 09:31:15 -0700 Subject: [Python-ideas] Error messages for shared libraries for other platform In-Reply-To: <8bf9a62b-b5f0-4ed3-a11d-aed6111ff7a8@email.android.com> References: <4011a761-f650-4e1f-a634-2a8054d09834@email.android.com> <92750E5C-3DF7-44F2-970C-181E7A655E15@yahoo.com> <8bf9a62b-b5f0-4ed3-a11d-aed6111ff7a8@email.android.com> Message-ID: <115DD239-4597-413A-8AB2-C240D25BA05A@yahoo.com> Each of the major *nix ldd/dyld families has a different set of errors, each one has half a dozen or more errors in its set, and most of them are localized. And usually the string you get back is dynamically generated by concatenating multiple different strings together. Let's take an example. You try to import foo.so. The module itself is fine, but it depends on libfoo.dylib. You happen to have three copies of libfoo.dylib on the path, but one is too old a version of libfoo to meet the requirement, while the next is an i386/ppc fat binary when you need x86_64, and the third is not a mach-o library at all. So, you get a four-line error message explaining that foo.so couldn't be opened because no valid libfoo image could be found, and then explaining what was wrong with each libfoo candidate. The explanations will be different between OS X 10.6 vs. 10.7-8, between English and German, etc., and completely unrelated to the messages on Linux or FreeBSD. Besides all the parsing problems, how do you want that to appear in Python in the end result? Sent from a random iPhone On Aug 5, 2013, at 7:58, Ryan wrote: > Well, the error seems to follow this pattern: > > ImportError: Cannot load library: loadsegments[number]: *number* cannot map segment from `library.so` at ... No, that's one of a dozen or so different errors you can get on recent Linux/gnu ldd in English. > On Windows, it always says '%1 is not a valid Win32 application' > > So, could you catch the exceptions and compare the string or perform a regex match or the like? > > Andrew Barnert wrote: >> >> I suspect that on most platforms, we just get a NULL return from dlopen, call dlerror, and use the error string it returns. A quick test on OS X seems to bear that out. So, short of parsing the dlerror string (or trying to parse the elf/mach-o/etc. headers ourselves), I'm not sure what we could do. >> >> Sent from a random iPhone >> >> On Aug 4, 2013, at 23:01, Ryan wrote: >> >>> I don't really know C. At all. I was thinking the errors could be caught at a higher level, something like(the code isn't runnable): >>> >>> except windowserror as ex: >>> if ex.string == '%1 is not...: >>> raise_error_here >>> >>> >>> Guido van Rossum wrote: >>>> >>>> Do you know how to fix this? >>>> >>>> On Sunday, August 4, 2013, Ryan wrote: >>>>> Here are my experiences in accidently getting a .so/.dll file for the wrong chip/CPU type: >>>>> >>>>> Windows: >>>>> >>>>> %1 is not a valid Win32 application >>>>> >>>>> *nix/Android: >>>>> >>>>> A long message about improper memory mappings and such. >>>>> >>>>> I'd like to propose the concept of better errors in these cases. Both Windows and Posix errors is this case are horrible, and it'd be nice for them to actually be written in English. >>>>> -- >>>>> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >>> >>> -- >>> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> http://mail.python.org/mailman/listinfo/python-ideas > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Aug 5 18:38:30 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 06 Aug 2013 02:38:30 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: <51FFD506.7080608@pearwood.info> On 06/08/13 01:58, Oscar Benjamin wrote: > Having looked at the reference implementation I'm slightly confused > about the mode API/implementation. It took a little while for me to > understand what the ``window`` parameter is for (I was only able to > understand it by studying the source) but I've got it now. > > ISTM that the mode class is splicing two fundamentally different > things together: > 1) Finding the most frequently occurring values in some collection of data. > 2) Estimating the location of the peak of a hypothetical continuous > probability distribution from which some real-valued numeric data is > drawn. Both of these -- the most frequent value, and the peak in a distribution -- are called the mode, and are fundamentally the same thing, and only differ between continuous and discrete data. In both cases, you are estimating a population mode from a sample. With discrete data, you can count the values, and the one with the highest frequency is the sample mode. With continuous data, you almost certainly will find that every value is unique. There are two approaches to calculating the sample mode for continuous data: bin the data first, then count the frequencies of the bins; or quoting from "Numerical Recipes" (reference in the source), by the technique known in the literature as "Estimating the rate of an inhomogeneous Poisson process from Jth waiting times". That's a mouthful, which is probably why it's so hard to find anything online about it. But check the reference given in the source. Any of the "Numerical Recipes..." by Press et al should have it. (There are versions for C, Fortran and Pascal.) > The 2) part does not seem like something that is normally in secondary > school maths. This is not *just* aimed at secondary school stats :-) >It's also not common AFAIK in other statistical packages > (at least not under the name mode). Press et al claim it is poorly known, but much better than the binning method. It saddens me that twenty years on, it's still poorly known. Does my explanation satisfy your objection? If not, I will consider deferring mode for 3.5, which will give me some time to think about a better API and documentation. -- Steven From mertz at gnosis.cx Mon Aug 5 18:56:58 2013 From: mertz at gnosis.cx (David Mertz) Date: Mon, 5 Aug 2013 09:56:58 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: On Mon, Aug 5, 2013 at 12:46 AM, Peter Otten <__peter__ at web.de> wrote: > filter(None, items) > filter(bool, items) > filter(len, items) > filter(lambda s: s != "", strings) These are NOT "alternatives" ... they mean very different things (but will in some cases produce the same result. Well, except that 'None' and 'bool' are equivalent above. -0.5 so far on this. Since a one-argument version of filter() is now simply an error, this change won't break any existing code. However, I feel like it will invite future errors: having to be explicit about what predicate you are filtering by--even if it is bool()--keeps users honest in actually stating what the predicate is rather than relying on "implicit identity function". > filter(items, predicate) # wrong > but this is a noisy error. I think the advantage of making the magic None > redundant outweighs this potential pitfall. This may not be a noisy error, nor any error at all that Python can detect. It is perfectly possible for a Python object to have both a .__call__() and a .__iter__() method. It can well be the case that only the *programmer* and not the language needs to decide which thing is the iterator and which one predicate. Yours, David... -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. From shane at umbrellacode.com Mon Aug 5 19:22:08 2013 From: shane at umbrellacode.com (Shane Green) Date: Mon, 5 Aug 2013 10:22:08 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: <5531F48E-F879-417D-AA0E-71169769AB85@umbrellacode.com> However: filter(None, items) and filter(lambda item: item, items) are alternatives, correct? If so, then was it also correct to say that None as the first parameter is the same as the identity function? If those things are true, then I still think passing the iterator itself as the first argument is actually more clear and accurate than the current approach; either that, or an explicit identity function, as None does not have a __call__ that acts as an identity function or boolean evaluation so it?s use here doesn?t make much sense to me; it?s just the way it?s always worked. On Aug 5, 2013, at 9:56 AM, David Mertz wrote: > On Mon, Aug 5, 2013 at 12:46 AM, Peter Otten <__peter__ at web.de> wrote: >> filter(None, items) >> filter(bool, items) >> filter(len, items) >> filter(lambda s: s != "", strings) > > These are NOT "alternatives" ... they mean very different things (but > will in some cases produce the same result. Well, except that 'None' > and 'bool' are equivalent above. > > -0.5 so far on this. Since a one-argument version of filter() is now > simply an error, this change won't break any existing code. However, > I feel like it will invite future errors: having to be explicit about > what predicate you are filtering by--even if it is bool()--keeps users > honest in actually stating what the predicate is rather than relying > on "implicit identity function". > >> filter(items, predicate) # wrong >> but this is a noisy error. I think the advantage of making the magic None >> redundant outweighs this potential pitfall. > > This may not be a noisy error, nor any error at all that Python can > detect. It is perfectly possible for a Python object to have both a > .__call__() and a .__iter__() method. It can well be the case that > only the *programmer* and not the language needs to decide which thing > is the iterator and which one predicate. > > Yours, David... > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Mon Aug 5 19:31:10 2013 From: mertz at gnosis.cx (David Mertz) Date: Mon, 5 Aug 2013 10:31:10 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: <5531F48E-F879-417D-AA0E-71169769AB85@umbrellacode.com> References: <5531F48E-F879-417D-AA0E-71169769AB85@umbrellacode.com> Message-ID: On Mon, Aug 5, 2013 at 10:22 AM, Shane Green wrote: > However: > > filter(None, items) and > filter(lambda item: item, items) > are alternatives, correct? If so, then was it also correct to say that None > as the first parameter is the same as the identity function? Of course. The above are equivalent, and both are also equivalent to the only form *I* would ever think of using: filter(bool, items). Doing a bool(...) is always implied in evaluation--or maybe more accurately, applying it is idempotent. I agree that 'None' as a placeholder predicate is strange and unintuitive. I would probably be at least +0.5 on deprecating that, and requiring predicates be actual callables. However, that change *would* break some existing code. > If those things are true, then I still think passing the iterator itself as > the first argument is actually more clear and accurate than the current > approach; either that, or an explicit identity function, as None does not > have a __call__ that acts as an identity function or boolean evaluation so > it?s use here doesn?t make much sense to me; it?s just the way it?s always > worked. If we could go back in time and reverse the order of 'pred' and 'iter' in the API of filter(), I'd probably support that. However, every single Python program that uses filter() now assumes the current argument order. The breakage there is far too great to consider now. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. From markus at unterwaditzer.net Mon Aug 5 19:32:39 2013 From: markus at unterwaditzer.net (Markus Unterwaditzer) Date: Mon, 05 Aug 2013 19:32:39 +0200 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: -0, for reasons already mentioned. While i agree that filter(None, items) is counterintuitive, filter(bool, items) looks very readable to me. I think modifying the behavior of filter like this will at most trick users into thinking they need to specify the iterable first. -- Markus (from phone) Peter Otten <__peter__ at web.de> wrote: >filter(items) > >looks much cleaner than > >filter(None, items) > >and is easy to understand. Fewer people would use alternative spellings >like > >filter(bool, items) >filter(len, items) >filter(lambda s: s != "", strings) > >The signature change may lead you to spell > >filter(predicate, items) # correct > >as > >filter(items, predicate) # wrong > >but this is a noisy error. I think the advantage of making the magic >None >redundant outweighs this potential pitfall. > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas From shane at umbrellacode.com Mon Aug 5 19:51:19 2013 From: shane at umbrellacode.com (Shane Green) Date: Mon, 5 Aug 2013 10:51:19 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: <5531F48E-F879-417D-AA0E-71169769AB85@umbrellacode.com> Message-ID: Yes, but omitting an identity function and just passing its input?an iterable in this case?seems like a reasonable ?shortcut? in general. And wouldn?t it be relatively easy to make the None parameter passed to filter for this usage optional, without introducing any compatibility issues? On Aug 5, 2013, at 10:31 AM, David Mertz wrote: > On Mon, Aug 5, 2013 at 10:22 AM, Shane Green wrote: >> However: >> >> filter(None, items) and >> filter(lambda item: item, items) >> are alternatives, correct? If so, then was it also correct to say that None >> as the first parameter is the same as the identity function? > > Of course. The above are equivalent, and both are also equivalent to > the only form *I* would ever think of using: filter(bool, items). > Doing a bool(...) is always implied in evaluation--or maybe more > accurately, applying it is idempotent. > > I agree that 'None' as a placeholder predicate is strange and > unintuitive. I would probably be at least +0.5 on deprecating that, > and requiring predicates be actual callables. However, that change > *would* break some existing code. > >> If those things are true, then I still think passing the iterator itself as >> the first argument is actually more clear and accurate than the current >> approach; either that, or an explicit identity function, as None does not >> have a __call__ that acts as an identity function or boolean evaluation so >> it?s use here doesn?t make much sense to me; it?s just the way it?s always >> worked. > > If we could go back in time and reverse the order of 'pred' and 'iter' > in the API of filter(), I'd probably support that. However, every > single Python program that uses filter() now assumes the current > argument order. The breakage there is far too great to consider now. > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Mon Aug 5 19:58:30 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 5 Aug 2013 18:58:30 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FFD506.7080608@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <51FFD506.7080608@pearwood.info> Message-ID: On 5 August 2013 17:38, Steven D'Aprano wrote: > On 06/08/13 01:58, Oscar Benjamin wrote: > >> Having looked at the reference implementation I'm slightly confused >> about the mode API/implementation. It took a little while for me to >> understand what the ``window`` parameter is for (I was only able to >> understand it by studying the source) but I've got it now. >> >> ISTM that the mode class is splicing two fundamentally different >> things together: >> 1) Finding the most frequently occurring values in some collection of >> data. >> 2) Estimating the location of the peak of a hypothetical continuous >> probability distribution from which some real-valued numeric data is >> drawn. > > Both of these -- the most frequent value, and the peak in a distribution -- > are called the mode, and are fundamentally the same thing, and only differ > between continuous and discrete data. > > In both cases, you are estimating a population mode from a sample. With discrete data the sample will often be the population and your mode() function will return the exact value that is indisputably the mode. > With > discrete data, you can count the values, and the one with the highest > frequency is the sample mode. With continuous data, you almost certainly > will find that every value is unique. There are two approaches to > calculating the sample mode for continuous data: There are many more than two approaches. This is why I don't really think it is suitable for the stdlib stats module. Computing the mode of a sample of data having discrete values is a well-defined problem (actually there are controversial aspects; see below) and there is essentially one basic method for doing it. Estimating the mode from a finite sample drawn from a continuous probability distribution is not a well-posed problem: there is no non-arbitrary way to do it. Every method uses heuristics and AFAIK every method has parameters that must be arbitrarily specified (such as ``window``). Different methods or parameter choices can in some cases give wildly different results so I think that this is an algorithm that needs to be used carefully and shouldn't be documented as *the* way to compute the mode() for continuous numbers. At the least the docstring should explain how a user should choose the value of window and what it does! > bin the data first, then > count the frequencies of the bins; or quoting from "Numerical Recipes" > (reference in the source), by the technique known in the literature as > "Estimating the rate of an inhomogeneous Poisson process from Jth waiting > times". That's a mouthful, which is probably why it's so hard to find > anything online about it. But check the reference given in the source. Any > of the "Numerical Recipes..." by Press et al should have it. (There are > versions for C, Fortran and Pascal.) I have the C version published about 10 years after yours but I think I've lent it to someone. I understand what it's doing from the code and the name though. >> It's also not common AFAIK in other statistical packages >> (at least not under the name mode). > > Press et al claim it is poorly known, but much better than the binning > method. It saddens me that twenty years on, it's still poorly known. It is essentially a binning method but it's one that allows the location and size of the bins to be chosen by the data rather than arbitrarily fixed a priori. In that sense it is better than the standard binning method. None of the mode() functions from other stats packages that I was listing include either the binning method or the Poisson process method. They just compute the most frequently occuring values in a sequence and make no attempt to estimate the mode of a continuous distribution. > Does my explanation satisfy your objection? I would have called it a suggestion rather than an objection. I'm certainly not objecting to this module; I jnust want it to be as good as possible (according to my own definition of good!). > If not, I will consider > deferring mode for 3.5, which will give me some time to think about a better > API and documentation. I also find two other aspects of the mode function a little odd: I can't work out why I would want the max_modes parameter to be anything other than 1 or infinity. In fact I normally want it to be infinity. I've looked at how a couple of other stats packages handle this and it seems like the mot common thing is just to arbitrarily return any-old mode (which is rubbish). Yours will by default raise an error if there isn't a unique mode (which is better). But it seems odd to do mode(data, max_modes=float('inf')) to say that I want all the modes. My preference really is just that modes() returns a list of all modes and the user should decide what to do with however many values they get back. The other thing is about this idea that if all values are equally common then their is "no mode". I want to say that every value is a mode rathern than none. Otherwise you get strange differences between e.g.: [2,2,3,3,4,4] and [1,2,2,3,3,4,4]. I've checked on the interweb though and it seems that most people disagree with me on this point so never mind! Oscar From ethan at stoneleaf.us Mon Aug 5 19:43:52 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 05 Aug 2013 10:43:52 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: <51FFE458.2070604@stoneleaf.us> On 08/05/2013 12:46 AM, Peter Otten wrote: > filter(items) > > looks much cleaner than > > filter(None, items) +1 `range` already behaves like this to support the common case. -- ~Ethan~ From storchaka at gmail.com Mon Aug 5 20:25:22 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 05 Aug 2013 21:25:22 +0300 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: 05.08.13 20:32, Markus Unterwaditzer ???????(??): > -0, for reasons already mentioned. While i agree that filter(None, items) is counterintuitive, filter(bool, items) looks very readable to me. filter(bool, items) is redundant in same sense as `if bool(x)`, `for x in iter(items)`, or `"%s" % str(x)`. A result of the predicate is implicitly called to boolean. From jbvsmo at gmail.com Mon Aug 5 20:31:38 2013 From: jbvsmo at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Bernardo?=) Date: Mon, 5 Aug 2013 15:31:38 -0300 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: -1 There should be only one way and there are lots of ways already. Use a genexp for readability: (x for x in items if x) Jo?o Bernardo 2013/8/5 Serhiy Storchaka > 05.08.13 20:32, Markus Unterwaditzer ???????(??): > > -0, for reasons already mentioned. While i agree that filter(None, items) >> is counterintuitive, filter(bool, items) looks very readable to me. >> > > filter(bool, items) is redundant in same sense as `if bool(x)`, `for x in > iter(items)`, or `"%s" % str(x)`. A result of the predicate is implicitly > called to boolean. > > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Mon Aug 5 21:11:51 2013 From: shane at umbrellacode.com (Shane Green) Date: Mon, 5 Aug 2013 12:11:51 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: <4D0B0998-BBC3-4991-BEC2-D5073A7A8930@umbrellacode.com> +1 (still) for the additional reason that it just makes sense: filter(bool, items) => ?Filter this list of items using predicate.? filter(items) => ?Filter this list of items.? To me it makes perfect sense that the second, shorthand form would mean filter falsy items from the list; it?s precisely what I would expect a function named ?filter? to do to a list of items in the absence of other arguments. Finally, while it?s true ?bool? is explicit but redundant when you consider None?s meaning in this context to be ?bool?, if you consider its meaning to be the identity function instead?which is the interpretation I tend to think of because filter, if I?m not mistaken, predates True/False and __bool__, etc.?then it can be argued that the use of ?bool? is superfluous rather than explicit/redundant. On Aug 5, 2013, at 11:31 AM, Jo?o Bernardo wrote: > -1 There should be only one way and there are lots of ways already. Use a genexp for readability: (x for x in items if x) > > Jo?o Bernardo > > > 2013/8/5 Serhiy Storchaka > 05.08.13 20:32, Markus Unterwaditzer ???????(??): > > -0, for reasons already mentioned. While i agree that filter(None, items) is counterintuitive, filter(bool, items) looks very readable to me. > > filter(bool, items) is redundant in same sense as `if bool(x)`, `for x in iter(items)`, or `"%s" % str(x)`. A result of the predicate is implicitly called to boolean. > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Aug 5 21:03:56 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 05 Aug 2013 12:03:56 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: <51FFF71C.8060108@stoneleaf.us> On 08/05/2013 11:31 AM, ? wrote: > -1 There should be only one way and there are lots of ways already. Use a genexp for readability: (x for x in items if x) One Obvious Way *not* Only One Way. :) -- ~Ethan~ From steve at pearwood.info Mon Aug 5 22:04:17 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 06 Aug 2013 06:04:17 +1000 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <51FBF02F.1000202@pearwood.info> References: <51FBF02F.1000202@pearwood.info> Message-ID: <52000541.6040000@pearwood.info> On 03/08/13 03:45, Steven D'Aprano wrote: > I have raised an issue on the tracker to add a statistics module to Python's standard library: > > http://bugs.python.org/issue18606 Thanks to everyone who has given feedback, it has been very humbling and informative. I have a revised proto-PEP just about ready for (hopefully final) feedback, but before I do there is one potentially major stumbling block: whether or not the statistics module should have it's own version of sum. Against the idea ---------------- * One Obvious Way / Only One Way -- there are already two ways to calculate a sum (builtins.sum and math.fsum), no need for a third. * Even if there is a need, we should aim to fix the problems with the existing sum functions rather than add a third. * Even if we can't, don't call it "sum", call it something else. "precise_sum" was the only suggestion given so far. (If I have missed any objections, I apologize.) In favour --------- * Speaking as the module author, it is my considered opinion that I cannot (easily, or at all) get the behaviour I expect from the statistics module using the existing sum functions without a lot of pain. See below. * For backward compatibility, I don't think we can change the existing sum functions. E.g.: - built-in sum accepts non-numeric values if they support the + operator, that won't change before Python 4000; - built-in sum can be inaccurate with floats; - math.fsum coerces everything to float. * Even if we could change one of the existing sum functions (math.fsum is probably the better candidate) I personally don't know enough C to do so. Either somebody else steps up and volunteers, or any such change is deferred indefinitely. Now that Decimal has an accelerated C version in CPython, it is more important that ever before to treat it (and Fraction) as first-class numeric types, and avoid coercing them to float unless necessary. So I consider it a Must Have that stats functions support Decimal and Fraction data without unnecessarily converting them to floats. This rules out fsum: py> from decimal import Decimal as D py> data = [D("0.1"), D("0.3")] py> math.fsum(data) 0.4 py> statistics.sum(data) Decimal('0.4') On the other hand, the built-in sum is demonstrably inaccurate with floats, which is why fsum exists in the first place: py> data = [1e100, 1, -1e100, 1] py> sum(data) 1.0 py> math.fsum(data) 2.0 py> statistics.sum(data) 2.0 Consequently the statistics module includes its own version of sum. Never mind the implementation, that may change in the future. Regardless of the implementation, the interface of statistics.sum is distinct from both of the existing sum functions. There are three versions of sum because they each do different things. Currently, I can do this, both internally within other functions such as mean, and externally, when I just want a total: total = statistics.sum(data) and get the right result regardless of the numeric type of data[1]. Without it, I have to do something like this: # Make sure data is a list, and not an iterator. if any(isinstance(x, float) for x in data): total = math.fsum(data) else: total = sum(data) Are there still objections to making statistics.sum public? If the only way to move forward is to make it a private implementation detail, I will do so, but I really think that I have built a better sum and hope to keep it as a public function. Show of hands please, +1 or -1 on statistics.sum. [1] Well, not complex numbers. -- Steven From ethan at stoneleaf.us Mon Aug 5 22:20:46 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 05 Aug 2013 13:20:46 -0700 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52000541.6040000@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> Message-ID: <5200091E.4050603@stoneleaf.us> On 08/05/2013 01:04 PM, Steven D'Aprano wrote: > > Show of hands please, +1 or -1 on statistics.sum. +1 From alexander.belopolsky at gmail.com Mon Aug 5 22:34:28 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 5 Aug 2013 16:34:28 -0400 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52000541.6040000@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> Message-ID: On Mon, Aug 5, 2013 at 4:04 PM, Steven D'Aprano wrote: > > * For backward compatibility, I don't think we can change the existing sum > functions. E.g.: > > - built-in sum accepts non-numeric values if they support > the + operator, that won't change before Python 4000; > I don't see this as a problem. To the contrary, I find designs that deliberately inhibit duck-typing to be "unpythonic." > - built-in sum can be inaccurate with floats; > This can be fixed. The implementation already special-cases floats. > > - math.fsum coerces everything to float. > And does not support the start argument. Note that this was a deliberate choice: /* Note 5: The signature of math.fsum() differs from __builtin__.sum() because the start argument doesn't make sense in the context of accurate summation. Since the partials table is collapsed before returning a result, sum(seq2, start=sum(seq1)) may not equal the accurate result returned by sum(itertools.chain(seq1, seq2)). */ See < http://hg.python.org/cpython/file/c6d4564dc86f/Modules/mathmodule.c#l993>. > > * Even if we could change one of the existing sum functions (math.fsum is > probably the better candidate) I personally don't know enough C to do so. > Either somebody else steps up and volunteers, or any such change is > deferred indefinitely. > I'll be happy to help with coding once the specs are agreed upon. > Show of hands please, +1 or -1 on statistics.sum. > -1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon Aug 5 22:36:39 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 05 Aug 2013 16:36:39 -0400 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: On 8/5/2013 3:46 AM, Peter Otten wrote: > filter(items) > > looks much cleaner than > > filter(None, items) > > and is easy to understand. True, if you call the iterable 'items', but filter(abc) is less obvious. Is 'abc' an iterable to be filtered, or a predicate missing the iterable to filter? Optional leading args cannot be directly expressed as a Python signature. They are a nuisance to document as well as to simulate. I consider range, the only example I can think of at the moment, as something *not* to be imitated. Range gets away with the oddity because start and stop must both be ints. For filter, 'predicate' and 'iterable' are neither identical nor disjoint. As for the last point, an object can have both .__iter__ and .__call__. For example, a collection with __contains__ could have __call__ = __contains__ Then 'instance(x)' is the same as 'x in instance', except that the instance is a callable predicate that can be passed to functions like filter, whereas the expression 'in instance' cannot be. > Fewer people would use alternative spellings like > > filter(bool, items) > filter(len, items) > filter(lambda s: s != "", strings) > > The signature change may lead you to spell > filter(predicate, items) # correct > as > filter(items, predicate) # wrong I think it easy to guess that it will confuse people. > but this is a noisy error.\ Maybe, but not guaranteed to be. See above. > I think the advantage of making the magic None > redundant outweighs this potential pitfall. I think the opposite. -1 -- Terry Jan Reedy From mertz at gnosis.cx Mon Aug 5 22:43:44 2013 From: mertz at gnosis.cx (David Mertz) Date: Mon, 5 Aug 2013 13:43:44 -0700 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52000541.6040000@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> Message-ID: On Mon, Aug 5, 2013 at 1:04 PM, Steven D'Aprano wrote: > * One Obvious Way / Only One Way -- there are already two ways to > calculate a sum (builtins.sum and math.fsum), no need for a third. > * Even if we can't, don't call it "sum", call it something else. > "precise_sum" was the only suggestion given so far. > I think I am probably +1 on statistics.sum(). Actually, I think it should be called math.statistics.sum(), but that's a smaller issue. However, one name that hasn't been mentioned might be even better: statistics._sum(). You need it within the module, it needs behavior that isn't quite the same as either builtins.sum() nor as math.fsum() (though it may utilize those internally). Calling it by such a "private" name seems to concord pretty well with Python naming conventions. Moreover, someone other than the module author CAN use it if they want to. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Tue Aug 6 00:36:36 2013 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 05 Aug 2013 23:36:36 +0100 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52000541.6040000@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> Message-ID: <520028F4.6030208@mrabarnett.plus.com> On 05/08/2013 21:04, Steven D'Aprano wrote: > > Are there still objections to making statistics.sum public? If the only way to move forward is to make it a private implementation detail, I will do so, but I really think that I have built a better sum and hope to keep it as a public function. > > Show of hands please, +1 or -1 on statistics.sum. > On balance, +1, as long as the help explains its purpose, of course. I was thinking that as there's a module called "math" (a short name) the statistics module should be called "stats", but then I remembered that there's already a module called "stat", which could be confusing... From joshua at landau.ws Tue Aug 6 00:41:20 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 5 Aug 2013 23:41:20 +0100 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52000541.6040000@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> Message-ID: On 5 August 2013 21:04, Steven D'Aprano wrote: > Thanks to everyone who has given feedback, it has been very humbling and > informative. I have a revised proto-PEP just about ready for (hopefully > final) feedback, but before I do there is one potentially major stumbling > block: whether or not the statistics module should have it's own version of > sum. > ... > Are there still objections to making statistics.sum public? If the only > way to move forward is to make it a private implementation detail, I will > do so, but I really think that I have built a better sum and hope to keep > it as a public function. > > Show of hands please, +1 or -1 on statistics.sum. > +1 for any of [math.]statistics.[[precise]_]sum or equiv. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Aug 6 01:15:33 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 5 Aug 2013 19:15:33 -0400 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] Message-ID: On Mon, Aug 5, 2013 at 6:36 PM, MRAB wrote: > I was thinking that as there's a module called "math" (a short name) > the statistics module should be called "stats", but then I remembered > that there's already a module called "stat", which could be confusing... > FWIW, I would also prefer a shorter name (or even folding statistics into math altogether.) We already have pstat module and no-one complained that it can be confused with stat. In general, I don't like when stdlib steals good English words from the user. If stats does not work - consider statslib. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Tue Aug 6 01:23:26 2013 From: joshua at landau.ws (Joshua Landau) Date: Tue, 6 Aug 2013 00:23:26 +0100 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: References: Message-ID: On 6 August 2013 00:15, Alexander Belopolsky wrote: > > On Mon, Aug 5, 2013 at 6:36 PM, MRAB wrote: > >> I was thinking that as there's a module called "math" (a short name) >> the statistics module should be called "stats", but then I remembered >> that there's already a module called "stat", which could be confusing... >> > > FWIW, I would also prefer a shorter name (or even folding statistics into > math altogether.) > > We already have pstat module and no-one complained that it can be confused > with stat. In general, I don't like when stdlib steals good English words > from the user. If stats does not work - consider statslib. > As someone who disagrees, what's wrong with "import statistics as stats"? Are you saying it would get in the way if you wanted your own statistics module? -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Aug 6 01:31:03 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 5 Aug 2013 19:31:03 -0400 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: References: Message-ID: On Mon, Aug 5, 2013 at 7:23 PM, Joshua Landau wrote: > As someone who disagrees, what's wrong with "import statistics as stats"? > Are you saying it would get in the way if you wanted your own statistics > module? No. More likely it will get in the way of the "statistics" variable. import statistics as stats # A few hundred lines later: stats.mean(statistics) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Aug 6 01:33:13 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 6 Aug 2013 09:33:13 +1000 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: If filter accepted keyword arguments, then that would offer a much cleaner way to skip the predicate. That should become much easier to do (without speed consequences) once Clinic lands in the default branch. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Aug 6 01:46:47 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 6 Aug 2013 09:46:47 +1000 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> Message-ID: On 6 Aug 2013 08:43, "Joshua Landau" wrote: > > On 5 August 2013 21:04, Steven D'Aprano wrote: >> >> Thanks to everyone who has given feedback, it has been very humbling and informative. I have a revised proto-PEP just about ready for (hopefully final) feedback, but before I do there is one potentially major stumbling block: whether or not the statistics module should have it's own version of sum. > > ... >> >> Are there still objections to making statistics.sum public? If the only way to move forward is to make it a private implementation detail, I will do so, but I really think that I have built a better sum and hope to keep it as a public function. >> >> Show of hands please, +1 or -1 on statistics.sum. > > > +1 for any of [math.]statistics.[[precise]_]sum or equiv. +1 from me, too. "Ducktyping" in the case of the statistics module refers to something being a Real number (in the mathematical sense). Doing this kind of basic statistical analysis with complex numbers or containers that happen to implement a "+" operation doesn't make sense. (Note: one advantage of calling the new module math.statistics is that it leaves the door open to a possible future cmath.statistics, since there *are* some statistical operations that make sense with complex numbers, too) Cheers, Nick. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Tue Aug 6 02:01:40 2013 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 06 Aug 2013 01:01:40 +0100 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: References: Message-ID: <52003CE4.9050408@mrabarnett.plus.com> On 06/08/2013 00:23, Joshua Landau wrote: > On 6 August 2013 00:15, Alexander Belopolsky > > > wrote: > > > On Mon, Aug 5, 2013 at 6:36 PM, MRAB > wrote: > > I was thinking that as there's a module called "math" (a short name) > the statistics module should be called "stats", but then I > remembered > that there's already a module called "stat", which could be > confusing... > > > FWIW, I would also prefer a shorter name (or even folding statistics > into math altogether.) > > We already have pstat module and no-one complained that it can be > confused with stat. In general, I don't like when stdlib steals > good English words from the user. If stats does not work - consider > statslib. > > > As someone who disagrees, what's wrong with "import statistics as > stats"? Are you saying it would get in the way if you wanted your own > statistics module? > Well, "statistics" is no longer than "subprocess" anyway. From alexander.belopolsky at gmail.com Tue Aug 6 02:11:44 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 5 Aug 2013 20:11:44 -0400 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52003CE4.9050408@mrabarnett.plus.com> References: <52003CE4.9050408@mrabarnett.plus.com> Message-ID: On Mon, Aug 5, 2013 at 8:01 PM, MRAB wrote: > Well, "statistics" is no longer than "subprocess" anyway. Right, but subprocess was never intended for use in the "calculator mode." -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Tue Aug 6 02:41:22 2013 From: cs at zip.com.au (Cameron Simpson) Date: Tue, 6 Aug 2013 10:41:22 +1000 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52000541.6040000@pearwood.info> References: <52000541.6040000@pearwood.info> Message-ID: <20130806004122.GA45267@cskk.homeip.net> On 06Aug2013 06:04, Steven D'Aprano wrote: | Show of hands please, +1 or -1 on statistics.sum. For me: +1 on statistics.sum, and +0.8 for calling it "sum" instead of the painfully cumbersome "precise_sum". Regarding the latter, it is only going to end up in code as "sum" if someone imports it; concerned coders can always "import sum as precise_sum" or the like. And since I've not said it, I'm +1 on the statistics module itself too. Cheers, -- Cameron Simpson From ethan at stoneleaf.us Tue Aug 6 03:12:47 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 05 Aug 2013 18:12:47 -0700 Subject: [Python-ideas] statistics module: where should it live? Message-ID: <52004D8F.60906@stoneleaf.us> Somebody suggested 'math.statistics' as that would leave the door open for a 'cmath.statistics'. +1 From joshua at landau.ws Tue Aug 6 03:40:04 2013 From: joshua at landau.ws (Joshua Landau) Date: Tue, 6 Aug 2013 02:40:04 +0100 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <20130806004122.GA45267@cskk.homeip.net> References: <52000541.6040000@pearwood.info> <20130806004122.GA45267@cskk.homeip.net> Message-ID: On 6 August 2013 01:41, Cameron Simpson wrote: > On 06Aug2013 06:04, Steven D'Aprano wrote: > | Show of hands please, +1 or -1 on statistics.sum. > > For me: +1 on statistics.sum, and +0.8 for calling it "sum" instead > of the painfully cumbersome "precise_sum". > > Regarding the latter, it is only going to end up in code as "sum" > if someone imports it; concerned coders can always "import sum as > precise_sum" or the like. > Just to be clear, in my view the purpose of naming it distinctly from builtins.sum was to aid in the discovery phase. Hence an "import as" is not an alternative. (I also feel like it's not a big deal, so I wouldn't feel offended if the matter was ignored.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Aug 6 04:10:37 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 06 Aug 2013 12:10:37 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <87zjswlu6d.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51FBF02F.1000202@pearwood.info> <87zjswlu6d.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52005B1D.8090100@pearwood.info> On 05/08/13 12:59, Stephen J. Turnbull wrote: > I couldn't find a list of functions proposed for inclusion in the > statistics package in the pre-PEP, only lists of functions in other > implementations that "suggest" the content of this package. Did I > miss something? Not really. I haven't seen the full public API of modules listed in other PEPs, so I didn't include it in mine. Perhaps I didn't look hard enough. Here's the current public API: - add_partial Utility for performing high-precision sums. - mean Arithmetic mean (average) of data. - median Median (middle value) of data. - median.high Median, taking the high value in ties. - median.low Median, taking the low value in ties. - median.grouped Median, adjusting for grouped data. - mode Mode (most common value) of data. - mode.collate Helper for mode. - mode.extract Helper for mode. - pstdev Population standard deviation of data. - pvariance Population variance of data. - StatisticsError Exception for statistics errors. - stdev Sample standard deviation of data. - sum High-precision sum of data. - variance Sample variance of data. After discussion with Oscar, I am leaning towards changing the API for mode, so mode.collate and mode.extract may not survive. [...] > And some of your > arguments are basically incorrect when considered from the standpoint > of *interpreting*, rather than *computing*, statistics: > > Steven D'Aprano writes: > > > - The built-in sum can lose accuracy when dealing with floats of wildly > > differing magnitude. Consequently, the above naive mean fails this > > "torture test" with an error of 100%: > > > > assert mean([1e30, 1, 3, -1e30]) == 1 > > 100%? This is a relative error of sqrt(2)*1e-30. I don't understand your calculation here. Where are you getting the values 2 and 1e-30 from? The exact value of the arithmetic mean of the four values given is exactly 1. (Total of 4, divided by 4, is 1. The calculated value is 0, which is an absolute error of 1, or a relative error of (1-0)/1 = 100%. [...] > So I would throw out all these appealing arguments that depend on > confounding numerical accuracy and statistical accuracy, and replace > it with a correct argument showing how precision does matter in > statistical interpretation: > > The first step in interpreting variation in data (including > dealing with ill-conditioned data) is standardization of the data > to a series with variance 1 (and often, mean 0). Standardization > requires accurate computation of tne mean and standard deviation of > the raw series. However, naive computation of mean and standard > deviation can lose precision very quickly. Because precision > bounds accuracy, it is important to use the most precise possible > algorithms for computing mean and standard deviation, or the > results of standardization are themselves useless. Thanks for the contribution. [...] > I also wonder about the utility of a "statistics" package that has no > functionality for presenting and operating on the most fundamental > "statistic" of all: the (empirical) distribution. Eg my own > statistics package will *never* suffer from ill-conditioned data (it's > only used for dealing with generated series of 10-100 data points with > a maximum dynamic range of about 100 to 1), but it's important for my > purposes to be able to flexibly deal with distributions (computing > modes and arbitrary percentiles, "bootstrap" random functions, > recognize multimodality, generate histograms, etc). That's only an > example, specific to teaching (and I use spreadsheets and R, not > Python, for demonstrations of actual computational applications). It's early days, and it is better to start the module small and grow it than to try to fit everything and the kitchen sink in from Day One. > I think the wide variety of applications of distributions merits > consideration of their inclusion in a "batteries included" statistical > package. I'm happy to discuss this further with you off-list. -- Steven From rymg19 at gmail.com Tue Aug 6 04:21:03 2013 From: rymg19 at gmail.com (Ryan) Date: Mon, 05 Aug 2013 21:21:03 -0500 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52000541.6040000@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> Message-ID: <0de4705f-c7d6-493f-a264-a2b38012fa72@email.android.com> +99. In Python, if you don't have to put multiple lines of code for a single task, you shouldn't have to. That's my opinion, at least. What other stuff might this statistics module have? It's own array format like NumPy? I'm curious. Steven D'Aprano wrote: >On 03/08/13 03:45, Steven D'Aprano wrote: >> I have raised an issue on the tracker to add a statistics module to >Python's standard library: >> >> http://bugs.python.org/issue18606 > > >Thanks to everyone who has given feedback, it has been very humbling >and informative. I have a revised proto-PEP just about ready for >(hopefully final) feedback, but before I do there is one potentially >major stumbling block: whether or not the statistics module should have >it's own version of sum. > > >Against the idea >---------------- > > >* One Obvious Way / Only One Way -- there are already two ways to >calculate a sum (builtins.sum and math.fsum), no need for a third. > >* Even if there is a need, we should aim to fix the problems with the >existing sum functions rather than add a third. > >* Even if we can't, don't call it "sum", call it something else. >"precise_sum" was the only suggestion given so far. > >(If I have missed any objections, I apologize.) > > > >In favour >--------- > >* Speaking as the module author, it is my considered opinion that I >cannot (easily, or at all) get the behaviour I expect from the >statistics module using the existing sum functions without a lot of >pain. See below. > >* For backward compatibility, I don't think we can change the existing >sum functions. E.g.: > > - built-in sum accepts non-numeric values if they support > the + operator, that won't change before Python 4000; > > - built-in sum can be inaccurate with floats; > > - math.fsum coerces everything to float. > >* Even if we could change one of the existing sum functions (math.fsum >is probably the better candidate) I personally don't know enough C to >do so. Either somebody else steps up and volunteers, or any such change >is deferred indefinitely. > > >Now that Decimal has an accelerated C version in CPython, it is more >important that ever before to treat it (and Fraction) as first-class >numeric types, and avoid coercing them to float unless necessary. So I >consider it a Must Have that stats functions support Decimal and >Fraction data without unnecessarily converting them to floats. This >rules out fsum: > >py> from decimal import Decimal as D >py> data = [D("0.1"), D("0.3")] >py> math.fsum(data) >0.4 >py> statistics.sum(data) >Decimal('0.4') > > >On the other hand, the built-in sum is demonstrably inaccurate with >floats, which is why fsum exists in the first place: > >py> data = [1e100, 1, -1e100, 1] >py> sum(data) >1.0 >py> math.fsum(data) >2.0 >py> statistics.sum(data) >2.0 > > >Consequently the statistics module includes its own version of sum. >Never mind the implementation, that may change in the future. >Regardless of the implementation, the interface of statistics.sum is >distinct from both of the existing sum functions. There are three >versions of sum because they each do different things. > >Currently, I can do this, both internally within other functions such >as mean, and externally, when I just want a total: > >total = statistics.sum(data) > >and get the right result regardless of the numeric type of data[1]. >Without it, I have to do something like this: > ># Make sure data is a list, and not an iterator. >if any(isinstance(x, float) for x in data): > total = math.fsum(data) >else: > total = sum(data) > > >Are there still objections to making statistics.sum public? If the only >way to move forward is to make it a private implementation detail, I >will do so, but I really think that I have built a better sum and hope >to keep it as a public function. > >Show of hands please, +1 or -1 on statistics.sum. > > > > >[1] Well, not complex numbers. > >-- >Steven >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Tue Aug 6 04:26:33 2013 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 06 Aug 2013 03:26:33 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <52005B1D.8090100@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <87zjswlu6d.fsf@uwakimon.sk.tsukuba.ac.jp> <52005B1D.8090100@pearwood.info> Message-ID: <52005ED9.2080905@mrabarnett.plus.com> On 06/08/2013 03:10, Steven D'Aprano wrote: > On 05/08/13 12:59, Stephen J. Turnbull wrote: >> I couldn't find a list of functions proposed for inclusion in the >> statistics package in the pre-PEP, only lists of functions in other >> implementations that "suggest" the content of this package. Did I >> miss something? > > Not really. I haven't seen the full public API of modules listed in other PEPs, so I didn't include it in mine. Perhaps I didn't look hard enough. > > Here's the current public API: > > - add_partial Utility for performing high-precision sums. > - mean Arithmetic mean (average) of data. > - median Median (middle value) of data. > - median.high Median, taking the high value in ties. > - median.low Median, taking the low value in ties. > - median.grouped Median, adjusting for grouped data. > - mode Mode (most common value) of data. > - mode.collate Helper for mode. > - mode.extract Helper for mode. > - pstdev Population standard deviation of data. > - pvariance Population variance of data. How about "popstdev" and "popvariance" instead? The "p" is not as clear to me as "pop". > - StatisticsError Exception for statistics errors. > - stdev Sample standard deviation of data. > - sum High-precision sum of data. > - variance Sample variance of data. > [snip] From joshua at landau.ws Tue Aug 6 04:29:50 2013 From: joshua at landau.ws (Joshua Landau) Date: Tue, 6 Aug 2013 03:29:50 +0100 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <0de4705f-c7d6-493f-a264-a2b38012fa72@email.android.com> References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> <0de4705f-c7d6-493f-a264-a2b38012fa72@email.android.com> Message-ID: On 6 August 2013 03:21, Ryan wrote: > +99. In Python, if you don't have to put multiple lines of code for a > single task, you shouldn't have to. That's my opinion, at least. > > What other stuff might this statistics module have? It's own array format > like NumPy? I'm curious. > What would be the purpose of that? The module purposefully doesn't even do multivariate data yet so multidimensional arrays are irrelevant and typed arrays are something else entirely. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Aug 6 05:11:38 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 5 Aug 2013 20:11:38 -0700 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> Message-ID: On Aug 5, 2013, at 16:46, Nick Coghlan wrote: > Note: one advantage of calling the new module math.statistics is that it leaves the door open to a possible future cmath.statistics, since there *are* some statistical operations that make sense with complex numbers, too) Actually, a _lot_ of the operations make sense with complex numbers--sum and mean, most obviously. Even sum of squares, variance, stdev, etc. have unambiguous meanings, even if they're not as obviously useful. So... What happens if you call them on complex numbers. (I'd be happy with either "it works when it makes sense" or "TypeError", I'm just curious which.) Anyway, +1 on the overall idea, +0.5 on the simple name "sum", and I guess +0 or -0 on making it math.statistics instead of just statistics depending on the answer to the previous question. From abarnert at yahoo.com Tue Aug 6 05:14:15 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 5 Aug 2013 20:14:15 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <52005ED9.2080905@mrabarnett.plus.com> References: <51FBF02F.1000202@pearwood.info> <87zjswlu6d.fsf@uwakimon.sk.tsukuba.ac.jp> <52005B1D.8090100@pearwood.info> <52005ED9.2080905@mrabarnett.plus.com> Message-ID: On Aug 5, 2013, at 19:26, MRAB wrote: > On 06/08/2013 03:10, Steven D'Aprano wrote: >> On 05/08/13 12:59, Stephen J. Turnbull wrote: >>> I couldn't find a list of functions proposed for inclusion in the >>> statistics package in the pre-PEP, only lists of functions in other >>> implementations that "suggest" the content of this package. Did I >>> miss something? >> >> Not really. I haven't seen the full public API of modules listed in other PEPs, so I didn't include it in mine. Perhaps I didn't look hard enough. >> >> Here's the current public API: >> >> - add_partial Utility for performing high-precision sums. >> - mean Arithmetic mean (average) of data. >> - median Median (middle value) of data. >> - median.high Median, taking the high value in ties. >> - median.low Median, taking the low value in ties. >> - median.grouped Median, adjusting for grouped data. >> - mode Mode (most common value) of data. >> - mode.collate Helper for mode. >> - mode.extract Helper for mode. >> - pstdev Population standard deviation of data. >> - pvariance Population variance of data. > > How about "popstdev" and "popvariance" instead? The "p" is not as clear > to me as "pop". It took me a second to figure out how to parse "popstdev" into the correct three words. I'm guessing that will be a one-time cost that most people who need the function will never even notice... But I think the same may be true for learning what pstdev means. > >> - StatisticsError Exception for statistics errors. >> - stdev Sample standard deviation of data. >> - sum High-precision sum of data. >> - variance Sample variance of data. > [snip] > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From rymg19 at gmail.com Tue Aug 6 06:55:50 2013 From: rymg19 at gmail.com (Ryan) Date: Mon, 05 Aug 2013 23:55:50 -0500 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> <0de4705f-c7d6-493f-a264-a2b38012fa72@email.android.com> Message-ID: There is no purpose. It was just an example. @whoeveriswritingthemodule: I'll be happy to help if needed in coding or reStructuredText(i.e. documentation). Joshua Landau wrote: >On 6 August 2013 03:21, Ryan wrote: > >> +99. In Python, if you don't have to put multiple lines of code for a >> single task, you shouldn't have to. That's my opinion, at least. >> >> What other stuff might this statistics module have? It's own array >format >> like NumPy? I'm curious. >> > >What would be the purpose of that? The module purposefully doesn't even >do >multivariate data yet so multidimensional arrays are irrelevant and >typed >arrays are something else entirely. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michelelacchia at gmail.com Tue Aug 6 07:22:28 2013 From: michelelacchia at gmail.com (Michele Lacchia) Date: Tue, 6 Aug 2013 07:22:28 +0200 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: References: <52003CE4.9050408@mrabarnett.plus.com> Message-ID: What about the fraction module? It could be in math but it isn't. It could also be frac or fracs, but it is not. For consistency the module name IMHO should be statistics. Il giorno 06/ago/2013 02:12, "Alexander Belopolsky" < alexander.belopolsky at gmail.com> ha scritto: > > On Mon, Aug 5, 2013 at 8:01 PM, MRAB wrote: > >> Well, "statistics" is no longer than "subprocess" anyway. > > > Right, but subprocess was never intended for use in the "calculator mode." > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Tue Aug 6 07:31:17 2013 From: joshua at landau.ws (Joshua Landau) Date: Tue, 6 Aug 2013 06:31:17 +0100 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: References: <52003CE4.9050408@mrabarnett.plus.com> Message-ID: On 6 August 2013 06:22, Michele Lacchia wrote: > > Il giorno 06/ago/2013 02:12, "Alexander Belopolsky" ha scritto: >> >> On Mon, Aug 5, 2013 at 8:01 PM, MRAB wrote: >>> >>> Well, "statistics" is no longer than "subprocess" anyway. >> >> Right, but subprocess was never intended for use in the "calculator mode." > > What about the fraction module? It could be in math but it isn't. It could also be frac or fracs, but it is not. For consistency the module name IMHO should be statistics. To be fair that's a false analogy as math is module for mathematical functions (of which statistics is a subset) and fractions contains numeric types. Additionally, whether things are inside other modules hasn't been very constant over time -- there are several cases where modules were combined and spliced although mostly in the Python 2 to 3 border. From ncoghlan at gmail.com Tue Aug 6 07:42:39 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 6 Aug 2013 15:42:39 +1000 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: References: <52003CE4.9050408@mrabarnett.plus.com> Message-ID: On 6 August 2013 15:31, Joshua Landau wrote: > On 6 August 2013 06:22, Michele Lacchia wrote: >> >> Il giorno 06/ago/2013 02:12, "Alexander Belopolsky" ha scritto: >>> >>> On Mon, Aug 5, 2013 at 8:01 PM, MRAB wrote: >>>> >>>> Well, "statistics" is no longer than "subprocess" anyway. >>> >>> Right, but subprocess was never intended for use in the "calculator mode." >> >> What about the fraction module? It could be in math but it isn't. It could also be frac or fracs, but it is not. For consistency the module name IMHO should be statistics. > > To be fair that's a false analogy as math is module for mathematical > functions (of which statistics is a subset) and fractions contains > numeric types. > > Additionally, whether things are inside other modules hasn't been very > constant over time -- there are several cases where modules were > combined and spliced although mostly in the Python 2 to 3 border. The main reason we've switched to nesting things inside other namespaces is when a name is somewhat ambiguous on its own, or risks a name clash with a PyPI project. concurrent.futures is so named to help avoid confusion with the finance industry notion of "futures" unittest.mock avoids colliding with the PyPI original In this case, since Steven's module doesn't handle complex numbers (as I understand it), putting it inside the "math" namespace helps make that clear. If it does handle complex numbers where appropriate, then the top level name would make more sense. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From raymond.hettinger at gmail.com Tue Aug 6 08:34:48 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 5 Aug 2013 23:34:48 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: <3464DB8E-B684-44F0-9D69-5F1A4E529417@gmail.com> On Aug 5, 2013, at 10:32 AM, Markus Unterwaditzer wrote: > -0, for reasons already mentioned. While i agree that filter(None, items) is counterintuitive, filter(bool, items) looks very readable to me. > > I think modifying the behavior of filter like this will at most trick users into thinking they need to specify the iterable first. I concur. Put me down for a -1. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Aug 6 08:53:50 2013 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 6 Aug 2013 07:53:50 +0100 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: On Mon, Aug 5, 2013 at 8:46 AM, Peter Otten <__peter__ at web.de> wrote: > filter(items) > > looks much cleaner than > > filter(None, items) > > and is easy to understand. Fewer people would use alternative spellings like > > filter(bool, items) As an alternative (and simpler) to the proposal currently on the table: Would it improve clarity if the magic of None were instead applied to 'bool'? The third spelling here would thus be blessed with a usage recommendation and possible performance improvement (eliminating a redundant call, but only if the arg 'is bool'), while still having most of the readability that the one-arg form aims for. (Obviously backward compat will mean that None will continue to have its magic, but it would be the less-recommended option. New code would be encouraged to use filter(bool,...) which would have the exact same effect.) ChrisA From __peter__ at web.de Tue Aug 6 09:28:29 2013 From: __peter__ at web.de (Peter Otten) Date: Tue, 06 Aug 2013 09:28:29 +0200 Subject: [Python-ideas] Allow filter(items) References: Message-ID: Chris Angelico wrote: > On Mon, Aug 5, 2013 at 8:46 AM, Peter Otten > <__peter__ at web.de> wrote: >> filter(items) >> >> looks much cleaner than >> >> filter(None, items) >> >> and is easy to understand. Fewer people would use alternative spellings >> like >> >> filter(bool, items) > > As an alternative (and simpler) to the proposal currently on the > table: Would it improve clarity if the magic of None were instead > applied to 'bool'? I was really thinking of the Python side. There is a first_true() function as a candidate for inclusion into itertools which is basically next(filter(None, items), None) and I felt that e. g. next(filter(None, (line.strip() for line in file)), "") warrants a new function while next(filter(line.strip() for line in file), "") does not. Looks like it won't fly... The implementation already has an optimization to treat bool like None -- the former is never called: [Python/bltinmodule.c] if (lz->func == Py_None || lz->func == (PyObject *)&PyBool_Type) { ok = PyObject_IsTrue(item); This is done on every next() call, and while my instinct would be to translate PyBool_Type to Py_None once in the constructor I doubt that this has a noticeable impact on performance. > The third spelling here would thus be blessed with > a usage recommendation and possible performance improvement > (eliminating a redundant call, but only if the arg 'is bool'), while > still having most of the readability that the one-arg form aims for. > (Obviously backward compat will mean that None will continue to have > its magic, but it would be the less-recommended option. New code would > be encouraged to use filter(bool,...) which would have the exact same > effect.) > > ChrisA From rosuav at gmail.com Tue Aug 6 09:34:32 2013 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 6 Aug 2013 08:34:32 +0100 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: On Tue, Aug 6, 2013 at 8:28 AM, Peter Otten <__peter__ at web.de> wrote: > The implementation already has an optimization to treat bool like None -- > the former is never called: > > [Python/bltinmodule.c] > > if (lz->func == Py_None || lz->func == (PyObject *)&PyBool_Type) { > ok = PyObject_IsTrue(item); > > This is done on every next() call, and while my instinct would be to > translate PyBool_Type to Py_None once in the constructor I doubt that this > has a noticeable impact on performance. Okay. Sounds like there's already an answer to those who want more readability: Just use filter(bool,...). Maybe I'm just not seeing the obvious problem with this version? ChrisA From dickinsm at gmail.com Tue Aug 6 09:49:49 2013 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 6 Aug 2013 08:49:49 +0100 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: References: <52003CE4.9050408@mrabarnett.plus.com> Message-ID: The complex numbers argument seems like a red herring to me: I can't imagine why we'd ever want a cmath.statistics module. I can't see any problem with having future functions in math.statistics that can either take float inputs and return a float result, or take complex inputs and return a complex result. One of the biggest reasons that it's useful to have both math.sqrt and cmath.sqrt is that the latter returns complex numbers for negative *float* inputs, and that's going to be undesirable for many (but desirable for some). I can't think of any kind of statistics that would need that kind of separation. Any of math.statistics, math.stats, statistics or stats works for me. My (weak) preference would be for a 'statistics' top-level module. Mark On Tue, Aug 6, 2013 at 6:42 AM, Nick Coghlan wrote: > On 6 August 2013 15:31, Joshua Landau wrote: > > On 6 August 2013 06:22, Michele Lacchia > wrote: > >> > >> Il giorno 06/ago/2013 02:12, "Alexander Belopolsky" < > alexander.belopolsky at gmail.com> ha scritto: > >>> > >>> On Mon, Aug 5, 2013 at 8:01 PM, MRAB > wrote: > >>>> > >>>> Well, "statistics" is no longer than "subprocess" anyway. > >>> > >>> Right, but subprocess was never intended for use in the "calculator > mode." > >> > >> What about the fraction module? It could be in math but it isn't. It > could also be frac or fracs, but it is not. For consistency the module name > IMHO should be statistics. > > > > To be fair that's a false analogy as math is module for mathematical > > functions (of which statistics is a subset) and fractions contains > > numeric types. > > > > Additionally, whether things are inside other modules hasn't been very > > constant over time -- there are several cases where modules were > > combined and spliced although mostly in the Python 2 to 3 border. > > The main reason we've switched to nesting things inside other > namespaces is when a name is somewhat ambiguous on its own, or risks a > name clash with a PyPI project. > > concurrent.futures is so named to help avoid confusion with the > finance industry notion of "futures" > unittest.mock avoids colliding with the PyPI original > > In this case, since Steven's module doesn't handle complex numbers (as > I understand it), putting it inside the "math" namespace helps make > that clear. If it does handle complex numbers where appropriate, then > the top level name would make more sense. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Aug 6 10:23:05 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 06 Aug 2013 11:23:05 +0300 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: 06.08.13 10:34, Chris Angelico ???????(??): > Okay. Sounds like there's already an answer to those who want more > readability: Just use filter(bool,...). Maybe I'm just not seeing the > obvious problem with this version? Are `if bool(...)` or `if bool(...) == True` more readable than `if ...`? From rosuav at gmail.com Tue Aug 6 10:25:32 2013 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 6 Aug 2013 09:25:32 +0100 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: On Tue, Aug 6, 2013 at 9:23 AM, Serhiy Storchaka wrote: > 06.08.13 10:34, Chris Angelico ???????(??): > >> Okay. Sounds like there's already an answer to those who want more >> readability: Just use filter(bool,...). Maybe I'm just not seeing the >> obvious problem with this version? > > > Are `if bool(...)` or `if bool(...) == True` more readable than `if ...`? They're more readable than 'if None(...)' is. ChrisA From raymond.hettinger at gmail.com Tue Aug 6 10:25:51 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 6 Aug 2013 01:25:51 -0700 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52003CE4.9050408@mrabarnett.plus.com> References: <52003CE4.9050408@mrabarnett.plus.com> Message-ID: <9A7089A9-22AA-46C9-BACF-BF4A8E320051@gmail.com> On Aug 5, 2013, at 5:01 PM, MRAB wrote: > Well, "statistics" is no longer than "subprocess" anyway. It also has the advantage of being obvious. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Tue Aug 6 11:01:58 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 6 Aug 2013 02:01:58 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: <509A16D4-FA3F-4315-9D66-DD5E3BB8BA45@umbrellacode.com> On Aug 6, 2013, at 1:25 AM, Chris Angelico wrote: >> Are `if bool(...)` or `if bool(...) == True` more readable than `if ...`? That highlights the repetitive?rather that explicit?nature of bool in this application. The fact is that None symbolized the identity function (as does bool, it would appear) in this case, and it makes perfect sense to replace the identity function f(x) and input x with x itself; it also makes perfect sense to have a filter function that operates on a collection without a predicate, making predicate an optional transformation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Aug 6 11:02:06 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 06 Aug 2013 18:02:06 +0900 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <51FFD506.7080608@pearwood.info> Message-ID: <87txj3kx9t.fsf@uwakimon.sk.tsukuba.ac.jp> Oscar Benjamin writes: > >> It's also not common AFAIK in other statistical packages > >> (at least not under the name mode). > > > > Press et al claim it is poorly known, but much better than the > > binning method. It saddens me that twenty years on, it's still > > poorly known. In what sense is it "better" than the binning method? If you're working with tax data or subsidy data, your bins will be given to you (the brackets). Similarly for geographical data (political boundaries), and so on. I've almost never found choice of bins to be a problem (but my use cases are such that either the bins are given or they don't much matter because there's enough data to approximate a density graphically). Does it properly identify multiple modes (preferably including lower peaks), or does it involve a single-peakedness assumption? > My preference really is just that modes() returns a list of all > modes and the user should decide what to do with however many > values they get back. +1 I might be useful to have helper functions or methods to make common selections. > The other thing is about this idea that if all values are equally > common then their is "no mode". I want to say that every value is a > mode rathern than none. +1 One of the things that I teach my students is that the mode (and median) always exist, but in some distributions they're not very informative. I'd be disappointed if that teaching were falsified in Python's stdlib. I also hate edge cases like this: > Otherwise you get strange differences between > e.g.: [2,2,3,3,4,4] and [1,2,2,3,3,4,4]. I've checked on the interweb > though and it seems that most people disagree with me on this point so > never mind! In fact, in my biased sample (math-averse, math-differently-abled MBA students), invariably students start out by saying there is no mode unless it's unique, are convinced of the existence of multimodalness by examples involving physical dimensions of men and women when aggregated as "human beings", and most look at examples like Oscar's and are convinced that "there is no unique mode" and "every value is modal" are the best ways to speak of these edge cases. From clay.sweetser at gmail.com Tue Aug 6 12:33:42 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Tue, 6 Aug 2013 06:33:42 -0400 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52000541.6040000@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> Message-ID: On Aug 5, 2013 4:07 PM, "Steven D'Aprano" wrote: > > Show of hands please, +1 or -1 on statistics.sum. > > > > > [1] Well, not complex numbers. > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Aug 6 14:18:09 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 6 Aug 2013 14:18:09 +0200 Subject: [Python-ideas] statistics module: where should it live? References: <52004D8F.60906@stoneleaf.us> Message-ID: <20130806141809.1c1c7a8c@pitrou.net> Le Mon, 05 Aug 2013 18:12:47 -0700, Ethan Furman a ?crit : > Somebody suggested 'math.statistics' as that would leave the door > open for a 'cmath.statistics'. I'm absolutely -10 (*) on any temptation to build thematic packages like that. "urllib.request" and friends are a disaster when it comes to discovering, remembering and typing those names. "Flat is better than nested". Kudos to Tim Peters for understanding why it is so, even before Python tried to do differently :-) (*) You may interpret it in binary form if you like :-) Regards Antoine. From oscar.j.benjamin at gmail.com Tue Aug 6 14:58:02 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 6 Aug 2013 13:58:02 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <87txj3kx9t.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51FBF02F.1000202@pearwood.info> <51FFD506.7080608@pearwood.info> <87txj3kx9t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 6 August 2013 10:02, Stephen J. Turnbull wrote: > Oscar Benjamin writes: > > > >> It's also not common AFAIK in other statistical packages > > >> (at least not under the name mode). > > > > > > Press et al claim it is poorly known, but much better than the > > > binning method. It saddens me that twenty years on, it's still > > > poorly known. > > In what sense is it "better" than the binning method? The book that Steven is referencing is written (primarily) for the benefit of scientists. I think it is expected that the if you're trying to estimate the mode of a continuously distributed quantity then it is because, say, you have experimental data from a skewed distribution. I'm not sure though as I've just borrowed a 1999 edition (in C) from a colleague's desk and this particular method/algorithm isn't included (it doesn't give any method to compute the mode). > If you're > working with tax data or subsidy data, your bins will be given to you > (the brackets). That's a good point. It would be useful if a mode function could use the appropriate bins where they are predetermined. Of course you can bin them yourself and call modes(). Scipy/Matlab etc. provide the bin-counting functionality separately under hist or histogram rather than mode. > Similarly for geographical data (political > boundaries), and so on. It's definitely your job to bin those! > I've almost never found choice of bins to be > a problem (but my use cases are such that either the bins are given or > they don't much matter because there's enough data to approximate a > density graphically). > > Does it properly identify multiple modes (preferably including lower > peaks), or does it involve a single-peakedness assumption? It doesn't assume single-peakedness. There are a couple of strategies for identifying possible additional modes after finding the first (see mode.extract). > > My preference really is just that modes() returns a list of all > > modes and the user should decide what to do with however many > > values they get back. > > +1 > > I might be useful to have helper functions or methods to make common > selections. Perhaps modes() could return all modes and mode() could return 1 if there's exactly 1 or otherwise raise an error. Oscar From oscar.j.benjamin at gmail.com Tue Aug 6 14:59:54 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 6 Aug 2013 13:59:54 +0100 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52000541.6040000@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> Message-ID: On 5 August 2013 21:04, Steven D'Aprano wrote: > > Show of hands please, +1 or -1 on statistics.sum. > +1 From oscar.j.benjamin at gmail.com Tue Aug 6 15:09:58 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 6 Aug 2013 14:09:58 +0100 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: References: <52003CE4.9050408@mrabarnett.plus.com> Message-ID: On 6 August 2013 08:49, Mark Dickinson wrote: > The complex numbers argument seems like a red herring to me: I can't > imagine why we'd ever want a cmath.statistics module. I can't see any > problem with having future functions in math.statistics that can either take > float inputs and return a float result, or take complex inputs and return a > complex result. The complex numbers argument is a red herring. Steven's module already accepts whatever types make sense in the appropriate context, including complex numbers: >>> from statistics import * >>> mean([1j, 2j, 3j]) 2j >>> mode([1, 2, 3j, 3j]) 3j >>> median([1j, 2j]) Traceback (most recent call last): File "", line 1, in File ".\statistics.py", line 462, in __new__ data = sorted(data) TypeError: unorderable types: complex() < complex() >>> from fractions import Fraction >>> dicerolls = list(range(1, 6+1)) >>> dicerolls [1, 2, 3, 4, 5, 6] >>> mean(dicerolls) 3.5 >>> fdice = [Fraction(n) for n in dicerolls] >>> fdice [Fraction(1, 1), Fraction(2, 1), Fraction(3, 1), Fraction(4, 1), Fraction(5, 1), Fraction(6, 1)] >>> mean(fdice) Fraction(7, 2) >>> print(mean(fdice)) 7/2 >>> print(pvariance(fdice)) 35/12 >>> mode('abracadabra') 'a' >>> median.low(['spam', 'ham', 'eggs']) 'ham' >>> from datetime import datetime, timedelta >>> now = datetime.now() >>> dates = [now + timedelta(days=n) for n in range(5)] >>> dates [datetime.datetime(2013, 8, 6, 14, 5, 35, 614027), datetime.datetime(2013, 8, 7, 14, 5, 35, 614027), datetime.datetime(2013, 8, 8, 14, 5, 35, 614027), datetime.datetime(2013, 8, 9, 14, 5, 35, 614027), datetime.datetime(2013, 8, 10, 14, 5, 35, 614027)] >>> median.low(dates) datetime.datetime(2013, 8, 8, 14, 5, 35, 614027) Oscar From solipsis at pitrou.net Tue Aug 6 15:26:00 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 6 Aug 2013 15:26:00 +0200 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] References: <52003CE4.9050408@mrabarnett.plus.com> Message-ID: <20130806152600.3293aa71@pitrou.net> Le Tue, 6 Aug 2013 15:42:39 +1000, Nick Coghlan a ?crit : > > The main reason we've switched to nesting things inside other > namespaces is when a name is somewhat ambiguous on its own, or risks a > name clash with a PyPI project. > > concurrent.futures is so named to help avoid confusion with the > finance industry notion of "futures" Which is completely silly. Polysemy is a fact of life in any natural language and disambiguation is done through context, not by being extremely wordy like a Java programmer. Regards Antoine. From mal at egenix.com Tue Aug 6 15:46:00 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 06 Aug 2013 15:46:00 +0200 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <20130806152600.3293aa71@pitrou.net> References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> Message-ID: <5200FE18.2000409@egenix.com> On 06.08.2013 15:26, Antoine Pitrou wrote: > Le Tue, 6 Aug 2013 15:42:39 +1000, > Nick Coghlan a > ?crit : >> >> The main reason we've switched to nesting things inside other >> namespaces is when a name is somewhat ambiguous on its own, or risks a >> name clash with a PyPI project. >> >> concurrent.futures is so named to help avoid confusion with the >> finance industry notion of "futures" > > Which is completely silly. Polysemy is a fact of life in any natural > language and disambiguation is done through context, not by being > extremely wordy like a Java programmer. While context works in natural languages (and computers have a hard time understanding it :-)), it doesn't work for the Python import mechanism, so I don't follow you. Python has grown a lot since the days most of the stdlib modules/packages were added, so we have to pay more attention to name clashes. "math.statistics" looks like a decent name, IMO. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Tue Aug 6 15:55:54 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 6 Aug 2013 15:55:54 +0200 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> Message-ID: <20130806155554.1524fde0@pitrou.net> Le Tue, 06 Aug 2013 15:46:00 +0200, "M.-A. Lemburg" a ?crit : > > Python has grown a lot since the days most of the stdlib > modules/packages were added, so we have to pay more attention > to name clashes. Of course we can pay attention to name clashes. This is done through checking at PyPI, though, not by speculating that someone may think "finance" when they encounter the word "futures". Regards Antoine. From oscar.j.benjamin at gmail.com Tue Aug 6 15:59:06 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 6 Aug 2013 14:59:06 +0100 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: On 5 August 2013 08:46, Peter Otten <__peter__ at web.de> wrote: > filter(items) > > looks much cleaner than > > filter(None, items) > > and is easy to understand. I agree. I don't use filter very often and when I do I always have to think carefully about the order of the arguments. I'd prefer it if it were more like sort etc.: filter(numbers, key=lambda x: x<5) Oscar From mal at egenix.com Tue Aug 6 16:19:18 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 06 Aug 2013 16:19:18 +0200 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <20130806155554.1524fde0@pitrou.net> References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> <20130806155554.1524fde0@pitrou.net> Message-ID: <520105E6.1010800@egenix.com> On 06.08.2013 15:55, Antoine Pitrou wrote: > Le Tue, 06 Aug 2013 15:46:00 +0200, > "M.-A. Lemburg" a ?crit : >> >> Python has grown a lot since the days most of the stdlib >> modules/packages were added, so we have to pay more attention >> to name clashes. > > Of course we can pay attention to name clashes. This is done through > checking at PyPI, though, not by speculating that someone may think > "finance" when they encounter the word "futures". True. Apart from avoiding name clashes, I think adding a bit of extra context by means of placing the module into package also helps people trying to determine the meaning of the module. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Tue Aug 6 16:23:18 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 6 Aug 2013 16:23:18 +0200 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> <20130806155554.1524fde0@pitrou.net> <520105E6.1010800@egenix.com> Message-ID: <20130806162318.79347d00@pitrou.net> Le Tue, 06 Aug 2013 16:19:18 +0200, "M.-A. Lemburg" a ?crit : > On 06.08.2013 15:55, Antoine Pitrou wrote: > > Le Tue, 06 Aug 2013 15:46:00 +0200, > > "M.-A. Lemburg" a ?crit : > >> > >> Python has grown a lot since the days most of the stdlib > >> modules/packages were added, so we have to pay more attention > >> to name clashes. > > > > Of course we can pay attention to name clashes. This is done through > > checking at PyPI, though, not by speculating that someone may think > > "finance" when they encounter the word "futures". > > True. > > Apart from avoiding name clashes, I think adding a bit of extra > context by means of placing the module into package also helps people > trying to determine the meaning of the module. Well, "statistics" sounds clear enough to me :-) (like "logging" or "unittest") Regards Antoine. From ncoghlan at gmail.com Tue Aug 6 16:33:25 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 7 Aug 2013 00:33:25 +1000 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <20130806162318.79347d00@pitrou.net> References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> <20130806155554.1524fde0@pitrou.net> <520105E6.1010800@egenix.com> <20130806162318.79347d00@pitrou.net> Message-ID: On 7 August 2013 00:23, Antoine Pitrou wrote: > Le Tue, 06 Aug 2013 16:19:18 +0200, > "M.-A. Lemburg" a ?crit : > >> On 06.08.2013 15:55, Antoine Pitrou wrote: >> > Le Tue, 06 Aug 2013 15:46:00 +0200, >> > "M.-A. Lemburg" a ?crit : >> >> >> >> Python has grown a lot since the days most of the stdlib >> >> modules/packages were added, so we have to pay more attention >> >> to name clashes. >> > >> > Of course we can pay attention to name clashes. This is done through >> > checking at PyPI, though, not by speculating that someone may think >> > "finance" when they encounter the word "futures". >> >> True. >> >> Apart from avoiding name clashes, I think adding a bit of extra >> context by means of placing the module into package also helps people >> trying to determine the meaning of the module. > > Well, "statistics" sounds clear enough to me :-) > (like "logging" or "unittest") Yeah, Steven's own stats module on PyPI is the main name clash we need to avoid, and "statistics" handles that nicely. The other nice thing about using the top level name is that converting math to a package would be a pain, so +1 for the simple option :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mal at egenix.com Tue Aug 6 17:12:33 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 06 Aug 2013 17:12:33 +0200 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <20130806162318.79347d00@pitrou.net> References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> <20130806155554.1524fde0@pitrou.net> <520105E6.1010800@egenix.com> <20130806162318.79347d00@pitrou.net> Message-ID: <52011261.40809@egenix.com> On 06.08.2013 16:23, Antoine Pitrou wrote: > Le Tue, 06 Aug 2013 16:19:18 +0200, > "M.-A. Lemburg" a ?crit : > >> On 06.08.2013 15:55, Antoine Pitrou wrote: >>> Le Tue, 06 Aug 2013 15:46:00 +0200, >>> "M.-A. Lemburg" a ?crit : >>>> >>>> Python has grown a lot since the days most of the stdlib >>>> modules/packages were added, so we have to pay more attention >>>> to name clashes. >>> >>> Of course we can pay attention to name clashes. This is done through >>> checking at PyPI, though, not by speculating that someone may think >>> "finance" when they encounter the word "futures". >> >> True. >> >> Apart from avoiding name clashes, I think adding a bit of extra >> context by means of placing the module into package also helps people >> trying to determine the meaning of the module. > > Well, "statistics" sounds clear enough to me :-) > (like "logging" or "unittest") Sure. I was thinking of the more exotic "futures" or often used terms such as "request" that can benefit from added context. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Tue Aug 6 17:18:57 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 6 Aug 2013 17:18:57 +0200 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> <20130806155554.1524fde0@pitrou.net> <520105E6.1010800@egenix.com> <20130806162318.79347d00@pitrou.net> <52011261.40809@egenix.com> Message-ID: <20130806171857.1053acb8@pitrou.net> Le Tue, 06 Aug 2013 17:12:33 +0200, "M.-A. Lemburg" a ?crit : > On 06.08.2013 16:23, Antoine Pitrou wrote: > > Le Tue, 06 Aug 2013 16:19:18 +0200, > > "M.-A. Lemburg" a ?crit : > > > >> On 06.08.2013 15:55, Antoine Pitrou wrote: > >>> Le Tue, 06 Aug 2013 15:46:00 +0200, > >>> "M.-A. Lemburg" a ?crit : > >>>> > >>>> Python has grown a lot since the days most of the stdlib > >>>> modules/packages were added, so we have to pay more attention > >>>> to name clashes. > >>> > >>> Of course we can pay attention to name clashes. This is done > >>> through checking at PyPI, though, not by speculating that someone > >>> may think "finance" when they encounter the word "futures". > >> > >> True. > >> > >> Apart from avoiding name clashes, I think adding a bit of extra > >> context by means of placing the module into package also helps > >> people trying to determine the meaning of the module. > > > > Well, "statistics" sounds clear enough to me :-) > > (like "logging" or "unittest") > > Sure. I was thinking of the more exotic "futures" or often used > terms such as "request" that can benefit from added context. In that case, compound / abbreviated names can also be used ("urllib", "httpreq"...). They also avoid the collision threat nicely. Regards Antoine. From mal at egenix.com Tue Aug 6 17:36:19 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 06 Aug 2013 17:36:19 +0200 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <20130806171857.1053acb8@pitrou.net> References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> <20130806155554.1524fde0@pitrou.net> <520105E6.1010800@egenix.com> <20130806162318.79347d00@pitrou.net> <52011261.40809@egenix.com> <20130806171857.1053acb8@pitrou.net> Message-ID: <520117F3.1060507@egenix.com> On 06.08.2013 17:18, Antoine Pitrou wrote: > Le Tue, 06 Aug 2013 17:12:33 +0200, > "M.-A. Lemburg" a ?crit : > >> On 06.08.2013 16:23, Antoine Pitrou wrote: >>> Le Tue, 06 Aug 2013 16:19:18 +0200, >>> "M.-A. Lemburg" a ?crit : >>> >>>> On 06.08.2013 15:55, Antoine Pitrou wrote: >>>>> Le Tue, 06 Aug 2013 15:46:00 +0200, >>>>> "M.-A. Lemburg" a ?crit : >>>>>> >>>>>> Python has grown a lot since the days most of the stdlib >>>>>> modules/packages were added, so we have to pay more attention >>>>>> to name clashes. >>>>> >>>>> Of course we can pay attention to name clashes. This is done >>>>> through checking at PyPI, though, not by speculating that someone >>>>> may think "finance" when they encounter the word "futures". >>>> >>>> True. >>>> >>>> Apart from avoiding name clashes, I think adding a bit of extra >>>> context by means of placing the module into package also helps >>>> people trying to determine the meaning of the module. >>> >>> Well, "statistics" sounds clear enough to me :-) >>> (like "logging" or "unittest") >> >> Sure. I was thinking of the more exotic "futures" or often used >> terms such as "request" that can benefit from added context. > > In that case, compound / abbreviated names can also be used ("urllib", > "httpreq"...). They also avoid the collision threat nicely. I think the world has changed since the days of 8.3 DOS names :-) The typing argument is not all that important anymore when editors and interactive tools like ipython take care of most of the typing for you via auto completion. Nowadays, code readability takes precedent and so we can use the luxury of names that tell a story, rather than play rot13 on your brain :-) That doesn't mean we need to go all java about names, but we also don't have to revert to "req" when we really mean "request". -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Tue Aug 6 17:45:01 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 6 Aug 2013 17:45:01 +0200 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> <20130806155554.1524fde0@pitrou.net> <520105E6.1010800@egenix.com> <20130806162318.79347d00@pitrou.net> <52011261.40809@egenix.com> <20130806171857.1053acb8@pitrou.net> <520117F3.1060507@egenix.com> Message-ID: <20130806174501.794fa777@pitrou.net> Le Tue, 06 Aug 2013 17:36:19 +0200, "M.-A. Lemburg" a ?crit : > >> > >> Sure. I was thinking of the more exotic "futures" or often used > >> terms such as "request" that can benefit from added context. > > > > In that case, compound / abbreviated names can also be used > > ("urllib", "httpreq"...). They also avoid the collision threat > > nicely. > > I think the world has changed since the days of 8.3 DOS names :-) > > The typing argument is not all that important anymore > when editors and interactive tools like ipython take care > of most of the typing for you via auto completion. ipython isn't available in every environment, and it's not in everyone's taste either :-) Editors are generally not that smart about auto-completion either. Usually they will auto-complete names which are already in the current file. IDEs with sophisticated Python plugins may do better, but not everyone likes to use them. Regards Antoine. From oscar.j.benjamin at gmail.com Tue Aug 6 17:49:17 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 6 Aug 2013 16:49:17 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <51FBF02F.1000202@pearwood.info> References: <51FBF02F.1000202@pearwood.info> Message-ID: On 2 August 2013 18:45, Steven D'Aprano wrote: > I have raised an issue on the tracker to add a statistics module to Python's > standard library: > > http://bugs.python.org/issue18606 > > and have been asked to write a PEP. Attached is my draft PEP. Feedback is > requested, thanks in advance. I have another query/suggestion for the statistics module. Taking the example from the PEP: >>> from statistics import * >>> data = [1, 2, 4, 5, 8] >>> data = [x+1e12 for x in data] >>> variance(data) 7.5 However: >>> variance(iter(data)) 7.4999542236328125 Okay so that's a small difference and it's unlikely to upset many people. But being something of a numerical obsessive I do often get upset about things like this. It's not that I mind the size of the error but rather that I dislike having the calculation implicitly changed. I want to think that it doesn't matter whether I pass an iterator or a list because either I get an error or I get the same result. Now I understand that the reason is a switch from a 2-pass algorithm to a 1-pass algorithm and that you want to support working directly with iterators rather than just collections. However, toy examples aside, I'm not sure that there is much of a practical use-case for computing *individual* statistics on a single pass. Whenever I've wanted to compute statistics on a single pass I've wanted to compute *multiple* statistics in *the same* single pass. Really I think that the use-cases are basically like this: 1) You can just put the data in a collection in memory (the common case). 2) Your data is too large to go in memory but you can iterate over it from the disk, or network, or a computational generator or whatever. Since the iteration is expensive or unrepeatable you want to compute everything in one pass (happens sometimes but certainly a lot less common than case 1)). 3) Your data/computation is distributed and you want to compute statistics in a distributed/parallel framework and merge them later (a very specialised setup that possibly warrants having its own implementation of the statistical routines anyway). Currently the API of the statistics module is only really suited to case 1). I think that it would be better to limit it to that case to simplify the implementation and make the output always consistent. In other words I think it should just require a collection, reject iterators, and use as many passes as it needs to get the best results. This would make the implementation simpler in a number of areas. An alternative API would be better for single-pass statistics (perhaps deferred for now). In the past if I've made myself APIs for this they look more like this: >>> stats = iterstats('mean', 'min', 'var', 'count') >>> stats.consume_data([1, 2, 3, 4]) >>> stats.compute_statistics() {'mean': 2.5, 'min': 1, 'var': 1.666, 'count': 4} >>> stats.consume_data([5, 6, 7, 8]) ... To satisfy use-case 3) is more complicated but it basically amounts to being able to do something like: >>> allstats = iterstats.merge([stats1, stats2, ...]) Oscar From oscar.j.benjamin at gmail.com Tue Aug 6 18:15:15 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 6 Aug 2013 17:15:15 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: On 6 August 2013 16:49, Oscar Benjamin wrote: > Really I think that the use-cases are basically like this: > > 1) You can just put the data in a collection in memory (the common case). > 2) Your data is too large to go in memory but you can iterate over it > from the disk, or network, or a computational generator or whatever. > Since the iteration is expensive or unrepeatable you want to compute > everything in one pass (happens sometimes but certainly a lot less > common than case 1)). > 3) Your data/computation is distributed and you want to compute > statistics in a distributed/parallel framework and merge them later (a > very specialised setup that possibly warrants having its own > implementation of the statistical routines anyway). 4) You want to be able to save/reload state midway through computing statistics and get intermediate results. This could be e.g. a script that periodically runs and collates data from log-files. Oscar From rymg19 at gmail.com Tue Aug 6 18:36:51 2013 From: rymg19 at gmail.com (Ryan) Date: Tue, 06 Aug 2013 11:36:51 -0500 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: That could easily be fixed: variance(list(iter(data))) It could take place on the inside. Oscar Benjamin wrote: >On 2 August 2013 18:45, Steven D'Aprano wrote: >> I have raised an issue on the tracker to add a statistics module to >Python's >> standard library: >> >> http://bugs.python.org/issue18606 >> >> and have been asked to write a PEP. Attached is my draft PEP. >Feedback is >> requested, thanks in advance. > >I have another query/suggestion for the statistics module. > >Taking the example from the PEP: > >>>> from statistics import * >>>> data = [1, 2, 4, 5, 8] >>>> data = [x+1e12 for x in data] >>>> variance(data) >7.5 > >However: > >>>> variance(iter(data)) >7.4999542236328125 > >Okay so that's a small difference and it's unlikely to upset many >people. But being something of a numerical obsessive I do often get >upset about things like this. It's not that I mind the size of the >error but rather that I dislike having the calculation implicitly >changed. I want to think that it doesn't matter whether I pass an >iterator or a list because either I get an error or I get the same >result. > > > >Oscar >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From michelelacchia at gmail.com Tue Aug 6 21:44:49 2013 From: michelelacchia at gmail.com (Michele Lacchia) Date: Tue, 6 Aug 2013 21:44:49 +0200 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: Yes but then you lose all the advantages of iterators. What's the point in that? Furthermore it's not guaranteed that you can always converting an iterator into a list. As it has already been said, you could run out of memory, for instance. Il giorno 06/ago/2013 18:37, "Ryan" ha scritto: > That could easily be fixed: > > variance(list(iter(data))) > > It could take place on the inside. > > Oscar Benjamin wrote: > > >On 2 August 2013 18:45, Steven D'Aprano wrote: > >> I have raised an issue on the tracker to add a statistics module to > >Python's > >> standard library: > >> > >> http://bugs.python.org/issue18606 > >> > >> and have been asked to write a PEP. Attached is my draft PEP. > >Feedback is > >> requested, thanks in advance. > > > >I have another query/suggestion for the statistics module. > > > >Taking the example from the PEP: > > > >>>> from statistics import * > >>>> data = [1, 2, 4, 5, 8] > >>>> data = [x+1e12 for x in data] > >>>> variance(data) > >7.5 > > > >However: > > > >>>> variance(iter(data)) > >7.4999542236328125 > > > >Okay so that's a small difference and it's unlikely to upset many > >people. But being something of a numerical obsessive I do often get > >upset about things like this. It's not that I mind the size of the > >error but rather that I dislike having the calculation implicitly > >changed. I want to think that it doesn't matter whether I pass an > >iterator or a list because either I get an error or I get the same > >result. > > > > > > > > > >Oscar > >_______________________________________________ > >Python-ideas mailing list > >Python-ideas at python.org > >http://mail.python.org/mailman/listinfo/python-ideas > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Aug 7 00:14:15 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 6 Aug 2013 15:14:15 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> On Aug 6, 2013, at 12:44, Michele Lacchia wrote: > Yes but then you lose all the advantages of iterators. What's the point in that? > Furthermore it's not guaranteed that you can always converting an iterator into a list. As it has already been said, you could run out of memory, for instance. > And the places where the stdlib/builtins do that automatic conversion--even when it's well motivated and almost always harmless once you think about it, like str.join--are surprising to most people. (Following up on str.join as an example, just about every question whose answer is str.join([...]) ends up with someone suggesting a genexpr instead of a listcomp, someone else explaining that it doesn't actually save any memory in that case, just wastes a bit of time, then some back and forth until everyone finally gets it.) The question is whether it would be even _more_ surprising to return an error, or a less accurate result. I don't know the answer to that. > Il giorno 06/ago/2013 18:37, "Ryan" ha scritto: >> That could easily be fixed: >> >> variance(list(iter(data))) >> >> It could take place on the inside. >> >> Oscar Benjamin wrote: >> >> >On 2 August 2013 18:45, Steven D'Aprano wrote: >> >> I have raised an issue on the tracker to add a statistics module to >> >Python's >> >> standard library: >> >> >> >> http://bugs.python.org/issue18606 >> >> >> >> and have been asked to write a PEP. Attached is my draft PEP. >> >Feedback is >> >> requested, thanks in advance. >> > >> >I have another query/suggestion for the statistics module. >> > >> >Taking the example from the PEP: >> > >> >>>> from statistics import * >> >>>> data = [1, 2, 4, 5, 8] >> >>>> data = [x+1e12 for x in data] >> >>>> variance(data) >> >7.5 >> > >> >However: >> > >> >>>> variance(iter(data)) >> >7.4999542236328125 >> > >> >Okay so that's a small difference and it's unlikely to upset many >> >people. But being something of a numerical obsessive I do often get >> >upset about things like this. It's not that I mind the size of the >> >error but rather that I dislike having the calculation implicitly >> >changed. I want to think that it doesn't matter whether I pass an >> >iterator or a list because either I get an error or I get the same >> >result. >> > >> >> >> > >> > >> >Oscar >> >_______________________________________________ >> >Python-ideas mailing list >> >Python-ideas at python.org >> >http://mail.python.org/mailman/listinfo/python-ideas >> >> -- >> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Aug 7 00:57:54 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 06 Aug 2013 15:57:54 -0700 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <20130806174501.794fa777@pitrou.net> References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> <20130806155554.1524fde0@pitrou.net> <520105E6.1010800@egenix.com> <20130806162318.79347d00@pitrou.net> <52011261.40809@egenix.com> <20130806171857.1053acb8@pitrou.net> <520117F3.1060507@egenix.com> <20130806174501.794fa777@pitrou.net> Message-ID: <52017F72.8010902@stoneleaf.us> On 08/06/2013 08:45 AM, Antoine Pitrou wrote: > > IDEs with sophisticated Python > plugins may do better, but not everyone likes to use them. +1 Although I still prefer statistics as the module name. No need to inflict unnecessary abbreviations on non-english speakers. -- ~Ethan~ From tjreedy at udel.edu Wed Aug 7 01:47:41 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 06 Aug 2013 19:47:41 -0400 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: On 8/6/2013 4:23 AM, Serhiy Storchaka wrote: > 06.08.13 10:34, Chris Angelico ???????(??): >> Okay. Sounds like there's already an answer to those who want more >> readability: Just use filter(bool,...). Maybe I'm just not seeing the >> obvious problem with this version? > > Are `if bool(...)` or `if bool(...) == True` more readable than `if ...`? No and irrelevant. This is simply not a parallel situation. -- Terry Jan Reedy From tjreedy at udel.edu Wed Aug 7 01:56:29 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 06 Aug 2013 19:56:29 -0400 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: On 8/6/2013 3:34 AM, Chris Angelico wrote: > On Tue, Aug 6, 2013 at 8:28 AM, Peter Otten <__peter__ at web.de> wrote: >> The implementation already has an optimization to treat bool like None -- >> the former is never called: >> >> [Python/bltinmodule.c] >> >> if (lz->func == Py_None || lz->func == (PyObject *)&PyBool_Type) { >> ok = PyObject_IsTrue(item); >> >> This is done on every next() call, and while my instinct would be to >> translate PyBool_Type to Py_None once in the constructor I doubt that this >> has a noticeable impact on performance. > > Okay. Sounds like there's already an answer to those who want more > readability: Just use filter(bool,...). I think we should change the doc to say that the default is (implicit) 'bool', which I think it would be if the function were defined today. -- Terry Jan Reedy From steve at pearwood.info Wed Aug 7 03:06:40 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 07 Aug 2013 11:06:40 +1000 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> <20130806155554.1524fde0@pitrou.net> <520105E6.1010800@egenix.com> <20130806162318.79347d00@pitrou.net> Message-ID: <52019DA0.6030307@pearwood.info> On 07/08/13 00:33, Nick Coghlan wrote: > On 7 August 2013 00:23, Antoine Pitrou wrote: >> Le Tue, 06 Aug 2013 16:19:18 +0200, >> "M.-A. Lemburg" a ?crit : [...] >>> Apart from avoiding name clashes, I think adding a bit of extra >>> context by means of placing the module into package also helps people >>> trying to determine the meaning of the module. >> >> Well, "statistics" sounds clear enough to me :-) >> (like "logging" or "unittest") > > Yeah, Steven's own stats module on PyPI is the main name clash we need > to avoid, and "statistics" handles that nicely. The other nice thing > about using the top level name is that converting math to a package > would be a pain, so +1 for the simple option :) On the other hand, moving math to a package would lower the barrier to adding new functions to it in the future. Wouldn't these two steps be sufficient to make math a package? 1. Move math.cpython-34.so to _math.cpython-34.so 2. Add math/__init__.py containing a single line "from _math import *" As far as the name goes, to cut back on bike-shedding, I'm going to rule out any names other than these three: 1) statistics 2) statslib 3) math.stats Top-level "stats" is ruled out because of possibility of confusion with "stat". My preference is math.stats because it allows the expansion of math. Others may consider that a disadvantage. Thoughts? -- Steven From steve at pearwood.info Wed Aug 7 03:34:29 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 07 Aug 2013 11:34:29 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> Message-ID: <5201A425.7090403@pearwood.info> On 07/08/13 01:49, Oscar Benjamin wrote: > Taking the example from the PEP: > >>>> from statistics import * >>>> data = [1, 2, 4, 5, 8] >>>> data = [x+1e12 for x in data] >>>> variance(data) > 7.5 > > However: > >>>> variance(iter(data)) > 7.4999542236328125 > > Okay so that's a small difference and it's unlikely to upset many > people. But being something of a numerical obsessive I do often get > upset about things like this. It's not that I mind the size of the > error but rather that I dislike having the calculation implicitly > changed. I want to think that it doesn't matter whether I pass an > iterator or a list because either I get an error or I get the same > result. That's fantastic feedback and exactly the sort of thing I want to hear :-) This is mentioned under "Design Decisions" in the PEP, and treated as a feature, but I'm open to revising that behaviour. 3.4 feature-freeze is quite close, and I don't want to hold up acceptance of the PEP (which doesn't even have a number yet!) for one-pass stats calculations. So I'm going to take this approach: - The difference between variance(list(data)) and variance(iter(data)) is an artifact of implementation, not a feature, so is subject to change. - I doubt I will reject iterators, but I may internally convert them to lists (median already does this). - For the time being, all documentation examples will only show lists being used. - I will defer for 3.5 a set of one-pass functions that return running statistics (I already have code for coroutines to do this, but they're not ready for the std lib). -- Steven From rymg19 at gmail.com Wed Aug 7 03:47:52 2013 From: rymg19 at gmail.com (Ryan) Date: Tue, 06 Aug 2013 20:47:52 -0500 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52019DA0.6030307@pearwood.info> References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> <20130806155554.1524fde0@pitrou.net> <520105E6.1010800@egenix.com> <20130806162318.79347d00@pitrou.net> <52019DA0.6030307@pearwood.info> Message-ID: I say statistics. That doesn't really fit overly well in math, and statslib sounds C-ish. Steven D'Aprano wrote: >On 07/08/13 00:33, Nick Coghlan wrote: >> On 7 August 2013 00:23, Antoine Pitrou wrote: >>> Le Tue, 06 Aug 2013 16:19:18 +0200, >>> "M.-A. Lemburg" a ?crit : >[...] >>>> Apart from avoiding name clashes, I think adding a bit of extra >>>> context by means of placing the module into package also helps >people >>>> trying to determine the meaning of the module. >>> >>> Well, "statistics" sounds clear enough to me :-) >>> (like "logging" or "unittest") >> >> Yeah, Steven's own stats module on PyPI is the main name clash we >need >> to avoid, and "statistics" handles that nicely. The other nice thing >> about using the top level name is that converting math to a package >> would be a pain, so +1 for the simple option :) > > >On the other hand, moving math to a package would lower the barrier to >adding new functions to it in the future. > >Wouldn't these two steps be sufficient to make math a package? > >1. Move math.cpython-34.so to _math.cpython-34.so > >2. Add math/__init__.py containing a single line "from _math import *" > > >As far as the name goes, to cut back on bike-shedding, I'm going to >rule out any names other than these three: > >1) statistics >2) statslib >3) math.stats > > >Top-level "stats" is ruled out because of possibility of confusion with >"stat". > >My preference is math.stats because it allows the expansion of math. >Others may consider that a disadvantage. > >Thoughts? > > > > >-- >Steven >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Aug 7 03:52:15 2013 From: rymg19 at gmail.com (Ryan) Date: Tue, 06 Aug 2013 20:52:15 -0500 Subject: [Python-ideas] ElementTree iterparse string Message-ID: <260efc16-8404-493e-906d-8e51301c7540@email.android.com> ElementTree iterparse only works with file names or file objects. What if there was an iterparse for strings? Like iterparsestring or iterfromstring or iterstring, etc. Even though the string is stored in memory, storing the entire tree along with the string is a major minus, especially since a string takes less memory than an ElementTree instance or a root element. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Wed Aug 7 03:55:54 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 6 Aug 2013 18:55:54 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> [item for item in items if item] vs [item for item in items if bool(item)] Isn?t an optional predicate function also a very common programming pattern? On Aug 6, 2013, at 4:47 PM, Terry Reedy wrote: > On 8/6/2013 4:23 AM, Serhiy Storchaka wrote: >> 06.08.13 10:34, Chris Angelico ???????(??): >>> Okay. Sounds like there's already an answer to those who want more >>> readability: Just use filter(bool,...). Maybe I'm just not seeing the >>> obvious problem with this version? >> >> Are `if bool(...)` or `if bool(...) == True` more readable than `if ...`? > > No and irrelevant. This is simply not a parallel situation. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Wed Aug 7 04:05:10 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Wed, 7 Aug 2013 10:05:10 +0800 Subject: [Python-ideas] Allow filter(items) In-Reply-To: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> Message-ID: > I agree. I don't use filter very often and when I do I always have to think carefully about the order of the arguments. I'd prefer it if it were more like sort etc. OTOH, map filter and reduce all have a nice symmetry in thing(func, list). I guess the logic is that the sort predicate is optional, and the func for these other things isn't, but anyway... Boo for inconsistent argument orders =( -Haoyi On Wed, Aug 7, 2013 at 9:55 AM, Shane Green wrote: > [item for item in items if item] vs [item for item in items if > bool(item)] > > Isn?t an *optional* predicate function also a very common programming > pattern? > > > > > On Aug 6, 2013, at 4:47 PM, Terry Reedy wrote: > > On 8/6/2013 4:23 AM, Serhiy Storchaka wrote: > > 06.08.13 10:34, Chris Angelico ???????(??): > > Okay. Sounds like there's already an answer to those who want more > readability: Just use filter(bool,...). Maybe I'm just not seeing the > obvious problem with this version? > > > Are `if bool(...)` or `if bool(...) == True` more readable than `if ...`? > > > No and irrelevant. This is simply not a parallel situation. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Wed Aug 7 04:18:28 2013 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 07 Aug 2013 03:18:28 +0100 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52019DA0.6030307@pearwood.info> References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> <20130806155554.1524fde0@pitrou.net> <520105E6.1010800@egenix.com> <20130806162318.79347d00@pitrou.net> <52019DA0.6030307@pearwood.info> Message-ID: <5201AE74.30009@mrabarnett.plus.com> On 07/08/2013 02:06, Steven D'Aprano wrote: > On 07/08/13 00:33, Nick Coghlan wrote: >> On 7 August 2013 00:23, Antoine Pitrou wrote: >>> Le Tue, 06 Aug 2013 16:19:18 +0200, >>> "M.-A. Lemburg" a ?crit : > [...] >>>> Apart from avoiding name clashes, I think adding a bit of extra >>>> context by means of placing the module into package also helps people >>>> trying to determine the meaning of the module. >>> >>> Well, "statistics" sounds clear enough to me :-) >>> (like "logging" or "unittest") >> >> Yeah, Steven's own stats module on PyPI is the main name clash we need >> to avoid, and "statistics" handles that nicely. The other nice thing >> about using the top level name is that converting math to a package >> would be a pain, so +1 for the simple option :) > > > On the other hand, moving math to a package would lower the barrier to adding new functions to it in the future. > > Wouldn't these two steps be sufficient to make math a package? > > 1. Move math.cpython-34.so to _math.cpython-34.so > > 2. Add math/__init__.py containing a single line "from _math import *" > > > As far as the name goes, to cut back on bike-shedding, I'm going to rule out any names other than these three: > > 1) statistics > 2) statslib > 3) math.stats > > > Top-level "stats" is ruled out because of possibility of confusion with "stat". > > My preference is math.stats because it allows the expansion of math. Others may consider that a disadvantage. > > Thoughts? > I'm not keen on "statslib". From joshua at landau.ws Wed Aug 7 04:19:36 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 7 Aug 2013 03:19:36 +0100 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52019DA0.6030307@pearwood.info> References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> <20130806155554.1524fde0@pitrou.net> <520105E6.1010800@egenix.com> <20130806162318.79347d00@pitrou.net> <52019DA0.6030307@pearwood.info> Message-ID: On 7 August 2013 02:06, Steven D'Aprano wrote: > As far as the name goes, to cut back on bike-shedding, I'm going to rule out > any names other than these three: > > 1) statistics > 2) statslib Ugh. -1 to statslib. > 3) math.stats > > My preference is math.stats because it allows the expansion of math. Others > may consider that a disadvantage. Despite being happy this idea was taken so warmly I think being a subcategory of math is actually a bad idea. math and cmath are two sides of the same coin -- math always converts to float and cmath always converts to complex. I don't think a library like statistics fits on either one of those sides. It "feels" wrong. So I'm +1 to statistics, a bloodied stake-in-the-heart to statlib and a -0.5 to math.stats. From shane at umbrellacode.com Wed Aug 7 04:46:36 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 6 Aug 2013 19:46:36 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> Message-ID: <4EDEBF88-4A81-4EE6-893C-2F922BE0A669@umbrellacode.com> It seems kind of like there should be a filtered operation like there is a sorted one. (and why not have a list.filter(predicate=None) to perform in place filtering for that matter?) On Aug 6, 2013, at 7:05 PM, Haoyi Li wrote: > > I agree. I don't use filter very often and when I do I always have to > think carefully about the order of the arguments. I'd prefer it if it > were more like sort etc. > > OTOH, map filter and reduce all have a nice symmetry in thing(func, list). I guess the logic is that the sort predicate is optional, and the func for these other things isn't, but anyway... > > Boo for inconsistent argument orders =( > > -Haoyi > > > On Wed, Aug 7, 2013 at 9:55 AM, Shane Green wrote: > [item for item in items if item] vs [item for item in items if bool(item)] > > Isn?t an optional predicate function also a very common programming pattern? > > > > > On Aug 6, 2013, at 4:47 PM, Terry Reedy wrote: > >> On 8/6/2013 4:23 AM, Serhiy Storchaka wrote: >>> 06.08.13 10:34, Chris Angelico ???????(??): >>>> Okay. Sounds like there's already an answer to those who want more >>>> readability: Just use filter(bool,...). Maybe I'm just not seeing the >>>> obvious problem with this version? >>> >>> Are `if bool(...)` or `if bool(...) == True` more readable than `if ...`? >> >> No and irrelevant. This is simply not a parallel situation. >> >> -- >> Terry Jan Reedy >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Wed Aug 7 04:46:53 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 7 Aug 2013 03:46:53 +0100 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: On 5 August 2013 08:46, Peter Otten <__peter__ at web.de> wrote: > filter(items) > > looks much cleaner than > > filter(None, items) > > and is easy to understand. Fewer people would use alternative spellings like > > filter(bool, items) > filter(len, items) > filter(lambda s: s != "", strings) > > The signature change may lead you to spell > > filter(predicate, items) # correct > > as > > filter(items, predicate) # wrong > > but this is a noisy error. I think the advantage of making the magic None > redundant outweighs this potential pitfall. I'll put forward my side on the issue. I'm for "filter(items)" just because I'd never use "filter(lambda i: ..., items)" over "(i for i in items if ...)". To me the filter(items) form is the only form that actually makes sense for me in the majority of cases as I rarely have the key function predefined. Additionally, keyword arguments can't solve this completely as "filter(iterable, key=pred)" would raise an error just like: >>> (lambda x=0, y=0: ...)(0, x=0) Traceback (most recent call last): File "", line 1, in TypeError: () got multiple values for argument 'x' unless we break backward compatibility or make some very odd inconsistencies. That said, there's something to letting keyword arguments reorder function calls -- this isn't the only time I've wanted to do it. It'd make most sense if more builtins took keyword arguments though. From joshua at landau.ws Wed Aug 7 04:52:48 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 7 Aug 2013 03:52:48 +0100 Subject: [Python-ideas] Allow filter(items) In-Reply-To: <4EDEBF88-4A81-4EE6-893C-2F922BE0A669@umbrellacode.com> References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <4EDEBF88-4A81-4EE6-893C-2F922BE0A669@umbrellacode.com> Message-ID: On 7 August 2013 03:46, Shane Green wrote: > It seems kind of like there should be a filtered operation like there is a > sorted one. filtered would be identical to filter as-is. sorted is a counterpart to a method, not another function. > (and why not have a list.filter(predicate=None) to perform in place > filtering for that matter?) sorted actually just converts to a list and then runs the sort method AFAICT. sort is defined in-place because it's efficient that way. None of this applies to filter. From shane at umbrellacode.com Wed Aug 7 05:09:20 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 6 Aug 2013 20:09:20 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <4EDEBF88-4A81-4EE6-893C-2F922BE0A669@umbrellacode.com> Message-ID: <1800C49C-E517-403B-B65E-1CCC13F18E71@umbrellacode.com> Okay, but those are kind of implementation detail reasons, not language consistency, right? On Aug 6, 2013, at 7:52 PM, Joshua Landau wrote: > On 7 August 2013 03:46, Shane Green wrote: >> It seems kind of like there should be a filtered operation like there is a >> sorted one. > > filtered would be identical to filter as-is. sorted is a counterpart > to a method, not another function. > >> (and why not have a list.filter(predicate=None) to perform in place >> filtering for that matter?) > > sorted actually just converts to a list and then runs the sort method > AFAICT. sort is defined in-place because it's efficient that way. None > of this applies to filter. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Aug 7 05:09:27 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 6 Aug 2013 20:09:27 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <4EDEBF88-4A81-4EE6-893C-2F922BE0A669@umbrellacode.com> Message-ID: <40950C6C-3036-4598-87B6-9CD2CD541E0A@yahoo.com> On Aug 6, 2013, at 19:52, Joshua Landau wrote: >> (and why not have a list.filter(predicate=None) to perform in place >> filtering for that matter?) > > sorted actually just converts to a list and then runs the sort method > AFAICT. sort is defined in-place because it's efficient that way. None > of this applies to filter. A lot of newcomers from C-like languages expect in-place filtering to be more efficient, and put a whole lot of effort into writing complex and buggy code that avoids all the linear del a[i] calls before finally testing and seeing that a[:] = filter() is still faster in almost every case. Maybe the docs should actually explain (where? no idea...) that filtering in-place isn't more efficient, and that's why there's no filter method and filtered function? From shane at umbrellacode.com Wed Aug 7 05:29:06 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 6 Aug 2013 20:29:06 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: <40950C6C-3036-4598-87B6-9CD2CD541E0A@yahoo.com> References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <4EDEBF88-4A81-4EE6-893C-2F922BE0A669@umbrellacode.com> <40950C6C-3036-4598-87B6-9CD2CD541E0A@yahoo.com> Message-ID: I knew when I wrote the first email that I shouldn?t have deleted (aside from being terribly inefficient) from what I wrote :-) My point is that these are implementation details: as a newcomer who saw a sorted function, would it make sense to expect a filtered one; and as a newcomer who saw list.sort, would it make sense to expect list.filter? Because I don?t think it necessarily follows that a newcomer who noticed lists have a filter method meant that filter method was the most efficient approach to filtering. On Aug 6, 2013, at 8:09 PM, Andrew Barnert wrote: > On Aug 6, 2013, at 19:52, Joshua Landau wrote: > >>> (and why not have a list.filter(predicate=None) to perform in place >>> filtering for that matter?) >> >> sorted actually just converts to a list and then runs the sort method >> AFAICT. sort is defined in-place because it's efficient that way. None >> of this applies to filter. > > A lot of newcomers from C-like languages expect in-place filtering to be more efficient, and put a whole lot of effort into writing complex and buggy code that avoids all the linear del a[i] calls before finally testing and seeing that a[:] = filter() is still faster in almost every case. Maybe the docs should actually explain (where? no idea...) that filtering in-place isn't more efficient, and that's why there's no filter method and filtered function? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Aug 7 05:32:07 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 7 Aug 2013 12:32:07 +0900 Subject: [Python-ideas] Allow filter(items) In-Reply-To: <51FFF71C.8060108@stoneleaf.us> References: <51FFF71C.8060108@stoneleaf.us> Message-ID: <20993.49079.3786.797429@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > On 08/05/2013 11:31 AM, ? wrote: > > -1 There should be only one way and there are lots of ways > > already. Use a genexp for readability: (x for x in items if x) > > One Obvious Way *not* Only One Way. :) The actual statement is in the middle: There should be one-- and preferably only one --obvious way to do it. I assume that's the "only one" the OP meant. Of course there's more than one way; that's inherent in a language where you can define named functions, then call them by name -- or inline their definitions if you prefer. But it's definitely better for readability if everybody can agree on TOOWTDI. From stephen at xemacs.org Wed Aug 7 05:38:34 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 7 Aug 2013 12:38:34 +0900 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: <20993.49466.66097.881733@uwakimon.sk.tsukuba.ac.jp> Peter Otten writes: > filter(items) > > looks much cleaner than > > filter(None, items) Sure. But the filter function is the important thing here. 'items' is probably a conceptual dummy (the reader already knows what 'items' refers to and is expecting it to be filtered). I'd be more sympathetic to omitting the function if you were suggesting that containers grow a filter method: items.filter() Nevertheless, I still would prefer items.filter(bool) So, -1. From stephen at xemacs.org Wed Aug 7 05:14:58 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 7 Aug 2013 12:14:58 +0900 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> Message-ID: <20993.48050.391880.335250@uwakimon.sk.tsukuba.ac.jp> David Mertz writes: > However, one name that hasn't been mentioned might be even > better: statistics._sum(). -1 statistics.sum() is needed any time you want to take the sum of a function of the difference of series with similar means, because you're likely to get a large number of small differences, and a few large differences, with the large differences pretty much offsetting each other. The canonical example is the series of differences of a series and its mean (ie, the average of the squares of that series is the variance, which is why statistics.sum is needed internally to Steven's package), but such constructions occur frequently in statistical analysis. One example is in linear regression with nearly collinear regressors. Another is in "standardizing" variates to have mean zero and variance one. (Perhaps that is -- or should be -- included in the statistics package, but it seems to violate the "not every 3-line function" rule of thumb.) So it should be a "public" name to encourage people to use it. From stephen at xemacs.org Wed Aug 7 06:01:51 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 7 Aug 2013 13:01:51 +0900 Subject: [Python-ideas] Module name [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <20130806174501.794fa777@pitrou.net> References: <52003CE4.9050408@mrabarnett.plus.com> <20130806152600.3293aa71@pitrou.net> <5200FE18.2000409@egenix.com> <20130806155554.1524fde0@pitrou.net> <520105E6.1010800@egenix.com> <20130806162318.79347d00@pitrou.net> <52011261.40809@egenix.com> <20130806171857.1053acb8@pitrou.net> <520117F3.1060507@egenix.com> <20130806174501.794fa777@pitrou.net> Message-ID: <20993.50863.578004.287301@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > ipython isn't available in every environment, and it's not in > everyone's taste either :-) "import bp" where ./bp.py contains the "boilerplate" imports you need for this particular application is always available, though. If that's not worth doing, then the burden of typing a few long names is rather small. So the typing argument simply doesn't apply to module names. From stephen at xemacs.org Wed Aug 7 06:03:52 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 7 Aug 2013 13:03:52 +0900 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52000541.6040000@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> Message-ID: <20993.50984.766662.930881@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > Show of hands please, +1 or -1 on statistics.sum. +0.5 I'd like to see Alexander's suggestion of special-casing numerical sums in built-in sum considered more carefully before going with statistics.sum as a separate function. From stephen at xemacs.org Wed Aug 7 06:25:12 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 7 Aug 2013 13:25:12 +0900 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <52005B1D.8090100@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <87zjswlu6d.fsf@uwakimon.sk.tsukuba.ac.jp> <52005B1D.8090100@pearwood.info> Message-ID: <20993.52264.398255.891538@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > > > Consequently, the above naive mean fails this > > > "torture test" with an error of 100%: > > > > > > assert mean([1e30, 1, 3, -1e30]) == 1 > > > > 100%? This is a relative error of sqrt(2)*1e-30. > > I don't understand your calculation here. Where are you getting the > values 2 and 1e-30 from? The standard deviation of the example data. Your calculation of relative error is statistically irrelevant, unless you can assert 30 decimal places of accuracy in the measurements 1e30 and -1e30. If you just have data and no theory about where it came from, the relevant unit is the standard deviation. > > I also wonder about the utility of a "statistics" package that has no > > functionality for presenting and operating on the most fundamental > > "statistic" of all: the (empirical) distribution. > > It's early days, and it is better to start the module small and > grow it than to try to fit everything and the kitchen sink in from Day > One. OK. > I'm happy to discuss this further with you off-list. Me too, although my implementation is way far from ready for prime time, and the curriculum committee just nuked that whole course so I have no interest in fixing it independent of this discussion. But I'll see what resources I can scrape up if the implementation is of interest. Other interested parties, feel free to contact me for addition to the CC list. Steve From joshua at landau.ws Wed Aug 7 06:45:50 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 7 Aug 2013 05:45:50 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <20993.52264.398255.891538@uwakimon.sk.tsukuba.ac.jp> References: <51FBF02F.1000202@pearwood.info> <87zjswlu6d.fsf@uwakimon.sk.tsukuba.ac.jp> <52005B1D.8090100@pearwood.info> <20993.52264.398255.891538@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 7 August 2013 05:25, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > > > Consequently, the above naive mean fails this > > > > "torture test" with an error of 100%: > > > > > > > > assert mean([1e30, 1, 3, -1e30]) == 1 > > > > > > 100%? This is a relative error of sqrt(2)*1e-30. > > > > I don't understand your calculation here. Where are you getting the > > values 2 and 1e-30 from? > > The standard deviation of the example data. > > Your calculation of relative error is statistically irrelevant, unless > you can assert 30 decimal places of accuracy in the measurements 1e30 > and -1e30. If you just have data and no theory about where it came > from, the relevant unit is the standard deviation. It depends what you're using the mean for. If you divide by the mean (to make the new data's mean 1) an error like this can be the difference between dividing by 0 and dividing by 5 and you get very different results in those cases, hence error relative to the mathematically true value *is* relevant. Not being a statistics person I'm not able to say how often this? would be the case but I wouldn't ignore it entirely either. ? Error relative to the true value being more significant than relative to std. deviation From ncoghlan at gmail.com Wed Aug 7 07:21:47 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 7 Aug 2013 15:21:47 +1000 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> Message-ID: On 7 August 2013 12:05, Haoyi Li wrote: >> I agree. I don't use filter very often and when I do I always have to > think carefully about the order of the arguments. I'd prefer it if it > were more like sort etc. > > OTOH, map filter and reduce all have a nice symmetry in thing(func, list). I > guess the logic is that the sort predicate is optional, and the func for > these other things isn't, but anyway... > > Boo for inconsistent argument orders =( Right, the signatures of map, filter and functools.reduce all date from a time before iterators became such a key language feature. To switch from their functional forms to iterator focused equivalents, you might leave map alone and define revised filtering and reduction operations: def filtered(iterable, pred=None): """Filter out false values from an iterable. Accepts an optional predicate function.""" ... def reduced(start, iterable, op): """Reduces an iterable to a single value, given a start value and binary operator.""" ... These might make better candidates for itertools inclusion than the proposed "next_true", since they take the current functional APIs and redesign them to be iterator focused. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Wed Aug 7 08:04:23 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 7 Aug 2013 08:04:23 +0200 Subject: [Python-ideas] ElementTree iterparse string References: <260efc16-8404-493e-906d-8e51301c7540@email.android.com> Message-ID: <20130807080423.3ec89885@fsol> On Tue, 06 Aug 2013 20:52:15 -0500 Ryan wrote: > ElementTree iterparse only works with file names or file objects. What if there was an iterparse for strings? Like iterparsestring or iterfromstring or iterstring, etc. Even though the string is stored in memory, storing the entire tree along with the string is a major minus, especially since a string takes less memory than an ElementTree instance or a root element. Take a look at IncrementalParser: http://docs.python.org/dev/library/xml.etree.elementtree.html#incremental-parsing Regards Antoine. From steve at pearwood.info Wed Aug 7 09:08:12 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 7 Aug 2013 17:08:12 +1000 Subject: [Python-ideas] Allow filter(items) In-Reply-To: <4EDEBF88-4A81-4EE6-893C-2F922BE0A669@umbrellacode.com> References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <4EDEBF88-4A81-4EE6-893C-2F922BE0A669@umbrellacode.com> Message-ID: <20130807070812.GG17751@ando> On Tue, Aug 06, 2013 at 07:46:36PM -0700, Shane Green wrote: > It seems kind of like there should be a filtered operation like there is a sorted one. In Python 2, that is spelled filter(predicate, values). In Python 3, filter returns a lazy iterator rather than an eager list, so you can write list(filter(predicate, values)) instead. > (and why not have a list.filter(predicate=None) to perform in place filtering for that matter?) Because it's not 1972, we have more than 64K of memory, and most in-place operations should be re-written to return a new list instead :-) That's a glib answer, of course, but in general making an external filtered copy, then writing back into the list, will be faster than modifying the list in place: values[:] = filter(predicate, values) It's also more flexible. Here is how to filter only the last 100 items: values[-100:] = filter(predicate, values[-100:]) The filter function itself doesn't need to know where the data is coming from or where it is going. Another reason is, filter being a method implies that all sequences (or at least, all list-like sequences) need to implement that method. Being a function means that it only needs to be implemented once. But most of all, I expect it is because filter is an operation that comes from the functional programming school of thought, and modifying data structures in-place is anathema to functional programming. -- Steven From rosuav at gmail.com Wed Aug 7 09:37:40 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 7 Aug 2013 08:37:40 +0100 Subject: [Python-ideas] Allow filter(items) In-Reply-To: <20130807070812.GG17751@ando> References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <4EDEBF88-4A81-4EE6-893C-2F922BE0A669@umbrellacode.com> <20130807070812.GG17751@ando> Message-ID: On Wed, Aug 7, 2013 at 8:08 AM, Steven D'Aprano wrote: > Because it's not 1972, we have more than 64K of memory, and most > in-place operations should be re-written to return a new list instead > :-) > > That's a glib answer, of course... I tried for some time to figure out why glibc demanded a new list be created... ChrisA From shane at umbrellacode.com Wed Aug 7 09:42:47 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 7 Aug 2013 00:42:47 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> Message-ID: Yes, and that?s why ?filtered? could be more significant that a second name for ?filter?. But, if we?re really methodical we should consider adding an in-place ?filter(predicate=None)? method to list because having one seems makes seems to make sense for all the reasons having an in-place sort method does, with the exception of implementation details, and the new filtered() method that reflects list?s built but extends its API and returns an iterator, would make perfect sense being defined in __builtin__ right-along sorted. On Aug 6, 2013, at 10:21 PM, Nick Coghlan wrote: > On 7 August 2013 12:05, Haoyi Li wrote: >>> I agree. I don't use filter very often and when I do I always have to >> think carefully about the order of the arguments. I'd prefer it if it >> were more like sort etc. >> >> OTOH, map filter and reduce all have a nice symmetry in thing(func, list). I >> guess the logic is that the sort predicate is optional, and the func for >> these other things isn't, but anyway... >> >> Boo for inconsistent argument orders =( > > Right, the signatures of map, filter and functools.reduce all date > from a time before iterators became such a key language feature. > > To switch from their functional forms to iterator focused equivalents, > you might leave map alone and define revised filtering and reduction > operations: > > def filtered(iterable, pred=None): > """Filter out false values from an iterable. Accepts an > optional predicate function.""" > ... > > def reduced(start, iterable, op): > """Reduces an iterable to a single value, given a start value > and binary operator.""" > ... > > These might make better candidates for itertools inclusion than the > proposed "next_true", since they take the current functional APIs and > redesign them to be iterator focused. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Aug 7 10:43:12 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 07 Aug 2013 17:43:12 +0900 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> Message-ID: <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> Shane Green writes: > we should consider adding an in-place ?filter(predicate=None)? > method to list -1 on the default. One of the things I like about Python is that it mostly manages to eschew magic values (like "None" meaning "bool") and spurious brevity (like defaulting the predicate). While "bool" may be the most common predicate here, it's not obvious to me what the absence of the predicate means. On both counts, I'm against this (or any) default. Specifically, in most of the applications where I personally would want to use something like "filter", zeros and empty lists are typically valid values. So I'd want to filter "None" or similar "not available" values. Therefore I would expect "is not None" to be the default. Steve From storchaka at gmail.com Wed Aug 7 10:55:17 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 07 Aug 2013 11:55:17 +0300 Subject: [Python-ideas] Allow filter(items) In-Reply-To: <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 07.08.13 11:43, Stephen J. Turnbull ???????(??): > One of the things I like about Python is that it mostly manages to > eschew magic values (like "None" meaning "bool") and spurious brevity > (like defaulting the predicate). While "bool" may be the most common > predicate here, it's not obvious to me what the absence of the > predicate means. On both counts, I'm against this (or any) default. > > Specifically, in most of the applications where I personally would > want to use something like "filter", zeros and empty lists are typically > valid values. So I'd want to filter "None" or similar "not available" > values. Therefore I would expect "is not None" to be the default. The default predicate is not "bool". The default is identity function (lambda x: x). From shane at umbrellacode.com Wed Aug 7 11:07:50 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 7 Aug 2013 02:07:50 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <40E437BC-BC1C-4EA2-A3D9-C47965157736@umbrellacode.com> Okay, that makes sense. Sadly I think typed it that mostly out of habit; sometimes I fallback to when keyword arguments were a bit more exceptional than they are today. That would probably be the best way to accept an explicit predicate, don?t you think? You make a good point, even though I?m all for filter(items), invoking items.filter() with no arguments work as well because you don?t have as obvious of a connection between identity(items) being replaced by items like there is in the other case. Then again, when you invoke an in-place filter in a collection of items without any arguments, what else would you expect it to drop but drop its falsy entries? I?m undecided on predicate as keyword vs. required param; agree default predicate=None is a bad choice. On Aug 7, 2013, at 1:43 AM, Stephen J. Turnbull wrote: > Shane Green writes: > >> we should consider adding an in-place ?filter(predicate=None)? >> method to list > > -1 on the default. > > One of the things I like about Python is that it mostly manages to > eschew magic values (like "None" meaning "bool") and spurious brevity > (like defaulting the predicate). While "bool" may be the most common > predicate here, it's not obvious to me what the absence of the > predicate means. On both counts, I'm against this (or any) default. > > Specifically, in most of the applications where I personally would > want to use something like "filter", zeros and empty lists are typically > valid values. So I'd want to filter "None" or similar "not available" > values. Therefore I would expect "is not None" to be the default. > > Steve -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Wed Aug 7 11:11:02 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 7 Aug 2013 02:11:02 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <7E57DAF0-BE03-4B49-9A34-28565A6EBB34@umbrellacode.com> You ever want to add an efficient ident() to __builtins__ so you could stop using lambda and normalize a bunch of things? On Aug 7, 2013, at 1:55 AM, Serhiy Storchaka wrote: > 07.08.13 11:43, Stephen J. Turnbull ???????(??): >> One of the things I like about Python is that it mostly manages to >> eschew magic values (like "None" meaning "bool") and spurious brevity >> (like defaulting the predicate). While "bool" may be the most common >> predicate here, it's not obvious to me what the absence of the >> predicate means. On both counts, I'm against this (or any) default. >> >> Specifically, in most of the applications where I personally would >> want to use something like "filter", zeros and empty lists are typically >> valid values. So I'd want to filter "None" or similar "not available" >> values. Therefore I would expect "is not None" to be the default. > > The default predicate is not "bool". The default is identity function (lambda x: x). > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Wed Aug 7 11:43:06 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 7 Aug 2013 05:43:06 -0400 Subject: [Python-ideas] Allow filter(items) In-Reply-To: <7E57DAF0-BE03-4B49-9A34-28565A6EBB34@umbrellacode.com> References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> <7E57DAF0-BE03-4B49-9A34-28565A6EBB34@umbrellacode.com> Message-ID: On Wed, Aug 7, 2013 at 5:11 AM, Shane Green wrote: > You ever want to add an efficient ident() to __builtins__ so you could > stop using lambda and normalize a bunch of things? See or tl;dr < http://mail.python.org/pipermail/python-ideas/2009-March/003647.html>. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Wed Aug 7 12:01:38 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 7 Aug 2013 06:01:38 -0400 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: On Mon, Aug 5, 2013 at 3:46 AM, Peter Otten <__peter__ at web.de> wrote: > filter(items) > > looks much cleaner than > > filter(None, items) > > >From Guido's time machine comes a pronouncement: < http://bugs.python.org/issue2186#msg63026>. :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Wed Aug 7 12:38:44 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 7 Aug 2013 03:38:44 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> <7E57DAF0-BE03-4B49-9A34-28565A6EBB34@umbrellacode.com> Message-ID: <444793C1-B52D-4015-AF94-3F0835096C7D@umbrellacode.com> > On Aug 7, 2013, at 2:43 AM, Alexander Belopolsky wrote: > >> >> On Wed, Aug 7, 2013 at 5:11 AM, Shane Green wrote: >> You ever want to add an efficient ident() to __builtins__ so you could stop using lambda and normalize a bunch of things? >> >> See or tl;dr . > Can?t believe no one has ever thought it before? ;-) Thanks, that succinct and effective counter argument was the outcome of a lot of discussion! -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Wed Aug 7 13:10:40 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 7 Aug 2013 12:10:40 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> Message-ID: On Aug 6, 2013 11:19 PM, "Andrew Barnert" wrote: > > On Aug 6, 2013, at 12:44, Michele Lacchia wrote: >> >> Yes but then you lose all the advantages of iterators. What's the point in that? >> Furthermore it's not guaranteed that you can always converting an iterator into a list. As it has already been said, you could run out of memory, for instance. > > And the places where the stdlib/builtins do that automatic conversion--even when it's well motivated and almost always harmless once you think about it, like str.join--are surprising to most people. (Following up on str.join as an example, just about every question whose answer is str.join([...]) ends up with someone suggesting a genexpr instead of a listcomp, someone else explaining that it doesn't actually save any memory in that case, just wastes a bit of time, then some back and forth until everyone finally gets it.) > > The question is whether it would be even _more_ surprising to return an error, or a less accurate result. I don't know the answer to that. I'm going to make the claim (with no supporting data) that more than 95% of the time, when a user calls variance(iterator) they will be guilty of premature optimisation. Really the cases where you can't build a collection are rare. People will still do it though just because it's satisfying to do everything with iterators in constant memory (I'm often guilty of this kind of thing). However unlike str.join there's no one pass algorithm that can be as accurate so it's not purely a performance question. An error can inform users that a one pass API exists and the documentation for the one pass API can explain that it is less accurate. That way the user is properly informed of the tradeoffs and can respond appropriately (or inappropriately!). Oscar -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Wed Aug 7 13:14:31 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 7 Aug 2013 12:14:31 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <5201A425.7090403@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <5201A425.7090403@pearwood.info> Message-ID: On Aug 7, 2013 2:35 AM, "Steven D'Aprano" wrote: > > On 07/08/13 01:49, Oscar Benjamin wrote: > >> Taking the example from the PEP: >> >>>>> from statistics import * >>>>> data = [1, 2, 4, 5, 8] >>>>> data = [x+1e12 for x in data] >>>>> variance(data) >> >> 7.5 >> >> However: >> >>>>> variance(iter(data)) >> >> 7.4999542236328125 >> >> Okay so that's a small difference and it's unlikely to upset many >> people. But being something of a numerical obsessive I do often get >> upset about things like this. It's not that I mind the size of the >> error but rather that I dislike having the calculation implicitly >> changed. I want to think that it doesn't matter whether I pass an >> iterator or a list because either I get an error or I get the same >> result. > > > That's fantastic feedback and exactly the sort of thing I want to hear :-) > > This is mentioned under "Design Decisions" in the PEP, and treated as a feature, but I'm open to revising that behaviour. 3.4 feature-freeze is quite close, and I don't want to hold up acceptance of the PEP (which doesn't even have a number yet!) for one-pass stats calculations. So I'm going to take this approach: > > - The difference between variance(list(data)) and variance(iter(data)) is an artifact of implementation, not a feature, so is subject to change. > > - I doubt I will reject iterators, but I may internally convert them to lists (median already does this). > > - For the time being, all documentation examples will only show lists being used. > > - I will defer for 3.5 a set of one-pass functions that return running statistics (I already have code for coroutines to do this, but they're not ready for the std lib). Sounds good to me! Oscar -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Aug 7 14:14:06 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 07 Aug 2013 21:14:06 +0900 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87mwotlmup.fsf@uwakimon.sk.tsukuba.ac.jp> Serhiy Storchaka writes: > The default predicate is not "bool". The default is identity function > (lambda x: x). IIUC correctly, the intent is "bool", it just happens that it's more efficient to spell it "lambda x: x" when you don't actually need a Boolean value. From eliben at gmail.com Wed Aug 7 14:50:01 2013 From: eliben at gmail.com (Eli Bendersky) Date: Wed, 7 Aug 2013 05:50:01 -0700 Subject: [Python-ideas] ElementTree iterparse string In-Reply-To: <260efc16-8404-493e-906d-8e51301c7540@email.android.com> References: <260efc16-8404-493e-906d-8e51301c7540@email.android.com> Message-ID: On Tue, Aug 6, 2013 at 6:52 PM, Ryan wrote: > ElementTree iterparse only works with file names or file objects. What if > there was an iterparse for strings? Like iterparsestring or iterfromstring > or iterstring, etc. Even though the string is stored in memory, storing the > entire tree along with the string is a major minus, especially since a > string takes less memory than an ElementTree instance or a root element. > > Hi Ryan, 1) This question is more suitable for python-list@ or Stack Overflow 2) In addition to Antoine's suggestion, note that ET takes file-like objects everywhere, which means you can use StringIO. Look at the tests in Lib/test/test_xml_etree.py for some examples. This is a common idiom in Python programming. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Aug 7 16:12:10 2013 From: rymg19 at gmail.com (Ryan) Date: Wed, 07 Aug 2013 09:12:10 -0500 Subject: [Python-ideas] ElementTree iterparse string In-Reply-To: References: <260efc16-8404-493e-906d-8e51301c7540@email.android.com> Message-ID: <6b8a9464-d832-49b2-b600-343239f07953@email.android.com> Actually, is was a feature suggestion, not a question. And, I have the 3.3 version of the docs, so I didn't realize they already added that new feature. Eli Bendersky wrote: >On Tue, Aug 6, 2013 at 6:52 PM, Ryan wrote: > >> ElementTree iterparse only works with file names or file objects. >What if >> there was an iterparse for strings? Like iterparsestring or >iterfromstring >> or iterstring, etc. Even though the string is stored in memory, >storing the >> entire tree along with the string is a major minus, especially since >a >> string takes less memory than an ElementTree instance or a root >element. >> >> >Hi Ryan, > >1) This question is more suitable for python-list@ or Stack Overflow >2) In addition to Antoine's suggestion, note that ET takes file-like >objects everywhere, which means you can use StringIO. Look at the tests >in >Lib/test/test_xml_etree.py for some examples. This is a common idiom in >Python programming. > >Eli -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Wed Aug 7 16:21:10 2013 From: masklinn at masklinn.net (Masklinn) Date: Wed, 7 Aug 2013 16:21:10 +0200 Subject: [Python-ideas] ElementTree iterparse string In-Reply-To: <6b8a9464-d832-49b2-b600-343239f07953@email.android.com> References: <260efc16-8404-493e-906d-8e51301c7540@email.android.com> <6b8a9464-d832-49b2-b600-343239f07953@email.android.com> Message-ID: <9DCD9375-A4C1-4CAF-A28C-2B69B1B6F1E7@masklinn.net> On 2013-08-07, at 16:12 , Ryan wrote: > Actually, is was a feature suggestion, not a question. > > And, I have the 3.3 version of the docs, so I didn't realize they already added that new feature. If you're talking about using StringIO, it's not a new feature, it was already there before elementtree was even merged to the stdlib. Although it could be clearer, the 2.7 doc for iterparse already notes > source is a filename or file object containing XML data. as in other methods & utility functions (parse, ElementTree.parse and ElementTree.write) "file object" should really be understood as "file-like object". And more precisely, for parsing it's an object with a `read(size_hint: int)` method. From shane at umbrellacode.com Wed Aug 7 16:21:49 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 7 Aug 2013 07:21:49 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: <87mwotlmup.fsf@uwakimon.sk.tsukuba.ac.jp> References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> <87mwotlmup.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <196791EE-1398-4BD9-91B2-6C4E7CC95DAC@umbrellacode.com> If I read a snippet from the earlier emails correctly, it?s interesting to note that, if not for a trick in the C code that turned bool back into None, the predicate bool would in fact be a redundant transformation. On Aug 7, 2013, at 5:14 AM, Stephen J. Turnbull wrote: > Serhiy Storchaka writes: > >> The default predicate is not "bool". The default is identity function >> (lambda x: x). > > IIUC correctly, the intent is "bool", it just happens that it's more > efficient to spell it "lambda x: x" when you don't actually need a > Boolean value. > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From kristjan at ccpgames.com Wed Aug 7 16:23:36 2013 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Wed, 7 Aug 2013 14:23:36 +0000 Subject: [Python-ideas] Enhanced context managers with ContextManagerExit and None Message-ID: I just added a patch to the tracker: http://bugs.python.org/issue18677 Here is its' text: A proposed patch adds two features to context managers: 1)It has always irked me that it was impossible to assemble nested context managers in the python language. See issue #5251. The main problem, that exceptions in __enter__ cannot be properly handled, is fixed by introducing a new core exception, ContextManagerExit. When raised by __enter__(), the body that the context manager protects is skipped. This exception is in the spirit of other semi-internal exceptions such as GeneratorExit and StopIteration. Using this exception, contextlib.nested can properly handle the case where the body isn't run because of an internal __enter__ exception which is handled by an outer __exit__. 2) The mechanism used in implementing ContextManagerExit above is easily extended to allowing a special context manager: None. This is useful for having _optional_ context managers. E.g. code like this: with performance_timer(): do_work() def performance_timer(): if profiling: return accumulator return None None becomes the trivial context manager and its __enter__ and __exit__ calls are skipped, along with their overhead. This patch implements both features. In addition, it: 1) reintroduces contextlib.nested, which is based on nested_delayed 2) introduces contextlib.nested_delayed, which solves the other problem with previous versions of nested, that an inner context manager expression shouldn't be evaluated early. contextlib.nested evaluates callables returning context managers, rather than managers directly. 3) Allows contextlib.contextmanager decorated functions to not yield, which amounts to skipping the protected body (implicitly raising ContextManagerExit) 4) unittests for the whole thing. Cheers, Kristj?n -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Aug 7 16:41:16 2013 From: rymg19 at gmail.com (Ryan) Date: Wed, 07 Aug 2013 09:41:16 -0500 Subject: [Python-ideas] ElementTree iterparse string In-Reply-To: <9DCD9375-A4C1-4CAF-A28C-2B69B1B6F1E7@masklinn.net> References: <260efc16-8404-493e-906d-8e51301c7540@email.android.com> <6b8a9464-d832-49b2-b600-343239f07953@email.android.com> <9DCD9375-A4C1-4CAF-A28C-2B69B1B6F1E7@masklinn.net> Message-ID: <8d746eb7-004f-4796-ad6e-c794f95df80d@email.android.com> I was referring to the other suggestion, IncrementalParser. Masklinn wrote: >On 2013-08-07, at 16:12 , Ryan wrote: > >> Actually, is was a feature suggestion, not a question. >> >> And, I have the 3.3 version of the docs, so I didn't realize they >already added that new feature. > >If you're talking about using StringIO, it's not a new feature, it was >already there before elementtree was even merged to the stdlib. >Although >it could be clearer, the 2.7 doc for iterparse already notes > >> source is a filename or file object containing XML data. > >as in other methods & utility functions (parse, ElementTree.parse and >ElementTree.write) "file object" should really be understood as >"file-like object". And more precisely, for parsing it's an object with >a `read(size_hint: int)` method. >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Aug 7 18:01:11 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 7 Aug 2013 09:01:11 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> Message-ID: <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> On Aug 7, 2013, at 4:10, Oscar Benjamin wrote: > On Aug 6, 2013 11:19 PM, "Andrew Barnert" wrote: > > > > On Aug 6, 2013, at 12:44, Michele Lacchia wrote: > >> > >> Yes but then you lose all the advantages of iterators. What's the point in that? > >> Furthermore it's not guaranteed that you can always converting an iterator into a list. As it has already been said, you could run out of memory, for instance. > > > > And the places where the stdlib/builtins do that automatic conversion--even when it's well motivated and almost always harmless once you think about it, like str.join--are surprising to most people. (Following up on str.join as an example, just about every question whose answer is str.join([...]) ends up with someone suggesting a genexpr instead of a listcomp, someone else explaining that it doesn't actually save any memory in that case, just wastes a bit of time, then some back and forth until everyone finally gets it.) > > > > The question is whether it would be even _more_ surprising to return an error, or a less accurate result. I don't know the answer to that. > > I'm going to make the claim (with no supporting data) that more than 95% of the time, when a user calls variance(iterator) they will be guilty of premature optimisation. > I think you're probably right. In the similar cases that come up with, e.g., str.join(iterator), there is usually no reason whatsoever to believe that any memory or speed cost will make any difference. Often people get into arguments over a half dozen strings (where, even if it _did_ matter, which it doesn't, N is so low that algorithmic complexity isn't even relevant). > Really the cases where you can't build a collection are rare. People will still do it though just because it's satisfying to do everything with iterators in constant memory (I'm often guilty of this kind of thing). > Or so that a sequence of operations can be pipelined, possibly leading to better cache behavior. Or just because iterators are the pythonic (or python3-ic?) way to do it. > However unlike str.join there's no one pass algorithm that can be as accurate so it's not purely a performance question. > But the point is that str.join doesn't use a one-pass algorithm, it just constructs a list so it can do it in two passes. And it's been suggested on this thread that variance could easily do the same thing. So there are three choices. Using a one-pass algorithm would be surprising because it's less accurate. Automatic listification would be surprising because you went out of your way to pass lazy iterators around and variance broke the benefits. An exception would be surprising because almost every other function in the stdlib that takes lists also takes iterators, even when there are good reasons not to. I think you still may be right that the error is the way to go. You'll learn the problem quickly, and the workaround will be obvious, and the reason for it will be available in the docs. The other two potential surprises may not be as discoverable. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Aug 7 18:28:17 2013 From: rymg19 at gmail.com (Ryan) Date: Wed, 07 Aug 2013 11:28:17 -0500 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> Message-ID: <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> What if it was an option. i.e. statistics.sum(myiter, listconv=True) Andrew Barnert wrote: >On Aug 7, 2013, at 4:10, Oscar Benjamin >wrote: > >> On Aug 6, 2013 11:19 PM, "Andrew Barnert" wrote: >> > >> > On Aug 6, 2013, at 12:44, Michele Lacchia > wrote: >> >> >> >> Yes but then you lose all the advantages of iterators. What's the >point in that? >> >> Furthermore it's not guaranteed that you can always converting an >iterator into a list. As it has already been said, you could run out of >memory, for instance. >> > >> > And the places where the stdlib/builtins do that automatic >conversion--even when it's well motivated and almost always harmless >once you think about it, like str.join--are surprising to most people. >(Following up on str.join as an example, just about every question >whose answer is str.join([...]) ends up with someone suggesting a >genexpr instead of a listcomp, someone else explaining that it doesn't >actually save any memory in that case, just wastes a bit of time, then >some back and forth until everyone finally gets it.) >> > >> > The question is whether it would be even _more_ surprising to >return an error, or a less accurate result. I don't know the answer to >that. >> >> I'm going to make the claim (with no supporting data) that more than >95% of the time, when a user calls variance(iterator) they will be >guilty of premature optimisation. >> >I think you're probably right. In the similar cases that come up with, >e.g., str.join(iterator), there is usually no reason whatsoever to >believe that any memory or speed cost will make any difference. Often >people get into arguments over a half dozen strings (where, even if it >_did_ matter, which it doesn't, N is so low that algorithmic complexity >isn't even relevant). >> Really the cases where you can't build a collection are rare. People >will still do it though just because it's satisfying to do everything >with iterators in constant memory (I'm often guilty of this kind of >thing). >> >Or so that a sequence of operations can be pipelined, possibly leading >to better cache behavior. Or just because iterators are the pythonic >(or python3-ic?) way to do it. >> However unlike str.join there's no one pass algorithm that can be as >accurate so it's not purely a performance question. >> >But the point is that str.join doesn't use a one-pass algorithm, it >just constructs a list so it can do it in two passes. And it's been >suggested on this thread that variance could easily do the same thing. > >So there are three choices. Using a one-pass algorithm would be >surprising because it's less accurate. Automatic listification would be >surprising because you went out of your way to pass lazy iterators >around and variance broke the benefits. An exception would be >surprising because almost every other function in the stdlib that takes >lists also takes iterators, even when there are good reasons not to. > >I think you still may be right that the error is the way to go. You'll >learn the problem quickly, and the workaround will be obvious, and the >reason for it will be available in the docs. The other two potential >surprises may not be as discoverable. > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Wed Aug 7 18:48:17 2013 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 07 Aug 2013 17:48:17 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> Message-ID: <52027A51.4020901@mrabarnett.plus.com> On 07/08/2013 17:28, Ryan wrote: > What if it was an option. i.e. > > statistics.sum(myiter, listconv=True) > [snip] Why have an option? You can just do this: statistics.sum(list(myiter)) From tjreedy at udel.edu Wed Aug 7 19:20:04 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 07 Aug 2013 13:20:04 -0400 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 8/7/2013 4:55 AM, Serhiy Storchaka wrote: > The default predicate is not "bool". The default is identity function > (lambda x: x). Not really. None means "do not apply *any* predicate function". -- Terry Jan Reedy From tjreedy at udel.edu Wed Aug 7 19:31:40 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 07 Aug 2013 13:31:40 -0400 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: On 8/7/2013 6:01 AM, Alexander Belopolsky wrote: > From Guido's time machine comes a pronouncement: . As Raymond points out in a later message, it is standard in Python for None passed to a function to mean 'no argument, do something that does not require an value for this parameter'. It does not mean 'use a value of None for this parameter and do with None whatever would be done with any other value'. Usually, only trailing parameters are allowed to be given None and then it is given in the definition as a default. def f(a, b=None, c=None): pass can be called f(x), f(x, y), f(x, y, z), and f(x, None, z) or f(x, c=z). Because of the last option, there is never a need for the user to explicitly pass None. What is unusual about filter is not the use of None to mean 'no argument, behave differently', but the need for it to be written explicitly by the caller to get the null behavior. -- Terry Jan Reedy From michael.walter at gmail.com Wed Aug 7 20:05:27 2013 From: michael.walter at gmail.com (Michael Walter) Date: Wed, 7 Aug 2013 20:05:27 +0200 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <52027A51.4020901@mrabarnett.plus.com> References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> <52027A51.4020901@mrabarnett.plus.com> Message-ID: So that the default can be "true" ;) On Wed, Aug 7, 2013 at 6:48 PM, MRAB wrote: > On 07/08/2013 17:28, Ryan wrote: > >> What if it was an option. i.e. >> >> statistics.sum(myiter, listconv=True) >> >> [snip] > Why have an option? You can just do this: > > statistics.sum(list(myiter)) > > ______________________________**_________________ > > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Aug 7 20:27:29 2013 From: rymg19 at gmail.com (Ryan) Date: Wed, 07 Aug 2013 13:27:29 -0500 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> <52027A51.4020901@mrabarnett.plus.com> Message-ID: Scratch that. I just realized that this isn't Pascal, meaning types can be checked. The option is still a thought, however. Beats a documentation warning that most people(including me) would probably ignore out of lack of patience(again like me). Michael Walter wrote: >So that the default can be "true" ;) > > >On Wed, Aug 7, 2013 at 6:48 PM, MRAB >wrote: > >> On 07/08/2013 17:28, Ryan wrote: >> >>> What if it was an option. i.e. >>> >>> statistics.sum(myiter, listconv=True) >>> >>> [snip] >> Why have an option? You can just do this: >> >> statistics.sum(list(myiter)) >> >> ______________________________**_________________ >> >> Python-ideas mailing list >> Python-ideas at python.org >> >http://mail.python.org/**mailman/listinfo/python-ideas >> > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Wed Aug 7 20:47:16 2013 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 07 Aug 2013 19:47:16 +0100 Subject: [Python-ideas] Add 'interleave' function to itertools? Message-ID: <52029634.9070502@mrabarnett.plus.com> On python-list at python.org there was a thread "make elements of a list twice or more." where the OP wanted to repeat items in a list. For example, given: [b, a, c] the output should be: [b, b, a, a, c, c] It occurred to me that this was just a special case of interleaving iterables. Going the opposite way with a list is simple enough: >>> numbers = [0, 1, 2, 3, 4, 5] >>> evens = numbers[0 : : 2] >>> odds = numbers[1 : : 2] >>> evens [0, 2, 4] >>> odds [1, 3, 5] but how do you interleave them again? >>> [number for pair in zip(evens, odds) for number in pair] [0, 1, 2, 3, 4, 5] I'm suggesting adding an 'interleave' function to itertools: >>> list(interleave(evens, odds)) [0, 1, 2, 3, 4, 5] The function would stop when any of the iterables became exhausted (compare 'zip'). There could also be a related 'interleave_longest' function (compare 'zip_longest'). Here are the definitions: def interleave(*iterables): """Return a interleave object whose .__next__() method returns an element from each iterable argument in turn. The .__next__() method continues until the shortest iterable in the argument sequence is exhausted and then it raises StopIteration. """ sources = [iter(iterable) for iterable in iterables] try: while True: for iterable in sources: yield next(iterable) except StopIteration: pass def interleave_longest(*iterables, fillvalue=None): """Return an interleave_longest object whose .__next__() method returns an element from each iterable argument in turn. The .__next__() method continues until the longest iterable in the argument sequence is exhausted and then it raises StopIteration. When the shorter iterables are exhausted, the fillvalue is substituted in their place. The fillvalue defaults to None or can be specified by a keyword argument. """ sources = [iter(iterable) for iterable in iterables] if not sources: return remaining = len(sources) while True: for index, iterable in enumerate(sources): try: yield next(iterable) except StopIteration: remaining -= 1 if not remaining: return sources[index] = repeat(fillvalue) yield fillvalue From vito.detullio at gmail.com Wed Aug 7 21:19:19 2013 From: vito.detullio at gmail.com (Vito De Tullio) Date: Wed, 07 Aug 2013 21:19:19 +0200 Subject: [Python-ideas] Add 'interleave' function to itertools? References: <52029634.9070502@mrabarnett.plus.com> Message-ID: MRAB wrote: > def interleave(*iterables): > """Return a interleave object whose .__next__() method returns an > element from each iterable argument in turn. The .__next__() > method continues until the shortest iterable in the argument > sequence is exhausted and then it raises StopIteration. > """ > > sources = [iter(iterable) for iterable in iterables] > > try: > while True: > for iterable in sources: > yield next(iterable) > except StopIteration: > pass isn't something like def interleave(*iterables): for zipped_iterables in zip(*iterables): yield from zipped_iterables sufficient? (and the same for interleave_longest / zip_longest) -- By ZeD From bwmaister at gmail.com Wed Aug 7 21:22:33 2013 From: bwmaister at gmail.com (Brandon W Maister) Date: Wed, 7 Aug 2013 15:22:33 -0400 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> Message-ID: > Andrew Barnert wrote: >> >> But the point is that str.join doesn't use a one-pass algorithm, it just >> constructs a list so it can do it in two passes. And it's been suggested on >> this thread that variance could easily do the same thing. >> >> .. snip >> >> I think you still may be right that the error is the way to go. You'll >> learn the problem quickly, and the workaround will be obvious, and the >> reason for it will be available in the docs. The other two potential >> surprises may not be as discoverable. >> > I agree that the error is the way to go, but what about `variance(iterable, one_pass=True)` defaulting to False, with an exception that says "Warning: passing an iterable and using the one pass algorithm can lead to a slight loss in accuracy" if an iterable is passed in? Normally the stdlib accepts iterables anywhere a concrete collection works, but normally also passing in an iterable doesn't change the semantics of a function. brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Wed Aug 7 22:02:14 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 07 Aug 2013 23:02:14 +0300 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: References: <52029634.9070502@mrabarnett.plus.com> Message-ID: 07.08.13 22:19, Vito De Tullio ???????(??): > MRAB wrote: > >> def interleave(*iterables): >> """Return a interleave object whose .__next__() method returns an >> element from each iterable argument in turn. The .__next__() >> method continues until the shortest iterable in the argument >> sequence is exhausted and then it raises StopIteration. >> """ >> >> sources = [iter(iterable) for iterable in iterables] >> >> try: >> while True: >> for iterable in sources: >> yield next(iterable) >> except StopIteration: >> pass > > isn't something like > > def interleave(*iterables): > for zipped_iterables in zip(*iterables): > yield from zipped_iterables > > sufficient? chain.from_iterable(zip(*iterables)) From python at mrabarnett.plus.com Wed Aug 7 22:12:53 2013 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 07 Aug 2013 21:12:53 +0100 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: References: <52029634.9070502@mrabarnett.plus.com> Message-ID: <5202AA45.10408@mrabarnett.plus.com> On 07/08/2013 20:19, Vito De Tullio wrote: > MRAB wrote: > >> def interleave(*iterables): >> """Return a interleave object whose .__next__() method returns an >> element from each iterable argument in turn. The .__next__() >> method continues until the shortest iterable in the argument >> sequence is exhausted and then it raises StopIteration. >> """ >> >> sources = [iter(iterable) for iterable in iterables] >> >> try: >> while True: >> for iterable in sources: >> yield next(iterable) >> except StopIteration: >> pass > > isn't something like > > def interleave(*iterables): > for zipped_iterables in zip(*iterables): > yield from zipped_iterables > > sufficient? > > (and the same for interleave_longest / zip_longest) > You're probably correct, although there's still the corner case of what should happen when an iterable is exhausted. For example, what should list(interleave([0, 2, 4], [1, 3])) return? Should it be [0, 1, 2, 3, 4] or [0, 1, 2, 3]? In other words, if there are 'n' iterables, should 'interleave' always yield a multiple of 'n' items? Should there be a function for each case? And are these functions worthy of inclusion in itertools? :-) From abarnert at yahoo.com Wed Aug 7 22:54:19 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 7 Aug 2013 13:54:19 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: Message-ID: <53A803D5-99D6-41CB-A88F-827B0965DA34@yahoo.com> On Aug 7, 2013, at 10:31, Terry Reedy wrote: > On 8/7/2013 6:01 AM, Alexander Belopolsky wrote: > >> From Guido's time machine comes a pronouncement: > . > > As Raymond points out in a later message, it is standard in Python for None passed to a function to mean 'no argument, do something that does not require an value for this parameter'. It does not mean 'use a value of None for this parameter and do with None whatever would be done with any other value'. > > Usually, only trailing parameters are allowed to be given None and then it is given in the definition as a default. > def f(a, b=None, c=None): pass > can be called f(x), f(x, y), f(x, y, z), and f(x, None, z) or f(x, c=z). Because of the last option, there is never a need for the user to explicitly pass None. > > What is unusual about filter is not the use of None to mean 'no argument, behave differently', but the need for it to be written explicitly by the caller to get the null behavior. There are other functions like that. For example, to set a max split, but still get the default behavior of splitting on any whitespace instead of a specific character, you have to write s.split(None, 2). I'm not suggesting that it's an ideal design, just that it's not uniquely special to filter. From abarnert at yahoo.com Wed Aug 7 22:57:36 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 7 Aug 2013 13:57:36 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> Message-ID: On Aug 7, 2013, at 12:22, Brandon W Maister wrote: > >> Andrew Barnert wrote: >>> >>> But the point is that str.join doesn't use a one-pass algorithm, it just constructs a list so it can do it in two passes. And it's been suggested on this thread that variance could easily do the same thing. >>> >>> .. snip >>> >>> I think you still may be right that the error is the way to go. You'll learn the problem quickly, and the workaround will be obvious, and the reason for it will be available in the docs. The other two potential surprises may not be as discoverable. > > I agree that the error is the way to go, but what about `variance(iterable, one_pass=True)` defaulting to False, with an exception that says "Warning: passing an iterable and using the one pass algorithm can lead to a slight loss in accuracy" if an iterable is passed in? First, I think you meant "iterator" there, not "iterable". (A list is an iterable, but not an iterator.) Second, I'd expect that onepass=True would lead to a loss in accuracy no matter what the other argument was. Finally, if passing onepass=True causes it to raise an exception, why even have the option? > > Normally the stdlib accepts iterables anywhere a concrete collection works, but normally also passing in an iterable doesn't change the semantics of a function. > > brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Aug 7 23:00:47 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 7 Aug 2013 14:00:47 -0700 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: References: <52029634.9070502@mrabarnett.plus.com> Message-ID: <7FBD6A67-E2E2-408D-AB4A-D66BA26D5347@yahoo.com> On Aug 7, 2013, at 13:02, Serhiy Storchaka wrote: > 07.08.13 22:19, Vito De Tullio ???????(??): >> MRAB wrote: >> >>> def interleave(*iterables): >>> """Return a interleave object whose .__next__() method returns an >>> element from each iterable argument in turn. The .__next__() >>> method continues until the shortest iterable in the argument >>> sequence is exhausted and then it raises StopIteration. >>> """ >>> >>> sources = [iter(iterable) for iterable in iterables] >>> >>> try: >>> while True: >>> for iterable in sources: >>> yield next(iterable) >>> except StopIteration: >>> pass >> >> isn't something like >> >> def interleave(*iterables): >> for zipped_iterables in zip(*iterables): >> yield from zipped_iterables >> >> sufficient? > > chain.from_iterable(zip(*iterables)) Yet another reason why chain.from_iterable should be more discoverable. The operation is obviously just flattening/chaining a zip, and the only reason that isn't the obvious code for many people is that they don't know how to flatten. From joshua at landau.ws Wed Aug 7 23:29:49 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 7 Aug 2013 22:29:49 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> Message-ID: On 7 August 2013 17:01, Andrew Barnert wrote: > On Aug 7, 2013, at 4:10, Oscar Benjamin wrote: > > On Aug 6, 2013 11:19 PM, "Andrew Barnert" wrote: >> >> On Aug 6, 2013, at 12:44, Michele Lacchia >> wrote: >>> >>> Yes but then you lose all the advantages of iterators. What's the point >>> in that? >>> Furthermore it's not guaranteed that you can always converting an >>> iterator into a list. As it has already been said, you could run out of >>> memory, for instance. >> >> And the places where the stdlib/builtins do that automatic >> conversion--even when it's well motivated and almost always harmless once >> you think about it, like str.join--are surprising to most people. (Following >> up on str.join as an example, just about every question whose answer is >> str.join([...]) ends up with someone suggesting a genexpr instead of a >> listcomp, someone else explaining that it doesn't actually save any memory >> in that case, just wastes a bit of time, then some back and forth until >> everyone finally gets it.) >> >> The question is whether it would be even _more_ surprising to return an >> error, or a less accurate result. I don't know the answer to that. > > I'm going to make the claim (with no supporting data) that more than 95% of > the time, when a user calls variance(iterator) they will be guilty of > premature optimisation. > > I think you're probably right. In the similar cases that come up with, e.g., > str.join(iterator), there is usually no reason whatsoever to believe that > any memory or speed cost will make any difference. Often people get into > arguments over a half dozen strings (where, even if it _did_ matter, which > it doesn't, N is so low that algorithmic complexity isn't even relevant). > > Really the cases where you can't build a collection are rare. People will > still do it though just because it's satisfying to do everything with > iterators in constant memory (I'm often guilty of this kind of thing). > > Or so that a sequence of operations can be pipelined, possibly leading to > better cache behavior. Or just because iterators are the pythonic (or > python3-ic?) way to do it. > > However unlike str.join there's no one pass algorithm that can be as > accurate so it's not purely a performance question. > > But the point is that str.join doesn't use a one-pass algorithm, it just > constructs a list so it can do it in two passes. And it's been suggested on > this thread that variance could easily do the same thing. > > So there are three choices. Using a one-pass algorithm would be surprising > because it's less accurate. Automatic listification would be surprising > because you went out of your way to pass lazy iterators around and variance > broke the benefits. An exception would be surprising because almost every > other function in the stdlib that takes lists also takes iterators, even > when there are good reasons not to. > > I think you still may be right that the error is the way to go. You'll learn > the problem quickly, and the workaround will be obvious, and the reason for > it will be available in the docs. The other two potential surprises may not > be as discoverable. My preference is a documentation warning and leave it at that *or* automatic coercion to a list. Anything else is treating this issue as a *way* bigger deal than it currently is. The advantage of the first is that it allows one-pass algorithms. The advantage of the second is correctness by default. Whichever is preferred, I'd really rather not do some of the other workarounds like extra arguments or raising errors. One pass algorithms are important if speed is important. Correctness by default is important because this is a library that values correctness over speed (...I think we have our answer). Losing correctness to allow for faster algorithms is, in my opinion, anathema to the purpose of this library. From tjreedy at udel.edu Wed Aug 7 23:39:07 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 07 Aug 2013 17:39:07 -0400 Subject: [Python-ideas] Allow filter(items) Message-ID: On 8/7/2013 4:54 PM, Andrew Barnert wrote: > On Aug 7, 2013, at 10:31, Terry Reedy wrote: > >> On 8/7/2013 6:01 AM, Alexander Belopolsky wrote: >> >>> From Guido's time machine comes a pronouncement: >> . >> >> As Raymond points out in a later message, it is standard in Python for None passed to a function to mean 'no argument, do something that does not require an value for this parameter'. It does not mean 'use a value of None for this parameter and do with None whatever would be done with any other value'. >> >> Usually, only trailing parameters are allowed to be given None and then it is given in the definition as a default. >> def f(a, b=None, c=None): pass >> can be called f(x), f(x, y), f(x, y, z), and f(x, None, z) or f(x, c=z). Because of the last option, there is never a need for the user to explicitly pass None. >> >> What is unusual about filter is not the use of None to mean 'no argument, behave differently', but the need for it to be written explicitly by the caller to get the null behavior. > > There are other functions like that. For example, to set a max split, but still get the default behavior of splitting on any whitespace instead of a specific character, you have to write s.split(None, 2). As with "f(x, None, z) or f(x, c=z)', that, *or* >>> 'a b c d'.split(maxsplit=2) ['a', 'b', 'c d'] .split is similar to filter in using None to mean 'do something different'. In this case, it is 'do not split an any one string but on any whitespace span. I believe this is equivalent to re.split('\s+', ...), but without importing the re module and invoking it full overhead. > I'm not suggesting that it's an ideal design, just that it's not uniquely special to filter. If one tries the 'or' option with filter filter(iterable=[0,1,2]) Traceback (most recent call last): File "", line 1, in filter(iterable=[0,1,2]) TypeError: filter() does not take keyword arguments and even if it did, one would get a TypeError for passing only 1 arg when 2 are needed. -- Terry Jan Reedy -- Terry Jan Reedy From shane at umbrellacode.com Wed Aug 7 23:43:47 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 7 Aug 2013 14:43:47 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Aug 7, 2013, at 10:20 AM, Terry Reedy wrote: > On 8/7/2013 4:55 AM, Serhiy Storchaka wrote: > >> The default predicate is not "bool". The default is identity function >> (lambda x: x). > > Not really. None means "do not apply *any* predicate function". > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas Which is the identity function predicate and not bool. Wrriten as [item for item in items if predicate(item)] None translates: [item for item in items if predicate(item)] ? into ? [item for item in items if item] ? not ? [item for item in items if bool(item)] So predicate is defined such that predicate(item) == item, which is the definition of the identity function, not bool. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Wed Aug 7 23:49:37 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 7 Aug 2013 14:49:37 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <72D798E2-8C41-4DC8-8C2B-AF4DB907F909@umbrellacode.com> On Aug 7, 2013, at 2:43 PM, Shane Green wrote: > On Aug 7, 2013, at 10:20 AM, Terry Reedy wrote: > >> On 8/7/2013 4:55 AM, Serhiy Storchaka wrote: >> >>> The default predicate is not "bool". The default is identity function >>> (lambda x: x). >> >> Not really. None means "do not apply *any* predicate function". >> >> -- >> Terry Jan Reedy >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > Which is the identity function predicate and not bool. > > Wrriten as [item for item in items if predicate(item)] > > None translates: > [item for item in items if predicate(item)] > ? into ? > [item for item in items if item] > ? not ? > [item for item in items if bool(item)] > > So predicate is defined such that predicate(item) == item, which is the definition of the identity function, not bool. (sorry for the big text in my lsat response, I?m sitting far away and forgot to shrink it back down?which, incidentally, is also my excuse for anything really stupid I say). I wanted to add that my answer is basically geared around answering the question, okay, if you had to define this with a predicate, what would the predicate be? I?m not literally saying the identity function is used. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bwmaister at gmail.com Thu Aug 8 00:08:48 2013 From: bwmaister at gmail.com (Brandon W Maister) Date: Wed, 7 Aug 2013 18:08:48 -0400 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> Message-ID: On Wed, Aug 7, 2013 at 4:57 PM, Andrew Barnert wrote: > On Aug 7, 2013, at 12:22, Brandon W Maister wrote: > > I agree that the error is the way to go, but what about > `variance(iterable, one_pass=True)` defaulting to False, with an exception > that says "Warning: passing an iterable and using the one pass algorithm > can lead to a slight loss in accuracy" if an iterable is passed in? > > > First, I think you meant "iterator" there, not "iterable". (A list is an > iterable, but not an iterator.) > Oops, you're completely correct. Second, I'd expect that onepass=True would lead to a loss in accuracy no > matter what the other argument was. > > Finally, if passing onepass=True causes it to raise an exception, why even > have the option? > Sorry, I was unclear: I should have written something more like `allow_one_pass=True` and defaulting to False. Passing in True would silence the error, and force users to mean what they say. I don't feel strongly that it's a _good_ idea: I have never written code that would need to take advantage of a one pass algorithm, and I have no idea how common such code would be. If someone is at the point where they're trying to optimize their variance function they are probably at the point where they're moving beyond the stdlib anyway? brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Aug 8 00:08:31 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 7 Aug 2013 15:08:31 -0700 Subject: [Python-ideas] Allow filter(items) In-Reply-To: <5202B8EF.8050307@udel.edu> References: <53A803D5-99D6-41CB-A88F-827B0965DA34@yahoo.com> <5202B8EF.8050307@udel.edu> Message-ID: <0ECB09EC-B77D-42DF-BF8B-252E9592FC0B@yahoo.com> On Aug 7, 2013, at 14:15, Terry Reedy wrote: > > > On 8/7/2013 4:54 PM, Andrew Barnert wrote: >> On Aug 7, 2013, at 10:31, Terry Reedy wrote: >> >>> On 8/7/2013 6:01 AM, Alexander Belopolsky wrote: >>> >>>> From Guido's time machine comes a pronouncement: >>> . >>> >>> As Raymond points out in a later message, it is standard in Python for None passed to a function to mean 'no argument, do something that does not require an value for this parameter'. It does not mean 'use a value of None for this parameter and do with None whatever would be done with any other value'. >>> >>> Usually, only trailing parameters are allowed to be given None and then it is given in the definition as a default. >>> def f(a, b=None, c=None): pass >>> can be called f(x), f(x, y), f(x, y, z), and f(x, None, z) or f(x, c=z). Because of the last option, there is never a need for the user to explicitly pass None. >>> >>> What is unusual about filter is not the use of None to mean 'no argument, behave differently', but the need for it to be written explicitly by the caller to get the null behavior. >> >> There are other functions like that. For example, to set a max split, but still get the default behavior of splitting on any whitespace instead of a specific character, you have to write s.split(None, 2). > > As with "f(x, None, z) or f(x, c=z)', that, *or* > >>> 'a b c d'.split(maxsplit=2) > ['a', 'b', 'c d'] > > .split is similar to filter in using None to mean 'do something different'. In this case, it is 'do not split an any one string but on any whitespace span. I believe this is equivalent to re.split('\s+', ...), but without importing the re module and invoking it full overhead. > >> I'm not suggesting that it's an ideal design, just that it's not uniquely special to filter. > > If one tries the 'or' option with filter > > filter(iterable=[0,1,2]) > Traceback (most recent call last): > File "", line 1, in > filter(iterable=[0,1,2]) > TypeError: filter() does not take keyword arguments That's just an implementation detail. The split method didn't always take keyword args either; now it does. We could easily define filter as (in pseudo-Python instead of C): def filter(predicate=None, iterable=None): if iterable is None: raise TypeError("can't filter None") ... Then you could call it as filter(None, bariter) or filter(iterable=bariter), just like split. It's still clumsier that having optional, often-skipped arguments last, but my point is that doesn't make it unique. > > and even if it did, one would get a TypeError for passing only 1 arg when 2 are needed. > > -- > Terry Jan Reedy From mertz at gnosis.cx Thu Aug 8 00:20:13 2013 From: mertz at gnosis.cx (David Mertz) Date: Wed, 7 Aug 2013 15:20:13 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> Message-ID: I think having a one-pass option is useful, even at loss of precision. I do not believe that a warning should be a run-time thing though, but simply a matter of documentation. I guess the optional flag (with list-conversion as default) is the best approach; although a function with a whole different name doesn't seem implausible either... but indeed, the final goal really should be to allow accumulation of many stats in the same single-pass in this mode (not sure what that API would be). The case of really long numeric streams feels like like it is common enough to warrant this capability. We might well have a billion numbers in a file on a disk... or they might trickle in slowly from an actual instrument. That either uses too much memory or requires too much delay when intermediate results should be possible. Here's a question for the actual statisticians on the list (I'm not close to this). Would having a look-ahead window of moderate size (probably configurable) do enough good in numeric accuracy to be worthwhile? Obviously, creating pathological cases is still possible, but in the "normal" situation, does this matter enough? I.e. if the function were to read 100 numbers from an iterator, perform some manipulation on their ordering or scaling, produce that better intermediate result, then do the same with the next chunk of 100 numbers, is this enough of a win to have as an option? On Wed, Aug 7, 2013 at 3:08 PM, Brandon W Maister wrote: > > > > On Wed, Aug 7, 2013 at 4:57 PM, Andrew Barnert wrote: > >> On Aug 7, 2013, at 12:22, Brandon W Maister wrote: >> >> I agree that the error is the way to go, but what about >> `variance(iterable, one_pass=True)` defaulting to False, with an exception >> that says "Warning: passing an iterable and using the one pass algorithm >> can lead to a slight loss in accuracy" if an iterable is passed in? >> >> >> First, I think you meant "iterator" there, not "iterable". (A list is an >> iterable, but not an iterator.) >> > > Oops, you're completely correct. > > Second, I'd expect that onepass=True would lead to a loss in accuracy no >> matter what the other argument was. >> >> Finally, if passing onepass=True causes it to raise an exception, why >> even have the option? >> > > Sorry, I was unclear: I should have written something more like > `allow_one_pass=True` and defaulting to False. Passing in True would > silence the error, and force users to mean what they say. > > I don't feel strongly that it's a _good_ idea: I have never written code > that would need to take advantage of a one pass algorithm, and I have no > idea how common such code would be. If someone is at the point where > they're trying to optimize their variance function they are probably at the > point where they're moving beyond the stdlib anyway? > > brandon > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Aug 8 00:24:37 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 07 Aug 2013 15:24:37 -0700 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> Message-ID: <5202C925.2010208@stoneleaf.us> On 08/07/2013 03:20 PM, David Mertz wrote: > > Here's a question for the actual statisticians on the list (I'm not close to this). Would having a look-ahead window of > moderate size (probably configurable) do enough good in numeric accuracy to be worthwhile? Obviously, creating > pathological cases is still possible, but in the "normal" situation, does this matter enough? I.e. if the function were > to read 100 numbers from an iterator, perform some manipulation on their ordering or scaling, produce that better > intermediate result, then do the same with the next chunk of 100 numbers, is this enough of a win to have as an option? I have a follow-up question: considering the built-in error when calculating statistics, is the difference between sequence and iterator significant? Would we be just as well served with `it = iter(sequence)` and always using the one-pass algorithm? -- ~Ethan~ From joshua at landau.ws Thu Aug 8 00:27:18 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 7 Aug 2013 23:27:18 +0100 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: <7FBD6A67-E2E2-408D-AB4A-D66BA26D5347@yahoo.com> References: <52029634.9070502@mrabarnett.plus.com> <7FBD6A67-E2E2-408D-AB4A-D66BA26D5347@yahoo.com> Message-ID: On 7 August 2013 22:00, Andrew Barnert wrote: > On Aug 7, 2013, at 13:02, Serhiy Storchaka wrote: > >> 07.08.13 22:19, Vito De Tullio ???????(??): >>> MRAB wrote: >>> >>>> def interleave(*iterables): >>>> """Return a interleave object whose .__next__() method returns an >>>> element from each iterable argument in turn. The .__next__() >>>> method continues until the shortest iterable in the argument >>>> sequence is exhausted and then it raises StopIteration. >>>> """ >>>> >>>> sources = [iter(iterable) for iterable in iterables] >>>> >>>> try: >>>> while True: >>>> for iterable in sources: >>>> yield next(iterable) >>>> except StopIteration: >>>> pass >>> >>> isn't something like >>> >>> def interleave(*iterables): >>> for zipped_iterables in zip(*iterables): >>> yield from zipped_iterables >>> >>> sufficient? >> >> chain.from_iterable(zip(*iterables)) > > Yet another reason why chain.from_iterable should be more discoverable. The operation is obviously just flattening/chaining a zip, and the only reason that isn't the obvious code for many people is that they don't know how to flatten. I agree. I don't think we can get chain.from_iterable renamed to something better easily?, but I'm still hopeful that someone will update the implementation for http://www.python.org/dev/peps/pep-0448/ for me, and that solves it nicely?. ? At least it wasn't easy last time we tried ? chain.from_iterable(x) == (*i for i in x) From rymg19 at gmail.com Thu Aug 8 02:03:43 2013 From: rymg19 at gmail.com (Ryan) Date: Wed, 07 Aug 2013 19:03:43 -0500 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <5202C925.2010208@stoneleaf.us> References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> <5202C925.2010208@stoneleaf.us> Message-ID: <2356e859-7b98-4e90-91ab-14f2e4a0f99e@email.android.com> That would mostly be an unnecessary line of code. Doing that to a list is just like doing list(iterator). The difference is that it would make all lists lose that precision, and we'd end up with another problem on your hands. Ethan Furman wrote: >On 08/07/2013 03:20 PM, David Mertz wrote: >> >> Here's a question for the actual statisticians on the list (I'm not >close to this). Would having a look-ahead window of >> moderate size (probably configurable) do enough good in numeric >accuracy to be worthwhile? Obviously, creating >> pathological cases is still possible, but in the "normal" situation, >does this matter enough? I.e. if the function were >> to read 100 numbers from an iterator, perform some manipulation on >their ordering or scaling, produce that better >> intermediate result, then do the same with the next chunk of 100 >numbers, is this enough of a win to have as an option? > >I have a follow-up question: considering the built-in error when >calculating statistics, is the difference between >sequence and iterator significant? Would we be just as well served >with `it = iter(sequence)` and always using the >one-pass algorithm? > >-- >~Ethan~ >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Aug 8 02:19:50 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 08 Aug 2013 10:19:50 +1000 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <5202C925.2010208@stoneleaf.us> References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> <5202C925.2010208@stoneleaf.us> Message-ID: <5202E426.6090107@pearwood.info> On 08/08/13 08:24, Ethan Furman wrote: > On 08/07/2013 03:20 PM, David Mertz wrote: >> >> Here's a question for the actual statisticians on the list (I'm not close to this). Would having a look-ahead window of >> moderate size (probably configurable) do enough good in numeric accuracy to be worthwhile? Obviously, creating >> pathological cases is still possible, but in the "normal" situation, does this matter enough? I.e. if the function were >> to read 100 numbers from an iterator, perform some manipulation on their ordering or scaling, produce that better >> intermediate result, then do the same with the next chunk of 100 numbers, is this enough of a win to have as an option? > > I have a follow-up question: considering the built-in error when calculating statistics, is the difference between sequence and iterator significant? Would we be just as well served with `it = iter(sequence)` and always using the one-pass algorithm? To the best of my knowledge, there is no widely-known algorithm for accurately calculating variance in chunks. There's a two-pass algorithm, and a one-pass algorithm, and that's what people use. My guess is that you could adapt the one-pass algorithm to look-ahead in chunks, but to my mind that adds complexity for no benefit. Computationally, there is a small, but real, difference in the two algorithms, as can be seen by the fact that they give different results. Does the difference make a difference *in practice*, given that most real-world measurements are only accurate to 2-4 decimal places? No, not really. If your data is accurate to 3 decimal places, you've got no business quoting the variance to 15 decimal places. But, people will compare the std lib variance to numpy's variance, to Excel's variance, to their calculator's variance, and when there is a discrepancy, I would rather it be in our favour rather than have to make the excuse "well you see we picked the slightly less accurate algorithm in order to prematurely optimize for enormous data sets too big to fit into memory". The two-pass algorithm stays. I'm deferring to 3.5 a set of one-pass iterator friendly functions that will be suitable for calculating multiple statistics from a single data stream without building a list. -- Steven From stephen at xemacs.org Thu Aug 8 03:12:12 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 08 Aug 2013 10:12:12 +0900 Subject: [Python-ideas] Allow filter(items) In-Reply-To: <72D798E2-8C41-4DC8-8C2B-AF4DB907F909@umbrellacode.com> References: <2390EE27-ECD3-4ECE-A56C-BC5849AD42F2@umbrellacode.com> <87r4e5lwm7.fsf@uwakimon.sk.tsukuba.ac.jp> <72D798E2-8C41-4DC8-8C2B-AF4DB907F909@umbrellacode.com> Message-ID: <87a9ktkmtv.fsf@uwakimon.sk.tsukuba.ac.jp> Shane Green writes: > I wanted to add that my answer is basically geared around answering > the question, okay, if you had to define this with a predicate, what > would the predicate be? I?m not literally saying the identity > function is used. But then, because of the way Python treats expressions in Boolean contexts, you cannot distinguish any of the three treatments "use the iterable element as-is" (how None is actually implemented), "apply the identity function lambda x: x", and "apply bool", because all of them use the result of applying bool to the iterable element: bool(x) == bool((lambda z: z)(x)) == bool(bool(x)) for all x where the outer bool is probably inlined in the interpreter rather than an actual application of the function bool. From stephen at xemacs.org Thu Aug 8 04:30:57 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 08 Aug 2013 11:30:57 +0900 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> Message-ID: <878v0clxr2.fsf@uwakimon.sk.tsukuba.ac.jp> David Mertz writes: > The case of really long numeric streams feels like like it is > common enough to warrant this capability. We might well have a > billion numbers in a file on a disk... or they might trickle in > slowly from an actual instrument. The latter is the important question, I think, and Steven has already said that he has online algorithms (ie, updating as data becomes available) in mind for future implementation. > Here's a question for the actual statisticians on the list (I'm not > close to this). Would having a look-ahead window of moderate size > (probably configurable) do enough good in numeric accuracy to be > worthwhile? There's a better approach. In both the "large list" and "possibly infinite iterator" cases, a *distribution* of "moderate degree of refinement" can summarize all of the data without losing precision. From clay.sweetser at gmail.com Thu Aug 8 05:29:52 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Wed, 7 Aug 2013 23:29:52 -0400 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> Message-ID: On Aug 7, 2013 12:05 PM, "Andrew Barnert" wrote: > So there are three choices. Using a one-pass algorithm would be surprising because it's less accurate. Automatic listification would be surprising because you went out of your way to pass lazy iterators around and variance broke the benefits. An exception would be surprising because almost every other function in the stdlib that takes lists also takes iterators, even when there are good reasons not to. Just curious, where does this pop up (an stdlib function taking an iterator, when using an iterator with such a function is a bad idea)? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Thu Aug 8 06:01:26 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 08 Aug 2013 06:01:26 +0200 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: <5202AA45.10408@mrabarnett.plus.com> References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> Message-ID: MRAB, 07.08.2013 22:12: > And are these functions worthy of inclusion in itertools? :-) The fact that they are a short construction of the existing tools indicates that they are better suited for the recipes section than the itertools functions section of the module docs. There is already a roundrobin() recipe. Stefan From vito.detullio at gmail.com Thu Aug 8 07:48:20 2013 From: vito.detullio at gmail.com (Vito De Tullio) Date: Thu, 08 Aug 2013 07:48:20 +0200 Subject: [Python-ideas] Add 'interleave' function to itertools? References: <52029634.9070502@mrabarnett.plus.com> Message-ID: Serhiy Storchaka wrote: >> def interleave(*iterables): >> for zipped_iterables in zip(*iterables): >> yield from zipped_iterables >> >> sufficient? > > chain.from_iterable(zip(*iterables)) I always forget it... My first approach was to call chain(), the second sum(..., []), I fall back to a for + yield approach... (at least the last one it's correct :D) -- By ZeD From storchaka at gmail.com Thu Aug 8 09:09:43 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 08 Aug 2013 10:09:43 +0300 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> Message-ID: 08.08.13 07:01, Stefan Behnel ???????(??): > MRAB, 07.08.2013 22:12: >> And are these functions worthy of inclusion in itertools? :-) > > The fact that they are a short construction of the existing tools indicates > that they are better suited for the recipes section than the itertools > functions section of the module docs. There is already a roundrobin() recipe. I think this reason is applicable to the first_true()/coalesce() function (issue18652). From ronaldoussoren at mac.com Thu Aug 8 10:52:58 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 8 Aug 2013 10:52:58 +0200 Subject: [Python-ideas] statistics.sum [was Re: Pre-PEP: adding a statistics module to Python] In-Reply-To: <52000541.6040000@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <52000541.6040000@pearwood.info> Message-ID: On 5 Aug, 2013, at 22:04, Steven D'Aprano wrote: > > > Show of hands please, +1 or -1 on statistics.sum. -1, it would be better to change one of the other sum functions (either math.fsum should work with non-floats, or builtins.sum should do the right thing with floats). It's confusing to have multiple functions that do more or less the same thing, and more so when the reason for having multiple versions is due to complexity in the semantics of floating point numbers. Ronald From ronaldoussoren at mac.com Thu Aug 8 13:31:50 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 8 Aug 2013 13:31:50 +0200 Subject: [Python-ideas] Enhanced context managers with ContextManagerExit and None In-Reply-To: References: Message-ID: <529A20D4-55D0-43A2-A7E1-A4FBC1F0FBFC@mac.com> On 7 Aug, 2013, at 16:23, Kristj?n Valur J?nsson wrote: > I just added a patch to the tracker: http://bugs.python.org/issue18677 > > Here is its' text: > > A proposed patch adds two features to context managers: > > 1)It has always irked me that it was impossible to assemble nested context managers in the python language. See issue > #5251 > . > The main problem, that exceptions in __enter__ cannot be properly handled, is fixed by introducing a new core exception, ContextManagerExit. When raised by __enter__(), the body that the context manager protects is skipped. This exception is in the spirit of other semi-internal exceptions such as GeneratorExit and StopIteration. Using this exception, contextlib.nested can properly handle the case where the body isn't run because of an internal __enter__ exception which is handled by an outer __exit__. This appears to be simular to the mechanism in PEP 377 which was rejected. > > 2) The mechanism used in implementing ContextManagerExit above is easily extended to allowing a special context manager: None. This is useful for having _optional_ context managers. E.g. code like this: > with performance_timer(): > do_work() > > def performance_timer(): > if profiling: > return accumulator > return None > > None becomes the trivial context manager and its __enter__ and __exit__ calls are skipped, along with their overhead. How bad is the overhead of a trivial contextmanager (that is, one with empty bodies for both __enter__ and __exit__)? Ronald From oscar.j.benjamin at gmail.com Thu Aug 8 13:38:01 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 8 Aug 2013 12:38:01 +0100 Subject: [Python-ideas] Pre-PEP: adding a statistics module to Python In-Reply-To: <5202E426.6090107@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <36195A9C-E35E-4F15-A01F-1940C39CDE21@yahoo.com> <438AB4B9-AF17-44AB-B115-AC542F00797D@yahoo.com> <3a0252ad-8aea-4769-bcfd-e5db13e3fe8f@email.android.com> <5202C925.2010208@stoneleaf.us> <5202E426.6090107@pearwood.info> Message-ID: On 8 August 2013 01:19, Steven D'Aprano wrote: > On 08/08/13 08:24, Ethan Furman wrote: >> >> On 08/07/2013 03:20 PM, David Mertz wrote: >>> >>> Here's a question for the actual statisticians on the list (I'm not close >>> to this). Would having a look-ahead window of >>> moderate size (probably configurable) do enough good in numeric accuracy >>> to be worthwhile? Obviously, creating >>> pathological cases is still possible, but in the "normal" situation, does >>> this matter enough? I.e. if the function were >>> to read 100 numbers from an iterator, perform some manipulation on their >>> ordering or scaling, produce that better >>> intermediate result, then do the same with the next chunk of 100 numbers, >>> is this enough of a win to have as an option? >> >> >> I have a follow-up question: considering the built-in error when >> calculating statistics, is the difference between sequence and iterator >> significant? Would we be just as well served with `it = iter(sequence)` and >> always using the one-pass algorithm? > > To the best of my knowledge, there is no widely-known algorithm for > accurately calculating variance in chunks. There's a two-pass algorithm, and > a one-pass algorithm, and that's what people use. My guess is that you could > adapt the one-pass algorithm to look-ahead in chunks, but to my mind that > adds complexity for no benefit. It's true that chunkifying algorithms are rarely used when computing over a single-pass. However there are algorithms for working with chunks that are used for distributed/parallel computation. A quick test on my machine indicates that it is possible to get an accuracy that comes between that of the 1-pass and 2-pass algorithms. Depending on the API for 1-pass statistics it might be that the data naturally arrives in chunks or that there are other reasons to chunk it anyway in which case it could be good to use these. Also it's useful if there's ever an API for merging stats computed in different places. The method below is apparently due to Chan et. al but I'd be lying if I said I consulted anything apart from Wikipedia so see here for more info: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm > Computationally, there is a small, but real, difference in the two > algorithms, as can be seen by the fact that they give different results. > Does the difference make a difference *in practice*, given that most > real-world measurements are only accurate to 2-4 decimal places? No, not > really. If your data is accurate to 3 decimal places, you've got no business > quoting the variance to 15 decimal places. > > But, people will compare the std lib variance to numpy's variance, to > Excel's variance, to their calculator's variance, and when there is a > discrepancy, I would rather it be in our favour rather than have to make the > excuse "well you see we picked the slightly less accurate algorithm in order > to prematurely optimize for enormous data sets too big to fit into memory". > > The two-pass algorithm stays. I'm deferring to 3.5 a set of one-pass > iterator friendly functions that will be suitable for calculating multiple > statistics from a single data stream without building a list. You have the right approach here. I don't think numpy or any of the statistical libraries you listed in the PEP provides 1-pass statistics. AFAIK they all use the 2-pass algorithm to compute variance. The difference may be small but one method is definitively more accurate and usually more efficient (really an additional pass over a list is nothing compared to the actual computation). Some code is below and here's some example output (it uses random data so this isn't repeatable): $ python3 stats.py 10000 Data points: 10000 Exact result: 250000020133.92606 pvariance: 250000020133.92606 error: 0.0e+00 pvar1pass: 250000020133.92624 error: -1.83e-04 pvarchunk: 250000020133.92606 error: 0.00e+00 Data points: 2000000 pvariance: 249999999484.0627 pvar1pass: 249999999484.0557 error: 7.02e-03 pvarchunk: 249999999484.0625 error: 2.14e-04 So Steven's pvariance function can often compute the variance to within machine precision of the true value (that's as accurate as it possibly could be). The 1-pass variance has an error that's on the order of machine precision (plenty accurate enough). Chunking with chunksize 100 gives an error that's about 10 times smaller (presumably there's a sqrt(N) effect). The code: #!/usr/bin/env python from itertools import takewhile, islice, repeat import statistics def chunks(iterable, chunksize=100): '''Break iterable into chunks. >>> for chunk in chunks(range(8), 3): ... print(chunk) [0, 1, 2] [3, 4, 5] [6, 7] ''' islices = map(islice, repeat(iter(iterable)), repeat(chunksize)) return takewhile(bool, map(list, islices)) def pvarchunks(iterable, chunksize=100): '''Estimate variance in 1-pass by chunkifying iterable. >>> # Use an exact computation for demonstration >>> from fractions import Fraction >>> dice = [Fraction(n) for n in range(1, 6+1)] >>> print(pvarchunks(dice, chunksize=4)) 35/12 >>> # For comparison >>> import statistics >>> print(statistics.pvariance(dice)) 35/12 ''' count = 0 # number of items seen xbar = 0 # current estimate of mean ssqdev = 0 # current estimate of sum of squared deviation for chunk in chunks(iterable, chunksize): counti = len(chunk) xbari = statistics.mean(chunk) ssqdevi = statistics._direct(chunk, xbari, counti) # Merge counti, xbari, ssqdevi delta = (((xbari - xbar)**2) * (count*counti)) / (count + counti) xbar = (count*xbar + counti*xbari) / (count + counti) ssqdev = ssqdev + ssqdevi + delta count = count + counti # Compute the final result return ssqdev / count if __name__ == "__main__": import doctest doctest.testmod() import sys from random import gauss, shuffle from fractions import Fraction # Test numerical accuracy and compare with other methods # N is the size of the test. N1 = int(sys.argv[1]) if sys.argv[1:] else 200 N2 = int(sys.argv[2]) if sys.argv[2:] else 1000000 # Try to be numerically awkward data = [gauss(0, 1) for n in range(N1 // 2)] data += [gauss(1000000, 1) for n in range(N1 // 2)] shuffle(data) print('Data points:', len(data)) # First we'll get an exact result using fractions # This is really slow... fnums = [Fraction(x) for x in data] exact_var = statistics.pvariance(fnums) print('Exact result:', float(exact_var)) # compare with pvariance on floats pvarfloat = statistics.pvariance(data) print('pvariance:', pvarfloat) print(' error: %.1e' % (exact_var - pvarfloat)) # compare with 1-pass algorithm pvar1pass = statistics.pvariance(iter(data)) print('pvar1pass:', pvar1pass) print(' error: %.2e' % (exact_var - pvar1pass)) # compare with chunked 1-pass algorithm pvarchunk = pvarchunks(data, chunksize=100) print('pvarchunk:', pvarchunk) print(' error: %.2e' % (exact_var - pvarchunk)) # Okay now we'll try a really large amount of data # (and not bother with fractions) data = [gauss(0, 1) for n in range(N2)] data += [gauss(1000000, 1) for n in range(N2)] shuffle(data) print('Data points:', len(data)) # pvariance on floats (use as true answer in this case) pvarfloat = statistics.pvariance(data) print('pvariance:', pvarfloat) pvar1pass = statistics.pvariance(iter(data)) print('pvar1pass:', pvar1pass) print(' error: %.2e' % (pvarfloat - pvar1pass)) # Now compare with chunks pvarchunk = pvarchunks(data, chunksize=100) print('pvarchunk:', pvarchunk) print(' error: %.2e' % (pvarfloat - pvarchunk)) Oscar From ronaldoussoren at mac.com Thu Aug 8 14:34:21 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 8 Aug 2013 14:34:21 +0200 Subject: [Python-ideas] Enhanced context managers with ContextManagerExit and None In-Reply-To: References: <529A20D4-55D0-43A2-A7E1-A4FBC1F0FBFC@mac.com> Message-ID: <32B0ADAA-B514-451B-B203-E7640DF8EB93@mac.com> On 8 Aug, 2013, at 14:14, Kristj?n Valur J?nsson wrote: > > > ________________________________________ > Fr?: Ronald Oussoren [ronaldoussoren at mac.com] >> This appears to be simular to the mechanism in PEP 377 http://www.python.org/dev/peps/pep-0377/ >> which was rejected. > Indeed, it is similar, but simpler. (I was unaware of Nick's PEP). The PEP contains the fallacy that in case of __enter__ raising the SkipStatement (or in my case, ContextManagerExit) exception, that something needs to be assigned to the "as" body. The proposal I'm making is to make it possible to do in a single context manager, what it is possible to do with two context managers currently: > @contextmanager errordude: > 1 // 0 > yield > @contextmanager handler: > try: > yield > except ZeroDivisionError: > pass > with handler as a, errordude as b: > do_stuff(a, b) > > do_stuff will be silently skipped, and b won't be assigned to. > I'm proposing a mechanism where the same could be done with a single context manager: > with combined as a, b: > do_stuff(a, b) > > Currently, you have programmatic capabilities (silent skipping of the statement) with two context managers that you don't have with a single one. That's just plain odd. > > I don't think the original objectsions to PEP 377 are valid. My approach introduces very little added complexity. The use case is "correctly being able to combine context managers". It is a "completeness" argument too, that you can do with a single context manager what you can do with two nested ones. Skipping the body makes it possible to introduce flow control in a context manager, which could make it harder to read code. That is, currently the body is either executed unless an exception is raised that raises an exception, while with your proposal the body might be skipped entirely. I haven't formed an opinion yet on whether or not that is a problem, but PEP 343 (introduction of the with statement) references which claims that hiding flow control in macros can be bad. Ronald From clay.sweetser at gmail.com Thu Aug 8 14:44:41 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Thu, 08 Aug 2013 08:44:41 -0400 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> Message-ID: <351d2676-e0b9-485e-9ff8-b64341f778fe@email.android.com> Serhiy Storchaka wrote: >08.08.13 07:01, Stefan Behnel ???????(??): >> MRAB, 07.08.2013 22:12: >>> And are these functions worthy of inclusion in itertools? :-) >> >> The fact that they are a short construction of the existing tools >indicates >> that they are better suited for the recipes section than the >itertools >> functions section of the module docs. There is already a roundrobin() >recipe. > >I think this reason is applicable to the first_true()/coalesce() >function (issue18652). > Not necessarily... The one line solution that the function in issue 18652 proposes is not one that people might immediately come up with, or solve quite so elegantly. Not that I'm totally against it just being added to the recipes section, but I feel that the recipes section is too often overlooked to be much help (Might there be a way to remedy that?). > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- "The trouble with having an open mind, of course, is that people will come along and insist of putting things in it." - Terry Pratchett From kristjan at ccpgames.com Thu Aug 8 14:14:52 2013 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Thu, 8 Aug 2013 12:14:52 +0000 Subject: [Python-ideas] Enhanced context managers with ContextManagerExit and None In-Reply-To: <529A20D4-55D0-43A2-A7E1-A4FBC1F0FBFC@mac.com> References: , <529A20D4-55D0-43A2-A7E1-A4FBC1F0FBFC@mac.com> Message-ID: ________________________________________ Fr?: Ronald Oussoren [ronaldoussoren at mac.com] > This appears to be simular to the mechanism in PEP 377 http://www.python.org/dev/peps/pep-0377/ > which was rejected. Indeed, it is similar, but simpler. (I was unaware of Nick's PEP). The PEP contains the fallacy that in case of __enter__ raising the SkipStatement (or in my case, ContextManagerExit) exception, that something needs to be assigned to the "as" body. The proposal I'm making is to make it possible to do in a single context manager, what it is possible to do with two context managers currently: @contextmanager errordude: 1 // 0 yield @contextmanager handler: try: yield except ZeroDivisionError: pass with handler as a, errordude as b: do_stuff(a, b) do_stuff will be silently skipped, and b won't be assigned to. I'm proposing a mechanism where the same could be done with a single context manager: with combined as a, b: do_stuff(a, b) Currently, you have programmatic capabilities (silent skipping of the statement) with two context managers that you don't have with a single one. That's just plain odd. I don't think the original objectsions to PEP 377 are valid. My approach introduces very little added complexity. The use case is "correctly being able to combine context managers". It is a "completeness" argument too, that you can do with a single context manager what you can do with two nested ones. > > 2) The mechanism used in implementing ContextManagerExit above is easily extended to allowing a special context manager: None. This is useful for having _optional_ context managers. E.g. code like this: > with performance_timer(): > do_work() > > def performance_timer(): > if profiling: > return accumulator > return None > > None becomes the trivial context manager and its __enter__ and __exit__ calls are skipped, along with their overhead. > How bad is the overhead of a trivial contextmanager (that is, one with empty bodies for both > __enter__ and __exit__)? Valid question. Let's try: #testnone.py import timeit class trivial: def __enter__(self): pass def __exit__(self, a, b, c): pass trivial = trivial() print("trivial") print(timeit.timeit("with trivial: pass", setup = "from __main__ import trivial")) print("none") print(timeit.timeit("with None: pass", setup = "from __main__ import trivial")) print("trivial + sum") print(timeit.timeit("with trivial: 1+1", setup = "from __main__ import trivial")) print("none + sum") print(timeit.timeit("with None: 1+1", setup = "from __main__ import trivial")) yields: trivial 0.786079668918435 none 0.07240212102423305 trivial + sum 0.9069962424632848 none + sum 0.11295328499933865 As you can see, the function call overhead is tremendously high. Two complete function calls with bells and whistles are omitted by using the None context manager. Ronald From masklinn at masklinn.net Thu Aug 8 15:28:15 2013 From: masklinn at masklinn.net (Masklinn) Date: Thu, 8 Aug 2013 15:28:15 +0200 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> Message-ID: On 2013-08-08, at 06:01 , Stefan Behnel wrote: > MRAB, 07.08.2013 22:12: >> And are these functions worthy of inclusion in itertools? :-) > > The fact that they are a short construction of the existing tools indicates > that they are better suited for the recipes section than the itertools > functions section of the module docs. There is already a roundrobin() recipe. I'm really not fond at all of the "recipes" section, at least under its current incarnation: it's nice to have examples of composing itertools functions to build more specialized or higher-level tools, but recipes functions should be available in itertools or a submodule: the work has been done already, it's wasteful and annoying to have to copy/paste these functions to some arbitrary location every time they are needed, that's what libraries are for. At least there's more_itertools on pypy, but I think one should not need a third-party package for things functions which are defined (though not useable) in the standard library itself. From solipsis at pitrou.net Thu Aug 8 15:47:06 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 8 Aug 2013 15:47:06 +0200 Subject: [Python-ideas] Add 'interleave' function to itertools? References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> Message-ID: <20130808154706.458d80a0@pitrou.net> Le Thu, 8 Aug 2013 15:28:15 +0200, Masklinn a ?crit : > On 2013-08-08, at 06:01 , Stefan Behnel wrote: > > > MRAB, 07.08.2013 22:12: > >> And are these functions worthy of inclusion in itertools? :-) > > > > The fact that they are a short construction of the existing tools > > indicates that they are better suited for the recipes section than > > the itertools functions section of the module docs. There is > > already a roundrobin() recipe. > > I'm really not fond at all of the "recipes" section, at least under > its current incarnation: it's nice to have examples of composing > itertools functions to build more specialized or higher-level tools, > but recipes functions should be available in itertools or a > submodule: the work has been done already, it's wasteful and annoying > to have to copy/paste these functions to some arbitrary location > every time they are needed, that's what libraries are for. I agree with this, plus there's always the risk of making a mistake when pasting them, and unit tests are not included. Regards Antoine. From storchaka at gmail.com Thu Aug 8 16:05:04 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 08 Aug 2013 17:05:04 +0300 Subject: [Python-ideas] Make test.test_support an alias to test.support Message-ID: When backporting tests to 2.7 one of changes which you should do is change "support" to "test_support". Sometimes this is only a change. I propose rename the test.test_support module to test.support (this will simplify backporting patches which changes the test.support module) and make test.test_support an alias to test.support. I.e. Lib/test/test_support.py should contains: from test.support import * from test.support import __all__ From steve at pearwood.info Thu Aug 8 16:28:33 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 09 Aug 2013 00:28:33 +1000 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: <51FBF02F.1000202@pearwood.info> References: <51FBF02F.1000202@pearwood.info> Message-ID: <5203AB11.3010502@pearwood.info> Attached is the second draft of the pre-PEP for adding a statistics module to Python. A brief summary of the most important changes: - it's a top-level module called "statistics", not "math.stats"; - statistics.sum stays (although that's subject to revision, if builtin.sum or math.fsum are "fixed" before 3.4 feature-freeze); - one-pass algorithm for variance is deferred until Python 3.5, until then, lists are the preferred data structure (iterators will continue to work, provided they are small enough to be converted to lists). Last chance to object or suggest changes before I submit this to Python-Dev. Thanks again to everyone who provided feedback. -- Steven -------------- next part -------------- PEP: xxx Title: Adding A Statistics Module To The Standard Library Version: $Revision$ Last-Modified: $Date$ Author: Steven D'Aprano Status: Draft Type: Standards Track Content-Type: text/plain Created: 01-Aug-2013 Python-Version: 3.4 Post-History: Abstract This PEP proposes the addition of a module for common statistics functions such as mean, median, variance and standard deviation to the Python standard library. Rationale The proposed statistics module is motivated by the "batteries included" philosophy towards the Python standard library. Raymond Hettinger and other senior developers have requested a quality statistics library that falls somewhere in between high-end statistics libraries and ad hoc code.[1] Statistical functions such as mean, standard deviation and others are obvious and useful batteries, familiar to any Secondary School student. Even cheap scientific calculators typically include multiple statistical functions such as: - mean - population and sample variance - population and sample standard deviation - linear regression - correlation coefficient Graphing calculators aimed at Secondary School students typically include all of the above, plus some or all of: - median - mode - functions for calculating the probability of random variables from the normal, t, chi-squared, and F distributions - inference on the mean and others[2]. Likewise spreadsheet applications such as Microsoft Excel, LibreOffice and Gnumeric include rich collections of statistical functions[3]. In contrast, Python currently has no standard way to calculate even the simplest and most obvious statistical functions such as mean. For those who need statistical functions in Python, there are two obvious solutions: - install numpy and/or scipy[4]; - or use a Do It Yourself solution. Numpy is perhaps the most full-featured solution, but it has a few disadvantages: - It may be overkill for many purposes. The documentation for numpy even warns "It can be hard to know what functions are available in numpy. This is not a complete list, but it does cover most of them."[5] and then goes on to list over 270 functions, only a small number of which are related to statistics. - Numpy is aimed at those doing heavy numerical work, and may be intimidating to those who don't have a background in computational mathematics and computer science. For example, numpy.mean takes four arguments: mean(a, axis=None, dtype=None, out=None) although fortunately for the beginner or casual numpy user, three are optional and numpy.mean does the right thing in simple cases: >>> numpy.mean([1, 2, 3, 4]) 2.5 - For many people, installing numpy may be difficult or impossible. For example, people in corporate environments may have to go through a difficult, time-consuming process before being permitted to install third-party software. For the casual Python user, having to learn about installing third-party packages in order to average a list of numbers is unfortunate. This leads to option number 2, DIY statistics functions. At first glance, this appears to be an attractive option, due to the apparent simplicity of common statistical functions. For example: def mean(data): return sum(data)/len(data) def variance(data): # Use the Computational Formula for Variance. n = len(data) ss = sum(x**2 for x in data) - (sum(data)**2)/n return ss/(n-1) def standard_deviation(data): return math.sqrt(variance(data)) The above appears to be correct with a casual test: >>> data = [1, 2, 4, 5, 8] >>> variance(data) 7.5 But adding a constant to every data point should not change the variance: >>> data = [x+1e12 for x in data] >>> variance(data) 0.0 And variance should *never* be negative: >>> variance(data*100) -1239429440.1282566 By contrast, the proposed reference implementation gets the exactly correct answer 7.5 for the first two examples, and a reasonably close answer for the third: 6.012. numpy does no better[6]. Even simple statistical calculations contain traps for the unwary, starting with the Computational Formula itself. Despite the name, it is numerically unstable and can be extremely inaccurate, as can be seen above. It is completely unsuitable for computation by computer[7]. This problem plagues users of many programming language, not just Python[8], as coders reinvent the same numerically inaccurate code over and over again[9], or advise others to do so[10]. It isn't just the variance and standard deviation. Even the mean is not quite as straight-forward as it might appear. The above implementation seems too simple to have problems, but it does: - The built-in sum can lose accuracy when dealing with floats of wildly differing magnitude. Consequently, the above naive mean fails this "torture test": assert mean([1e30, 1, 3, -1e30]) == 1 returning 0 instead of 1, a purely computational error of 100%. - Using math.fsum inside mean will make it more accurate with float data, but it also has the side-effect of converting any arguments to float even when unnecessary. E.g. we should expect the mean of a list of Fractions to be a Fraction, not a float. While the above mean implementation does not fail quite as catastrophically as the naive variance does, a standard library function can do much better than the DIY versions. The example above involves an especially bad set of data, but even for more realistic data sets accuracy is important. The first step in interpreting variation in data (including dealing with ill-conditioned data) is often to standardize it to a series with variance 1 (and often mean 0). This standardization requires accurate computation of the mean and variance of the raw series. Naive computation of mean and variance can lose precision very quickly. Because precision bounds accuracy, it is important to use the most precise algorithms for computing mean and variance that are practical, or the results of standardization are themselves useless. Comparison To Other Languages/Packages The proposed statistics library is not intended to be a competitor to such third-party libraries as numpy/scipy, or of proprietary full-featured statistics packages aimed at professional statisticians such as Minitab, SAS and Matlab. It is aimed at the level of graphing and scientific calculators. Most programming languages have little or no built-in support for statistics functions. Some exceptions: R R (and its proprietary cousin, S) is a programming language designed for statistics work. It is extremely popular with statisticians and is extremely feature-rich[11]. C# The C# LINQ package includes extension methods to calculate the average of enumerables[12]. Ruby Ruby does not ship with a standard statistics module, despite some apparent demand[13]. Statsample appears to be a feature-rich third- party library, aiming to compete with R[14]. PHP PHP has an extremely feature-rich (although mostly undocumented) set of advanced statistical functions[15]. Delphi Delphi includes standard statistical functions including Mean, Sum, Variance, TotalVariance, MomentSkewKurtosis in its Math library[16]. GNU Scientific Library The GNU Scientific Library includes standard statistical functions, percentiles, median and others[17]. One innovation I have borrowed from the GSL is to allow the caller to optionally specify the pre- calculated mean of the sample (or an a priori known population mean) when calculating the variance and standard deviation[18]. Design Decisions Of The Module My intention is to start small and grow the library as needed, rather than try to include everything from the start. Consequently, the current reference implementation includes only a small number of functions: mean, variance, standard deviation, median, mode. (See the reference implementation for a full list.) I have aimed for the following design features: - Correctness over speed. It is easier to speed up a correct but slow function than to correct a fast but buggy one. - Concentrate on data in sequences, allowing two-passes over the data, rather than potentially compromise on accuracy for the sake of a one-pass algorithm. Functions expect data will be passed as a list or other sequence; if given an iterator, they may internally convert to a list. - Functions should, as much as possible, honour any type of numeric data. E.g. the mean of a list of Decimals should be a Decimal, not a float. When this is not possible, treat float as the "lowest common data type". - Although functions support data sets of floats, Decimals or Fractions, there is no guarantee that *mixed* data sets will be supported. (But on the other hand, they aren't explicitly rejected either.) - Plenty of documentation, aimed at readers who understand the basic concepts but may not know (for example) which variance they should use (population or sample?). Mathematicians and statisticians have a terrible habit of being inconsistent with both notation and terminology[19], and having spent many hours making sense of the contradictory/confusing definitions in use, it is only fair that I do my best to clarify rather than obfuscate the topic. - But avoid going into tedious[20] mathematical detail. Specification As the proposed reference implementation is in pure Python, other Python implementations can easily make use of the module unchanged, or adapt it as they see fit. What Should Be The Name Of The Module? This will be a top-level module "statistics". There was some interest in turning math into a package, and making this a sub-module of math, but the general consensus eventually agreed on a top-level module. Other potential but rejected names included "stats" (too much risk of confusion with existing "stat" module), and "statslib" (described as "too C-like"). Previous Discussions This proposal has been previously discussed here[21]. Frequently Asked Questions Q: Shouldn't this module spend time on PyPI before being considered for the standard library? A: Older versions of this module have been available on PyPI[22] since 2010. Being much simpler than numpy, it does not require many years of external development. Q: Does the standard library really need yet another version of ``sum``? A: This proved to be the most controversial part of the reference implementation. In one sense, clearly three sums is two too many. But in another sense, yes. The reasons why the two existing versions are unsuitable are described here[23] but the short summary is: - the built-in sum can lose precision with floats; - the built-in sum accepts any non-numeric data type that supports the + operator, apart from strings and bytes; - math.fsum is high-precision, but coerces all arguments to float. There is some interest in "fixing" one or the other of the existing sums. If this occurs before 3.4 feature-freeze, the decision to keep statistics.sum can be re-considered. Q: Will this module be backported to older versions of Python? A: The module currently targets 3.3, and I will make it available on PyPI for 3.3 for the foreseeable future. Backporting to older versions of the 3.x series is likely (but not yet decided). Backporting to 2.7 is less likely but not ruled out. Q: Is this supposed to replace numpy? A: No. While it is likely to grow over the years (see open issues below) it is not aimed to replace, or even compete directly with, numpy. Numpy is a full-featured numeric library aimed at professionals, the nuclear reactor of numeric libraries in the Python ecosystem. This is just a battery, as in "batteries included", and is aimed at an intermediate level somewhere between "use numpy" and "roll your own version". Open and Deferred Issues - At this stage, I am unsure of the best API for multivariate statistical functions such as linear regression, correlation coefficient, and covariance. Possible APIs include: * Separate arguments for x and y data: function([x0, x1, ...], [y0, y1, ...]) * A single argument for (x, y) data: function([(x0, y0), (x1, y1), ...]) * Selecting arbitrary columns from a 2D array: function([[a0, x0, y0, z0], [a1, x1, y1, z1], ...], x=1, y=2) * Some combination of the above. In the absence of a consensus of preferred API for multivariate stats, I will defer including such multivariate functions until Python 3.5. - Likewise, functions for calculating probability of random variables and inference testing (e.g. Student's t-test) will be deferred until 3.5. - There is considerable interest in including one-pass functions that can calculate multiple statistics from data in iterator form, without having to convert to a list. The experimental "stats" package on PyPI includes co-routine versions of statistics functions. Including these will be deferred to 3.5. References [1] http://mail.python.org/pipermail/python-dev/2010-October/104721.html [2] http://support.casio.com/pdf/004/CP330PLUSver310_Soft_E.pdf [3] Gnumeric: https://projects.gnome.org/gnumeric/functions.shtml LibreOffice: https://help.libreoffice.org/Calc/Statistical_Functions_Part_One https://help.libreoffice.org/Calc/Statistical_Functions_Part_Two https://help.libreoffice.org/Calc/Statistical_Functions_Part_Three https://help.libreoffice.org/Calc/Statistical_Functions_Part_Four https://help.libreoffice.org/Calc/Statistical_Functions_Part_Five [4] Scipy: http://scipy-central.org/ Numpy: http://www.numpy.org/ [5] http://wiki.scipy.org/Numpy_Functions_by_Category [6] Tested with numpy 1.6.1 and Python 2.7. [7] http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ [8] http://rosettacode.org/wiki/Standard_deviation [9] https://bitbucket.org/larsyencken/simplestats/src/c42e048a6625/src/basic.py [10] http://stackoverflow.com/questions/2341340/calculate-mean-and-variance-with-one-iteration [11] http://www.r-project.org/ [12] http://msdn.microsoft.com/en-us/library/system.linq.enumerable.average.aspx [13] https://www.bcg.wisc.edu/webteam/support/ruby/standard_deviation [14] http://ruby-statsample.rubyforge.org/ [15] http://www.php.net/manual/en/ref.stats.php [16] http://www.ayton.id.au/gary/it/Delphi/D_maths.htm#Delphi%20Statistical%20functions. [17] http://www.gnu.org/software/gsl/manual/html_node/Statistics.html [18] http://www.gnu.org/software/gsl/manual/html_node/Mean-and-standard-deviation-and-variance.html [19] http://mathworld.wolfram.com/Skewness.html [20] At least, tedious to those who don't like this sort of thing. [21] http://mail.python.org/pipermail/python-ideas/2011-September/011524.html [22] https://pypi.python.org/pypi/stats/ [23] http://mail.python.org/pipermail/python-ideas/2013-August/022630.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From storchaka at gmail.com Thu Aug 8 16:41:59 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 08 Aug 2013 17:41:59 +0300 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: <351d2676-e0b9-485e-9ff8-b64341f778fe@email.android.com> References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> <351d2676-e0b9-485e-9ff8-b64341f778fe@email.android.com> Message-ID: 08.08.13 15:44, Clay Sweetser ???????(??): > Not necessarily... The one line solution that the function in issue 18652 proposes is not one that people might immediately come up with, or solve quite so elegantly. Not that I'm totally against it just being added to the recipes section, but I feel that the recipes section is too often overlooked to be much help (Might there be a way to remedy that?). I think some simple functions in itertools recipes (such as tabulate() or quantify()) are not even worth a defining of a separate function. Just inline a body (which is just a combination of two function) and the reader will not need to look in the documentation or look up the definition of unknown function. From guido at python.org Thu Aug 8 16:58:07 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 8 Aug 2013 07:58:07 -0700 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: <5203AB11.3010502@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> Message-ID: --Guido van Rossum (sent from Android phone) On Aug 8, 2013 7:29 AM, "Steven D'Aprano" wrote: > - one-pass algorithm for variance is deferred until Python 3.5, until then, lists are the preferred data structure (iterators will continue to work, provided they are small enough to be converted to lists). How do you plan to distinguish between lists and iterators? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Thu Aug 8 16:51:15 2013 From: rymg19 at gmail.com (Ryan) Date: Thu, 08 Aug 2013 09:51:15 -0500 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: <5203AB11.3010502@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> Message-ID: <076d1960-2bc0-4a4b-8684-4f0ad1df42c7@email.android.com> Looks great to me! Just remember to put a big warning in the docs about the list/iterator problem. Steven D'Aprano wrote: >Attached is the second draft of the pre-PEP for adding a statistics >module to Python. A brief summary of the most important changes: > >- it's a top-level module called "statistics", not "math.stats"; > >- statistics.sum stays (although that's subject to revision, if >builtin.sum or math.fsum are "fixed" before 3.4 feature-freeze); > >- one-pass algorithm for variance is deferred until Python 3.5, until >then, lists are the preferred data structure (iterators will continue >to work, provided they are small enough to be converted to lists). > >Last chance to object or suggest changes before I submit this to >Python-Dev. > >Thanks again to everyone who provided feedback. > > > >-- >Steven > > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Aug 8 17:23:25 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 09 Aug 2013 01:23:25 +1000 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> Message-ID: <5203B7ED.5020002@pearwood.info> On 09/08/13 00:58, Guido van Rossum wrote: > --Guido van Rossum (sent from Android phone) > On Aug 8, 2013 7:29 AM, "Steven D'Aprano" wrote: >> - one-pass algorithm for variance is deferred until Python 3.5, until > then, lists are the preferred data structure (iterators will continue to > work, provided they are small enough to be converted to lists). > > How do you plan to distinguish between lists and iterators? I haven't yet decided between these two snippets: # option 1 if data is iter(data): data = list(data) # option 2 if not isinstance(data, collections.Sequence): data = list(data) although I'm currently leaning towards option 1. -- Steven From guido at python.org Thu Aug 8 17:46:00 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 8 Aug 2013 08:46:00 -0700 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: <5203B7ED.5020002@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> <5203B7ED.5020002@pearwood.info> Message-ID: On Thu, Aug 8, 2013 at 8:23 AM, Steven D'Aprano wrote: > On 09/08/13 00:58, Guido van Rossum wrote: >> How do you plan to distinguish between lists and iterators? > I haven't yet decided between these two snippets: > > # option 1 > if data is iter(data): > data = list(data) > > # option 2 > if not isinstance(data, collections.Sequence): > data = list(data) > > although I'm currently leaning towards option 1. #1 sounds fine. -- --Guido van Rossum (python.org/~guido) From oscar.j.benjamin at gmail.com Thu Aug 8 17:48:12 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 8 Aug 2013 16:48:12 +0100 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: <5203AB11.3010502@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> Message-ID: On 8 August 2013 15:28, Steven D'Aprano wrote: > Attached is the second draft of the pre-PEP for adding a statistics module > to Python. A brief summary of the most important changes: It all looks good to me. About this part in the PEP though: ''' Open and Deferred Issues - At this stage, I am unsure of the best API for multivariate statistical functions such as linear regression, correlation coefficient, and covariance. Possible APIs include: * Separate arguments for x and y data: function([x0, x1, ...], [y0, y1, ...]) * A single argument for (x, y) data: function([(x0, y0), (x1, y1), ...]) * Selecting arbitrary columns from a 2D array: function([[a0, x0, y0, z0], [a1, x1, y1, z1], ...], x=1, y=2) * Some combination of the above. In the absence of a consensus of preferred API for multivariate stats, I will defer including such multivariate functions until Python 3.5. ''' I don't think there's been any discussion about this so there's no lack of consensus. Or would you just prefer to defer it for now? I'm just going to say that it basically doesn't matter which of the first two options you go for; the third one with the 2D array and indices is an unnecessary complication. Whichever form you used there would always be situations where the data would need to be transposed because it is in one or other of the forms. Numpy actually provides both forms and a transposed variant with a ``rowvar`` argument e.g.: >>> help(numpy.corrcoef) Help on function corrcoef in module numpy.lib.function_base: corrcoef(x, y=None, rowvar=1, bias=0, ddof=None) Return correlation coefficients. ... Parameters ---------- x : array_like A 1-D or 2-D array containing multiple variables and observations. Each row of `m` represents a variable, and each column a single observation of all those variables. Also see `rowvar` below. y : array_like, optional An additional set of variables and observations. `y` has the same shape as `m`. rowvar : int, optional If `rowvar` is non-zero (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations. To me that seems like a bit of a mess (particularly since numpy arrays are so easily transposed). The more important question would be whether you intend to compute covariance/correlation matrices rather than just individual pairwise values. If the intention is to compute individual values then I would say just keep the API clean and simple with: >>> correlation(xdata, ydata) 0.7812312312 A signature like that hardly needs an explanation. Oscar From python at mrabarnett.plus.com Thu Aug 8 18:36:40 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 08 Aug 2013 17:36:40 +0100 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> Message-ID: <5203C918.2040702@mrabarnett.plus.com> On 08/08/2013 16:48, Oscar Benjamin wrote: > On 8 August 2013 15:28, Steven D'Aprano wrote: >> Attached is the second draft of the pre-PEP for adding a statistics module >> to Python. A brief summary of the most important changes: > > It all looks good to me. > > About this part in the PEP though: > ''' > Open and Deferred Issues > > - At this stage, I am unsure of the best API for multivariate statistical > functions such as linear regression, correlation coefficient, and > covariance. Possible APIs include: > > * Separate arguments for x and y data: > function([x0, x1, ...], [y0, y1, ...]) > > * A single argument for (x, y) data: > function([(x0, y0), (x1, y1), ...]) > > * Selecting arbitrary columns from a 2D array: > function([[a0, x0, y0, z0], [a1, x1, y1, z1], ...], x=1, y=2) > > * Some combination of the above. > > In the absence of a consensus of preferred API for multivariate stats, > I will defer including such multivariate functions until Python 3.5. > ''' > I tend to prefer the second form. If your data is the form of a pair of lists, then it's easy enough zip them anyway. > I don't think there's been any discussion about this so there's no > lack of consensus. Or would you just prefer to defer it for now? > > I'm just going to say that it basically doesn't matter which of the > first two options you go for; the third one with the 2D array and > indices is an unnecessary complication. > +1. It isn't trying to be numpy. From rymg19 at gmail.com Thu Aug 8 18:38:33 2013 From: rymg19 at gmail.com (Ryan) Date: Thu, 08 Aug 2013 11:38:33 -0500 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: <5203B7ED.5020002@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> <5203B7ED.5020002@pearwood.info> Message-ID: I believe you can also shorten option 1 to: data = list(data) if data is iter(data) else data And, wouldn't option 2 change any data structure that isn't a list? Would that be problem or a benefit? Steven D'Aprano wrote: >On 09/08/13 00:58, Guido van Rossum wrote: >> --Guido van Rossum (sent from Android phone) >> On Aug 8, 2013 7:29 AM, "Steven D'Aprano" >wrote: >>> - one-pass algorithm for variance is deferred until Python 3.5, >until >> then, lists are the preferred data structure (iterators will continue >to >> work, provided they are small enough to be converted to lists). >> >> How do you plan to distinguish between lists and iterators? > > >I haven't yet decided between these two snippets: > ># option 1 >if data is iter(data): > data = list(data) > > ># option 2 >if not isinstance(data, collections.Sequence): > data = list(data) > > >although I'm currently leaning towards option 1. > > > >-- >Steven >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Aug 8 18:50:08 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 8 Aug 2013 09:50:08 -0700 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: <20130808154706.458d80a0@pitrou.net> References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> <20130808154706.458d80a0@pitrou.net> Message-ID: <129C743A-AB18-4D48-84FB-F0A7A7F86022@yahoo.com> On Aug 8, 2013, at 6:47, Antoine Pitrou wrote: > Le Thu, 8 Aug 2013 15:28:15 +0200, > Masklinn a ?crit : >> On 2013-08-08, at 06:01 , Stefan Behnel wrote: >> >>> MRAB, 07.08.2013 22:12: >>>> And are these functions worthy of inclusion in itertools? :-) >>> >>> The fact that they are a short construction of the existing tools >>> indicates that they are better suited for the recipes section than >>> the itertools functions section of the module docs. There is >>> already a roundrobin() recipe. >> >> I'm really not fond at all of the "recipes" section, at least under >> its current incarnation: it's nice to have examples of composing >> itertools functions to build more specialized or higher-level tools, >> but recipes functions should be available in itertools or a >> submodule: the work has been done already, it's wasteful and annoying >> to have to copy/paste these functions to some arbitrary location >> every time they are needed, that's what libraries are for. > > I agree with this, plus there's always the risk of making a mistake > when pasting them, and unit tests are not included. And you have to either "from itertools import *" or edit the recipes after pasting. I agree that at least some of them belong in the actual module. Sure, a few are so trivial that the first time, you copy/paste/edit you'll be able to write it yourself from thereafter. And others are rarely useful. But grouper, which is the one-word answer to dozens of StackOverflow questions per year, or unique_everseen, which people regularly put effort into trying every which way to optimize before realizing the recipe already did it, etc., those should be in the module. Of course having the source in the docs does serve a teaching purpose (although with no explanations, it's not always as much of a purpose as you'd like--I can't think of anyone who's ever seen grouper who couldn't have written it himself but who can understand it without a long explanation). But that's not a problem. There's no reason the source to selected functions can't be in the docs even though it's also in the module. While we could debate about which ones are useful, and which are either too trivial or too rare, how much harm is there in just adding all of them? From abarnert at yahoo.com Thu Aug 8 18:54:44 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 8 Aug 2013 09:54:44 -0700 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> <5203B7ED.5020002@pearwood.info> Message-ID: On Aug 8, 2013, at 9:38, Ryan wrote: > I believe you can also shorten option 1 to: > > data = list(data) if data is iter(data) else data Sure, but why? > > And, wouldn't option 2 change any data structure that isn't a list? Would that be problem or a benefit? No, tuple, many third-party types, etc. are Sequences without being lists. It would definitely change _some_ data structures into lists, with an obvious cost and no real benefit. (Unless we're planning a future C-accelerates version that does fast indexing.) But of the ones you'd be most likely to pass in, it will do the right thing (listify most iterators, leave tuples alone, etc.). > > Steven D'Aprano wrote: >> >> On 09/08/13 00:58, Guido van Rossum wrote: >>> --Guido van Rossum (sent from Android phone) >>> On Aug 8, 2013 7:29 AM, "Steven D'Aprano" wrote: >>>> - one-pass algorithm for variance is deferred until Python 3.5, until >>> then, lists are the preferred data structure (iterators will continue to >>> work, provided they are small enough to be converted to lists). >>> >>> How do you plan to distinguish between lists and iterators? >> >> >> I haven't yet decided between these two snippets: >> >> # option 1 >> if data is iter(data): >> data = list(data) >> >> >> # option 2 >> if not isinstance(data, collections.Sequence): >> data = list(data) >> >> >> although I'm currently leaning towards option 1. > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Thu Aug 8 19:42:43 2013 From: mertz at gnosis.cx (David Mertz) Date: Thu, 8 Aug 2013 10:42:43 -0700 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: <129C743A-AB18-4D48-84FB-F0A7A7F86022@yahoo.com> References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> <20130808154706.458d80a0@pitrou.net> <129C743A-AB18-4D48-84FB-F0A7A7F86022@yahoo.com> Message-ID: I agree that most of the recipes for clever combinations of itertools functions--at least those that have intuitive and obvious names--should simply be put into the module itself (and per Andrew, they can nonetheless by documented as recipes). Even if these implementations can be done as one-liners that stitch together a few existing functions, that's not really any reason not to include them in the namespace. There are too many ways to do those one-liners slightly wrong, and no need to add to the cognitive burden of someone who just wants to, e.g. interleave. On Thu, Aug 8, 2013 at 9:50 AM, Andrew Barnert wrote: > On Aug 8, 2013, at 6:47, Antoine Pitrou wrote: > > > Le Thu, 8 Aug 2013 15:28:15 +0200, > > Masklinn a ?crit : > >> On 2013-08-08, at 06:01 , Stefan Behnel wrote: > >> > >>> MRAB, 07.08.2013 22:12: > >>>> And are these functions worthy of inclusion in itertools? :-) > >>> > >>> The fact that they are a short construction of the existing tools > >>> indicates that they are better suited for the recipes section than > >>> the itertools functions section of the module docs. There is > >>> already a roundrobin() recipe. > >> > >> I'm really not fond at all of the "recipes" section, at least under > >> its current incarnation: it's nice to have examples of composing > >> itertools functions to build more specialized or higher-level tools, > >> but recipes functions should be available in itertools or a > >> submodule: the work has been done already, it's wasteful and annoying > >> to have to copy/paste these functions to some arbitrary location > >> every time they are needed, that's what libraries are for. > > > > I agree with this, plus there's always the risk of making a mistake > > when pasting them, and unit tests are not included. > > And you have to either "from itertools import *" or edit the recipes after > pasting. > > I agree that at least some of them belong in the actual module. Sure, a > few are so trivial that the first time, you copy/paste/edit you'll be able > to write it yourself from thereafter. And others are rarely useful. But > grouper, which is the one-word answer to dozens of StackOverflow questions > per year, or unique_everseen, which people regularly put effort into trying > every which way to optimize before realizing the recipe already did it, > etc., those should be in the module. > > Of course having the source in the docs does serve a teaching purpose > (although with no explanations, it's not always as much of a purpose as > you'd like--I can't think of anyone who's ever seen grouper who couldn't > have written it himself but who can understand it without a long > explanation). But that's not a problem. There's no reason the source to > selected functions can't be in the docs even though it's also in the module. > > While we could debate about which ones are useful, and which are either > too trivial or too rare, how much harm is there in just adding all of them? > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Aug 8 19:47:36 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 8 Aug 2013 10:47:36 -0700 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> <20130808154706.458d80a0@pitrou.net> <129C743A-AB18-4D48-84FB-F0A7A7F86022@yahoo.com> Message-ID: On Thu, Aug 8, 2013 at 10:42 AM, David Mertz wrote: > I agree that most of the recipes for clever combinations of itertools > functions--at least those that have intuitive and obvious names--should > simply be put into the module itself (and per Andrew, they can nonetheless > by documented as recipes). > > Even if these implementations can be done as one-liners that stitch together > a few existing functions, that's not really any reason not to include them > in the namespace. There are too many ways to do those one-liners slightly > wrong, and no need to add to the cognitive burden of someone who just wants > to, e.g. interleave. I'm not sure if I completely buy this line of argument. Coming up with good names for all of them will become an issue, and people who can easily reconstruct the recipes from first principles won't necessarily be using them -- they may even have to look up the recipe by name before they understand what is going on. (Then again, feel free to ignore me. I rarely use the itertools module anyway.) -- --Guido van Rossum (python.org/~guido) From storchaka at gmail.com Thu Aug 8 20:32:16 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 08 Aug 2013 21:32:16 +0300 Subject: [Python-ideas] Deprecating rarely used str methods Message-ID: The str class have some very rarely used methods. When was the last time you used swapcase() (not counting the time of apprenticeship)? These methods take place in a source code, documentation and a memory of developers. Due to the large number of methods, some of the really necessary methods can be neglected. I propose to deprecate rarely used methods (especially that for most of them there is another, more obvious way) and remove them in 4.0. s.ljust(width) == '{:<{}}'.format(s, width) == '%-*s' % (width, s) s.ljust(width, fillchar) == '{:{}<{}}'.format(s, fillchar, width) s.rjust(width) == '{:>{}}'.format(s, width) == '%*s' % (width, s) s.rjust(width, fillchar) == '{:{}>{}}'.format(s, fillchar, width) s.center(width) == '{:^{}}'.format(s, width) s.center(width, fillchar) == '{:{}^{}}'.format(s, fillchar, width) str(n).zfill(width) == '{:0={}}'.format(n, width) == '%0*.f' % (width,n) str.swapcase() is just not needed. str.expandtabs([tabsize]) is rarely used and can be moved to the textwrap module. From tim.peters at gmail.com Thu Aug 8 20:48:23 2013 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 8 Aug 2013 13:48:23 -0500 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> <20130808154706.458d80a0@pitrou.net> <129C743A-AB18-4D48-84FB-F0A7A7F86022@yahoo.com> Message-ID: [David Mertz] >> I agree that most of the recipes for clever combinations of itertools >> functions--at least those that have intuitive and obvious names--should >> simply be put into the module itself (and per Andrew, they can nonetheless >> by documented as recipes). >> >> Even if these implementations can be done as one-liners that stitch together >> a few existing functions, that's not really any reason not to include them >> in the namespace. There are too many ways to do those one-liners slightly >> wrong, and no need to add to the cognitive burden of someone who just wants >> to, e.g. interleave. [Guido] > I'm not sure if I completely buy this line of argument. Coming up with > good names for all of them will become an issue, and people who can > easily reconstruct the recipes from first principles won't necessarily > be using them -- they may even have to look up the recipe by name > before they understand what is going on. > > (Then again, feel free to ignore me. I rarely use the itertools module anyway.) That's the thing: not everyone who could benefit from these "trivial" functions has an itertools view of the world; I'd wager that _most_ people are not among those "who can easily reconstruct the recipes from first principles". Canning simple operations is primarily for their benefit, not for itertools experts. For example, there's a long enhancement request about adding a "first_true" function, returning the first item in its iterable argument that evaluates to a true value (with an optional `pred` argument to define truthfulness, and an optional default value if no iterate "is true"): http://bugs.python.org/issue18652 It's an easy one-liner to an itertools expert, but the history in the bug report shows that the author has seen wide adoption of his package supplying this function. The typical non-expert codes a loop, and typically gets some end cases wrong (see the bug report for embarrassing examples - LOL ;-) ). To the non-expert, which itertools-like functionalities are "core" - which should be derived from which others - is a mystery. So I'm with David on throwing them all in, "at least those that have intuitive and obvious names". Let a thousand redundancies bloom ;-) From python at mrabarnett.plus.com Thu Aug 8 20:58:34 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 08 Aug 2013 19:58:34 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: Message-ID: <5203EA5A.4060405@mrabarnett.plus.com> On 08/08/2013 19:32, Serhiy Storchaka wrote: > The str class have some very rarely used methods. When was the last time > you used swapcase() (not counting the time of apprenticeship)? These > methods take place in a source code, documentation and a memory of > developers. Due to the large number of methods, some of the really > necessary methods can be neglected. I propose to deprecate rarely used > methods (especially that for most of them there is another, more obvious > way) and remove them in 4.0. > > s.ljust(width) == '{:<{}}'.format(s, width) == '%-*s' % (width, s) > s.ljust(width, fillchar) == '{:{}<{}}'.format(s, fillchar, width) > s.rjust(width) == '{:>{}}'.format(s, width) == '%*s' % (width, s) > s.rjust(width, fillchar) == '{:{}>{}}'.format(s, fillchar, width) > s.center(width) == '{:^{}}'.format(s, width) > s.center(width, fillchar) == '{:{}^{}}'.format(s, fillchar, width) > You could apply the same kind of reasoning that's used with regex: why use .format when .ljust, etc, is shorter and faster? > str(n).zfill(width) == '{:0={}}'.format(n, width) == '%0*.f' % (width,n) > > str.swapcase() is just not needed. > I've never used .swapcase, and I doubt I ever will. > str.expandtabs([tabsize]) is rarely used and can be moved to the > textwrap module. > I use it to ensure that indentation is always 4 spaces and never tabs in the regex release in case I've missed a setting in an editor somewhere, so it _is_ an important method! :-) From mal at egenix.com Thu Aug 8 21:16:27 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 08 Aug 2013 21:16:27 +0200 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: Message-ID: <5203EE8B.2020205@egenix.com> On 08.08.2013 20:32, Serhiy Storchaka wrote: > The str class have some very rarely used methods. When was the last time you used swapcase() (not > counting the time of apprenticeship)? These methods take place in a source code, documentation and a > memory of developers. Due to the large number of methods, some of the really necessary methods can > be neglected. I propose to deprecate rarely used methods (especially that for most of them there is > another, more obvious way) and remove them in 4.0. > > s.ljust(width) == '{:<{}}'.format(s, width) == '%-*s' % (width, s) > s.ljust(width, fillchar) == '{:{}<{}}'.format(s, fillchar, width) > s.rjust(width) == '{:>{}}'.format(s, width) == '%*s' % (width, s) > s.rjust(width, fillchar) == '{:{}>{}}'.format(s, fillchar, width) > s.center(width) == '{:^{}}'.format(s, width) > s.center(width, fillchar) == '{:{}^{}}'.format(s, fillchar, width) > > str(n).zfill(width) == '{:0={}}'.format(n, width) == '%0*.f' % (width,n) I don't think any of the .format() alternatives you listed is anywhere near as obvious as the methods :-) > str.swapcase() is just not needed. If you have you ever typed on a keyboard with shift lock enabled, you'd know what it's needed for ;-) Seriously, that one could get deprecated, provided that a helper function with the same functionality is provided elsewhere. > str.expandtabs([tabsize]) is rarely used and can be moved to the textwrap module. TABs still exist in lots of files, so I'd rather not get rid of this method. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 08 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From joshua at landau.ws Thu Aug 8 21:17:16 2013 From: joshua at landau.ws (Joshua Landau) Date: Thu, 8 Aug 2013 20:17:16 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <5203EA5A.4060405@mrabarnett.plus.com> References: <5203EA5A.4060405@mrabarnett.plus.com> Message-ID: On 8 August 2013 19:58, MRAB wrote: > On 08/08/2013 19:32, Serhiy Storchaka wrote: >> >> The str class have some very rarely used methods. When was the last time >> you used swapcase() (not counting the time of apprenticeship)? These >> methods take place in a source code, documentation and a memory of >> developers. Due to the large number of methods, some of the really >> necessary methods can be neglected. I propose to deprecate rarely used >> methods (especially that for most of them there is another, more obvious >> way) and remove them in 4.0. Planning all the way to 4.0? >> s.ljust(width) == '{:<{}}'.format(s, width) == '%-*s' % (width, s) >> s.ljust(width, fillchar) == '{:{}<{}}'.format(s, fillchar, width) >> s.rjust(width) == '{:>{}}'.format(s, width) == '%*s' % (width, s) >> s.rjust(width, fillchar) == '{:{}>{}}'.format(s, fillchar, width) >> s.center(width) == '{:^{}}'.format(s, width) >> s.center(width, fillchar) == '{:{}^{}}'.format(s, fillchar, width) >> > You could apply the same kind of reasoning that's used with regex: why > use .format when .ljust, etc, is shorter and faster? Because .format (or modulo) is the standard way to format? Honestly when first scanning this post I had no idea what "ljust" was until I saw that equivalents were given. >> str(n).zfill(width) == '{:0={}}'.format(n, width) == '%0*.f' % (width,n) >> >> str.swapcase() is just not needed. >> > I've never used .swapcase, and I doubt I ever will. Same >> str.expandtabs([tabsize]) is rarely used and can be moved to the >> textwrap module. >> > I use it to ensure that indentation is always 4 spaces and never tabs > in the regex release in case I've missed a setting in an editor > somewhere, so it _is_ an important method! :-) Would you be fine with it in textwrap? I don't use this (but I would use a .reindent method...). From mal at egenix.com Thu Aug 8 21:19:58 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 08 Aug 2013 21:19:58 +0200 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <5203EE8B.2020205@egenix.com> References: <5203EE8B.2020205@egenix.com> Message-ID: <5203EF5E.2020301@egenix.com> On 08.08.2013 21:16, M.-A. Lemburg wrote: > On 08.08.2013 20:32, Serhiy Storchaka wrote: >> str.expandtabs([tabsize]) is rarely used and can be moved to the textwrap module. > > TABs still exist in lots of files, so I'd rather not get rid > of this method. BTW: If more and more text formatting functions get moved to textwrap then the module should be renamed to textformat. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 08 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From storchaka at gmail.com Thu Aug 8 21:36:40 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 08 Aug 2013 22:36:40 +0300 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <5203EA5A.4060405@mrabarnett.plus.com> References: <5203EA5A.4060405@mrabarnett.plus.com> Message-ID: 08.08.13 21:58, MRAB ???????(??): > On 08/08/2013 19:32, Serhiy Storchaka wrote: >> s.ljust(width) == '{:<{}}'.format(s, width) == '%-*s' % (width, s) >> s.ljust(width, fillchar) == '{:{}<{}}'.format(s, fillchar, width) >> s.rjust(width) == '{:>{}}'.format(s, width) == '%*s' % (width, s) >> s.rjust(width, fillchar) == '{:{}>{}}'.format(s, fillchar, width) >> s.center(width) == '{:^{}}'.format(s, width) >> s.center(width, fillchar) == '{:{}^{}}'.format(s, fillchar, width) >> > You could apply the same kind of reasoning that's used with regex: why > use .format when .ljust, etc, is shorter and faster? Because in common case the result of ljust/rjust is concatenated with other strings for output. Instead of print('|' + name.ljust(20) + '|' + str(value).rjust(8) + '|') you can write just print('|%-20s|%8s|' % (name, value)) or print('|{:<20}|{:>8}|'.format(name, value)) which are shorter and cleaner. >> str.expandtabs([tabsize]) is rarely used and can be moved to the >> textwrap module. >> > I use it to ensure that indentation is always 4 spaces and never tabs > in the regex release in case I've missed a setting in an editor > somewhere, so it _is_ an important method! :-) Yes, I do not propose totally remove it. But this method is not used in most applications in contrary to such popular methods as split() or strip(). It looks as one of multiline formatting methods from the textwrap module. From mertz at gnosis.cx Thu Aug 8 21:48:21 2013 From: mertz at gnosis.cx (David Mertz) Date: Thu, 8 Aug 2013 12:48:21 -0700 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: Message-ID: -1 on all the parts. I've used Python long enough to remember being thankful that many of the functions in the 'string' module were duplicated as str methods. It's generally convenient to have them there. I can't actually argue that I use str.swapcase() much (maybe ever), but all the rest are handy and things I actually use. Yes, you can technically get the results out of str.format(), but the format mini-language is rather difficult and arcane and I can't really use it without the documentation open in another window. On Thu, Aug 8, 2013 at 11:32 AM, Serhiy Storchaka wrote: > The str class have some very rarely used methods. When was the last time > you used swapcase() (not counting the time of apprenticeship)? These > methods take place in a source code, documentation and a memory of > developers. Due to the large number of methods, some of the really > necessary methods can be neglected. I propose to deprecate rarely used > methods (especially that for most of them there is another, more obvious > way) and remove them in 4.0. > > s.ljust(width) == '{:<{}}'.format(s, width) == '%-*s' % (width, s) > s.ljust(width, fillchar) == '{:{}<{}}'.format(s, fillchar, width) > s.rjust(width) == '{:>{}}'.format(s, width) == '%*s' % (width, s) > s.rjust(width, fillchar) == '{:{}>{}}'.format(s, fillchar, width) > s.center(width) == '{:^{}}'.format(s, width) > s.center(width, fillchar) == '{:{}^{}}'.format(s, fillchar, width) > > str(n).zfill(width) == '{:0={}}'.format(n, width) == '%0*.f' % (width,n) > > str.swapcase() is just not needed. > > str.expandtabs([tabsize]) is rarely used and can be moved to the textwrap > module. > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Aug 8 22:23:33 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 8 Aug 2013 14:23:33 -0600 Subject: [Python-ideas] Make test.test_support an alias to test.support In-Reply-To: References: Message-ID: On Thu, Aug 8, 2013 at 8:05 AM, Serhiy Storchaka wrote: > When backporting tests to 2.7 one of changes which you should do is change > "support" to "test_support". Sometimes this is only a change. > > I propose rename the test.test_support module to test.support (this will > simplify backporting patches which changes the test.support module) and > make test.test_support an alias to test.support. I.e. > Lib/test/test_support.py should contains. Just to be explicit about it, you propose making the change in 2.7, right? -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Aug 8 23:34:37 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 08 Aug 2013 17:34:37 -0400 Subject: [Python-ideas] Make test.test_support an alias to test.support In-Reply-To: References: Message-ID: On 8/8/2013 10:05 AM, Serhiy Storchaka wrote: > When backporting tests to 2.7 one of changes which you should do is > change "support" to "test_support". Sometimes this is only a change. You mean, 'the only change'. Yes, very annoying when it is the only change. > I propose rename the test.test_support module to test.support (this will > simplify backporting patches which changes the test.support module) and > make test.test_support an alias to test.support. I.e. > Lib/test/test_support.py should contains: > > from test.support import * > from test.support import __all__ While I strongly feel we should have done this, and some other aliases (like Tkinter as alias for tkinter, etc, and dump 'lib-tk') in 2.7.0, it seems too late for these changes now. The problem, as with any new 2.7 feature, is that if 'support' were added in 2.7.6, test code that depends on 'support' would not run on 2.7.5-. While the contents of /test are 'internal use only', it seems to me that being able to run the test suite is a documented, public feature. However, we could say that one should only run the 2.7.z test suite with the 2.7.z interpreter and stdlib. If we do that, I would like to change the illegal name 'lib-tk' to '_libtk' and change the tk/ttk tests accordingly. If 'can change or be removed without notice between releases of Python.' refers to bug-fix releases as well as version releases, I might propose a few more changes, or rather, to also backport proposed changes for 3.4 to 2.7. -- Terry Jan Reedy From greg.ewing at canterbury.ac.nz Fri Aug 9 00:09:23 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 09 Aug 2013 10:09:23 +1200 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> Message-ID: <52041713.5040405@canterbury.ac.nz> > On Aug 8, 2013 7:29 AM, "Steven D'Aprano" > wrote: > > - one-pass algorithm for variance is deferred until Python 3.5, until > then, lists are the preferred data structure (iterators will continue to > work, provided they are small enough to be converted to lists). Does this multi-pass algorithm being talked about use a predetermined number of passes? If so, then surely any reiterable object will do, not necessarily a list. -- Greg From greg.ewing at canterbury.ac.nz Fri Aug 9 00:21:59 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 09 Aug 2013 10:21:59 +1200 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <5203EE8B.2020205@egenix.com> References: <5203EE8B.2020205@egenix.com> Message-ID: <52041A07.3090000@canterbury.ac.nz> M.-A. Lemburg wrote: > If you have you ever typed on a keyboard with shift lock enabled, ...on a Windows or Linux PC. If you're using a Mac, no algorithm will rescue you from that blunder. :-) -- Greg From python at mrabarnett.plus.com Fri Aug 9 00:58:13 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 08 Aug 2013 23:58:13 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <52041A07.3090000@canterbury.ac.nz> References: <5203EE8B.2020205@egenix.com> <52041A07.3090000@canterbury.ac.nz> Message-ID: <52042285.9050808@mrabarnett.plus.com> On 08/08/2013 23:21, Greg Ewing wrote: > M.-A. Lemburg wrote: >> If you have you ever typed on a keyboard with shift lock enabled, > > ...on a Windows or Linux PC. If you're using a Mac, > no algorithm will rescue you from that blunder. :-) > If you're using a Mac, 30% of the keystrokes will disappear off to Apple. :-) From alexander.belopolsky at gmail.com Fri Aug 9 01:17:41 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 8 Aug 2013 19:17:41 -0400 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <5203EA5A.4060405@mrabarnett.plus.com> References: <5203EA5A.4060405@mrabarnett.plus.com> Message-ID: On Thu, Aug 8, 2013 at 2:58 PM, MRAB wrote: > On 08/08/2013 19:32, Serhiy Storchaka wrote: > >> .. > > str.swapcase() is just not needed. >> >> I've never used .swapcase, and I doubt I ever will. This group need an FAQ. Any volunteers to create a "Frequently Rejected Ideas" list? http://mail.python.org/pipermail/python-dev/2011-September/113488.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Aug 9 03:45:38 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 09 Aug 2013 11:45:38 +1000 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: Message-ID: <520449C2.2020705@pearwood.info> On 09/08/13 04:32, Serhiy Storchaka wrote: > The str class have some very rarely used methods. When was the last time you used swapcase() (not counting the time of apprenticeship)? These methods take place in a source code, documentation and a memory of developers. Due to the large number of methods, some of the really necessary methods can be neglected. I propose to deprecate rarely used methods (especially that for most of them there is another, more obvious way) and remove them in 4.0. -1 It's true that, if swapcase were proposed today, it probably wouldn't be accepted. And maybe it should have been removed in Python 3. But it didn't. Deprecating working code just because (in the opinion of some people) it's not useful enough just causes code churn for no real benefit. Somebody out there is using methods ljust, rjust, center, zfill (I use those last two), yes, even swapcase -- somebody is using it, somewhere. You will break their code for very little benefit. Deprecation is only useful when there is a concrete plan to remove the feature. Python 4000 is so far in the future that it's more of a dream than an actual concrete planned release (this is why I don't refer to it as Python 4). There may not even be a Python 4000. Who knows? For the next ten, or fifteen, or fifty, years, there will be features in CPython flagged as deprecated, but still working, which means that as other implementations become Python 3 compatible they will be tempted to remove these features early. That's a bad thing, since then different implementations of Python running the same version will not be able to run the same code. The chaos and pain this will cause outweighs by far any benefit you might get from eventually removing features. When there are concrete plans for a Python 4000, that is the time to think about backwards incompatible changes. Or, following the ten-year transition period for Python 2 -> 3, maybe the BDFL will decide that Python 4000 will *not* break backwards compatibility. I was reading somebody's blog the other day talking about how he ran FORTRAN 66 code unchanged in a modern Fortran compiler and it worked perfectly. The cost of keeping swapcase etc. is tiny. They're stable code, unlikely to need much in the way of maintenance, and the pain from removing them is likely to outweigh what benefit there is. > s.ljust(width) == '{:<{}}'.format(s, width) == '%-*s' % (width, s) Especially given that your alternatives are far less readable than a method call, and difficult to override in a subclass. -- Steven From steve at pearwood.info Fri Aug 9 04:02:07 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 09 Aug 2013 12:02:07 +1000 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: <52041713.5040405@canterbury.ac.nz> References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> <52041713.5040405@canterbury.ac.nz> Message-ID: <52044D9F.2020004@pearwood.info> On 09/08/13 08:09, Greg Ewing wrote: >> On Aug 8, 2013 7:29 AM, "Steven D'Aprano" > wrote: >> > - one-pass algorithm for variance is deferred until Python 3.5, until then, lists are the preferred data structure (iterators will continue to work, provided they are small enough to be converted to lists). > > Does this multi-pass algorithm being talked about use > a predetermined number of passes? If so, then surely > any reiterable object will do, not necessarily a list. Correct. Any re-iterable sequence is expected to work, e.g. range objects, tuples, array.array. If you find a sequence type that doesn't work, that's a bug to be fixed. For the record, mean and mode uses a single pass; median currently sorts the data, which I guess counts as a single pass[1]; variance uses two passes. If some future version includes skew and kurtosis[2], they will require three passes. [1] Anyone want to contribute a version of median that uses Quickselect to avoid sorting? To make it faster than the built-in sort, it will probably need to be written in C. [2] But which ones? I know of at least four different definitions of each, three of which are in common use. Annoyingly, the differences are not documented. -- Steven From ron3200 at gmail.com Fri Aug 9 04:17:25 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 08 Aug 2013 21:17:25 -0500 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <5203EF5E.2020301@egenix.com> References: <5203EE8B.2020205@egenix.com> <5203EF5E.2020301@egenix.com> Message-ID: On 08/08/2013 02:19 PM, M.-A. Lemburg wrote: > On 08.08.2013 21:16, M.-A. Lemburg wrote: >> On 08.08.2013 20:32, Serhiy Storchaka wrote: >>> str.expandtabs([tabsize]) is rarely used and can be moved to the textwrap module. >> >> TABs still exist in lots of files, so I'd rather not get rid >> of this method. > > BTW: If more and more text formatting functions get moved to textwrap > then the module should be renamed to textformat. My vote is for texttools. But bike colors aside, I've often thought the textwrap module name is too specialised. A broader meaning name could allow other text processing functions and classes to be added/moved to it. Making it possible to reduce the number of top level modules in future versions. Cheers, Ron From stephen at xemacs.org Fri Aug 9 04:39:31 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 09 Aug 2013 11:39:31 +0900 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: <52044D9F.2020004@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> <52041713.5040405@canterbury.ac.nz> <52044D9F.2020004@pearwood.info> Message-ID: <8738qjlh98.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > [2] But which ones? I know of at least four different definitions > of each [of skew and kurtosis], three of which are in common > use. Annoyingly, the differences are not documented. Do you mean "definitions" or "implementations"? From steve at pearwood.info Fri Aug 9 08:43:23 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 09 Aug 2013 16:43:23 +1000 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: <8738qjlh98.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> <52041713.5040405@canterbury.ac.nz> <52044D9F.2020004@pearwood.info> <8738qjlh98.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52048F8B.4060600@pearwood.info> On 09/08/13 12:39, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > [2] But which ones? I know of at least four different definitions > > of each [of skew and kurtosis], three of which are in common > > use. Annoyingly, the differences are not documented. > > Do you mean "definitions" or "implementations"? Yes :-) Or possibly both. Skew and kurtosis are especially egregious examples of mathematicians being inconsistent in their terminology, notation and definitions, but as far as I have been able to determine, there are three common formulae for the third moment about the mean skewness: Population skewness: ?? = ?((x-?)/?)? ? n This, at least, everyone agrees on. Biased sample skewness uses the same formula for ??, substituting the sample mean and sample standard deviation for population mean ? and standard deviation ?: g? = ?((x-a)/sn)? ? n = ?n ?(x-a)? / (?(x-a)?)**(3/2) Note that's the *uncorrected* (n degrees of freedom) version of sample standard deviation, not the n-1 version. There are at least three bias-corrected formulae for sample skewness. This is the version of skewness used by SAS, SPSS, Excel, LibreOffice: G? = ?(n(n-1))?(n-2) ? g? And this is the version of skewness used by MINITAB: b? = ((n-1)/n)**(3/2) ? g? There's a third I read about in a paper, but it doesn't appear to have been used anywhere. The paper's authors claim it has better properties than either of the above two. Annoyingly, the notation g?, G? and b? is sometimes used interchangeably, and I recall seeing somebody using K? or k? for one of the above (but I forget which one), but as near as I can determine, the above are the most common notations. Kurtosis is much the same. There are two definitions for population kurtosis: Pearson's kurtosis, or kurtois proper: ?? = ?((x-?)/?)? ? n Fisher's kurtosis, or excess kurtosis: ?? = ?? - 3 Sample kurtosis g? is just ?? using sample mean and standard deviation. Again, there are at least three bias-corrected versions of the sample kurtosis. Here is the version used by SAS, SPSS, Excel, LibreOffice: G? = (n-1)(g?(n+1) + 6) ? ((n-2)(n-3)) = n(n+1)/((n-1)(n-2)(n-3)) ? ?(x-a)? / s? - 3(n-1)?/((n-2)(n-3)) And the version from MINITAB: b? = ((n-1)/n)? ? g? - 3 plus another version in the paper I mentioned above. And, like with skewness, people are inconsistent with notation, only sometimes worse if they don't distinguish between the "excess" or "proper" kurtosis. -- Steven From solipsis at pitrou.net Fri Aug 9 10:10:17 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Aug 2013 10:10:17 +0200 Subject: [Python-ideas] Deprecating rarely used str methods References: Message-ID: <20130809101017.1fd043f5@pitrou.net> Le Thu, 08 Aug 2013 21:32:16 +0300, Serhiy Storchaka a ?crit : > The str class have some very rarely used methods. When was the last > time you used swapcase() (not counting the time of apprenticeship)? > These methods take place in a source code, documentation and a memory > of developers. Due to the large number of methods, some of the really > necessary methods can be neglected. I propose to deprecate rarely > used methods (especially that for most of them there is another, more > obvious way) and remove them in 4.0. > > s.ljust(width) == '{:<{}}'.format(s, width) == '%-*s' % (width, s) > s.ljust(width, fillchar) == '{:{}<{}}'.format(s, fillchar, width) > s.rjust(width) == '{:>{}}'.format(s, width) == '%*s' % (width, s) > s.rjust(width, fillchar) == '{:{}>{}}'.format(s, fillchar, width) > s.center(width) == '{:^{}}'.format(s, width) > s.center(width, fillchar) == '{:{}^{}}'.format(s, fillchar, width) -1 for deprecating those 3, they are much more readable than the format equivalent (seriously, those smiley-like format specs are horrible...). > str(n).zfill(width) == '{:0={}}'.format(n, width) == '%0*.f' % > (width,n) > > str.swapcase() is just not needed. I'm ok with yanking those two. > str.expandtabs([tabsize]) is rarely used and can be moved to the > textwrap module. No opinion. Regards Antoine. From bruce at leapyear.org Fri Aug 9 10:25:25 2013 From: bruce at leapyear.org (Bruce Leban) Date: Fri, 9 Aug 2013 01:25:25 -0700 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <5203EA5A.4060405@mrabarnett.plus.com> Message-ID: On Aug 8, 2013 12:42 PM, "Serhiy Storchaka" wrote: > you can write just > > print('|%-20s|%8s|' % (name, value)) > > or > > print('|{:<20}|{:>8}|'.format(name, value)) > > which are shorter and cleaner. In what universe are those cleaner? I have to think much harder to figure out what they mean and a typo may completely mangle it w/o failing. Deprecation should be used for removing things that cause problems not for "cleaning up" things that are working fine. If it ain't broke don't fix it. --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From shibturn at gmail.com Fri Aug 9 12:07:32 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Fri, 09 Aug 2013 11:07:32 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <520449C2.2020705@pearwood.info> References: <520449C2.2020705@pearwood.info> Message-ID: On 09/08/2013 2:45am, Steven D'Aprano wrote: > It's true that, if swapcase were proposed today, it probably wouldn't be > accepted. Or it would throw an error on a string of more than one character. I think it is really a "character method" rather than a string method. -- Richard From mal at egenix.com Fri Aug 9 12:37:52 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 09 Aug 2013 12:37:52 +0200 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <520449C2.2020705@pearwood.info> Message-ID: <5204C680.3040203@egenix.com> On 09.08.2013 12:07, Richard Oudkerk wrote: > On 09/08/2013 2:45am, Steven D'Aprano wrote: >> It's true that, if swapcase were proposed today, it probably wouldn't be >> accepted. > > Or it would throw an error on a string of more than one character. I think it is really a > "character method" rather than a string method. Just a bit of history: s.swapcase() exists, because a while back we moved from functions in string.py to methods on string and unicode objects in Python 2. The swapcase() function in string.py has been there right from Python's day 1 in VCS history: http://hg.python.org/cpython-fullhistory/annotate/5570dbb1ce55/Lib/string.py#l38 author Guido van Rossum date Sat, 13 Oct 1990 19:23:40 +0000 (1990-10-13) Instead of deprecating this piece of history, perhaps we should start deprecating the caps lock key ;-) ... just think of what would give us: no more accidental yelling on the Internet, confused Nigerian spammers, fewer problems with password entries, just to name a few things. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 09 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From greg.ewing at canterbury.ac.nz Fri Aug 9 12:52:17 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 09 Aug 2013 22:52:17 +1200 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <5204C680.3040203@egenix.com> References: <520449C2.2020705@pearwood.info> <5204C680.3040203@egenix.com> Message-ID: <5204C9E1.1080307@canterbury.ac.nz> M.-A. Lemburg wrote: > Instead of deprecating this piece of history, perhaps we should start > deprecating the caps lock key ;-) ... just think of what would give > us: no more accidental yelling on the Internet, confused Nigerian > spammers, fewer problems with password entries, just to name > a few things. And it would give us room to put the control key back where it belongs! +1 -- Greg From oscar.j.benjamin at gmail.com Fri Aug 9 13:27:20 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 9 Aug 2013 12:27:20 +0100 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: <52048F8B.4060600@pearwood.info> References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> <52041713.5040405@canterbury.ac.nz> <52044D9F.2020004@pearwood.info> <8738qjlh98.fsf@uwakimon.sk.tsukuba.ac.jp> <52048F8B.4060600@pearwood.info> Message-ID: On 9 August 2013 07:43, Steven D'Aprano wrote: > On 09/08/13 12:39, Stephen J. Turnbull wrote: >> >> Steven D'Aprano writes: >> >> > [2] But which ones? I know of at least four different definitions >> > of each [of skew and kurtosis], three of which are in common >> > use. Annoyingly, the differences are not documented. >> >> Do you mean "definitions" or "implementations"? > > > Yes :-) > > Or possibly both. Skew and kurtosis are especially egregious examples of > mathematicians being inconsistent in their terminology, notation and > definitions, but as far as I have been able to determine, there are three > common formulae for the third moment about the mean skewness: Perhaps they just shouldn't be included then. Does anyone here *really* have an application for computing skewness/kurtosis (don't just imagine one)? I don't think I ever compute these directly although I have used normality tests that might have used them internally (i.e. for the D'Agostino K-squared test as computed by numpy's skewtest function). I would find skew and kurtosis functions not very useful unless I also have the appropriate statistical tests and statistical testing is a massive can of worms. Oscar From steve at pearwood.info Fri Aug 9 13:39:41 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 09 Aug 2013 21:39:41 +1000 Subject: [Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python In-Reply-To: References: <51FBF02F.1000202@pearwood.info> <5203AB11.3010502@pearwood.info> <52041713.5040405@canterbury.ac.nz> <52044D9F.2020004@pearwood.info> <8738qjlh98.fsf@uwakimon.sk.tsukuba.ac.jp> <52048F8B.4060600@pearwood.info> Message-ID: <5204D4FD.4030209@pearwood.info> On 09/08/13 21:27, Oscar Benjamin wrote: > On 9 August 2013 07:43, Steven D'Aprano wrote: >> Or possibly both. Skew and kurtosis are especially egregious examples of >> mathematicians being inconsistent in their terminology, notation and >> definitions, but as far as I have been able to determine, there are three >> common formulae for the third moment about the mean skewness: > > Perhaps they just shouldn't be included then. Does anyone here > *really* have an application for computing skewness/kurtosis (don't > just imagine one)? Skewness and kurtois are useful when you need to know a data set's skewness and kurtosis :-) I understand that kurtosis is especially used in economics. The main use for skew is, I believe, checking whether data is sufficiently close to normal that tests which assume normality will be valid. But I'm not an expert on these. Do not fear, I am not proposing that they be included in the statistics module. -- Steven From abarnert at yahoo.com Fri Aug 9 18:12:03 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 9 Aug 2013 09:12:03 -0700 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: Message-ID: <05D82B96-00B8-4DF3-BCEE-744302894895@yahoo.com> On Aug 8, 2013, at 11:32, Serhiy Storchaka wrote: People have already said what I want to say on most of these, but: > str(n).zfill(width) == '{:0={}}'.format(n, width) == '%0*.f' % (width,n) This one comes up all the time on Stack Overflow and similar places. Even when I would use format in my own code, it's a whole lot easier to explain zfill to a novice than to explain the format mini language(s) to someone who's never used anything but bare {} (or %s). > str.swapcase() is just not needed. What about when I need to deliver the secret documents to my CIA contact? If we couldn't swap cases at the baggage carousel, that would affect national security, or at least the ability of Hollywood to write formulaic thrillers. From oscar.j.benjamin at gmail.com Fri Aug 9 18:32:13 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 9 Aug 2013 17:32:13 +0100 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> <20130808154706.458d80a0@pitrou.net> <129C743A-AB18-4D48-84FB-F0A7A7F86022@yahoo.com> Message-ID: On 8 August 2013 19:48, Tim Peters wrote: > > To the non-expert, which itertools-like functionalities are "core" - > which should be derived from which others - is a mystery. So I'm with > David on throwing them all in, "at least those that have intuitive and > obvious names". Let a thousand redundancies bloom ;-) They should at least go in a different module. Currently each of those recipes is already available in more-itertools (including the proposed first()): http://pythonhosted.org/more-itertools/api.html Also if they're going to become anything more than recipes then they need to be looked at more carefully. I don't really think many of these belong in the stdlib. For example why not just use generators? I would find generators easier to understand than some of these recipes e.g.: sum(1 for x in data if x > threshold) or sum(x > threshold for x in data) are easier to understand than quantify(data, lambda x: x > threshold) even if you're using a named function. I think that consume() should be included in itertools. flatten() is a synonym for chain.from_iterable and there has been some discussion that this could go into builtins (the disagreement only seemed to be about the name). I want take() to have different semantics than it does i.e.: def take(iterable, n): chunk = list(islice(iterable, n)) if len(chunk) != n: raise ValueError('%n items requested but %n found' % (n, len(chunk))) return chunk Similarly grouper() probably doesn't do what anyone really wants (since it discards the end of the iterable). It should really be: def grouper(iterable, chunksize): islices = map(islice, repeat(iter(iterable)), repeat(chunksize)) return takewhile(bool, map(list, islices)) Some recipes are just unnecessary such as random_product() which is really just a long way to write something like: random_pair = (random.choice(s1), random.choice(s2)) Same goes for random_permutation, random_combination, and random_combination_with_replacement: if you're just going to build a collection out of each iterable then what does this have to do with iterators? The partition() recipe is only useful if you can think of a clever way of using it. How would you use it to do anything that would be better than just building a list (since that's what tee() will do internally if you consume either of the iterators)? I use itertools all the time but really the only recipes I would use in my own code are consume() (with the second argument made optional), pairwise() and powerset(). Oscar From abarnert at yahoo.com Fri Aug 9 18:51:17 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 9 Aug 2013 09:51:17 -0700 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> <20130808154706.458d80a0@pitrou.net> <129C743A-AB18-4D48-84FB-F0A7A7F86022@yahoo.com> Message-ID: <060BC8FF-253E-4672-BAE6-23AAF7753D5B@yahoo.com> On Aug 9, 2013, at 9:32, Oscar Benjamin wrote: > Similarly grouper() probably doesn't do what anyone really wants > (since it discards the end of the iterable). What do you mean by that? Every element in the original iterable ends up in one of the groups; nothing is discarded. I use this all the time. (Well, I often end up writing a "pairs" iterator by hand, and don't reach for grouper unless I already need it for different sized groups... But that's probably an argument _for_ putting it in the module, not against.) From abarnert at yahoo.com Fri Aug 9 18:54:44 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 9 Aug 2013 09:54:44 -0700 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> <20130808154706.458d80a0@pitrou.net> <129C743A-AB18-4D48-84FB-F0A7A7F86022@yahoo.com> Message-ID: <3E2A8772-343B-4231-BCDC-0E6B4A0D1068@yahoo.com> On Aug 9, 2013, at 9:32, Oscar Benjamin wrote: > They should at least go in a different module. Currently each of those > recipes is already available in more-itertools (including the proposed > first()): > http://pythonhosted.org/more-itertools/api.html How is that a problem? Unless you're doing "from itertools import *" and "from more_itertools import *" (and care which one you get) there's no possible conflict. From mertz at gnosis.cx Fri Aug 9 19:47:42 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 9 Aug 2013 10:47:42 -0700 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <5204C9E1.1080307@canterbury.ac.nz> References: <520449C2.2020705@pearwood.info> <5204C680.3040203@egenix.com> <5204C9E1.1080307@canterbury.ac.nz> Message-ID: I'm old enough to also want the function keys on the left, "As God intended!" (I miss how quick those WordPerfect 4.2 shortcuts were with the F keys next to the correctly-placed Control and Alt keys). On Fri, Aug 9, 2013 at 3:52 AM, Greg Ewing wrote: > M.-A. Lemburg wrote: > >> Instead of deprecating this piece of history, perhaps we should start >> deprecating the caps lock key ;-) ... just think of what would give >> us: no more accidental yelling on the Internet, confused Nigerian >> spammers, fewer problems with password entries, just to name >> a few things. >> > > And it would give us room to put the control key > back where it belongs! > > +1 > > -- > Greg > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Fri Aug 9 20:08:09 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 9 Aug 2013 19:08:09 +0100 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: <3E2A8772-343B-4231-BCDC-0E6B4A0D1068@yahoo.com> References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> <20130808154706.458d80a0@pitrou.net> <129C743A-AB18-4D48-84FB-F0A7A7F86022@yahoo.com> <3E2A8772-343B-4231-BCDC-0E6B4A0D1068@yahoo.com> Message-ID: On 9 August 2013 17:54, Andrew Barnert wrote: >> They should at least go in a different module. Currently each of those >> recipes is already available in more-itertools (including the proposed >> first()): >> http://pythonhosted.org/more-itertools/api.html > > How is that a problem? Unless you're doing "from itertools import *" and "from more_itertools import *" (and care which one you get) there's no possible conflict. Sorry I didn't mean that they should go in a different module because of more-itertools. I was just mentioning more-itertools at the same time. They should go in a different module because itertools is a very carefully designed minimal set of primitives and I think it's bad to pollute that by throwing in any old thing for convenience (and based on his previous comments I'm pretty sure Raymond that's how Raymond feels. Oscar From oscar.j.benjamin at gmail.com Fri Aug 9 20:14:05 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 9 Aug 2013 19:14:05 +0100 Subject: [Python-ideas] Add 'interleave' function to itertools? In-Reply-To: <060BC8FF-253E-4672-BAE6-23AAF7753D5B@yahoo.com> References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> <20130808154706.458d80a0@pitrou.net> <129C743A-AB18-4D48-84FB-F0A7A7F86022@yahoo.com> <060BC8FF-253E-4672-BAE6-23AAF7753D5B@yahoo.com> Message-ID: On 9 August 2013 17:51, Andrew Barnert wrote: >> Similarly grouper() probably doesn't do what anyone really wants >> (since it discards the end of the iterable). > > What do you mean by that? Every element in the original iterable ends up in one of the groups; nothing is discarded. > > I use this all the time. (Well, I often end up writing a "pairs" iterator by hand, and don't reach for grouper unless I already need it for different sized groups... But that's probably an argument _for_ putting it in the module, not against.) Sorry, grouper as it stands does this: >>> list(grouper(range(10), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)] If you replace zip_longest with zip it does this: >>> list(grouper(range(10), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8)] I've never had any use for either of those. The one I posted does this: >>> list(grouper(range(10), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)] The other one I've wanted is >>> list(grouper(range(10), 3)) ValueError('Uneven chunksizes') I consider the grouper() in the recipes to be the second-least useful of the 4 possibilities (the least useful is the one that discards the final part). Oscar From vito.detullio at gmail.com Fri Aug 9 20:44:50 2013 From: vito.detullio at gmail.com (Vito De Tullio) Date: Fri, 09 Aug 2013 20:44:50 +0200 Subject: [Python-ideas] Deprecating rarely used str methods References: <5203EA5A.4060405@mrabarnett.plus.com> Message-ID: Bruce Leban wrote: >> print('|%-20s|%8s|' % (name, value)) >> print('|{:<20}|{:>8}|'.format(name, value)) >> >> which are shorter and cleaner. [than .ljust, .rjust, .center, .zfill...] > In what universe are those cleaner? I found the .format mini language confusing, but the '%'-style it's definitely clearer to me. -- By ZeD From solipsis at pitrou.net Fri Aug 9 21:18:18 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Aug 2013 21:18:18 +0200 Subject: [Python-ideas] Add 'interleave' function to itertools? References: <52029634.9070502@mrabarnett.plus.com> <5202AA45.10408@mrabarnett.plus.com> <20130808154706.458d80a0@pitrou.net> <129C743A-AB18-4D48-84FB-F0A7A7F86022@yahoo.com> <3E2A8772-343B-4231-BCDC-0E6B4A0D1068@yahoo.com> Message-ID: <20130809211818.4734c9d6@fsol> On Fri, 9 Aug 2013 19:08:09 +0100 Oscar Benjamin wrote: > > Sorry I didn't mean that they should go in a different module because > of more-itertools. I was just mentioning more-itertools at the same > time. > > They should go in a different module because itertools is a very > carefully designed minimal set of primitives and I think it's bad to > pollute that by throwing in any old thing for convenience (and based > on his previous comments I'm pretty sure Raymond that's how Raymond > feels. Minimalism is pretty much a subjective thing. Also, it can be a foolish thing to pursue when maintaining an API. For example, the ssl module until Python 2.6 strived for minimalism. What it meant really is that it was so devoid of features that it was actually a threat for the security of Python applications using it. Regards Antoine. From storchaka at gmail.com Fri Aug 9 21:21:09 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 09 Aug 2013 22:21:09 +0300 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <5203EA5A.4060405@mrabarnett.plus.com> Message-ID: 09.08.13 02:17, Alexander Belopolsky ???????(??): > This group need an FAQ. Any volunteers to create a "Frequently Rejected > Ideas" list? This is a great idea. Should not it be the first entity in this FAQ? > http://mail.python.org/pipermail/python-dev/2011-September/113488.html Thank you for interesting link. From storchaka at gmail.com Fri Aug 9 21:34:03 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 09 Aug 2013 22:34:03 +0300 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <5203EE8B.2020205@egenix.com> References: <5203EE8B.2020205@egenix.com> Message-ID: 08.08.13 22:16, M.-A. Lemburg ???????(??): > I don't think any of the .format() alternatives you listed is > anywhere near as obvious as the methods :-) I confess that I be forced to look into documentation to write these examples. ;-) It's because I don't use new style formatting very often. However C-style alternatives looks enough obvious to me. > If you have you ever typed on a keyboard with shift lock enabled, > you'd know what it's needed for ;-) Are .swapcase() swaps ";" and ":"? Or "q" and "?" (on my computers I use CapsLock to switch keyboard layout between English and Ukrainian)? ;-) From grosser.meister.morti at gmx.net Fri Aug 9 21:43:37 2013 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Fri, 09 Aug 2013 21:43:37 +0200 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <20130809101017.1fd043f5@pitrou.net> References: <20130809101017.1fd043f5@pitrou.net> Message-ID: <52054669.8060708@gmx.net> On 08/09/2013 10:10 AM, Antoine Pitrou wrote: > Le Thu, 08 Aug 2013 21:32:16 +0300, > Serhiy Storchaka a > ?crit : > >> The str class have some very rarely used methods. When was the last >> time you used swapcase() (not counting the time of apprenticeship)? >> These methods take place in a source code, documentation and a memory >> of developers. Due to the large number of methods, some of the really >> necessary methods can be neglected. I propose to deprecate rarely >> used methods (especially that for most of them there is another, more >> obvious way) and remove them in 4.0. >> >> s.ljust(width) == '{:<{}}'.format(s, width) == '%-*s' % (width, s) >> s.ljust(width, fillchar) == '{:{}<{}}'.format(s, fillchar, width) >> s.rjust(width) == '{:>{}}'.format(s, width) == '%*s' % (width, s) >> s.rjust(width, fillchar) == '{:{}>{}}'.format(s, fillchar, width) >> s.center(width) == '{:^{}}'.format(s, width) >> s.center(width, fillchar) == '{:{}^{}}'.format(s, fillchar, width) > > -1 for deprecating those 3, they are much more readable than the format > equivalent (seriously, those smiley-like format specs are horrible...). > You can write ljust/rjust like this: ljust: s+" "*(width-len(s)) rjust: " "*(width-len(s))+s Is this more readable or less? I'm 0 on this. From solipsis at pitrou.net Fri Aug 9 21:51:17 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Aug 2013 21:51:17 +0200 Subject: [Python-ideas] Deprecating rarely used str methods References: <20130809101017.1fd043f5@pitrou.net> <52054669.8060708@gmx.net> Message-ID: <20130809215117.1e4e9250@fsol> On Fri, 09 Aug 2013 21:43:37 +0200 Mathias Panzenb?ck wrote: > On 08/09/2013 10:10 AM, Antoine Pitrou wrote: > > Le Thu, 08 Aug 2013 21:32:16 +0300, > > Serhiy Storchaka a > > ?crit : > > > >> The str class have some very rarely used methods. When was the last > >> time you used swapcase() (not counting the time of apprenticeship)? > >> These methods take place in a source code, documentation and a memory > >> of developers. Due to the large number of methods, some of the really > >> necessary methods can be neglected. I propose to deprecate rarely > >> used methods (especially that for most of them there is another, more > >> obvious way) and remove them in 4.0. > >> > >> s.ljust(width) == '{:<{}}'.format(s, width) == '%-*s' % (width, s) > >> s.ljust(width, fillchar) == '{:{}<{}}'.format(s, fillchar, width) > >> s.rjust(width) == '{:>{}}'.format(s, width) == '%*s' % (width, s) > >> s.rjust(width, fillchar) == '{:{}>{}}'.format(s, fillchar, width) > >> s.center(width) == '{:^{}}'.format(s, width) > >> s.center(width, fillchar) == '{:{}^{}}'.format(s, fillchar, width) > > > > -1 for deprecating those 3, they are much more readable than the format > > equivalent (seriously, those smiley-like format specs are horrible...). > > > > You can write ljust/rjust like this: > ljust: s+" "*(width-len(s)) > rjust: " "*(width-len(s))+s > > Is this more readable or less? Rather less, IMHO. Regards Antoine. From storchaka at gmail.com Fri Aug 9 21:48:42 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 09 Aug 2013 22:48:42 +0300 Subject: [Python-ideas] Deprecating repr() and the like Message-ID: repr(), ascii() and one-argument str() can be deprecated and removed in Python 5.0 because they are redundant and can be expressed as a simple formatting: repr(x) == '%r'%(x,) == '{!r}'.format(x) ascii(x) == '%a'%(x,) == '{!a}'.format(x) str(x) == '%s'%(x,) == '{}'.format(x) We also can deprecate string concatenation for the same reason: s1 + s2 == '%s%s'%(s1,s1) == '{}{}'.format(s1,s2) == ''.join([s1,s2]) From storchaka at gmail.com Fri Aug 9 22:00:35 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 09 Aug 2013 23:00:35 +0300 Subject: [Python-ideas] Make test.test_support an alias to test.support In-Reply-To: References: Message-ID: 09.08.13 00:34, Terry Reedy ???????(??): > On 8/8/2013 10:05 AM, Serhiy Storchaka wrote: >> When backporting tests to 2.7 one of changes which you should do is >> change "support" to "test_support". Sometimes this is only a change. > > You mean, 'the only change'. Yes, very annoying when it is the only change. Yes, sorry for misspelling. At least a year ago in many backported from 3.2 to 2.7 tests it was the only change. > While I strongly feel we should have done this, and some other aliases > (like Tkinter as alias for tkinter, etc, and dump 'lib-tk') in 2.7.0, it > seems too late for these changes now. The problem, as with any new 2.7 > feature, is that if 'support' were added in 2.7.6, test code that > depends on 'support' would not run on 2.7.5-. While the contents of > /test are 'internal use only', it seems to me that being able to run the > test suite is a documented, public feature. Sometimes we add new constants or functions in test_support.py in bugfix releases. Of course some tests for 2.7.x will fail, hang or crash on 2.7.y where y < x because this tests added to test a bug or because they use a new feature of test_support.py added between bugfix releases. From storchaka at gmail.com Fri Aug 9 21:51:21 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 09 Aug 2013 22:51:21 +0300 Subject: [Python-ideas] Make test.test_support an alias to test.support In-Reply-To: References: Message-ID: 08.08.13 23:23, Eric Snow ???????(??): > Just to be explicit about it, you propose making the change in 2.7, right? Yes, of course. We already made some changes in test_support.py in 2.7 if they were needed for new tests. From alexander.belopolsky at gmail.com Fri Aug 9 22:18:22 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 9 Aug 2013 16:18:22 -0400 Subject: [Python-ideas] Frequently Rejected Ideas Was: Deprecating rarely used str methods Message-ID: On Fri, Aug 9, 2013 at 3:21 PM, Serhiy Storchaka wrote: > 09.08.13 02:17, Alexander Belopolsky ???????(??): > > This group need an FAQ. Any volunteers to create a "Frequently Rejected >> Ideas" list? >> > > This is a great idea. Should not it be the first entity in this FAQ? > > http://mail.python.org/**pipermail/python-dev/2011-** >> September/113488.html >> > > +1 Another one that came up recently is http://bugs.python.org/issue2186#msg63026 -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Fri Aug 9 22:19:31 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 09 Aug 2013 16:19:31 -0400 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <5203EE8B.2020205@egenix.com> References: <5203EE8B.2020205@egenix.com> Message-ID: <1376079571.1701.8054723.755FB579@webmail.messagingengine.com> On Thu, Aug 8, 2013, at 14:58, MRAB wrote: > I use [expandtabs] to ensure that indentation is always 4 spaces and never tabs > in the regex release in case I've missed a setting in an editor > somewhere, so it _is_ an important method! :-) On Thu, Aug 8, 2013, at 15:16, M.-A. Lemburg wrote: > If you have you ever typed on a keyboard with shift lock enabled, > you'd know what [swapcase is] needed for ;-) What are these text editors that let you call a python function on a range/selection? I've run into both of these situations, and am quite familiar with the Vim commands for fixing them. From random832 at fastmail.us Fri Aug 9 22:25:23 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 09 Aug 2013 16:25:23 -0400 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <520449C2.2020705@pearwood.info> References: <520449C2.2020705@pearwood.info> Message-ID: <1376079923.4927.8056135.1CD2FE02@webmail.messagingengine.com> On Thu, Aug 8, 2013, at 21:45, Steven D'Aprano wrote: > Deprecation is only useful when there is a concrete plan to remove the > feature. Python 4000 is so far in the future that it's more of a dream > than an actual concrete planned release (this is why I don't refer to it > as Python 4). There may not even be a Python 4000. Who knows? Maybe there needs to be an official "to do list" for ideas to be re-examined when Python 4000 becomes a more concrete thing. I think a "list of warts" has been proposed and rejected recently, but I got the impression that was due to the perceived pejorative-ness of calling something that. I would add str % stuff to the list of things that should be deprecated... has anyone done any work on a converter for that, that could be included in a hypothetical 3to4? From random832 at fastmail.us Fri Aug 9 22:29:14 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 09 Aug 2013 16:29:14 -0400 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <20130809101017.1fd043f5@pitrou.net> References: <20130809101017.1fd043f5@pitrou.net> Message-ID: <1376080154.5472.8058283.3C75E1D2@webmail.messagingengine.com> On Fri, Aug 9, 2013, at 4:10, Antoine Pitrou wrote: > -1 for deprecating those 3, they are much more readable than the format > equivalent (seriously, those smiley-like format specs are horrible...). 99% of the time you're not going to use them in the real world. Yes, the format string in this: s.rjust(width, fillchar) == '{:{}>{}}'.format(s, fillchar, width) is unpleasant, but you should think of it more as this: s.rjust(42,'x') == '{:x>42}'.format(s) since 90% of the time width is constant, 99% for fillchar. From random832 at fastmail.us Fri Aug 9 22:33:43 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 09 Aug 2013 16:33:43 -0400 Subject: [Python-ideas] Deprecating repr() and the like In-Reply-To: References: Message-ID: <1376080423.6361.8059615.06C65630@webmail.messagingengine.com> On Fri, Aug 9, 2013, at 15:48, Serhiy Storchaka wrote: > repr(), ascii() and one-argument str() can be deprecated and removed in > Python 5.0 because they are redundant and can be expressed as a simple > formatting: > > repr(x) == '%r'%(x,) == '{!r}'.format(x) > ascii(x) == '%a'%(x,) == '{!a}'.format(x) > str(x) == '%s'%(x,) == '{}'.format(x) > > We also can deprecate string concatenation for the same reason: > > s1 + s2 == '%s%s'%(s1,s1) == '{}{}'.format(s1,s2) == ''.join([s1,s2]) Is this a serious proposal, or a means of expressing displeasure with another recent proposal? I have responses to each of those cases, but need to know which one to post. From taleinat at gmail.com Fri Aug 9 22:39:05 2013 From: taleinat at gmail.com (Tal Einat) Date: Fri, 9 Aug 2013 23:39:05 +0300 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <5203EA5A.4060405@mrabarnett.plus.com> Message-ID: I'll take the bait and make a list. It would be helpful if those who follow this list (and perhaps other discussions, e.g. python-dev) could point out (just by the title) some often discussed and rejected proposals. I'll go over the archives later, but a list of things to look for would help. - Tal Einat On Fri, Aug 9, 2013 at 2:17 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > > > On Thu, Aug 8, 2013 at 2:58 PM, MRAB wrote: > >> On 08/08/2013 19:32, Serhiy Storchaka wrote: >> >>> .. >> >> str.swapcase() is just not needed. >>> >>> I've never used .swapcase, and I doubt I ever will. >> > > This group need an FAQ. Any volunteers to create a "Frequently Rejected > Ideas" list? > > http://mail.python.org/pipermail/python-dev/2011-September/113488.html > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michelelacchia at gmail.com Fri Aug 9 22:40:10 2013 From: michelelacchia at gmail.com (Michele Lacchia) Date: Fri, 9 Aug 2013 22:40:10 +0200 Subject: [Python-ideas] Deprecating repr() and the like In-Reply-To: References: Message-ID: Isn't Python 5 too far away to think about it? Il giorno 09/ago/2013 22:07, "Serhiy Storchaka" ha scritto: > repr(), ascii() and one-argument str() can be deprecated and removed in > Python 5.0 because they are redundant and can be expressed as a simple > formatting: > > repr(x) == '%r'%(x,) == '{!r}'.format(x) > ascii(x) == '%a'%(x,) == '{!a}'.format(x) > str(x) == '%s'%(x,) == '{}'.format(x) > > We also can deprecate string concatenation for the same reason: > > s1 + s2 == '%s%s'%(s1,s1) == '{}{}'.format(s1,s2) == ''.join([s1,s2]) > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Aug 9 22:46:13 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Aug 2013 22:46:13 +0200 Subject: [Python-ideas] Deprecating rarely used str methods References: <20130809101017.1fd043f5@pitrou.net> <1376080154.5472.8058283.3C75E1D2@webmail.messagingengine.com> Message-ID: <20130809224613.259450c0@fsol> On Fri, 09 Aug 2013 16:29:14 -0400 random832 at fastmail.us wrote: > On Fri, Aug 9, 2013, at 4:10, Antoine Pitrou wrote: > > -1 for deprecating those 3, they are much more readable than the format > > equivalent (seriously, those smiley-like format specs are horrible...). > > 99% of the time you're not going to use them in the real world. > > Yes, the format string in this: > s.rjust(width, fillchar) == '{:{}>{}}'.format(s, fillchar, width) > is unpleasant, but you should think of it more as this: > s.rjust(42,'x') == '{:x>42}'.format(s) ... which doesn't strike me as very readable either, let alone easy to remember. Regards Antoine. From tim.peters at gmail.com Fri Aug 9 22:52:00 2013 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 9 Aug 2013 15:52:00 -0500 Subject: [Python-ideas] Deprecating repr() and the like In-Reply-To: References: Message-ID: [Serhiy Storchaka] >> repr(), ascii() and one-argument str() can be deprecated and removed in >> Python 5.0 because they are redundant and can be expressed as a simple >> formatting: >> >> repr(x) == '%r'%(x,) == '{!r}'.format(x) >> ascii(x) == '%a'%(x,) == '{!a}'.format(x) >> str(x) == '%s'%(x,) == '{}'.format(x) >> >> We also can deprecate string concatenation for the same reason: >> >> s1 + s2 == '%s%s'%(s1,s1) == '{}{}'.format(s1,s2) == ''.join([s1,s2]) And don't forget the digit 1! It looks too much like lowercase letter L. It would be silly require constructs like 42 // 42 instead, so let's add a new builtin "one": >>> 1 SyntaxError: invalid syntax >>> one == 42 // 42 True >>> one 1 I'm not sure how to replace the confusing output from that last line ;-) From solipsis at pitrou.net Fri Aug 9 22:56:56 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Aug 2013 22:56:56 +0200 Subject: [Python-ideas] Deprecating repr() and the like References: Message-ID: <20130809225656.638135b9@fsol> On Fri, 9 Aug 2013 15:52:00 -0500 Tim Peters wrote: > [Serhiy Storchaka] > >> repr(), ascii() and one-argument str() can be deprecated and removed in > >> Python 5.0 because they are redundant and can be expressed as a simple > >> formatting: > >> > >> repr(x) == '%r'%(x,) == '{!r}'.format(x) > >> ascii(x) == '%a'%(x,) == '{!a}'.format(x) > >> str(x) == '%s'%(x,) == '{}'.format(x) > >> > >> We also can deprecate string concatenation for the same reason: > >> > >> s1 + s2 == '%s%s'%(s1,s1) == '{}{}'.format(s1,s2) == ''.join([s1,s2]) > > And don't forget the digit 1! It looks too much like lowercase letter > L. It would be silly require constructs like 42 // 42 instead, so > let's add a new builtin "one": > > >>> 1 > SyntaxError: invalid syntax > >>> one == 42 // 42 > True > >>> one > 1 > > I'm not sure how to replace the confusing output from that last line ;-) Just fix one's repr() (which doesn't exist anymore). Regards Antoine. From mal at egenix.com Fri Aug 9 23:28:05 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 09 Aug 2013 23:28:05 +0200 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <5203EE8B.2020205@egenix.com> Message-ID: <52055EE5.6030104@egenix.com> On 09.08.2013 21:34, Serhiy Storchaka wrote: > 08.08.13 22:16, M.-A. Lemburg ???????(??): >> I don't think any of the .format() alternatives you listed is >> anywhere near as obvious as the methods :-) > > I confess that I be forced to look into documentation to write these examples. ;-) It's because I > don't use new style formatting very often. However C-style alternatives looks enough obvious to me. > >> If you have you ever typed on a keyboard with shift lock enabled, >> you'd know what it's needed for ;-) > > Are .swapcase() swaps ";" and ":"? Or "q" and "?" (on my computers I use CapsLock to switch keyboard > layout between English and Ukrainian)? ;-) I guess it's one of those functions that someone invented as example of an involutary string function for a tutorial late in the 1980s. It then probably developed a life of its own ;-) There's even a YouTube video on it: http://www.youtube.com/watch?v=RPrqhB1dmJA (the author obviously has a Java background ;-)) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 09 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From taleinat at gmail.com Fri Aug 9 23:24:48 2013 From: taleinat at gmail.com (Tal Einat) Date: Sat, 10 Aug 2013 00:24:48 +0300 Subject: [Python-ideas] Frequently Rejected Ideas Was: Deprecating rarely used str methods In-Reply-To: References: Message-ID: (cross-posting from the original thread) I'll take the bait and make a list. It would be helpful if those who follow this list (and perhaps other forums, e.g. python-dev which I don't read) could point out (even just by headline) some often discussed and rejected proposals. I'll go over the archives later, but a list of things to look for would help. On Fri, Aug 9, 2013 at 11:18 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > > > On Fri, Aug 9, 2013 at 3:21 PM, Serhiy Storchaka wrote: > >> 09.08.13 02:17, Alexander Belopolsky ???????(??): >> >> This group need an FAQ. Any volunteers to create a "Frequently Rejected >>> Ideas" list? >>> >> >> This is a great idea. Should not it be the first entity in this FAQ? >> >> http://mail.python.org/**pipermail/python-dev/2011-** >>> September/113488.html >>> >> >> > +1 > > Another one that came up recently is > > http://bugs.python.org/issue2186#msg63026 > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Aug 9 23:57:52 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 9 Aug 2013 14:57:52 -0700 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <52055EE5.6030104@egenix.com> References: <5203EE8B.2020205@egenix.com> <52055EE5.6030104@egenix.com> Message-ID: On Fri, Aug 9, 2013 at 2:28 PM, M.-A. Lemburg wrote: > I guess it's one of those functions that someone invented as > example of an involutary string function for a tutorial late in > the 1980s. It then probably developed a life of its own ;-) > Honestly I don't recall why I ever thought I needed it. But it is there in string.py in Python 0.9.1. It almost looks like it was just too cool not to add -- the lower(), upper() an caseswap() functions share a dict mapping between lowercase <--> uppercase. > There's even a YouTube video on it: > > http://www.youtube.com/watch?v=RPrqhB1dmJA > > (the author obviously has a Java background ;-)) Oh, wow, that's insane. He's trained himself to type #{ and #} around each block before filling in the body of the block. :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Aug 10 00:11:15 2013 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 9 Aug 2013 23:11:15 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <1376079923.4927.8056135.1CD2FE02@webmail.messagingengine.com> References: <520449C2.2020705@pearwood.info> <1376079923.4927.8056135.1CD2FE02@webmail.messagingengine.com> Message-ID: On Fri, Aug 9, 2013 at 9:25 PM, wrote: > I would add str % stuff to the list of things that should be > deprecated... has anyone done any work on a converter for that, that > could be included in a hypothetical 3to4? Why should it be deprecated, though? This keeps coming up - it's too useful to throw away. The sprintf codes are well known from other languages, and they're expressive. The only change I might support would be making it an actual function 'sprintf' rather than an operator on the str class, but that feels like bikeshedding and completely unnecessary incompatibility. ChrisA From ron3200 at gmail.com Sat Aug 10 03:51:49 2013 From: ron3200 at gmail.com (Ron Adam) Date: Fri, 09 Aug 2013 20:51:49 -0500 Subject: [Python-ideas] Deprecating repr() and the like In-Reply-To: References: Message-ID: On 08/09/2013 03:52 PM, Tim Peters wrote: > [Serhiy Storchaka] >>> >>repr(), ascii() and one-argument str() can be deprecated and removed in >>> >>Python 5.0 because they are redundant and can be expressed as a simple >>> >>formatting: >>> >> >>> >>repr(x) == '%r'%(x,) == '{!r}'.format(x) >>> >>ascii(x) == '%a'%(x,) == '{!a}'.format(x) >>> >>str(x) == '%s'%(x,) == '{}'.format(x) >>> >> >>> >>We also can deprecate string concatenation for the same reason: >>> >> >>> >>s1 + s2 == '%s%s'%(s1,s1) == '{}{}'.format(s1,s2) == ''.join([s1,s2]) > And don't forget the digit 1! It looks too much like lowercase letter > L. It would be silly require constructs like 42 // 42 instead, so > let's add a new builtin "one": > >>>> >>>1 > SyntaxError: invalid syntax >>>> >>>one == 42 // 42 > True >>>> >>>one > 1 > > I'm not sure how to replace the confusing output from that last line;-) Surely it should round trip ... eval(str(one)) == one and int("one") == one ;-) These recent threads, with many little things to improve, replace, include, or depreciate, causes me to think we've reached a point where we need some input and guidance on what direction to go in. Cheers, Ron From steve at pearwood.info Sat Aug 10 04:35:50 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 10 Aug 2013 12:35:50 +1000 Subject: [Python-ideas] Deprecating repr() and the like In-Reply-To: References: Message-ID: <5205A706.8000703@pearwood.info> On 10/08/13 06:52, Tim Peters wrote: > And don't forget the digit 1! It looks too much like lowercase letter > L. It would be silly require constructs like 42 // 42 instead, so > let's add a new builtin "one": Heh heh :-) I would like to point out that Apple's Hypertalk language defined constants one through (I think) twelve, so you could write code like this: add one to count get word four of line two of text put it into field "Something" which worked well given the audience it was aimed at. But that sort of verbosity it doesn't really suit Python, which aims to be executable pseudo-code rather than pseudo-English. Do you know what else doesn't really suit Python? An over-reliance on Perl-like cryptic symbols. Of the three code snippets: repr(obj).rjust(10) "%10r" % obj "{!r:>10}".format(obj) there is no doubt in my mind that the first is more Pythonic. The second is much terser (and also potentially buggy, if obj happens to be a tuple), while the third manages to combine the cryptic use of symbols from the second with the verbosity of the first, and so satisfies nobody :-) Having the option to give up readability for terseness is a good and positive thing. But neither should be the only-one-way to do it. repr() and string methods are the obvious way to do it. If symbols were more Pythonic, rather than named functions, we'd have kept backticks `obj` and removed repr(obj). But we didn't. -1 on deprecating named functions like repr. -100 on deprecating named functions for removal in Python 4000 when there isn't even a schedule for Python 4000 yet. There isn't even a schedule for coming up with a schedule. It is premature to be making concrete code changes with an aim to Python 4000 yet. There may not even be a Python 4000, and if there is, it may not break backwards compatibility. -- Steven From abarnert at yahoo.com Sat Aug 10 04:42:09 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 9 Aug 2013 19:42:09 -0700 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <520449C2.2020705@pearwood.info> <1376079923.4927.8056135.1CD2FE02@webmail.messagingengine.com> Message-ID: <280018CB-FD33-4C06-812E-54C304AECE87@yahoo.com> On Aug 9, 2013, at 15:11, Chris Angelico wrote: > On Fri, Aug 9, 2013 at 9:25 PM, wrote: >> I would add str % stuff to the list of things that should be >> deprecated... has anyone done any work on a converter for that, that >> could be included in a hypothetical 3to4? > > Why should it be deprecated, though? I agree that there's no point arguing this out yet again. But I don't understand why so many people seem so baffled by the opposite position. Having two very different and relatively complex mini languages for the same purpose is a burden. Not having the same format strings as every other language in the world would also be a burden. Nobody can seriously believe that the other side really doesn't understand their point when the points are this obvious. From cs at zip.com.au Sat Aug 10 04:31:39 2013 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 10 Aug 2013 12:31:39 +1000 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <52041A07.3090000@canterbury.ac.nz> References: <52041A07.3090000@canterbury.ac.nz> Message-ID: <20130810023139.GA24224@cskk.homeip.net> On 09Aug2013 10:21, Greg Ewing wrote: | M.-A. Lemburg wrote: | >If you have you ever typed on a keyboard with shift lock enabled, | | ...on a Windows or Linux PC. If you're using a Mac, | no algorithm will rescue you from that blunder. :-) One of the first things I do on a new machine, Linux or Mac, is to disable the capslock key. It is a simple keyboard pref on the Mac and a kept-in-a-file xmodmap incantation on Linux (assuming X11 desktop). Cheers, -- Cameron Simpson You my man are a danger to society and should be taken out of society for all our sakes. As to what is done to you once removed I couldn't care less. - Roy G. Culley, Unix Systems Administrator From joshua at landau.ws Sat Aug 10 04:53:51 2013 From: joshua at landau.ws (Joshua Landau) Date: Sat, 10 Aug 2013 03:53:51 +0100 Subject: [Python-ideas] Deprecating repr() and the like In-Reply-To: <5205A706.8000703@pearwood.info> References: <5205A706.8000703@pearwood.info> Message-ID: On 10 August 2013 03:35, Steven D'Aprano wrote: > Do you know what else doesn't really suit Python? An over-reliance on > Perl-like cryptic symbols. Of the three code snippets: > > repr(obj).rjust(10) > > "%10r" % obj > > "{!r:>10}".format(obj) > > there is no doubt in my mind that the first is more Pythonic. The second is > much terser (and also potentially buggy, if obj happens to be a tuple), > while the third manages to combine the cryptic use of symbols from the > second with the verbosity of the first, and so satisfies nobody :-) > > Having the option to give up readability for terseness is a good and > positive thing. But neither should be the only-one-way to do it. repr() and > string methods are the obvious way to do it. Given that I'd use: "{:>10}".format(repr(obj)) I don't think the argument is about terseness. > If symbols were more Pythonic, rather than named functions, we'd have kept > backticks `obj` and removed repr(obj). But we didn't. But no-one actually knew about the backticks ;D (they were a bad idea nonetheless). From oscar.j.benjamin at gmail.com Sat Aug 10 04:54:33 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Sat, 10 Aug 2013 03:54:33 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <20130810023139.GA24224@cskk.homeip.net> References: <52041A07.3090000@canterbury.ac.nz> <20130810023139.GA24224@cskk.homeip.net> Message-ID: On 10 August 2013 03:31, Cameron Simpson wrote: > On 09Aug2013 10:21, Greg Ewing wrote: > | M.-A. Lemburg wrote: > | >If you have you ever typed on a keyboard with shift lock enabled, > | > | ...on a Windows or Linux PC. If you're using a Mac, > | no algorithm will rescue you from that blunder. :-) > > One of the first things I do on a new machine, Linux or Mac, is to > disable the capslock key. It is a simple keyboard pref on the Mac > and a kept-in-a-file xmodmap incantation on Linux (assuming X11 > desktop). Interesting. That's a cleverer solution than mine which is to just remove the physical key (with a screwdriver if necessary or whatever else comes to hand). Oscar From rosuav at gmail.com Sat Aug 10 05:58:41 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 10 Aug 2013 04:58:41 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <280018CB-FD33-4C06-812E-54C304AECE87@yahoo.com> References: <520449C2.2020705@pearwood.info> <1376079923.4927.8056135.1CD2FE02@webmail.messagingengine.com> <280018CB-FD33-4C06-812E-54C304AECE87@yahoo.com> Message-ID: On Sat, Aug 10, 2013 at 3:42 AM, Andrew Barnert wrote: > On Aug 9, 2013, at 15:11, Chris Angelico wrote: > >> On Fri, Aug 9, 2013 at 9:25 PM, wrote: >>> I would add str % stuff to the list of things that should be >>> deprecated... has anyone done any work on a converter for that, that >>> could be included in a hypothetical 3to4? >> >> Why should it be deprecated, though? > > I agree that there's no point arguing this out yet again. > > But I don't understand why so many people seem so baffled by the opposite position. Having two very different and relatively complex mini languages for the same purpose is a burden. Not having the same format strings as every other language in the world would also be a burden. Nobody can seriously believe that the other side really doesn't understand their point when the points are this obvious. Oh, I can see the other side's arguments. If str.format existed and str% didn't, there would be insufficient grounds to add it. But they both exist, and the arguments for removing a feature have to be insanely strong. Status quo wins easily. ChrisA From stefan_ml at behnel.de Sat Aug 10 06:32:58 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 10 Aug 2013 06:32:58 +0200 Subject: [Python-ideas] Frequently Rejected Ideas Was: Deprecating rarely used str methods In-Reply-To: References: Message-ID: Tal Einat, 09.08.2013 23:24: > It would be helpful if those who follow this list (and perhaps other > forums, e.g. python-dev which I don't read) could point out (even just by > headline) some often discussed and rejected proposals. I'll go over the > archives later, but a list of things to look for would help. This is relevant: http://www.python.org/dev/peps/pep-3099/ Stefan From joshua at landau.ws Sat Aug 10 07:36:39 2013 From: joshua at landau.ws (Joshua Landau) Date: Sat, 10 Aug 2013 06:36:39 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <520449C2.2020705@pearwood.info> <1376079923.4927.8056135.1CD2FE02@webmail.messagingengine.com> <280018CB-FD33-4C06-812E-54C304AECE87@yahoo.com> Message-ID: On 10 August 2013 04:58, Chris Angelico wrote: > On Sat, Aug 10, 2013 at 3:42 AM, Andrew Barnert wrote: >> On Aug 9, 2013, at 15:11, Chris Angelico wrote: >> >>> On Fri, Aug 9, 2013 at 9:25 PM, wrote: >>>> I would add str % stuff to the list of things that should be >>>> deprecated... has anyone done any work on a converter for that, that >>>> could be included in a hypothetical 3to4? >>> >>> Why should it be deprecated, though? >> >> I agree that there's no point arguing this out yet again. >> >> But I don't understand why so many people seem so baffled by the opposite position. Having two very different and relatively complex mini languages for the same purpose is a burden. Not having the same format strings as every other language in the world would also be a burden. Nobody can seriously believe that the other side really doesn't understand their point when the points are this obvious. > > > Oh, I can see the other side's arguments. If str.format existed and > str% didn't, there would be insufficient grounds to add it. But they > both exist, and the arguments for removing a feature have to be > insanely strong. Status quo wins easily. But the arguments for deprecating a feature in favour of the other (so as to aid standardisation) without imminent removal plans don't have to be as strong. Anyway, I've learnt this is not an argument I can win so I'm not trying to prove anything. From joshua at landau.ws Sat Aug 10 07:37:30 2013 From: joshua at landau.ws (Joshua Landau) Date: Sat, 10 Aug 2013 06:37:30 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <20130810023139.GA24224@cskk.homeip.net> References: <52041A07.3090000@canterbury.ac.nz> <20130810023139.GA24224@cskk.homeip.net> Message-ID: On 10 August 2013 03:31, Cameron Simpson wrote: > On 09Aug2013 10:21, Greg Ewing wrote: > | M.-A. Lemburg wrote: > | >If you have you ever typed on a keyboard with shift lock enabled, > | > | ...on a Windows or Linux PC. If you're using a Mac, > | no algorithm will rescue you from that blunder. :-) > > One of the first things I do on a new machine, Linux or Mac, is to > disable the capslock key. It is a simple keyboard pref on the Mac > and a kept-in-a-file xmodmap incantation on Linux (assuming X11 > desktop). Rather, map it to a new modifier. Mod3 is almost always free. From michelelacchia at gmail.com Sat Aug 10 08:05:54 2013 From: michelelacchia at gmail.com (Michele Lacchia) Date: Sat, 10 Aug 2013 08:05:54 +0200 Subject: [Python-ideas] Deprecating repr() and the like In-Reply-To: <5205A706.8000703@pearwood.info> References: <5205A706.8000703@pearwood.info> Message-ID: Il giorno 10/ago/2013 04:37, "Steven D'Aprano" ha scritto: > > Do you know what else doesn't really suit Python? An over-reliance on Perl-like cryptic symbols. Of the three code snippets: > > repr(obj).rjust(10) > > "%10r" % obj > > "{!r:>10}".format(obj) > > there is no doubt in my mind that the first is more Pythonic. The second is much terser (and also potentially buggy, if obj happens to be a tuple), while the third manages to combine the cryptic use of symbols from the second with the verbosity of the first, and so satisfies nobody :-) > I think that it's more about the context you are doing string manipulation in. I would surely use format() if I have to do many complex text manipulations - while logging for example. On the other hand, if I just need ljust(), rjust(), or something similar (maybe even in a more functional context) then I wouldn't use format(). What's true is that I never remember the most complex transformations of the format minilanguage so I always have to look it up in the docs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Sat Aug 10 09:43:30 2013 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 10 Aug 2013 17:43:30 +1000 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: Message-ID: <20130810074330.GA21082@cskk.homeip.net> On 10Aug2013 06:37, Joshua Landau wrote: | On 10 August 2013 03:31, Cameron Simpson wrote: | > One of the first things I do on a new machine, Linux or Mac, is to | > disable the capslock key. It is a simple keyboard pref on the Mac | > and a kept-in-a-file xmodmap incantation on Linux (assuming X11 | > desktop). | | Rather, map it to a new modifier. Mod3 is almost always free. I think the relevant bit for me is: add control = Caps_Lock i.e. make CapsLock another Control key under X11. -- Cameron Simpson Cordless hoses have been around for quite some time. They're called buckets. - Dan Prener From rosuav at gmail.com Sat Aug 10 10:09:30 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 10 Aug 2013 09:09:30 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <520449C2.2020705@pearwood.info> <1376079923.4927.8056135.1CD2FE02@webmail.messagingengine.com> <280018CB-FD33-4C06-812E-54C304AECE87@yahoo.com> Message-ID: On Sat, Aug 10, 2013 at 6:36 AM, Joshua Landau wrote: > On 10 August 2013 04:58, Chris Angelico wrote: >> Oh, I can see the other side's arguments. If str.format existed and >> str% didn't, there would be insufficient grounds to add it. But they >> both exist, and the arguments for removing a feature have to be >> insanely strong. Status quo wins easily. > > But the arguments for deprecating a feature in favour of the other (so > as to aid standardisation) without imminent removal plans don't have > to be as strong. What does deprecation mean if you aren't planning to remove it? Why should I change my code if it's not going to break? Deprecation has to be backed by intent to remove. ChrisA From stefan_ml at behnel.de Sat Aug 10 10:41:38 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 10 Aug 2013 10:41:38 +0200 Subject: [Python-ideas] Frequently Rejected Ideas Was: Deprecating rarely used str methods In-Reply-To: References: Message-ID: Stefan Behnel, 10.08.2013 06:32: > Tal Einat, 09.08.2013 23:24: >> It would be helpful if those who follow this list (and perhaps other >> forums, e.g. python-dev which I don't read) could point out (even just by >> headline) some often discussed and rejected proposals. I'll go over the >> archives later, but a list of things to look for would help. > > This is relevant: > > http://www.python.org/dev/peps/pep-3099/ BTW, the FAQ should go into wiki.python.org, I'd say. Stefan From joshua at landau.ws Sat Aug 10 11:26:27 2013 From: joshua at landau.ws (Joshua Landau) Date: Sat, 10 Aug 2013 10:26:27 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <20130810074330.GA21082@cskk.homeip.net> References: <20130810074330.GA21082@cskk.homeip.net> Message-ID: On 10 August 2013 08:43, Cameron Simpson wrote: > On 10Aug2013 06:37, Joshua Landau wrote: > | On 10 August 2013 03:31, Cameron Simpson wrote: > | > One of the first things I do on a new machine, Linux or Mac, is to > | > disable the capslock key. It is a simple keyboard pref on the Mac > | > and a kept-in-a-file xmodmap incantation on Linux (assuming X11 > | > desktop). > | > | Rather, map it to a new modifier. Mod3 is almost always free. > > I think the relevant bit for me is: > > add control = Caps_Lock > > i.e. make CapsLock another Control key under X11. Then you have three (3!!) control keys, two of which are spaced apart by a single button. I find I'm quick to run out of shortcuts; an extra modifier helps keep things flatter (no Ctrl-Alt-Mod-Shift-PgDn ;p). From joshua at landau.ws Sat Aug 10 11:24:10 2013 From: joshua at landau.ws (Joshua Landau) Date: Sat, 10 Aug 2013 10:24:10 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <520449C2.2020705@pearwood.info> <1376079923.4927.8056135.1CD2FE02@webmail.messagingengine.com> <280018CB-FD33-4C06-812E-54C304AECE87@yahoo.com> Message-ID: On 10 August 2013 09:09, Chris Angelico wrote: > On Sat, Aug 10, 2013 at 6:36 AM, Joshua Landau wrote: >> On 10 August 2013 04:58, Chris Angelico wrote: >>> Oh, I can see the other side's arguments. If str.format existed and >>> str% didn't, there would be insufficient grounds to add it. But they >>> both exist, and the arguments for removing a feature have to be >>> insanely strong. Status quo wins easily. >> >> But the arguments for deprecating a feature in favour of the other (so >> as to aid standardisation) without imminent removal plans don't have >> to be as strong. > > What does deprecation mean if you aren't planning to remove it? Why > should I change my code if it's not going to break? Deprecation has to > be backed by intent to remove. without *imminent* removal plans Also, such a measure would primarily affect new code. From tshepang at gmail.com Sat Aug 10 12:36:14 2013 From: tshepang at gmail.com (Tshepang Lekhonkhobe) Date: Sat, 10 Aug 2013 12:36:14 +0200 Subject: [Python-ideas] Frequently Rejected Ideas Was: Deprecating rarely used str methods In-Reply-To: References: Message-ID: On Sat, Aug 10, 2013 at 10:41 AM, Stefan Behnel wrote: > BTW, the FAQ should go into wiki.python.org, I'd say. Official docs look a lot better than the (super-ugly) wiki. rST is also a lot nicer to work with (or I'm just used to it). Anyways, what's the advantage of moving stuff to the wiki? From stefan_ml at behnel.de Sat Aug 10 12:39:15 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 10 Aug 2013 12:39:15 +0200 Subject: [Python-ideas] Frequently Rejected Ideas Was: Deprecating rarely used str methods In-Reply-To: References: Message-ID: Tshepang Lekhonkhobe, 10.08.2013 12:36: > On Sat, Aug 10, 2013 at 10:41 AM, Stefan Behnel wrote: >> BTW, the FAQ should go into wiki.python.org, I'd say. > > Official docs look a lot better than the (super-ugly) wiki. rST is > also a lot nicer to work with (or I'm just used to it). > > Anyways, what's the advantage of moving stuff to the wiki? That it can be edited by "normal" people? Stefan From tshepang at gmail.com Sat Aug 10 13:07:11 2013 From: tshepang at gmail.com (Tshepang Lekhonkhobe) Date: Sat, 10 Aug 2013 13:07:11 +0200 Subject: [Python-ideas] Frequently Rejected Ideas Was: Deprecating rarely used str methods In-Reply-To: References: Message-ID: On Sat, Aug 10, 2013 at 12:39 PM, Stefan Behnel wrote: > Tshepang Lekhonkhobe, 10.08.2013 12:36: >> On Sat, Aug 10, 2013 at 10:41 AM, Stefan Behnel wrote: >>> BTW, the FAQ should go into wiki.python.org, I'd say. >> >> Official docs look a lot better than the (super-ugly) wiki. rST is >> also a lot nicer to work with (or I'm just used to it). >> >> Anyways, what's the advantage of moving stuff to the wiki? > > That it can be edited by "normal" people? So, "normal" people are those who can't be bothered to report the issue, and propose an entry (or a fix)? I think that extra layer does help so that a core dev can determine if the entry actually deserves inclusion. Reporting the issue can at least lead to that discussion. The wiki does not have that advantage (other than that core devs have the option of subscribing to the page). From ncoghlan at gmail.com Sat Aug 10 13:07:58 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 10 Aug 2013 21:07:58 +1000 Subject: [Python-ideas] Make test.test_support an alias to test.support In-Reply-To: References: Message-ID: On 9 August 2013 07:34, Terry Reedy wrote: > While I strongly feel we should have done this, and some other aliases (like > Tkinter as alias for tkinter, etc, and dump 'lib-tk') in 2.7.0, it seems too > late for these changes now. The problem, as with any new 2.7 feature, is > that if 'support' were added in 2.7.6, test code that depends on 'support' > would not run on 2.7.5-. While the contents of /test are 'internal use > only', it seems to me that being able to run the test suite is a documented, > public feature. > > However, we could say that one should only run the 2.7.z test suite with the > 2.7.z interpreter and stdlib. This is certainly the case - you can't run the test suite for a later version against an earlier version and expect it to work. > If we do that, I would like to change the > illegal name 'lib-tk' to '_libtk' and change the tk/ttk tests accordingly. > If 'can change or be removed without notice between releases of Python.' > refers to bug-fix releases as well as version releases, I might propose a > few more changes, or rather, to also backport proposed changes for 3.4 to > 2.7. Don't go too overboard with it, but the test suite is definitely more open to updates than the standard library itself. (e.g. the conversion of test.support to a package was applied to the 3.3 maintenance branch to keep it in sync with the default branch). And PEP 434 certainly applies to IDLE's test suite in addition to IDLE itself. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From taleinat at gmail.com Sat Aug 10 13:40:23 2013 From: taleinat at gmail.com (Tal Einat) Date: Sat, 10 Aug 2013 14:40:23 +0300 Subject: [Python-ideas] Frequently Rejected Ideas Was: Deprecating rarely used str methods In-Reply-To: References: Message-ID: On Sat, Aug 10, 2013 at 1:36 PM, Tshepang Lekhonkhobe wrote: > On Sat, Aug 10, 2013 at 10:41 AM, Stefan Behnel wrote: >> BTW, the FAQ should go into wiki.python.org, I'd say. > > Official docs look a lot better than the (super-ugly) wiki. rST is > also a lot nicer to work with (or I'm just used to it). > > Anyways, what's the advantage of moving stuff to the wiki? The official docs are for documentation of Python itself, and are versioned along with the source code. I don't think that the list of frequently rejected proposals belongs there. In my opinion, the wiki is the right place for such a list. One reason is that this list will need to be updated quite often, which is much easier on a wiki, and has nothing to do with release dates of Python versions. - Tal Einat From ubershmekel at gmail.com Sat Aug 10 13:57:45 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sat, 10 Aug 2013 14:57:45 +0300 Subject: [Python-ideas] Frequently Rejected Ideas Was: Deprecating rarely used str methods In-Reply-To: References: Message-ID: On Sat, Aug 10, 2013 at 2:40 PM, Tal Einat wrote: > > In my opinion, the wiki is the right place for such a list. One reason > is that this list will need to be updated quite often, which is much > easier on a wiki, and has nothing to do with release dates of Python > versions. > Ok, let's try: http://wiki.python.org/moin/FrequentlyRejectedIdeas Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Aug 10 20:50:45 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 10 Aug 2013 14:50:45 -0400 Subject: [Python-ideas] Make test.test_support an alias to test.support In-Reply-To: References: Message-ID: On 8/10/2013 7:07 AM, Nick Coghlan wrote: > On 9 August 2013 07:34, Terry Reedy wrote: >> However, we could say that one should only run the 2.7.z test suite with the >> 2.7.z interpreter and stdlib. > > This is certainly the case - you can't run the test suite for a later > version against an earlier version and expect it to work. Thinking a bit more, of course. New tests for fixed bugs will not pass on earlier releases. That is the point of making bugfix releases. Less obvious is that tests can rely on a bug (no necessarily in the module directly being tested) and break when a future release fixes the bug. >> If we do that, I would like to change the >> illegal name 'lib-tk' to '_libtk' and change the tk/ttk tests accordingly. Looking further, I see that 2.x sys.path has .../lib/lib-tk added *after* .../lib, so I withdraw that idea. The 'problem' is that tk tests need to import lib/lib-tk/test (and its contents) but cannot do so directly because 'import test' would import lib/test instead of lib/lib-tk/test. Currently they engage in contortions to get around the name clash. To me, the better solution would be to eliminate the name clash by renaming lib-tk/test to lib-tk/tk_test or lib-tk/_tktest (to mark it as private). >> If 'can change or be removed without notice between releases of Python.' >> refers to bug-fix releases as well as version releases, I might propose a >> few more changes, or rather, to also backport proposed changes for 3.4 to >> 2.7. I am referring to http://bugs.python.org/issue18604 "Consolidate gui available checks in test.support" Any patch will necessarily change the tk test files in both test/ and lib-tk/test/ since part of the issue is to move gui check code from lib-tk/test/support to test/support. I will raise the backport question once there is an agreed-upon patch. > Don't go too overboard with it, but the test suite is definitely more > open to updates than the standard library itself. (e.g. the conversion > of test.support to a package was applied to the 3.3 maintenance branch > to keep it in sync with the default branch). > And PEP 434 certainly applies to IDLE's test suite in addition to IDLE itself. I think the reasoning of the PEP applies in part to the tkinter test suite, including the lib-tk part. That will be part of any backport discussion on the issue. -- Terry Jan Reedy From ben+python at benfinney.id.au Sun Aug 11 03:25:41 2013 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 11 Aug 2013 11:25:41 +1000 Subject: [Python-ideas] Frequently Rejected Ideas Was: Deprecating rarely used str methods References: Message-ID: <7wsiyhaui2.fsf@benfinney.id.au> Tshepang Lekhonkhobe writes: > On Sat, Aug 10, 2013 at 12:39 PM, Stefan Behnel wrote: > > Tshepang Lekhonkhobe, 10.08.2013 12:36: > > > Anyways, what's the advantage of moving stuff [from official > > > Python documentation] to the wiki? > > > > That it can be edited by "normal" people? > > So, "normal" people are those who can't be bothered to report the > issue, and propose an entry (or a fix)? Normal people are also those who want to avoid the requirement for reading and signing a legal document assigning special rights to the PSF, just to propose a fix. -- \ ?It's up to the masses to distribute [music] however they want | `\ ? The laws don't matter at that point. People sharing music in | _o__) their bedrooms is the new radio.? ?Neil Young, 2008-05-06 | Ben Finney From solipsis at pitrou.net Sun Aug 11 12:03:08 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 11 Aug 2013 12:03:08 +0200 Subject: [Python-ideas] Frequently Rejected Ideas Was: Deprecating rarely used str methods References: <7wsiyhaui2.fsf@benfinney.id.au> Message-ID: <20130811120308.6f536907@fsol> On Sun, 11 Aug 2013 11:25:41 +1000 Ben Finney wrote: > Tshepang Lekhonkhobe > writes: > > > On Sat, Aug 10, 2013 at 12:39 PM, Stefan Behnel wrote: > > > Tshepang Lekhonkhobe, 10.08.2013 12:36: > > > > Anyways, what's the advantage of moving stuff [from official > > > > Python documentation] to the wiki? > > > > > > That it can be edited by "normal" people? > > > > So, "normal" people are those who can't be bothered to report the > > issue, and propose an entry (or a fix)? > > Normal people are also those who want to avoid the requirement for > reading and signing a legal document assigning special rights to the > PSF, just to propose a fix. I don't think we ask for a CLA when someone submits a 10-line patch. The wiki would be a fine place for some kinds of documentation if it were a decent wiki. As far as I'm concerned, wiki.p.o is such a UI disaster that I'm totally reluctant to touch it (or even read it, actually). Regards Antoine. From storchaka at gmail.com Sun Aug 11 12:30:07 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 11 Aug 2013 13:30:07 +0300 Subject: [Python-ideas] Deprecating repr() and the like In-Reply-To: <1376080423.6361.8059615.06C65630@webmail.messagingengine.com> References: <1376080423.6361.8059615.06C65630@webmail.messagingengine.com> Message-ID: 09.08.13 23:33, random832 at fastmail.us ???????(??): > Is this a serious proposal, or a means of expressing displeasure with > another recent proposal? I have responses to each of those cases, but > need to know which one to post. This is totally non-serious. From storchaka at gmail.com Sun Aug 11 12:30:12 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 11 Aug 2013 13:30:12 +0300 Subject: [Python-ideas] Deprecating repr() and the like In-Reply-To: References: Message-ID: 09.08.13 23:52, Tim Peters ???????(??): > And don't forget the digit 1! It looks too much like lowercase letter > L. It would be silly require constructs like 42 // 42 instead, so > let's add a new builtin "one": > >>>> 1 > SyntaxError: invalid syntax >>>> one == 42 // 42 > True >>>> one > 1 > > I'm not sure how to replace the confusing output from that last line ;-) Use Enum. ;-) From stephen at xemacs.org Sun Aug 11 20:21:52 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 12 Aug 2013 03:21:52 +0900 Subject: [Python-ideas] Frequently Rejected Ideas Was: Deprecating rarely used str methods In-Reply-To: <7wsiyhaui2.fsf@benfinney.id.au> References: <7wsiyhaui2.fsf@benfinney.id.au> Message-ID: <87vc3cjdfj.fsf@uwakimon.sk.tsukuba.ac.jp> Ben Finney writes: > Normal people are also those who want to avoid the requirement for > reading and signing a legal document assigning special rights to the > PSF, just to propose a fix. Ben, you are welcome to dislike signing CAs, but please stop spreading FUD about the PSF's CA. The rights explicitly specified in the CA actually constitute *restrictions* on the PSF compared to the rights granted by the licenses themselves. Steve From random832 at fastmail.us Sun Aug 11 23:14:25 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sun, 11 Aug 2013 17:14:25 -0400 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <520449C2.2020705@pearwood.info> <1376079923.4927.8056135.1CD2FE02@webmail.messagingengine.com> <280018CB-FD33-4C06-812E-54C304AECE87@yahoo.com> Message-ID: <1376255665.22589.8591575.338DF56E@webmail.messagingengine.com> On Fri, Aug 9, 2013, at 23:58, Chris Angelico wrote: > Oh, I can see the other side's arguments. If str.format existed and > str% didn't, there would be insufficient grounds to add it. But they > both exist, and the arguments for removing a feature have to be > insanely strong. Status quo wins easily. What exactly was the sufficient grounds for adding str.format, and for giving it its own minilanguage instead of using the one % already uses [with extensions like being able to use both positional arguments and %(keyword)s, and maybe something like %1$s for explicit positional arguments]? From rosuav at gmail.com Sun Aug 11 23:42:35 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 11 Aug 2013 22:42:35 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <1376255665.22589.8591575.338DF56E@webmail.messagingengine.com> References: <520449C2.2020705@pearwood.info> <1376079923.4927.8056135.1CD2FE02@webmail.messagingengine.com> <280018CB-FD33-4C06-812E-54C304AECE87@yahoo.com> <1376255665.22589.8591575.338DF56E@webmail.messagingengine.com> Message-ID: On Sun, Aug 11, 2013 at 10:14 PM, wrote: > > > On Fri, Aug 9, 2013, at 23:58, Chris Angelico wrote: >> Oh, I can see the other side's arguments. If str.format existed and >> str% didn't, there would be insufficient grounds to add it. But they >> both exist, and the arguments for removing a feature have to be >> insanely strong. Status quo wins easily. > > What exactly was the sufficient grounds for adding str.format, and for > giving it its own minilanguage instead of using the one % already uses > [with extensions like being able to use both positional arguments and > %(keyword)s, and maybe something like %1$s for explicit positional > arguments]? Or %[3]s for the 3rd argument (zero-based, so "%s%[0]s" would duplicate the string). And %{...%} to repeat the inner content for every element of the argument iterable. There's a lot that can be done to extend sprintf notation, if someone wants to. But I don't know that Python wants to do that, unless str.format is to be obsoleted. Here's a few ideas that could be borrowed, if desired: http://pike.lysator.liu.se/generated/manual/modref/ex/predef_3A_3A/sprintf.html ChrisA From alexander.belopolsky at gmail.com Sun Aug 11 23:48:45 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 11 Aug 2013 17:48:45 -0400 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: <1376255665.22589.8591575.338DF56E@webmail.messagingengine.com> References: <520449C2.2020705@pearwood.info> <1376079923.4927.8056135.1CD2FE02@webmail.messagingengine.com> <280018CB-FD33-4C06-812E-54C304AECE87@yahoo.com> <1376255665.22589.8591575.338DF56E@webmail.messagingengine.com> Message-ID: On Sun, Aug 11, 2013 at 5:14 PM, wrote: > .. > > What exactly was the sufficient grounds for adding str.format, and for > giving it its own minilanguage instead of using the one % already uses > [with extensions like being able to use both positional arguments and > %(keyword)s, and maybe something like %1$s for explicit positional > arguments]? > http://www.python.org/dev/peps/pep-3101/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon Aug 12 00:50:45 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 11 Aug 2013 18:50:45 -0400 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <520449C2.2020705@pearwood.info> <1376079923.4927.8056135.1CD2FE02@webmail.messagingengine.com> <280018CB-FD33-4C06-812E-54C304AECE87@yahoo.com> <1376255665.22589.8591575.338DF56E@webmail.messagingengine.com> Message-ID: On 8/11/2013 5:48 PM, Alexander Belopolsky wrote: > > > > On Sun, Aug 11, 2013 at 5:14 PM, > > wrote: > > .. > > What exactly was the sufficient grounds for adding str.format, and for > giving it its own minilanguage instead of using the one % already uses > [with extensions like being able to use both positional arguments and > %(keyword)s, and maybe something like %1$s for explicit positional > arguments]? > > > http://www.python.org/dev/peps/pep-3101/ To add to that, the special treatment of tuples and sometimes dicts lead to bugs or unexpected exceptions. >>> '%s' % [1,] '[1]' >>> '%s' % (1,) '1' >>> '%s' % [1,2,3] '[1, 2, 3]' >>> '%s' % (1,2,3) Traceback (most recent call last): File "", line 1, in '%s' % (1,2,3) TypeError: not all arguments converted during string formatting -- Terry Jan Reedy From rosuav at gmail.com Mon Aug 12 00:58:07 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 11 Aug 2013 23:58:07 +0100 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <520449C2.2020705@pearwood.info> <1376079923.4927.8056135.1CD2FE02@webmail.messagingengine.com> <280018CB-FD33-4C06-812E-54C304AECE87@yahoo.com> <1376255665.22589.8591575.338DF56E@webmail.messagingengine.com> Message-ID: On Sun, Aug 11, 2013 at 11:50 PM, Terry Reedy wrote: > To add to that, the special treatment of tuples and sometimes dicts lead to > bugs or unexpected exceptions. >>>> '%s' % [1,] > '[1]' >>>> '%s' % (1,) > '1' >>>> '%s' % [1,2,3] > '[1, 2, 3]' >>>> '%s' % (1,2,3) > Traceback (most recent call last): > File "", line 1, in > '%s' % (1,2,3) > TypeError: not all arguments converted during string formatting This is why I would prefer printf to be a function, rather than an operator. The operator is a coolness that has a cost. ChrisA From ben+python at benfinney.id.au Mon Aug 12 01:19:21 2013 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 12 Aug 2013 09:19:21 +1000 Subject: [Python-ideas] Contributions to official documentation versus contributions to wiki (was: Frequently Rejected Ideas Was: Deprecating rarely used str methods) References: <7wsiyhaui2.fsf@benfinney.id.au> <20130811120308.6f536907@fsol> Message-ID: <7w1u5z6cjq.fsf_-_@benfinney.id.au> "Stephen J. Turnbull" writes: > Ben Finney writes: > > > Normal people are also those who want to avoid the requirement for > > reading and signing a legal document assigning special rights to the > > PSF, just to propose a fix. > > Ben, you are welcome to dislike signing CAs, but please stop spreading > FUD about the PSF's CA. My claim is factual, not FUD, and is entailed within the terms of the contributor agreement. > The rights explicitly specified in the CA actually constitute > *restrictions* on the PSF compared to the rights granted by the > licenses themselves. The contributor agreement grants to PSF the unilateral power to redistribute the contribution under ?any other open source license approved by [the PSF]?, a power not granted to other recipients of the contribution. So yes, it arrogates special rights to the PSF. Does this make the PSF awful? No, of course not. But I can't pretend it is acceptable to grant special terms to one party in the community. Antoine Pitrou writes: > On Sun, 11 Aug 2013 11:25:41 +1000 > Ben Finney wrote: > > Normal people are also those who want to avoid the requirement for > > reading and signing a legal document assigning special rights to the > > PSF, just to propose a fix. > > I don't think we ask for a CLA when someone submits a 10-line patch. Not true, at least in my experience. I have been asked to submit a contributor agreement for small patches to the documentation. Since I cannot in good conscience accept the PSF's requirements, they reject such contributions even under an acceptable all-parties-equal license. -- \ ?As soon as we abandon our own reason, and are content to rely | `\ upon authority, there is no end to our troubles.? ?Bertrand | _o__) Russell, _Unpopular Essays_, 1950 | Ben Finney From stephen at xemacs.org Mon Aug 12 05:14:46 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 12 Aug 2013 12:14:46 +0900 Subject: [Python-ideas] Contributions to official documentation versus contributions to wiki In-Reply-To: <7w1u5z6cjq.fsf_-_@benfinney.id.au> References: <7wsiyhaui2.fsf@benfinney.id.au> <20130811120308.6f536907@fsol> <7w1u5z6cjq.fsf_-_@benfinney.id.au> Message-ID: <87txivk3bt.fsf@uwakimon.sk.tsukuba.ac.jp> Ben Finney writes: > The contributor agreement grants to PSF the unilateral power to > redistribute the contribution under ?any other open source license > approved by [the PSF]?, a power not granted to other recipients of > the contribution. "The gentleman turns out to lack a full understanding of the issues." It is *not* the CA that grants that power; it is the license (AFL or Apache). Anybody receiving a distribution under those licenses can change the license terms. If you don't like that, don't grant those licenses. Not to the PSF, and not to anybody else. The CA is moot. Under the current Python license, the same power of redistribution is granted to Python's downstream, so there's nothing special about the power mentioned in the CA itself. 'Nuff said. Reply-To set to me; please observe. From tjreedy at udel.edu Mon Aug 12 05:22:34 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 11 Aug 2013 23:22:34 -0400 Subject: [Python-ideas] Contributions to official documentation versus contributions to wiki In-Reply-To: <7w1u5z6cjq.fsf_-_@benfinney.id.au> References: <7wsiyhaui2.fsf@benfinney.id.au> <20130811120308.6f536907@fsol> <7w1u5z6cjq.fsf_-_@benfinney.id.au> Message-ID: On 8/11/2013 7:19 PM, Ben Finney wrote: > "Stephen J. Turnbull" >> Ben, you are welcome to dislike signing CAs, but please stop spreading >> FUD about the PSF's CA. The PSF has 2 Agreements. One for uploading packages to be redistributed as separate packages on PyPI. The other for accepting contributions to the collective work known CPythonx.y. I do not like parts of the package hosting license, but I agree that Ben's complaints about the contribution license are FUDlike. > My claim is factual, not FUD, and is entailed within the terms of the > contributor agreement. I will disagree below. > >> The rights explicitly specified in the CA actually constitute >> *restrictions* on the PSF compared to the rights granted by the >> licenses themselves. > > The contributor agreement grants to PSF the unilateral power to > redistribute the contribution under ?any other open source license > approved by [the PSF]?, a power not granted to other recipients of the > contribution. So yes, it arrogates special rights to the PSF. This is deceptive at best. 1. To the extent that a contribution is substantial enough to have copyright, the copyright explicitly remains with the contributor. This is fairly rare for contributions to collective works. 2. A grant of rights in the contribution to PSF only grants those rights to the PSF. WOW. It cannot be otherwise. But since the grant is explicitly not exclusive, the copyright holder is free to grant the same rights in the contributed word to everyone else in the world. It is the choice of the copyright holder whether to grant special rights to the PSF or to grant the same rights to everyone. If you want, write a generic version of the Academic License version whatever, sign it, and post it and a notice on python list that all your contributions to Python via bugs.python.org are available to anyone under the same conditions. Then PSF will definitely not have any special rights to your words. Of course, your generic license can only apply to your words and not anyone else's. 3. The PSF is the copyright holder of the *collective* work and to that extent, it must, as a practical matter. have 'special rights', just as you have special rights to the words you write. If you want to find unfair-to-author's licenses, look everywhere but the open-source software world. -- Terry Jan Reedy From benjamin at python.org Mon Aug 12 05:32:12 2013 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 12 Aug 2013 03:32:12 +0000 (UTC) Subject: [Python-ideas] Deprecating rarely used str methods References: Message-ID: Serhiy Storchaka writes: > str.swapcase() is just not needed. I would quite love to get rid of this method, since it's basically useless and wrong on non-ASCII strings. From ncoghlan at gmail.com Mon Aug 12 05:34:31 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 11 Aug 2013 23:34:31 -0400 Subject: [Python-ideas] Contributions to official documentation versus contributions to wiki (was: Frequently Rejected Ideas Was: Deprecating rarely used str methods) In-Reply-To: <7w1u5z6cjq.fsf_-_@benfinney.id.au> References: <7wsiyhaui2.fsf@benfinney.id.au> <20130811120308.6f536907@fsol> <7w1u5z6cjq.fsf_-_@benfinney.id.au> Message-ID: On 11 Aug 2013 19:20, "Ben Finney" wrote: > > "Stephen J. Turnbull" > writes: > > > Ben Finney writes: > > > > > Normal people are also those who want to avoid the requirement for > > > reading and signing a legal document assigning special rights to the > > > PSF, just to propose a fix. > > > > Ben, you are welcome to dislike signing CAs, but please stop spreading > > FUD about the PSF's CA. > > My claim is factual, not FUD, and is entailed within the terms of the > contributor agreement. > > > The rights explicitly specified in the CA actually constitute > > *restrictions* on the PSF compared to the rights granted by the > > licenses themselves. > > The contributor agreement grants to PSF the unilateral power to > redistribute the contribution under ?any other open source license > approved by [the PSF]?, a power not granted to other recipients of the > contribution. So yes, it arrogates special rights to the PSF. > > Does this make the PSF awful? No, of course not. But I can't pretend it > is acceptable to grant special terms to one party in the community. We don't do it for fun - we do it because we don't have the right to relicense some of the previously donated source code, and don't want to spend the lawyer time needed to determine if we can get by without those relicensing rights for new contributions while complying with those existing obligations. People that care about this can either offer to fund the lawyer time to figure out if ALv2 contributions could be accepted without relicensing rights, or accept that Python's complex licensing history means that contributions on a "licence in = licence out" basis are not currently considered feasible, and that a desire to contribute solely to projects with pristine licensing histories is currently incompatible with a desire to contribute directly to CPython. It's a pretty simple choice, and I consider it very poor form to use freely provided PSF communication channels to lobby against a licensing model the PSF believes it is legally obliged to use (choosing not to contribute directly yourself is a different story, as that's an individual ethical decision). Regards, Nick. > > > Antoine Pitrou > writes: > > > On Sun, 11 Aug 2013 11:25:41 +1000 > > Ben Finney wrote: > > > Normal people are also those who want to avoid the requirement for > > > reading and signing a legal document assigning special rights to the > > > PSF, just to propose a fix. > > > > I don't think we ask for a CLA when someone submits a 10-line patch. > > Not true, at least in my experience. I have been asked to submit a > contributor agreement for small patches to the documentation. Since I > cannot in good conscience accept the PSF's requirements, they reject > such contributions even under an acceptable all-parties-equal license. > > -- > \ ?As soon as we abandon our own reason, and are content to rely | > `\ upon authority, there is no end to our troubles.? ?Bertrand | > _o__) Russell, _Unpopular Essays_, 1950 | > Ben Finney > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Aug 12 05:48:24 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 11 Aug 2013 23:48:24 -0400 Subject: [Python-ideas] Contributions to official documentation versus contributions to wiki (was: Frequently Rejected Ideas Was: Deprecating rarely used str methods) In-Reply-To: References: <7wsiyhaui2.fsf@benfinney.id.au> <20130811120308.6f536907@fsol> <7w1u5z6cjq.fsf_-_@benfinney.id.au> Message-ID: On 11 Aug 2013 23:34, "Nick Coghlan" wrote: > > > On 11 Aug 2013 19:20, "Ben Finney" wrote: > > > > "Stephen J. Turnbull" > > writes: > > > > > Ben Finney writes: > > > > > > > Normal people are also those who want to avoid the requirement for > > > > reading and signing a legal document assigning special rights to the > > > > PSF, just to propose a fix. > > > > > > Ben, you are welcome to dislike signing CAs, but please stop spreading > > > FUD about the PSF's CA. > > > > My claim is factual, not FUD, and is entailed within the terms of the > > contributor agreement. > > > > > The rights explicitly specified in the CA actually constitute > > > *restrictions* on the PSF compared to the rights granted by the > > > licenses themselves. > > > > The contributor agreement grants to PSF the unilateral power to > > redistribute the contribution under ?any other open source license > > approved by [the PSF]?, a power not granted to other recipients of the > > contribution. So yes, it arrogates special rights to the PSF. > > > > Does this make the PSF awful? No, of course not. But I can't pretend it > > is acceptable to grant special terms to one party in the community. > > We don't do it for fun - we do it because we don't have the right to relicense some of the previously donated source code, and don't want to spend the lawyer time needed to determine if we can get by without those relicensing rights for new contributions while complying with those existing obligations. > > People that care about this can either offer to fund the lawyer time to figure out if ALv2 contributions could be accepted without relicensing rights, or accept that Python's complex licensing history means that contributions on a "licence in = licence out" basis are not currently considered feasible, and that a desire to contribute solely to projects with pristine licensing histories is currently incompatible with a desire to contribute directly to CPython. A couple more complexities for alternative proposals to deal with: - avoid any perception of conflicting with GPLv2. - provide the PSF itself with a similar level of legal protection to what it currently receives. Cheers, Nick. > > It's a pretty simple choice, and I consider it very poor form to use freely provided PSF communication channels to lobby against a licensing model the PSF believes it is legally obliged to use (choosing not to contribute directly yourself is a different story, as that's an individual ethical decision). > > Regards, > Nick. > > > > > > > Antoine Pitrou > > writes: > > > > > On Sun, 11 Aug 2013 11:25:41 +1000 > > > Ben Finney wrote: > > > > Normal people are also those who want to avoid the requirement for > > > > reading and signing a legal document assigning special rights to the > > > > PSF, just to propose a fix. > > > > > > I don't think we ask for a CLA when someone submits a 10-line patch. > > > > Not true, at least in my experience. I have been asked to submit a > > contributor agreement for small patches to the documentation. Since I > > cannot in good conscience accept the PSF's requirements, they reject > > such contributions even under an acceptable all-parties-equal license. > > > > -- > > \ ?As soon as we abandon our own reason, and are content to rely | > > `\ upon authority, there is no end to our troubles.? ?Bertrand | > > _o__) Russell, _Unpopular Essays_, 1950 | > > Ben Finney > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Mon Aug 12 07:14:24 2013 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 12 Aug 2013 15:14:24 +1000 Subject: [Python-ideas] Contributions to official documentation versus contributions to wiki References: <7wsiyhaui2.fsf@benfinney.id.au> <20130811120308.6f536907@fsol> <7w1u5z6cjq.fsf_-_@benfinney.id.au> Message-ID: <7wbo53h4nj.fsf@benfinney.id.au> Nick Coghlan writes: > On 11 Aug 2013 19:20, "Ben Finney" wrote: > > Does this make the PSF awful? No, of course not. But I can't pretend > > it is acceptable to grant special terms to one party in the > > community. > > We don't do it for fun - we do it because we don't have the right to > relicense some of the previously donated source code, and don't want > to spend the lawyer time needed to determine if we can get by without > those relicensing rights for new contributions while complying with > those existing obligations. You're right. For the benefit of this forum: I've discussed this with Nick in person, and we agree that the PSF is in a bind on this matter because of awkward ancient license terms on some code in Python. We'd both prefer that the PSF could accept ?license in = license out?, that is, no contributor agreement needed. It seems to me that the Apache License grants PSF everything they need, with no need for a contributor agreement; but neither of us has the legal expertise to know, and without that expertise, it's PSF that bears the risk. So, for what it's worth, I don't have ill will to the PSF on this matter. > [?] I consider it very poor form to use freely provided PSF > communication channels to lobby against a licensing model the PSF > believes it is legally obliged to use (choosing not to contribute > directly yourself is a different story, as that's an individual > ethical decision). My apologies, I agree this is inappropriate. -- \ ?If you continue running Windows, your system may become | `\ unstable.? ?Microsoft, Windows 95 bluescreen error message | _o__) | Ben Finney From abarnert at yahoo.com Mon Aug 12 08:00:58 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 11 Aug 2013 23:00:58 -0700 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: Message-ID: <0F07186B-F077-4A81-AA91-ECD0D083A1F2@yahoo.com> On Aug 11, 2013, at 20:32, Benjamin Peterson wrote: > Serhiy Storchaka writes: > >> str.swapcase() is just not needed. > > I would quite love to get rid of this method, since it's basically useless > and wrong on non-ASCII strings. No it isn't. Most people who say that are mixing up bytes.swapcase with str.swapcase (or talking about Python 2). The bytes method obviously only handles ASCII because bytes objects don't know what encoding they're in (or even whether they're text in the first place) so nothing else would make sense. But the str method handles non-ASCII cases just as well as the rest of Python. For example, '????'.swapcase() == '????'. Of course Unicode case mapping isn't perfect, because it's contextless, and because it doesn't handle language-specific tailoring, and because it has all the usual compromises (like Turkish dotless i). But I doubt you're suggesting that Python reject Unicode. Of course your other reason for rejecting it--that it's useless--I think most people agree with. From benjamin at python.org Mon Aug 12 08:10:15 2013 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 12 Aug 2013 06:10:15 +0000 (UTC) Subject: [Python-ideas] Deprecating rarely used str methods References: <0F07186B-F077-4A81-AA91-ECD0D083A1F2@yahoo.com> Message-ID: Andrew Barnert writes: > > On Aug 11, 2013, at 20:32, Benjamin Peterson python.org> wrote: > > > Serhiy Storchaka ...> writes: > > > >> str.swapcase() is just not needed. > > > > I would quite love to get rid of this method, since it's basically useless > > and wrong on non-ASCII strings. > > No it isn't. I realize it "handles" non-ASCII characters, but I claim there is no sensical behavior. From abarnert at yahoo.com Mon Aug 12 08:40:42 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 11 Aug 2013 23:40:42 -0700 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <0F07186B-F077-4A81-AA91-ECD0D083A1F2@yahoo.com> Message-ID: On Aug 11, 2013, at 23:10, Benjamin Peterson wrote: > Andrew Barnert writes: > >> >> On Aug 11, 2013, at 20:32, Benjamin Peterson python.org> wrote: >> >>> Serhiy Storchaka ...> writes: >>> >>>> str.swapcase() is just not needed. >>> >>> I would quite love to get rid of this method, since it's basically useless >>> and wrong on non-ASCII strings. >> >> No it isn't. > > I realize it "handles" non-ASCII characters, but I claim there is no > sensical behavior. So you want Python to not follow Unicode in general, or just in case mapping, or just in this one function (leaving upper, lower, etc. alone)? Out of curiosity, what language do you use that has no sensible behavior? Most scripts either have sensible case rules, or just don't have cases (so the function is an obvious no-op). I know some people think Unicode chose the _wrong_ rule for their script (e.g., the Turkish i mentioned earlier--even if it is what most Turkish computer users wanted, there are purists who insist it should work properly, or that Turkish dotted i and dotless I should be separate characters from their Latin equivalents). But that's not the same as saying there _are_ no good rules for their script. From benjamin at python.org Mon Aug 12 08:54:16 2013 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 12 Aug 2013 06:54:16 +0000 (UTC) Subject: [Python-ideas] Deprecating rarely used str methods References: <0F07186B-F077-4A81-AA91-ECD0D083A1F2@yahoo.com> Message-ID: Andrew Barnert writes: > > On Aug 11, 2013, at 23:10, Benjamin Peterson wrote: > > > Andrew Barnert ...> writes: > > > >> > >> On Aug 11, 2013, at 20:32, Benjamin Peterson python.org> wrote: > >> > >>> Serhiy Storchaka ...> writes: > >>> > >>>> str.swapcase() is just not needed. > >>> > >>> I would quite love to get rid of this method, since it's basically useless > >>> and wrong on non-ASCII strings. > >> > >> No it isn't. > > > > I realize it "handles" non-ASCII characters, but I claim there is no > > sensical behavior. > > So you want Python to not follow Unicode in general, or just in case mapping, or just in this one function > (leaving upper, lower, etc. alone)? > > Out of curiosity, what language do you use that has no sensible behavior? Most scripts either have sensible > case rules, or just don't have cases (so the function is an obvious no-op). I know some people think Unicode > chose the _wrong_ rule for their script (e.g., the Turkish i mentioned earlier--even if it is what most > Turkish computer users wanted, there are purists who insist it should work properly, or that Turkish > dotted i and dotless I should be separate characters from their Latin equivalents). But that's not the > same as saying there _are_ no good rules for their script. In the precense of things like title case, "swapping" case isn't always an operation that can make sense. From abarnert at yahoo.com Mon Aug 12 09:20:33 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 12 Aug 2013 00:20:33 -0700 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <0F07186B-F077-4A81-AA91-ECD0D083A1F2@yahoo.com> Message-ID: On Aug 11, 2013, at 23:54, Benjamin Peterson wrote: > Andrew Barnert writes: > >> >> On Aug 11, 2013, at 23:10, Benjamin Peterson wrote: >> >>> Andrew Barnert ...> writes: >>> >>>> >>>> On Aug 11, 2013, at 20:32, Benjamin Peterson python.org> > wrote: >>>> >>>>> Serhiy Storchaka ...> writes: >>>>> >>>>>> str.swapcase() is just not needed. >>>>> >>>>> I would quite love to get rid of this method, since it's basically useless >>>>> and wrong on non-ASCII strings. >>>> >>>> No it isn't. >>> >>> I realize it "handles" non-ASCII characters, but I claim there is no >>> sensical behavior. >> >> So you want Python to not follow Unicode in general, or just in case > mapping, or just in this one function >> (leaving upper, lower, etc. alone)? >> >> Out of curiosity, what language do you use that has no sensible behavior? > Most scripts either have sensible >> case rules, or just don't have cases (so the function is an obvious > no-op). I know some people think Unicode >> chose the _wrong_ rule for their script (e.g., the Turkish i mentioned > earlier--even if it is what most >> Turkish computer users wanted, there are purists who insist it should work > properly, or that Turkish >> dotted i and dotless I should be separate characters from their Latin > equivalents). But that's not the >> same as saying there _are_ no good rules for their script. > > In the precense of things like title case, "swapping" case isn't always an > operation that can make sense. As far as I know, there aren't any titlecase characters. The titlecase function From abarnert at yahoo.com Mon Aug 12 09:27:51 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 12 Aug 2013 00:27:51 -0700 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <0F07186B-F077-4A81-AA91-ECD0D083A1F2@yahoo.com> Message-ID: Sorry about the incomplete message. Retrying... On Aug 11, 2013, at 23:54, Benjamin Peterson wrote: > Andrew Barnert writes: > >> >> On Aug 11, 2013, at 23:10, Benjamin Peterson wrote: >> >>> Andrew Barnert ...> writes: >>> >>>> >>>> On Aug 11, 2013, at 20:32, Benjamin Peterson python.org> > wrote: >>>> >>>>> Serhiy Storchaka ...> writes: >>>>> >>>>>> str.swapcase() is just not needed. >>>>> >>>>> I would quite love to get rid of this method, since it's basically useless >>>>> and wrong on non-ASCII strings. >>>> >>>> No it isn't. >>> >>> I realize it "handles" non-ASCII characters, but I claim there is no >>> sensical behavior. >> >> So you want Python to not follow Unicode in general, or just in case > mapping, or just in this one function >> (leaving upper, lower, etc. alone)? >> >> Out of curiosity, what language do you use that has no sensible behavior? > Most scripts either have sensible >> case rules, or just don't have cases (so the function is an obvious > no-op). I know some people think Unicode >> chose the _wrong_ rule for their script (e.g., the Turkish i mentioned > earlier--even if it is what most >> Turkish computer users wanted, there are purists who insist it should work > properly, or that Turkish >> dotted i and dotless I should be separate characters from their Latin > equivalents). But that's not the >> same as saying there _are_ no good rules for their script. > > In the precense of things like title case, "swapping" case isn't always an > operation that can make sense. There aren't any title case characters. Title casing turns each character into one or more uppercase or lowercase characters. And swapcase will do exactly what you'd expect. For example, '?'.title() == 'Ss' and it's obvious how that should respond to swapcase. That's a silly example, but hopefully more familiar to most people than Armenian or Church Slavonic; the point is that the same applied to all characters that either turn into two characters, from ligatures like fi to characters that have no pre compose capital form. Are there any counter examples I'm missing? From abarnert at yahoo.com Mon Aug 12 09:38:20 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 12 Aug 2013 00:38:20 -0700 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <0F07186B-F077-4A81-AA91-ECD0D083A1F2@yahoo.com> Message-ID: <54381AD1-AEBC-48FB-A9FC-40CF002315BE@yahoo.com> Sorry, missed one last edit. Last one, I promise. On Aug 12, 2013, at 0:27, Andrew Barnert wrote: > Sorry about the incomplete message. Retrying... > > On Aug 11, 2013, at 23:54, Benjamin Peterson wrote: > >> Andrew Barnert writes: >> >>> >>> On Aug 11, 2013, at 23:10, Benjamin Peterson wrote: >>> >>>> Andrew Barnert ...> writes: >>>> >>>>> >>>>> On Aug 11, 2013, at 20:32, Benjamin Peterson python.org> >> wrote: >>>>> >>>>>> Serhiy Storchaka ...> writes: >>>>>> >>>>>>> str.swapcase() is just not needed. >>>>>> >>>>>> I would quite love to get rid of this method, since it's basically useless >>>>>> and wrong on non-ASCII strings. >>>>> >>>>> No it isn't. >>>> >>>> I realize it "handles" non-ASCII characters, but I claim there is no >>>> sensical behavior. >>> >>> So you want Python to not follow Unicode in general, or just in case >> mapping, or just in this one function >>> (leaving upper, lower, etc. alone)? >>> >>> Out of curiosity, what language do you use that has no sensible behavior? >> Most scripts either have sensible >>> case rules, or just don't have cases (so the function is an obvious >> no-op). I know some people think Unicode >>> chose the _wrong_ rule for their script (e.g., the Turkish i mentioned >> earlier--even if it is what most >>> Turkish computer users wanted, there are purists who insist it should work >> properly, or that Turkish >>> dotted i and dotless I should be separate characters from their Latin >> equivalents). But that's not the >>> same as saying there _are_ no good rules for their script. >> >> In the precense of things like title case, "swapping" case isn't always an >> operation that can make sense. > > There aren't any title case characters ... That don't have an compatible two-character form. For example, U+01C8 can be interpreted as Lj, which can be swapcased. However, arguably it shouldn't be, because the docs explicitly say that it converts upper to lower an vice-versa, which means Lt characters should be left alone just as numbers and punctuation. So you could argue that it's ambiguous, but you can't argue that nothing makes sense; it's just a choice between two different things that both make sense. Otherwise... > Title casing turns each character into one or more uppercase or lowercase characters. And swapcase will do exactly what you'd expect. > > For example, '?'.title() == 'Ss' and it's obvious how that should respond to swapcase. That's a silly example, but hopefully more familiar to most people than Armenian or Church Slavonic; the point is that the same applied to all characters that either turn into two characters, from ligatures like fi to characters that have no pre compose capital form. > > Are there any counter examples I'm missing? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From steve at pearwood.info Mon Aug 12 10:07:48 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 12 Aug 2013 18:07:48 +1000 Subject: [Python-ideas] Deprecating rarely used str methods In-Reply-To: References: <0F07186B-F077-4A81-AA91-ECD0D083A1F2@yahoo.com> Message-ID: <20130812080747.GA31296@ando> On Mon, Aug 12, 2013 at 12:27:51AM -0700, Andrew Barnert wrote: > There aren't any title case characters. Title casing turns each > character into one or more uppercase or lowercase characters. And > swapcase will do exactly what you'd expect. Actually, there are titlecase characters: http://www.fileformat.info/info/unicode/category/Lt/list.htm http://unicode.org/faq/casemap_charprop.html I think it is reasonable to remove str.swapcase from Python 4000, if and when it exists, provided it is allowed to break backwards compatibility (which is *not* a given). When there is a planned schedule for Python 4000, then it is time to talk about deprecating str.swapcase. Until then, it's a wart that exists for backwards compatibility, but the pain in removing it far outweighs the pain in merely ignoring it. We the Python community are halfway through a ten-year, painful, backwards-incompatible migration from 2.x to 3.x. To justify further backwards-incompatible changes in the 3.x series will require a much stronger rationale than just "I don't know anyone who uses this method". -- Steven From solipsis at pitrou.net Mon Aug 12 10:31:10 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 12 Aug 2013 10:31:10 +0200 Subject: [Python-ideas] Contributions to official documentation versus contributions to wiki (was: Frequently Rejected Ideas Was: Deprecating rarely used str methods) References: <7wsiyhaui2.fsf@benfinney.id.au> <20130811120308.6f536907@fsol> <7w1u5z6cjq.fsf_-_@benfinney.id.au> Message-ID: <20130812103110.4155d3ea@pitrou.net> Le Mon, 12 Aug 2013 09:19:21 +1000, Ben Finney a ?crit : > > > The rights explicitly specified in the CA actually constitute > > *restrictions* on the PSF compared to the rights granted by the > > licenses themselves. > > The contributor agreement grants to PSF the unilateral power to > redistribute the contribution under ?any other open source license > approved by [the PSF]?, a power not granted to other recipients of the > contribution. So yes, it arrogates special rights to the PSF. > > Does this make the PSF awful? No, of course not. But I can't pretend > it is acceptable to grant special terms to one party in the community. Ben, I respect your distrust of the CLA's terms (or of CLAs in general), but does that mean you wouldn't contribute the python-daemon implementation under a CLA? AFAICT it has no chance of landing in the official source tree if you aren't willing to sign a CLA for it. > Not true, at least in my experience. I have been asked to submit a > contributor agreement for small patches to the documentation. Since I > cannot in good conscience accept the PSF's requirements, they reject > such contributions even under an acceptable all-parties-equal license. I suppose that depends on whoever reviews your patch :-) Regards Antoine. From flying-sheep at web.de Mon Aug 12 15:42:23 2013 From: flying-sheep at web.de (Philipp A.) Date: Mon, 12 Aug 2013 15:42:23 +0200 Subject: [Python-ideas] os.path.isbinary In-Reply-To: <1e485939-9606-4113-b208-57591afa4782@email.android.com> References: <90b4885a-3858-410d-821f-0c1dab48b420@email.android.com> <1e485939-9606-4113-b208-57591afa4782@email.android.com> Message-ID: well, the only remotely valid thing to do is to test if the input data is decodable with any of the encodings python knows. if we do some arbitrary threshold, we only get bugs like fedora?s current release name ?Schr?dinger's Cat? being considered ?not text?. i?d never write code like this . PS: why do people still convert stuff to float? we live in python3 world, where 1/2 is 0.5 -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Mon Aug 12 15:52:56 2013 From: masklinn at masklinn.net (Masklinn) Date: Mon, 12 Aug 2013 15:52:56 +0200 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: <90b4885a-3858-410d-821f-0c1dab48b420@email.android.com> <1e485939-9606-4113-b208-57591afa4782@email.android.com> Message-ID: <34A9E3DE-EEE6-405C-8853-DE4335BDEF92@masklinn.net> On 2013-08-12, at 15:42 , Philipp A. wrote: > well, the only remotely valid thing to do is to test if the input data is > decodable with any of the encodings python knows. Most iso-8859 parts can decode any byte (and thus any byte sequence). Parts 3, 6, 7, 8 and 11 are the only ones not to be defined across all of the [128, 255] range (they're ascii extensions so the [0, 127] range is identical to ascii in all iso-8859 parts) From solipsis at pitrou.net Mon Aug 12 16:01:21 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 12 Aug 2013 16:01:21 +0200 Subject: [Python-ideas] os.path.isbinary References: <90b4885a-3858-410d-821f-0c1dab48b420@email.android.com> <1e485939-9606-4113-b208-57591afa4782@email.android.com> Message-ID: <20130812160121.0586e539@pitrou.net> Le Mon, 12 Aug 2013 15:42:23 +0200, "Philipp A." a ?crit : > well, the only remotely valid thing to do is to test if the input > data is decodable with any of the encodings python knows. > > if we do some arbitrary threshold, we only get bugs like fedora?s > current release name ?Schr?dinger's Cat? being considered ?not text?. > i?d never write code like > this Truly horrible :-( From grosser.meister.morti at gmx.net Mon Aug 12 16:32:49 2013 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Mon, 12 Aug 2013 16:32:49 +0200 Subject: [Python-ideas] os.path.isbinary In-Reply-To: <34A9E3DE-EEE6-405C-8853-DE4335BDEF92@masklinn.net> References: <90b4885a-3858-410d-821f-0c1dab48b420@email.android.com> <1e485939-9606-4113-b208-57591afa4782@email.android.com> <34A9E3DE-EEE6-405C-8853-DE4335BDEF92@masklinn.net> Message-ID: <5208F211.8060703@gmx.net> On 08/12/2013 03:52 PM, Masklinn wrote: > On 2013-08-12, at 15:42 , Philipp A. wrote: >> well, the only remotely valid thing to do is to test if the input data is >> decodable with any of the encodings python knows. > > Most iso-8859 parts can decode any byte (and thus any byte sequence). > Are you sure about the null byte? '\0' But yes, just looking if there is a '\0' in the file isn't a good heuristic either. > Parts 3, 6, 7, 8 and 11 are the only ones not to be defined across all > of the [128, 255] range (they're ascii extensions so the [0, 127] range > is identical to ascii in all iso-8859 parts) From random832 at fastmail.us Mon Aug 12 16:46:48 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 12 Aug 2013 10:46:48 -0400 Subject: [Python-ideas] Contributions to official documentation versus contributions to wiki In-Reply-To: References: <7wsiyhaui2.fsf@benfinney.id.au> <20130811120308.6f536907@fsol> <7w1u5z6cjq.fsf_-_@benfinney.id.au> Message-ID: <1376318808.31698.8866255.25EC33E4@webmail.messagingengine.com> On Sun, Aug 11, 2013, at 23:22, Terry Reedy wrote: > 2. A grant of rights in the contribution to PSF only grants those rights > to the PSF. WOW. It cannot be otherwise. But since the grant is > explicitly not exclusive, the copyright holder is free to grant the same > rights in the contributed word to everyone else in the world. Do you see how granting everyone the right to change to any open source license approved by a unanimous vote of the PSF board really isn't in the spirit of considering everyone equal? Or are you proposing people should grant everyone the right to change to any open source license they choose? What's the point of having licenses at all, then? > 3. The PSF is the copyright holder of the *collective* work and to that > extent, it must, as a practical matter. have 'special rights', just as > you have special rights to the words you write. Er, no. To the extent that it "must" be special it is because it holds the resources used for distribution. Giving the PSF rights that would not be given to, say, someone else seeking to make a fork of python is not necessary in the way you are suggesting it is. From random832 at fastmail.us Mon Aug 12 16:59:18 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 12 Aug 2013 10:59:18 -0400 Subject: [Python-ideas] os.path.isbinary In-Reply-To: <5208F211.8060703@gmx.net> References: <90b4885a-3858-410d-821f-0c1dab48b420@email.android.com> <1e485939-9606-4113-b208-57591afa4782@email.android.com> <34A9E3DE-EEE6-405C-8853-DE4335BDEF92@masklinn.net> <5208F211.8060703@gmx.net> Message-ID: <1376319558.2510.8871303.2E44C658@webmail.messagingengine.com> On Mon, Aug 12, 2013, at 10:32, Mathias Panzenb?ck wrote: > On 08/12/2013 03:52 PM, Masklinn wrote: > > On 2013-08-12, at 15:42 , Philipp A. wrote: > >> well, the only remotely valid thing to do is to test if the input data is > >> decodable with any of the encodings python knows. > > > > Most iso-8859 parts can decode any byte (and thus any byte sequence). > > > > Are you sure about the null byte? '\0' > But yes, just looking if there is a '\0' in the file isn't a good > heuristic either. It depends on precisely what is meant by "iso-8859 parts" - and the same with any other character in 0-32 or 127-159 (there is nothing special about the null byte in this regard). But it's typical to think of "iso-8859" encodings as being more like IANA ISO-8859-1, which combines ISO/IEC 8859-1 with the control character definitions from ISO 6429. From rosuav at gmail.com Mon Aug 12 16:59:36 2013 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 12 Aug 2013 15:59:36 +0100 Subject: [Python-ideas] os.path.isbinary In-Reply-To: <5208F211.8060703@gmx.net> References: <90b4885a-3858-410d-821f-0c1dab48b420@email.android.com> <1e485939-9606-4113-b208-57591afa4782@email.android.com> <34A9E3DE-EEE6-405C-8853-DE4335BDEF92@masklinn.net> <5208F211.8060703@gmx.net> Message-ID: On Mon, Aug 12, 2013 at 3:32 PM, Mathias Panzenb?ck wrote: > On 08/12/2013 03:52 PM, Masklinn wrote: >> >> On 2013-08-12, at 15:42 , Philipp A. wrote: >>> >>> well, the only remotely valid thing to do is to test if the input data is >>> decodable with any of the encodings python knows. >> >> >> Most iso-8859 parts can decode any byte (and thus any byte sequence). >> > > Are you sure about the null byte? '\0' > But yes, just looking if there is a '\0' in the file isn't a good heuristic > either. I've often used the presence of a NUL in the data as a simple heuristic for "binary file", though only in places where it won't matter (for instance, showing file size in bytes rather than line count - if a binary file happens to have no \0 and its number of \n gets counted, big deal). Otherwise, not worth the hassle of finding out. ChrisA From random832 at fastmail.us Mon Aug 12 17:43:18 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 12 Aug 2013 11:43:18 -0400 Subject: [Python-ideas] Error messages for shared libraries for other platform In-Reply-To: References: <4011a761-f650-4e1f-a634-2a8054d09834@email.android.com> Message-ID: <1376322198.17752.8890375.39292741@webmail.messagingengine.com> On Mon, Aug 5, 2013, at 2:01, Ryan wrote: > if ex.string == '%1 is not...: ex.winerror == 193, surely. The message probably differentiates by locale, and doing this kind of thing is fragile. For your POSIX case, it looks like the underlying error (likely errno == ENOEXEC) is being caught and wrapped up in an ImportError at some lower level, which apparently isn't being done on windows for whatever reason. So maybe this code should A) have an analogous version written on windows and B) have a more clear error message It's possible that either or both of these cases can't distinguish between attempting to load a library of the wrong architecture, and simply loading a file that is corrupt or simply not a DLL/.so file at all. Have you tried doing so? It _might_ still be a common enough case to be worthwhile to go fishing for evidence that it's the case after getting a generic error that could be that or could be something else, but that's different from just unconditionally rewriting the error message. From random832 at fastmail.us Mon Aug 12 18:10:28 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 12 Aug 2013 12:10:28 -0400 Subject: [Python-ideas] Error messages on Windows In-Reply-To: <4011a761-f650-4e1f-a634-2a8054d09834@email.android.com> References: <4011a761-f650-4e1f-a634-2a8054d09834@email.android.com> Message-ID: <1376323828.26410.8896751.6B1675FC@webmail.messagingengine.com> On Mon, Aug 5, 2013, at 1:07, Ryan wrote: > Here are my experiences in accidently getting a .so/.dll file for the > wrong chip/CPU type: > > Windows: > > %1 is not a valid Win32 application There's got to be some way to fill in %1 with the name of the dll. But a look through the list of all windows error messages on my system shows a bunch with %values that the caller could not possibly know. From rymg19 at gmail.com Mon Aug 12 18:55:06 2013 From: rymg19 at gmail.com (Ryan) Date: Mon, 12 Aug 2013 11:55:06 -0500 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: <90b4885a-3858-410d-821f-0c1dab48b420@email.android.com> <1e485939-9606-4113-b208-57591afa4782@email.android.com> Message-ID: <9ac2e7e7-65ae-412d-adea-ba489cc36383@email.android.com> I wrote it on SL4A Python 2.6. That explains the implicit float conversion. SL4A's Python 3 takes up too much of my data partition. "Philipp A." wrote: >well, the only remotely valid thing to do is to test if the input data >is >decodable with any of the encodings python knows. > >if we do some arbitrary threshold, we only get bugs like fedora?s >current >release name ?Schr?dinger's Cat? being considered ?not text?. i?d never >write code like >this >. > >PS: why do people still convert stuff to float? we live in python3 >world, >where 1/2 is 0.5 -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kim.grasman at gmail.com Tue Aug 13 13:17:10 2013 From: kim.grasman at gmail.com (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Tue, 13 Aug 2013 13:17:10 +0200 Subject: [Python-ideas] Have os.unlink remove junction points on Windows Message-ID: Hi all, I posted this bug a while back, but haven't had any feedback on it, so I figured I'd open discussion here: http://bugs.python.org/issue18314 I want to have os.unlink recognize NTFS junction points (a prequel to proper symbolic links) and remove them transparently. Currently, os.unlink on a path to a junction point will fail with a WindowsError and the error code 5, access denied. Does this sound controversial, or would anybody be interested in reviewing a patch to this effect? Thanks, - Kim From kristjan at ccpgames.com Tue Aug 13 13:55:44 2013 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Tue, 13 Aug 2013 11:55:44 +0000 Subject: [Python-ideas] Enhanced context managers with ContextManagerExit and None In-Reply-To: <32B0ADAA-B514-451B-B203-E7640DF8EB93@mac.com> References: <529A20D4-55D0-43A2-A7E1-A4FBC1F0FBFC@mac.com> <32B0ADAA-B514-451B-B203-E7640DF8EB93@mac.com> Message-ID: > -----Original Message----- > From: Ronald Oussoren [mailto:ronaldoussoren at mac.com] > > Skipping the body makes it possible to introduce flow control in a context > manager, which could make it harder to read code. That is, currently the > body is either executed unless an exception is raised that raises an > exception, while with your proposal the body might be skipped entirely. I > haven't formed an opinion yet on whether or not that is a problem, but PEP > 343 (introduction of the with statement) references > > which claims that hiding flow control in macros can be bad. > Perhaps my example was not clear enough. The body will be skipped entirely if the __enter__() method raises an exception. An outer context manager can then suppress this exception. However, you cannot create a single context manager that does both. This, to me, was the most serious problem with the old nested() context manager: Nesting of two context managers could not be correctly done for arbitrary context managers I am not advocating that people do create such context manager, but pointing out that this omission means that you cannot always combine two context managers into one and preserve the semantics because it is possible to do something with two nested context managers that you cannot achieve with a single one. And this is my completeness argument. Imagine if there were code that could only be written using two nested functions, and that the nested function could not be folded into the outer one without sacrificing semantic correctness? I.e., you could do: foo = bar(spam(value)) but this: def ham(value): return bar(spam(value)) foo = ham(value) would not be correct for all possible "bar" and "spam"? As an example, here is what you can currently do in python (again, not recommending it but providing as an example) class ContextManagerExit(): pass @contextmanager def if_a(): try: yield except ContextManagerExit: pass @contextmanager def if_b(condition): if not condition: raise ContextManagerExit yield with if_a(), if_b(condition): execute_code() #this line is executed only if "condition" is True In current python, it is impossible to create a combined context manager that does this: if_c = nested(if_a(), if_b(condition)) with if_c: execute_code() #A single context manager cannot both raise an exception from __enter__() _and_ have it supressed. With my patch, "if_b()" is all that is needed, because ContextManagerExit is a special exception that is caught and suppressed by the interpreter. And so, it becomes possible to nest arbitrary context managers without sacrificing semantic correctness. ? The overhead is fairly high, although not enough that I'd worry a lot in my code. But then again, I don't build gaming backends or cloud-scale software :-) ? That said, using None as the "do nothing" context manager is problematic as this could hide problems in users code. I'm sure that I'm not the only person that sometimes forgets to actually return a value from a function. Using that (missing, and hence implicitly None) return value as a context manager currently causes an exception that clearly points out a bug in my code and could silently do the wrong thing with this proposal, e.g: ? def get_lock(obj): ? lck = LockManager.get_lock_for_object(obj) ? # oops forgot to return lck ? with get_lock(42): ? ... ? Using a different singleton instead of None would avoid this drawback. I think something like this is important for context managers that are used as "diagnostic" tools, e.g. timing related context managers that may be disabled in production without changing code. But I agree that None is problematic for the reason you demonstrate, and did consider that. I'm suggesting it here as a demonstration of the concept, and also to reduce the need for yet another built-in. Perhaps the ellipsis could be used, that's everyone's favourite singleton :) K -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Aug 13 16:48:22 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 13 Aug 2013 17:48:22 +0300 Subject: [Python-ideas] with None (Was: Re: Enhanced context managers with ContextManagerExit and None) In-Reply-To: References: Message-ID: 07.08.13 17:23, Kristj?n Valur J?nsson ???????(??): > 2) The mechanism used in implementing ContextManagerExit above is easily extended to allowing a special context manager: None. This is useful for having _optional_ context managers. E.g. code like this: > with performance_timer(): > do_work() > > def performance_timer(): > if profiling: > return accumulator > return None > > None becomes the trivial context manager and its __enter__ and __exit__ calls are skipped, along with their overhead. +1 to this idea. Of course ExitStack is powerful tool but it is too verbose for many simple cases. Consider simple example: file = open(...) if ... else None file2 = open(...) if ... else None process(file, file2, ...) if file is not None: file.close() if file2 is not None: file2.close() This code is not exception-safe. There are some ways to write it right. 1. With try/finally and a check in finally block: file = open(...) if ... else None try: file2 = open(...) if ... else None try: process(file, file2, ...) finally: if file2 is not None: file2.close() finally: if file is not None: file.close() 2. With a code duplication (note that process can be not a one-line call of a function): if ...: with open(...) as file: if ...: with open(...) as file2: process(file, file2, ...) else: process(file, None, ...) else: if ...: with open(...) as file2: process(None, file2, ...) else: process(None, None, ...) 3. With ExitStack: import contextlib with contextlib.ExitStack() as cm: if ...: file = open(...) cm.enter_context(file) else: file = None if ...: file2 = open(...) cm.enter_context(file2) else: file2 = None process(file, file2, ...) And when the with statement will support None as a "null-manager": file = open(...) if ... else None with file: file2 = open(...) if ... else None with file2: process(file, file2, ...) From masklinn at masklinn.net Tue Aug 13 17:04:24 2013 From: masklinn at masklinn.net (Masklinn) Date: Tue, 13 Aug 2013 17:04:24 +0200 Subject: [Python-ideas] with None (Was: Re: Enhanced context managers with ContextManagerExit and None) In-Reply-To: References: Message-ID: <3036140F-62AD-41C6-BF3F-5C3F56BB0EE2@masklinn.net> On 2013-08-13, at 16:48 , Serhiy Storchaka wrote: > > 3. With ExitStack: > > import contextlib > with contextlib.ExitStack() as cm: > if ...: > file = open(...) > cm.enter_context(file) > else: > file = None > if ...: > file2 = open(...) > cm.enter_context(file2) > else: > file2 = None > process(file, file2, ?) Please note that enter_context will return the result of object.__enter__, so: with contextlib.ExitStack() as cm: file = cm.enter_context(open(path1)) if condition1 else None file2 = cm.enter_context(open(path2)) if condition2 else None process(file, file2, ...) or if you really like conditional statements: with contextlib.ExitStack() as cm file = None file2 = None if condition1: file = cm.enter_context(open(path1)) if condition2: file2 = cm.enter_context(open(path2)) process(file, file2, ...) From masklinn at masklinn.net Tue Aug 13 17:29:38 2013 From: masklinn at masklinn.net (Masklinn) Date: Tue, 13 Aug 2013 17:29:38 +0200 Subject: [Python-ideas] with None (Was: Re: Enhanced context managers with ContextManagerExit and None) In-Reply-To: References: Message-ID: Furthermore: On 2013-08-13, at 16:48 , Serhiy Storchaka wrote: > > And when the with statement will support None as a "null-manager": I think that is a terrible idea, and will be a source of woe and bugs more than a solution to anything. If you want an optional context manager, why not create an OptionalContextManager(object | None) which delegates to the underlying non-None CM? That's simple, that's clear, that's explicit, and that's not a hack. From ethan at stoneleaf.us Tue Aug 13 16:42:53 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 13 Aug 2013 07:42:53 -0700 Subject: [Python-ideas] Enhanced context managers with ContextManagerExit and None In-Reply-To: References: <529A20D4-55D0-43A2-A7E1-A4FBC1F0FBFC@mac.com> <32B0ADAA-B514-451B-B203-E7640DF8EB93@mac.com> Message-ID: <520A45ED.7010902@stoneleaf.us> On 08/13/2013 04:55 AM, Kristj?n Valur J?nsson wrote: > > But I agree that None is problematic for the reason you demonstrate, and did consider that. I?m suggesting it here as a > demonstration of the concept, and also to reduce the need for yet another built-in. Perhaps the ellipsis could be used, > that?s everyone?s favourite singleton :) Heh. How about NotImplemented? -- ~Ethan~ From solipsis at pitrou.net Tue Aug 13 17:32:47 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 13 Aug 2013 17:32:47 +0200 Subject: [Python-ideas] with None (Was: Re: Enhanced context managers with ContextManagerExit and None) References: Message-ID: <20130813173247.7f68f476@pitrou.net> Le Tue, 13 Aug 2013 17:29:38 +0200, Masklinn a ?crit : > Furthermore: > > On 2013-08-13, at 16:48 , Serhiy Storchaka wrote: > > > > And when the with statement will support None as a "null-manager": > > I think that is a terrible idea, and will be a source of woe and bugs > more than a solution to anything. I agree. I'd rather have a null_manager() in contextlib, designed expressly for this purpose. Regards Antoine. From ncoghlan at gmail.com Tue Aug 13 17:52:20 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 13 Aug 2013 11:52:20 -0400 Subject: [Python-ideas] Enhanced context managers with ContextManagerExit and None In-Reply-To: References: <529A20D4-55D0-43A2-A7E1-A4FBC1F0FBFC@mac.com> <32B0ADAA-B514-451B-B203-E7640DF8EB93@mac.com> Message-ID: On 13 Aug 2013 08:27, "Kristj?n Valur J?nsson" wrote: > Perhaps my example was not clear enough. The body will be skipped entirely if the __enter__() method raises an exception. An outer context manager can then suppress this exception. > > However, you cannot create a single context manager that does both. This, to me, was the most serious problem with the old nested() context manager: Nesting of two context managers could not be correctly done for arbitrary context managers nested() was deprecated and removed because it didn't handle files (or any other CM that does resource acquisition in __init__) correctly. The fact you can't factor out arbitrary context managers had nothing to do with it. > I am not advocating that people do create such context manager, but pointing out that this omission means that you cannot always combine two context managers into one and preserve the semantics because it is possible to do something with two nested context managers that you cannot achieve with a single one. And this is my completeness argument. At the moment, a CM cannot prevent execution of the body - it must be paired with an if statement or an inner call that may raise an exception, keeping the flow control at least somewhat visible at the point of execution. The following is also an illegal context manager: @contextmanager def bad_cm(broken=False): if not broken: yield It's illegal for exactly the same reason this is illegal (this is the expanded form of your nested CM example): @contextmanager def bad_cm2(broken=False): class Skip(Exception): pass try: if broken: raise Skip yield except Skip: pass > Imagine if there were code that could only be written using two nested functions, and that the nested function could not be folded into the outer one without sacrificing semantic correctness? I.e., you could do: > > foo = bar(spam(value)) > > but this: > > def ham(value): > return bar(spam(value)) > > foo = ham(value) > > would not be correct for all possible ?bar? and ?spam?? But that's not what you're asking about. You're asking for the ability to collapse two independent try statements into one. There are already things you can't factor out as functions - that's why we have generators and context managers. It's also a fact that there are things you can't factor out as single context managers. This is why we have nested context managers and also still have explicit try/except/else/finally statements. > As an example, here is what you can currently do in python (again, not recommending it but providing as an example) > > class ContextManagerExit(): pass > > @contextmanager > def if_a(): > try: > yield > except ContextManagerExit: > pass > > @contextmanager > def if_b(condition): > if not condition: > raise ContextManagerExit > yield > > with if_a(), if_b(condition): > execute_code() #this line is executed only if ?condition? is True Expand it out to the underlying constructs and you will see this code is outright buggy, because the exception handler is too broad: try: if not condition: raise ContextManagerExit execute_code() # ContextManagerExit will be eaten here except ContextManagerExit: pass > In current python, it is impossible to create a combined context manager that does this: > > if_c = nested(if_a(), if_b(condition)) > > with if_c: > execute_code() #A single context manager cannot both raise an exception from __enter__() _and_ have it supressed. This is a feature, not a bug: the with statement body will *always* execute, unless __enter__ raises an exception. Don't be misled by the ability to avoid repeating the with keyword when specifying multiple context managers in the same statement: semantically, that's equivalent to multiple nested with statements, so the outer one always executes, and the inner ones can only skip the body by raising an exception from __enter__. > With my patch, ?if_b()? is all that is needed, because ContextManagerExit is a special exception that is caught and suppressed by the interpreter. And so, it becomes possible to nest arbitrary context managers without sacrificing semantic correctness. I'd be open to adding the following context manager to contextlib: @contextmanager def skip(keep=False): class Skip(Exception): pass Skip.caught = None try: yield Skip except Skip as exc: if keep: Skip.caught = exc This would allow certain currently awkward constructs to be expressed more easily without needing to drop back to a try/except block (note that 3.4 already adds contextlib.ignored to easily suppress selected exceptions in a block of code). Creating a custom exception type each time helps avoid various problems with overly broad exception handlers. > But I agree that None is problematic for the reason you demonstrate, and did consider that. I?m suggesting it here as a demonstration of the concept, and also to reduce the need for yet another built-in. Perhaps the ellipsis could be used, that?s everyone?s favourite singleton :) An empty contextlib.ExitStack() instance is already a perfectly serviceable "do nothing" context manager, we don't need (and won't get) another one. Cheers, Nick. From tjreedy at udel.edu Tue Aug 13 19:45:48 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 13 Aug 2013 13:45:48 -0400 Subject: [Python-ideas] with None (Was: Re: Enhanced context managers with ContextManagerExit and None) In-Reply-To: References: Message-ID: On 8/13/2013 10:48 AM, Serhiy Storchaka wrote: >> None becomes the trivial context manager and its __enter__ and >> __exit__ calls are skipped, along with their overhead. > And when the with statement will support None as a "null-manager": As I understand the proposal, it is not to actually make None a context manager with __enter__ and __exit__ methods (bad), but to make it a signal to not do the normal calls. Such a use of None as a 'skip' signal is similar to its use as function argument. While such use is fine when None is a default argument (the normal case), its use in filter for a required argument is problematical (see recent thread here). I think using None as a signal object here would engender the same confusion here. "How can None be a context manager if it does not have the ... methods?" Given the alternative of an empty stack and a possible addition to contextlib, I do not think we should go for this generalization of None usage. -- Terry Jan Reedy From storchaka at gmail.com Tue Aug 13 21:15:30 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 13 Aug 2013 22:15:30 +0300 Subject: [Python-ideas] with None (Was: Re: Enhanced context managers with ContextManagerExit and None) In-Reply-To: <3036140F-62AD-41C6-BF3F-5C3F56BB0EE2@masklinn.net> References: <3036140F-62AD-41C6-BF3F-5C3F56BB0EE2@masklinn.net> Message-ID: 13.08.13 18:04, Masklinn ???????(??): > Please note that enter_context will return the result of > object.__enter__, so: > > with contextlib.ExitStack() as cm: > file = cm.enter_context(open(path1)) if condition1 else None > file2 = cm.enter_context(open(path2)) if condition2 else None > process(file, file2, ...) Well, it works. However there are two disadvantages: 1. You need import contextlib. 2. cm.enter_context() increases a length of already long lines so you need split lines. N.B. Currently ExitStack used only in two places in stdlib (and I am the originator of both changes). In most cases it is simpler to do without ExitStack. From storchaka at gmail.com Tue Aug 13 21:19:05 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 13 Aug 2013 22:19:05 +0300 Subject: [Python-ideas] with None (Was: Re: Enhanced context managers with ContextManagerExit and None) In-Reply-To: References: Message-ID: 13.08.13 18:29, Masklinn ???????(??): > If you want an optional context manager, why not create an > OptionalContextManager(object | None) which delegates to the underlying > non-None CM? That's simple, that's clear, that's explicit, and that's > not a hack. Because OptionalContextManager doesn't support an interface of the underlying object (e.g. doesn't have the read() method). And OptionalContextManager(None) is not None. From storchaka at gmail.com Tue Aug 13 21:23:59 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 13 Aug 2013 22:23:59 +0300 Subject: [Python-ideas] with None (Was: Re: Enhanced context managers with ContextManagerExit and None) In-Reply-To: References: Message-ID: 13.08.13 20:45, Terry Reedy ???????(??): > As I understand the proposal, it is not to actually make None a context > manager with __enter__ and __exit__ methods (bad), Why? > but to make it a > signal to not do the normal calls. I agree this is a bad idea. From masklinn at masklinn.net Tue Aug 13 21:30:52 2013 From: masklinn at masklinn.net (Masklinn) Date: Tue, 13 Aug 2013 21:30:52 +0200 Subject: [Python-ideas] with None (Was: Re: Enhanced context managers with ContextManagerExit and None) In-Reply-To: References: Message-ID: <70FBE35C-7BC4-4690-88ED-4A30A6A6C916@masklinn.net> On 2013-08-13, at 21:19 , Serhiy Storchaka wrote: > 13.08.13 18:29, Masklinn ???????(??): >> If you want an optional context manager, why not create an >> OptionalContextManager(object | None) which delegates to the underlying >> non-None CM? That's simple, that's clear, that's explicit, and that's >> not a hack. > > Because OptionalContextManager doesn't support an interface of the underlying object (e.g. doesn't have the read() method). And OptionalContextManager(None) is not None. Irrelevant, OptionalContextManager's __enter__ returns either None or whatever the underlying contextmanager's __enter__ yields. So the true underlying object (or None) is available in the `with` body. OCM is solely there to provide a NOOP context manager for a None input. From storchaka at gmail.com Tue Aug 13 21:32:45 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 13 Aug 2013 22:32:45 +0300 Subject: [Python-ideas] with None (Was: Re: Enhanced context managers with ContextManagerExit and None) In-Reply-To: References: Message-ID: 13.08.13 22:23, Serhiy Storchaka ???????(??): > 13.08.13 20:45, Terry Reedy ???????(??): >> As I understand the proposal, it is not to actually make None a context >> manager with __enter__ and __exit__ methods (bad), > > Why? > >> but to make it a >> signal to not do the normal calls. > > I agree this is a bad idea. Sorry, I misunderstood you. Skipping the body of the with statement is a bad idea (because it converts with into other control construction). But skipping calls of __enter__() and __exit__() methods looks just as an optimization an implementation detail (at least for None). From kn0m0n3 at gmail.com Tue Aug 13 23:52:48 2013 From: kn0m0n3 at gmail.com (Jason Bursey) Date: Tue, 13 Aug 2013 16:52:48 -0500 Subject: [Python-ideas] cell phone python Message-ID: in python fsf.org paradigm for MetroPCS Kyocera "BrewOS" replacement... matlab/comsol SIM CITY oem on phone etc... have fitness center use levers and pulleys to turn turbine, reverse engineering fitness centers etc... -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Aug 13 23:59:57 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 13 Aug 2013 23:59:57 +0200 Subject: [Python-ideas] cell phone python References: Message-ID: <20130813235957.5a5c7231@fsol> On Tue, 13 Aug 2013 16:52:48 -0500 Jason Bursey wrote: > > matlab/comsol SIM CITY oem on phone etc... > > have fitness center use levers and pulleys to turn turbine, reverse > engineering fitness centers etc... +1. Regards Antoine. From ben+python at benfinney.id.au Wed Aug 14 05:12:10 2013 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 14 Aug 2013 13:12:10 +1000 Subject: [Python-ideas] Contributing to Python core via an intermediary (was: Contributions to official documentation versus contributions to wiki) References: <7wsiyhaui2.fsf@benfinney.id.au> <20130811120308.6f536907@fsol> <7w1u5z6cjq.fsf_-_@benfinney.id.au> <20130812103110.4155d3ea@pitrou.net> Message-ID: <7wtxitarud.fsf_-_@benfinney.id.au> Antoine Pitrou writes: > Ben, I respect your distrust of the CLA's terms (or of CLAs in > general), but does that mean you wouldn't contribute the python-daemon > implementation under a CLA? I would not sign such a document, correct. > AFAICT it has no chance of landing in the official source tree if you > aren't willing to sign a CLA for it. The work will be licensed to all recipients under the Apache License, and maintained as expected. Could one of the recipients be the person who makes the submission to the PSF for inclusion in Python core? -- \ ?Anyone who believes exponential growth can go on forever in a | `\ finite world is either a madman or an economist.? ?Kenneth | _o__) Boulding | Ben Finney From mal at egenix.com Wed Aug 14 11:03:36 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 14 Aug 2013 11:03:36 +0200 Subject: [Python-ideas] Contributing to Python core via an intermediary In-Reply-To: <7wtxitarud.fsf_-_@benfinney.id.au> References: <7wsiyhaui2.fsf@benfinney.id.au> <20130811120308.6f536907@fsol> <7w1u5z6cjq.fsf_-_@benfinney.id.au> <20130812103110.4155d3ea@pitrou.net> <7wtxitarud.fsf_-_@benfinney.id.au> Message-ID: <520B47E8.3020501@egenix.com> On 14.08.2013 05:12, Ben Finney wrote: > Antoine Pitrou > writes: > >> Ben, I respect your distrust of the CLA's terms (or of CLAs in >> general), but does that mean you wouldn't contribute the python-daemon >> implementation under a CLA? > > I would not sign such a document, correct. > >> AFAICT it has no chance of landing in the official source tree if you >> aren't willing to sign a CLA for it. > > The work will be licensed to all recipients under the Apache License, > and maintained as expected. Could one of the recipients be the person > who makes the submission to the PSF for inclusion in Python core? Only the copyright holder can enter into the CLA with the PSF, since it grants additional rights that go beyond the initial license, namely that of being able to relicense the code under an open source license. The purpose of the CLA is to prevent further complicating the license details of the Python distribution and to create a complete "paper" trail for each contribution, which tracks the copyright, so that the PSF can defend the IP rights in the distribution. Note that this does not necessarily mean that all code going into the core has to be subject to a CLA. It is still possible to integrate code which has a license compatible with the PSF license, but in general, we'd like to avoid the extra work of having to check and verify such licenses. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 14 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Wed Aug 14 11:17:59 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 14 Aug 2013 11:17:59 +0200 Subject: [Python-ideas] Contributing to Python core via an intermediary References: <7wsiyhaui2.fsf@benfinney.id.au> <20130811120308.6f536907@fsol> <7w1u5z6cjq.fsf_-_@benfinney.id.au> <20130812103110.4155d3ea@pitrou.net> <7wtxitarud.fsf_-_@benfinney.id.au> <520B47E8.3020501@egenix.com> Message-ID: <20130814111759.6b55cc36@pitrou.net> Le Wed, 14 Aug 2013 11:03:36 +0200, "M.-A. Lemburg" a ?crit : > > Note that this does not necessarily mean that all code going into > the core has to be subject to a CLA. It is still possible to > integrate code which has a license compatible with the > PSF license, but in general, we'd like to avoid the extra work > of having to check and verify such licenses. Is it possible to "check and verify", say, the common form of the (ultra-simple) MIT license (*)? That would make it a safe starting point for contributors. (*) http://opensource.org/licenses/MIT Regards Antoine. From mal at egenix.com Wed Aug 14 12:32:07 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 14 Aug 2013 12:32:07 +0200 Subject: [Python-ideas] Contributing to Python core via an intermediary In-Reply-To: <20130814111759.6b55cc36@pitrou.net> References: <7wsiyhaui2.fsf@benfinney.id.au> <20130811120308.6f536907@fsol> <7w1u5z6cjq.fsf_-_@benfinney.id.au> <20130812103110.4155d3ea@pitrou.net> <7wtxitarud.fsf_-_@benfinney.id.au> <520B47E8.3020501@egenix.com> <20130814111759.6b55cc36@pitrou.net> Message-ID: <520B5CA7.2050405@egenix.com> On 14.08.2013 11:17, Antoine Pitrou wrote: > Le Wed, 14 Aug 2013 11:03:36 +0200, > "M.-A. Lemburg" a ?crit : >> >> Note that this does not necessarily mean that all code going into >> the core has to be subject to a CLA. It is still possible to >> integrate code which has a license compatible with the >> PSF license, but in general, we'd like to avoid the extra work >> of having to check and verify such licenses. > > Is it possible to "check and verify", say, the common form of the > (ultra-simple) MIT license (*)? That would make it a safe starting point > for contributors. > > (*) http://opensource.org/licenses/MIT The "check and verify" step would have to be done on a case-by-case basis. We would need to make sure that the the copyright holders mentioned in the license are in fact the copyright holders, check that the license doesn't prevent the PSF from modifying the PSF license to address future concerns and also pay close attention to things like patents. The CLA makes this a lot easier for the PSF and everyone invovled, which is why we have it :-) But we're getting off-topic here. Such things should be discussed on the python-legal list: http://mail.python.org/mailman/listinfo/python-legal-sig -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 14 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From janzert at janzert.com Wed Aug 14 06:16:36 2013 From: janzert at janzert.com (Janzert) Date: Wed, 14 Aug 2013 00:16:36 -0400 Subject: [Python-ideas] Contributing to Python core via an intermediary In-Reply-To: <7wtxitarud.fsf_-_@benfinney.id.au> References: <7wsiyhaui2.fsf@benfinney.id.au> <20130811120308.6f536907@fsol> <7w1u5z6cjq.fsf_-_@benfinney.id.au> <20130812103110.4155d3ea@pitrou.net> <7wtxitarud.fsf_-_@benfinney.id.au> Message-ID: <520B04A4.7050509@janzert.com> On 8/13/2013 11:12 PM, Ben Finney wrote: > Antoine Pitrou > writes: > >> Ben, I respect your distrust of the CLA's terms (or of CLAs in >> general), but does that mean you wouldn't contribute the python-daemon >> implementation under a CLA? > > I would not sign such a document, correct. > >> AFAICT it has no chance of landing in the official source tree if you >> aren't willing to sign a CLA for it. > > The work will be licensed to all recipients under the Apache License, > and maintained as expected. Could one of the recipients be the person > who makes the submission to the PSF for inclusion in Python core? > IANAL nor am I associated with the PSF and am certainly unqualified to answer. So In any case I think the topic has moved well into the territory that further conversation and questions should be moved over to the legal-sig list at python-legal-sig at python.org Janzert * legal-sig list webpage can be found at http://mail.python.org/mailman/listinfo/python-legal-sig From kristjan at ccpgames.com Wed Aug 14 23:05:58 2013 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Wed, 14 Aug 2013 21:05:58 +0000 Subject: [Python-ideas] Enhanced context managers with ContextManagerExit and None In-Reply-To: References: <529A20D4-55D0-43A2-A7E1-A4FBC1F0FBFC@mac.com> <32B0ADAA-B514-451B-B203-E7640DF8EB93@mac.com> , Message-ID: Phew, it is a bit awkward to discuss this in two separate places. Those interested are invited to take a peek at the issue as well: http://bugs.python.org/issue18677 ________________________________________ Fr?: Nick Coghlan [ncoghlan at gmail.com] Sent: 13. ?g?st 2013 15:52 To: Kristj?n Valur J?nsson Cc: python-ideas at python.org; Ronald Oussoren Efni: Re: [Python-ideas] Enhanced context managers with ContextManagerExit and None > nested() was deprecated and removed because it didn't handle files (or > any other CM that does resource acquisition in __init__) correctly. > The fact you can't factor out arbitrary context managers had nothing > to do with it. Ok, I appreciate that, although I assumed otherwise. Imho, it is not nested that is broken but "CM that do resource aquisition in __init__()". I like to think of them as "hybrid". __enter__ is for resource aquisition. The only reason to do it in __init__ is if the object is its own context manager. But that is not a very good pattern. By deprecating and removing nested(), effectively we are giving up and saying: context managers should only be instantiated with a "with" statement, inline, and should only be used with "with" statements. There is nothing magic about "nested()" causing it to be a "bug magnet". The bug magnet is that some context managers don't take well to being instantiated early. We should have fixed that, rather than effectively prohibiting any programming involving context managers. > At the moment, a CM cannot prevent execution of the body - it must be > paired with an if statement or an inner call that may raise an > exception, keeping the flow control at least somewhat visible at the > point of execution. so: with deal_with_error, acquire_resource as r: foo(r) has more visible flow control than: with acquire_resource_or_deal_with_error as r: foo(r) With the combined with statement, you visually have a single condition manager, if not actually. > The following is also an illegal context manager: > > @contextmanager > def bad_cm(broken=False): > if not broken: > yield Not with my patch :). In fact this now works for all cm_a and cm_b: @contextmanager def pair(cm_a, cm_b): with cm_a as a, cm_b as b: yield a, b contextlib._GeneratorContextManager takes care of raising ContextManagerExit if nothing is yielded. > But that's not what you're asking about. You're asking for the ability > to collapse two independent try statements into one. No, not any more than I'm asking to collapse two "if" statements into one. However, a CM is not a try statement. It is a first class object, just like a function is. If we can do abstract programming with functions, pass callables around, lambdas, do currying, and so on and so forth, why should we not be able to do so with context managers? There are already things you can't factor out as functions - that's why we have generators and context managers. It's also a fact that there are things you can't factor out as single context managers. This is why we have nested context managers and also still have explicit try/except/else/finally statements. > Expand it out to the underlying constructs and you will see this code > is outright buggy, because the exception handler is too broad: > try: > if not condition: > raise ContextManagerExit > execute_code() # ContextManagerExit will be eaten here > except ContextManagerExit: > pass Now you are just being pedantic :) ContextManagerExit is a private exception that can only be raised by if_a, so there are no stray exceptions. Anyway, this was just to demonstrate how flow control _can_ be done with condition managers if one wanted to do so, intentionally. > In current python, it is impossible to create a combined context manager that does this: > > if_c = nested(if_a(), if_b(condition)) > > with if_c: > execute_code() #A single context manager cannot both raise an exception from __enter__() _and_ have it supressed. > >This is a feature, not a bug: the with statement body will *always* >execute, unless __enter__ raises an exception. Don't be misled by the >ability to avoid repeating the with keyword when specifying multiple >context managers in the same statement: semantically, that's >equivalent to multiple nested with statements, so the outer one always >executes, and the inner ones can only skip the body by raising an >exception from __enter__. I realize this, and this is the whole point of my my proposal. That __enter__ can raise an exception and have that exception silenced by the context manager machinery, not having to build that silencing around the machinery yourself. The point is: With two arbitrary context managers, cm_a and cm_b, cm_a _can_ silence the exception that was raised by cm_b's __enter__() method. This may be by design, or by accident, but it is possible. And this means that an equivalent cm_c = nested(cm_a, cm_b) is not possible. I want to be able to deal with this edge case so that can programmatically, in addition to syntactically, nest two context managers. > I'd be open to adding the following context manager to contextlib: > > @contextmanager > def skip(keep=False): Great, but that's not what my proposal is about. It's not about flow control, but composability. > An empty contextlib.ExitStack() instance is already a perfectly > serviceable "do nothing" context manager, we don't need (and won't > get) another one. Did you miss the performance argument? Of course I can write a do-nothing context manager. I'm not suggesting "None" because I'm too lasy to write my own. I'm suggesting it because context managers are very useful for other things than manageing resources, namely optional monitoring of he program: do myfunc(): ... with app_timer: stuff() if app_is_being_monitored: app_timer = real_app_timing_contextmanager else: app_timer = None. A context manager, even a do-nothing one, is currently expensive, consisting of two dynamic function calls. Having a "special" do-nothing context manager singleton would be beneficial in such cases when performance is important. Cheers, Kristj?n Cheers, Nick. From ncoghlan at gmail.com Wed Aug 14 23:52:47 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 14 Aug 2013 17:52:47 -0400 Subject: [Python-ideas] Enhanced context managers with ContextManagerExit and None In-Reply-To: References: <529A20D4-55D0-43A2-A7E1-A4FBC1F0FBFC@mac.com> <32B0ADAA-B514-451B-B203-E7640DF8EB93@mac.com> Message-ID: On 14 August 2013 17:05, Kristj?n Valur J?nsson wrote: > Phew, it is a bit awkward to discuss this in two separate places. Those interested are invited to take a peek at the issue as well: > http://bugs.python.org/issue18677 I put my real reply on the tracker issue (since that's a better historical record), but the short version is that you've persuaded *me* to go back to wanting to fix this (since your approach is significantly less horrible than what I came up with back in 2009 for PEP 377), which means Guido is the one you need to convince to let you reopen the PEP (I added him to the nosy list on the issue). If Guido still doesn't like it, then it will stay broken :( Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From oscar.j.benjamin at gmail.com Fri Aug 16 20:51:32 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 16 Aug 2013 19:51:32 +0100 Subject: [Python-ideas] Yet another sum function (fractions.sum) Message-ID: The discussion around having a sum() function in the statistics module reminded me that despite the fact that there are two functions in the stdlib (and more in the third-party libraries I use) I have previously rolled my own sum() function. The reason is that the stdlib does not currently contain a function that can sum Fractions in linear time for many inputs even though it is possible to implement a function that achieves closer to linear performance in more cases. I propose that there could be a sum() function or class-method in the fractions module for this purpose. I would just raise this on the tracker but seeing how many feathers were ruffled by statistics.sum I thought I'd suggest it here first. To demonstrate the problem I'll show how a quick and dirty mergesum() can out-perform sum(): $ cat tmpsum.py # tmpsum.py # Generate data from random import randint, seed from fractions import Fraction as F seed(123456789) # Use the same numbers each time nums = [F(randint(-1000, 1000), randint(1, 1000)) for _ in range(100000)] # 1) mergesum() is more efficient than sum() with Fractions. # 2) It assumes associativity of addition since it reorders the sum. # 3) It performs the same number of __add__ operations as sum() # 4) A more complicated iterator version is possible. def mergesum(seq): while len(seq) > 1: new = [a + b for a, b in zip(seq[:-1:2], seq[1::2])] if len(seq) % 2: new.append(seq[-1]) seq = new return seq[0] # Just a quick test assert mergesum(nums[:101]) == sum(nums[:101]) So now let's time sum() with these numbers: $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:10]' 'sum(nums)' 1000 loops, best of 3: 206 usec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:100]' 'sum(nums)' 100 loops, best of 3: 6.24 msec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:1000]' 'sum(nums)' 10 loops, best of 3: 320 msec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:10000]' 'sum(nums)' 10 loops, best of 3: 6.43 sec per loop Each time we increase the size of the input by a factor of 10 the computation time increases by a factor of about 30. This is caused by computing the gcd() to normalise the result between each increment to the total. As the numerator and denominator of the subtotal get larger the time taken to compute the gcd() after each increment increases. The result is that the algorithm overall costs somewhere between linear and quadratic time [from the above maybe it's O(n**(3/2))]. Now let's see how mergesum() performs: $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:10]' 'mergesum(nums)' 10000 loops, best of 3: 186 usec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:100]' 'mergesum(nums)' 100 loops, best of 3: 2.16 msec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:1000]' 'mergesum(nums)' 10 loops, best of 3: 24.6 msec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:10000]' 'mergesum(nums)' 10 loops, best of 3: 256 msec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:100000]' 'mergesum(nums)' 10 loops, best of 3: 2.59 sec per loop (I didn't time sum() with a 100000 input to compare with that last run). Notice that mergesum() out-performs sum() for all input sizes and that the time scaling is much closer to linear i.e. 10x the input takes 10x the time. It works by summing adjacent pairs of numbers halving the size of the list each time. This has the benefit that the numerator and denominator in most of additions are a lot smaller so that their gcd() is computed faster. Only the final additions need to use the really big numerator and denominator. The performance of both sum() functions are sensitive to the distribution of inputs and in particular the distribution of denominators but in my own usage a merge-sum algorithm has always been faster. I have found this useful for myself (using even larger lists of numbers) when using the fractions module and I propose that something similar could be added to the fractions module. Oscar From mertz at gnosis.cx Fri Aug 16 22:06:35 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 16 Aug 2013 13:06:35 -0700 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: My intuition was that one might do better than mergesum() by binning numbers. I.e. this implementation: def binsum(seq): bins = dict() for f in seq: bins[f.denominator] = bins.get(f.denominator, 0) + f return mergesum(list(bins.values())) Indeed I am right, but the effect doesn't show up until one looks at a fairly large collection of fractions: 538-code % python3 -m timeit -s 'from tmpsum import mergesum, binsum, nums; nums=nums[:50000]' 'mergesum(nums)' 10 loops, best of 3: 806 msec per loop 539-code % python3 -m timeit -s 'from tmpsum import mergesum, binsum, nums; nums=nums[:50000]' 'binsum(nums)' 10 loops, best of 3: 627 msec per loop binsum() beats sum() at much smaller sizes as well, but it doesn't beat simple mergesum() at the small sizes. This is true, BTW, even if binsum() only uses sum() on the last line; but there's an extra boost in speed to use mergesum() there. I'm not sure whether one might get a better binsum() by binning not on denominator itself, but binning together everything with a denominator that is a multiple of a stored denominator. In principle, that could be many fewer bins with negligible gcd() calculation needed; however, the extra testing needed to check for multiples might override that gain. There are several variations that come to mind, but I haven't tested them. On Fri, Aug 16, 2013 at 11:51 AM, Oscar Benjamin wrote: > The discussion around having a sum() function in the statistics module > reminded me that despite the fact that there are two functions in the > stdlib (and more in the third-party libraries I use) I have previously > rolled my own sum() function. The reason is that the stdlib does not > currently contain a function that can sum Fractions in linear time for > many inputs even though it is possible to implement a function that > achieves closer to linear performance in more cases. I propose that > there could be a sum() function or class-method in the fractions > module for this purpose. I would just raise this on the tracker but > seeing how many feathers were ruffled by statistics.sum I thought I'd > suggest it here first. > > To demonstrate the problem I'll show how a quick and dirty mergesum() > can out-perform sum(): > > $ cat tmpsum.py > # tmpsum.py > > # Generate data > from random import randint, seed > from fractions import Fraction as F > seed(123456789) # Use the same numbers each time > nums = [F(randint(-1000, 1000), randint(1, 1000)) for _ in > range(100000)] > > # 1) mergesum() is more efficient than sum() with Fractions. > # 2) It assumes associativity of addition since it reorders the sum. > # 3) It performs the same number of __add__ operations as sum() > # 4) A more complicated iterator version is possible. > def mergesum(seq): > while len(seq) > 1: > new = [a + b for a, b in zip(seq[:-1:2], seq[1::2])] > if len(seq) % 2: > new.append(seq[-1]) > seq = new > return seq[0] > > # Just a quick test > assert mergesum(nums[:101]) == sum(nums[:101]) > > So now let's time sum() with these numbers: > > $ python -m timeit -s 'from tmpsum import mergesum, nums; > nums=nums[:10]' 'sum(nums)' > 1000 loops, best of 3: 206 usec per loop > $ python -m timeit -s 'from tmpsum import mergesum, nums; > nums=nums[:100]' 'sum(nums)' > 100 loops, best of 3: 6.24 msec per loop > $ python -m timeit -s 'from tmpsum import mergesum, nums; > nums=nums[:1000]' 'sum(nums)' > 10 loops, best of 3: 320 msec per loop > $ python -m timeit -s 'from tmpsum import mergesum, nums; > nums=nums[:10000]' 'sum(nums)' > 10 loops, best of 3: 6.43 sec per loop > > Each time we increase the size of the input by a factor of 10 the > computation time increases by a factor of about 30. This is caused by > computing the gcd() to normalise the result between each increment to > the total. As the numerator and denominator of the subtotal get larger > the time taken to compute the gcd() after each increment increases. > The result is that the algorithm overall costs somewhere between > linear and quadratic time [from the above maybe it's O(n**(3/2))]. > > Now let's see how mergesum() performs: > > $ python -m timeit -s 'from tmpsum import mergesum, nums; > nums=nums[:10]' 'mergesum(nums)' > 10000 loops, best of 3: 186 usec per loop > $ python -m timeit -s 'from tmpsum import mergesum, nums; > nums=nums[:100]' 'mergesum(nums)' > 100 loops, best of 3: 2.16 msec per loop > $ python -m timeit -s 'from tmpsum import mergesum, nums; > nums=nums[:1000]' 'mergesum(nums)' > 10 loops, best of 3: 24.6 msec per loop > $ python -m timeit -s 'from tmpsum import mergesum, nums; > nums=nums[:10000]' 'mergesum(nums)' > 10 loops, best of 3: 256 msec per loop > $ python -m timeit -s 'from tmpsum import mergesum, nums; > nums=nums[:100000]' 'mergesum(nums)' > 10 loops, best of 3: 2.59 sec per loop > > (I didn't time sum() with a 100000 input to compare with that last run). > > Notice that mergesum() out-performs sum() for all input sizes and that > the time scaling is much closer to linear i.e. 10x the input takes 10x > the time. It works by summing adjacent pairs of numbers halving the > size of the list each time. This has the benefit that the numerator > and denominator in most of additions are a lot smaller so that their > gcd() is computed faster. Only the final additions need to use the > really big numerator and denominator. The performance of both sum() > functions are sensitive to the distribution of inputs and in > particular the distribution of denominators but in my own usage a > merge-sum algorithm has always been faster. > > I have found this useful for myself (using even larger lists of > numbers) when using the fractions module and I propose that something > similar could be added to the fractions module. > > > Oscar > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Mon Aug 19 11:35:34 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 19 Aug 2013 11:35:34 +0200 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: David Mertz, 16.08.2013 22:06: > My intuition was that one might do better than mergesum() by binning > numbers. I.e. this implementation: > > def binsum(seq): > bins = dict() > for f in seq: > bins[f.denominator] = bins.get(f.denominator, 0) + f > return mergesum(list(bins.values())) def mergesum(seq): while len(seq) > 1: new = [a + b for a, b in zip(seq[:-1:2], seq[1::2])] if len(seq) % 2: new.append(seq[-1]) seq = new return seq[0] > Indeed I am right, but the effect doesn't show up until one looks at a > fairly large collection of fractions: > > 538-code % python3 -m timeit -s 'from tmpsum import mergesum, binsum, > nums; > nums=nums[:50000]' 'mergesum(nums)' > 10 loops, best of 3: 806 msec per loop > 539-code % python3 -m timeit -s 'from tmpsum import mergesum, binsum, > nums; > nums=nums[:50000]' 'binsum(nums)' > 10 loops, best of 3: 627 msec per loop Simply sorting by denominator gives me a visible advantage over the above: def sortsum(seq): def key(f): return f.denominator if isinstance(f, F) else 1 seq = sorted(seq, key=key) if len(seq) < 3: return sum(seq) return mergesum(seq) $ python3.4 -m timeit -s '...; c=nums[:10000]' 'sortsum(c)' 10 loops, best of 3: 76.9 msec per loop $ python3.4 -m timeit -s '...; c=nums[:10000]' 'binsum(c)' 10 loops, best of 3: 83.2 msec per loop $ python3.4 -m timeit -s '...; c=nums[:10000]' 'mergesum(c)' 10 loops, best of 3: 106 msec per loop $ python3.4 -m timeit -s '...; c=nums[:1000]' 'sortsum(c)' 100 loops, best of 3: 9.49 msec per loop $ python3.4 -m timeit -s '...; c=nums[:1000]' 'binsum(c)' 100 loops, best of 3: 12.9 msec per loop $ python3.4 -m timeit -s '...; c=nums[:1000]' 'mergesum(c)' 100 loops, best of 3: 9.66 msec per loop $ python3.4 -m timeit -s '...; c=nums[:100]' 'sortsum(c)' 1000 loops, best of 3: 951 usec per loop $ python3.4 -m timeit -s '...; c=nums[:100]' 'mergesum(c)' 1000 loops, best of 3: 937 usec per loop $ python3.4 -m timeit -s '...; c=nums[:10]' 'sortsum(c)' 10000 loops, best of 3: 88.8 usec per loop $ python3.4 -m timeit -s '...; c=nums[:10]' 'mergesum(c)' 10000 loops, best of 3: 80.2 usec per loop So, it's a bit slower for small sequences (15 microseconds for <100 items sounds acceptable to me), but it's quite a bit faster for long sequences. It seems to be slowing down a bit for really long sequences, though: $ python3.4 -m timeit -s '...; c=nums[:100000]' 'mergesum(c)' 10 loops, best of 3: 1 sec per loop $ python3.4 -m timeit -s '...; c=nums[:100000]' 'sortsum(c)' 10 loops, best of 3: 748 msec per loop $ python3.4 -m timeit -s '...; c=nums[:100000]' 'binsum(c)' 10 loops, best of 3: 743 msec per loop However, unpacking the fractions for the bin summing makes this way faster for larger sequences: def binsum2(seq): bins = dict() get_bin = bins.get _isinstance = isinstance for f in seq: d, n = (f.denominator,f.numerator) if _isinstance(f,F) else (1,f) bins[d] = get_bin(d, 0) + n return mergesum([ F(n,d) for d, n in sorted(bins.items()) ]) $ python3.4 -m timeit -s '...; c=nums[:10000]' 'sortsum(c)' 10 loops, best of 3: 76.9 msec per loop $ python3.4 -m timeit -s '...; c=nums[:10000]' 'binsum2(c)' 10 loops, best of 3: 21 msec per loop $ python3.4 -m timeit -s '...; c=nums[:1000]' 'sortsum(c)' 100 loops, best of 3: 9.49 msec per loop $ python3.4 -m timeit -s '...; c=nums[:1000]' 'binsum2(c)' 100 loops, best of 3: 8.7 msec per loop But again, slower for short ones: $ python3.4 -m timeit -s '...; c=nums[:100]' 'mergesum(c)' 1000 loops, best of 3: 937 usec per loop $ python3.4 -m timeit -s '...; c=nums[:100]' 'binsum2(c)' 1000 loops, best of 3: 1.34 msec per loop $ python3.4 -m timeit -s '...; c=nums[:10]' 'mergesum(c)' 10000 loops, best of 3: 80.2 usec per loop $ python3.4 -m timeit -s '...; c=nums[:10]' 'binsum2(c)' 10000 loops, best of 3: 137 usec per loop Which is expected, because short sequences make it less likely to actually find common denominators. Assuming that the set of distinct denominators is usually small compared to the number of values, this would be the right tradeoff. Maybe inlining the denominator normalisation instead of creating Fraction instances at the end would give another boost here. In any case, this huge difference in performance speaks for providing some kind of specialised sum() function in the fractions module. Stefan From __peter__ at web.de Mon Aug 19 12:09:49 2013 From: __peter__ at web.de (Peter Otten) Date: Mon, 19 Aug 2013 12:09:49 +0200 Subject: [Python-ideas] Yet another sum function (fractions.sum) References: Message-ID: Oscar Benjamin wrote: > The discussion around having a sum() function in the statistics module > reminded me that despite the fact that there are two functions in the > stdlib (and more in the third-party libraries I use) I have previously > rolled my own sum() function. The reason is that the stdlib does not > currently contain a function that can sum Fractions in linear time for > many inputs even though it is possible to implement a function that > achieves closer to linear performance in more cases. I propose that > there could be a sum() function or class-method in the fractions > module for this purpose. I would just raise this on the tracker but > seeing how many feathers were ruffled by statistics.sum I thought I'd > suggest it here first. If that takes on, and the number of sum implementations grows, maybe there should be a __sum__() special (class) method, and the sum built-in be changed roughly to def sum(items, start=0): try: specialized_sum = start.__sum__ except AttributeError: return ... # current behaviour return specialized_sum(items, start) sum(items, 0.0) would then automatically profit from the clever optimizations of math.fsum() etc. From p.f.moore at gmail.com Mon Aug 19 12:31:13 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 19 Aug 2013 11:31:13 +0100 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: On 19 August 2013 11:09, Peter Otten <__peter__ at web.de> wrote: > If that takes on, and the number of sum implementations grows, maybe there > should be a __sum__() special (class) method, and the sum built-in be > changed roughly to > > def sum(items, start=0): > try: > specialized_sum = start.__sum__ > except AttributeError: > return ... # current behaviour > return specialized_sum(items, start) > > sum(items, 0.0) would then automatically profit from the clever > optimizations of math.fsum() etc. > Two points: 1. Specialising based on the type of the start parameter probably isn't ideal - what you *really* want is to specialise on the type of elements in the list (which is problematic, as lists can contain objects of differing types, so you have to consider those cases - maybe dispatch based on the first element of the list, who knows?) 2. If you do specialise based on start, this can easily be implemented using the new single-dispatch generic functions (you'd have to make start the first argument, but if you're dispatching on it, you'd likely need it to be mandatory anyway so that's not such a big deal). I'm not sure this is a good idea in any case, though - why is sum(items, 0.0) (with a "magic" start parameter which is a float) better than an explicit fsum(items) (where the function name says it's a float sum)? Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From clay.sweetser at gmail.com Mon Aug 19 15:47:31 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Mon, 19 Aug 2013 09:47:31 -0400 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: On Aug 19, 2013 6:09 AM, "Peter Otten" <__peter__ at web.de> wrote: > > If that takes on, and the number of sum implementations grows, maybe there > should be a __sum__() special (class) method, and the sum built-in be > changed roughly to > > def sum(items, start=0): > try: > specialized_sum = start.__sum__ > except AttributeError: > return ... # current behaviour > return specialized_sum(items, start) > > sum(items, 0.0) would then automatically profit from the clever > optimizations of math.fsum() etc. Another possibility (and I'm not suggesting that it's better than the proposed solution above) is to make a module just for the numerous sum implementations people need. -------------- next part -------------- An HTML attachment was scrubbed... URL: From __peter__ at web.de Mon Aug 19 15:58:45 2013 From: __peter__ at web.de (Peter Otten) Date: Mon, 19 Aug 2013 15:58:45 +0200 Subject: [Python-ideas] Yet another sum function (fractions.sum) References: Message-ID: Paul Moore wrote: > On 19 August 2013 11:09, Peter Otten > <__peter__ at web.de> wrote: > >> If that takes on, and the number of sum implementations grows, maybe >> there should be a __sum__() special (class) method, and the sum built-in >> be changed roughly to >> >> def sum(items, start=0): >> try: >> specialized_sum = start.__sum__ >> except AttributeError: >> return ... # current behaviour >> return specialized_sum(items, start) >> >> sum(items, 0.0) would then automatically profit from the clever >> optimizations of math.fsum() etc. >> > > Two points: > > 1. Specialising based on the type of the start parameter probably isn't > ideal - what you *really* want is to specialise on the type of elements in > the list You'd need another language for that -- or pypyesque magic ;) > (which is problematic, as lists can contain objects of differing > types, so you have to consider those cases - maybe dispatch based on the > first element of the list, who knows?) > 2. If you do specialise based on start, this can easily be implemented > using the new single-dispatch generic functions (you'd have to make start > the first argument, but if you're dispatching on it, you'd likely need it > to be mandatory anyway so that's not such a big deal). > > I'm not sure this is a good idea in any case, though - why is sum(items, > 0.0) (with a "magic" start parameter which is a float) better than an > explicit fsum(items) (where the function name says it's a float sum)? If there are multiple sum functions, typically one per type of summand, how can you make them easily discoverable? I doubt that fsum is used in many places where it would be appropriate. If you make the optimizations available via the built-in sum() you can add a sentence like "If you provide an explicit start value sum() may pick an algorithm optimized for that type." # needs work to its documentation and be done. From p.f.moore at gmail.com Mon Aug 19 17:18:55 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 19 Aug 2013 16:18:55 +0100 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: On 19 August 2013 14:58, Peter Otten <__peter__ at web.de> wrote: > You'd need another language for that -- or pypyesque magic ;) > Yes, that was sort of my point :-) > If there are multiple sum functions, typically one per type of summand, how > can you make them easily discoverable? I doubt that fsum is used in many > places where it would be appropriate. If you make the optimizations > available via the built-in sum() you can add a sentence like > > "If you provide an explicit start value sum() may pick an algorithm > optimized for that type." # needs work > > to its documentation and be done. > having l = and then res = sum(l) res = sum(l, 0.0) res = sum(l, 0) res = sum(l, Decimal('0.0')) res = sum(l, Fraction(0, 1)) all do subtly (or maybe not so subtly!) different things, seems to me to be a recipe for confusion, if not disaster. Certainly sum/math.fsum/etc. are not particularly discoverable, but that can *also* be fixed with documentation. In the documentation of sum(), add a paragraph: """If your values are all of one particular type, specialised functions for that type may be available - these may be more numerically stable, or faster, or otherwise more appropriate. The standard library includes math.fsum for floats, X.sum for X, ...""" That's just as good if you're starting from "I want to sum some values". And it's a lot *more* straightforward to find/google the documentation of math.fsum from code that calls it, than to know where you'd find details of the specialised algorithm used for sum(l, 0.0) if that was all you had to go on... Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Mon Aug 19 18:38:23 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 19 Aug 2013 18:38:23 +0200 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: Clay Sweetser, 19.08.2013 15:47: > On Aug 19, 2013 6:09 AM, "Peter Otten" wrote: >> >> If that takes on, and the number of sum implementations grows, maybe there >> should be a __sum__() special (class) method, and the sum built-in be >> changed roughly to >> >> def sum(items, start=0): >> try: >> specialized_sum = start.__sum__ >> except AttributeError: >> return ... # current behaviour >> return specialized_sum(items, start) >> >> sum(items, 0.0) would then automatically profit from the clever >> optimizations of math.fsum() etc. > > Another possibility (and I'm not suggesting that it's better than the > proposed solution above) is to make a module just for the numerous sum > implementations people need. -1 For summing up fractions, the fractions module is The One Obvious Place to look. For Decimal arithmetic, my first look would always go to the decimal module. Given that the math module deals with floating point arithmetic, it's The One Obvious Place to look for summing up floats. I agree with Paul Moore that the rest can be handled by documentation. Stefan From tjreedy at udel.edu Mon Aug 19 20:10:48 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 19 Aug 2013 14:10:48 -0400 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: On 8/19/2013 5:35 AM, Stefan Behnel wrote: > David Mertz, 16.08.2013 22:06: >> My intuition was that one might do better than mergesum() by binning >> numbers. I.e. this implementation: >> >> def binsum(seq): >> bins = dict() >> for f in seq: >> bins[f.denominator] = bins.get(f.denominator, 0) + f >> return mergesum(list(bins.values())) > > def mergesum(seq): > while len(seq) > 1: > new = [a + b for a, b in zip(seq[:-1:2], seq[1::2])] > if len(seq) % 2: > new.append(seq[-1]) > seq = new > return seq[0] > >> Indeed I am right, but the effect doesn't show up until one looks at a >> fairly large collection of fractions: >> >> 538-code % python3 -m timeit -s 'from tmpsum import mergesum, binsum, >> nums; >> nums=nums[:50000]' 'mergesum(nums)' >> 10 loops, best of 3: 806 msec per loop >> 539-code % python3 -m timeit -s 'from tmpsum import mergesum, binsum, >> nums; >> nums=nums[:50000]' 'binsum(nums)' >> 10 loops, best of 3: 627 msec per loop > > Simply sorting by denominator gives me a visible advantage over the above: > > def sortsum(seq): > def key(f): > return f.denominator if isinstance(f, F) else 1 > seq = sorted(seq, key=key) > if len(seq) < 3: > return sum(seq) > return mergesum(seq) > > $ python3.4 -m timeit -s '...; c=nums[:10000]' 'sortsum(c)' > 10 loops, best of 3: 76.9 msec per loop > $ python3.4 -m timeit -s '...; c=nums[:10000]' 'binsum(c)' > 10 loops, best of 3: 83.2 msec per loop > $ python3.4 -m timeit -s '...; c=nums[:10000]' 'mergesum(c)' > 10 loops, best of 3: 106 msec per loop > > $ python3.4 -m timeit -s '...; c=nums[:1000]' 'sortsum(c)' > 100 loops, best of 3: 9.49 msec per loop > $ python3.4 -m timeit -s '...; c=nums[:1000]' 'binsum(c)' > 100 loops, best of 3: 12.9 msec per loop > $ python3.4 -m timeit -s '...; c=nums[:1000]' 'mergesum(c)' > 100 loops, best of 3: 9.66 msec per loop > > $ python3.4 -m timeit -s '...; c=nums[:100]' 'sortsum(c)' > 1000 loops, best of 3: 951 usec per loop > $ python3.4 -m timeit -s '...; c=nums[:100]' 'mergesum(c)' > 1000 loops, best of 3: 937 usec per loop > > $ python3.4 -m timeit -s '...; c=nums[:10]' 'sortsum(c)' > 10000 loops, best of 3: 88.8 usec per loop > $ python3.4 -m timeit -s '...; c=nums[:10]' 'mergesum(c)' > 10000 loops, best of 3: 80.2 usec per loop > > > So, it's a bit slower for small sequences (15 microseconds for <100 items > sounds acceptable to me), but it's quite a bit faster for long sequences. > > It seems to be slowing down a bit for really long sequences, though: > > $ python3.4 -m timeit -s '...; c=nums[:100000]' 'mergesum(c)' > 10 loops, best of 3: 1 sec per loop > $ python3.4 -m timeit -s '...; c=nums[:100000]' 'sortsum(c)' > 10 loops, best of 3: 748 msec per loop > $ python3.4 -m timeit -s '...; c=nums[:100000]' 'binsum(c)' > 10 loops, best of 3: 743 msec per loop > > > However, unpacking the fractions for the bin summing makes this way faster > for larger sequences: > > def binsum2(seq): > bins = dict() > get_bin = bins.get > _isinstance = isinstance > for f in seq: > d, n = (f.denominator,f.numerator) if _isinstance(f,F) else (1,f) > bins[d] = get_bin(d, 0) + n > return mergesum([ F(n,d) for d, n in sorted(bins.items()) ]) > > $ python3.4 -m timeit -s '...; c=nums[:10000]' 'sortsum(c)' > 10 loops, best of 3: 76.9 msec per loop > $ python3.4 -m timeit -s '...; c=nums[:10000]' 'binsum2(c)' > 10 loops, best of 3: 21 msec per loop > > $ python3.4 -m timeit -s '...; c=nums[:1000]' 'sortsum(c)' > 100 loops, best of 3: 9.49 msec per loop > $ python3.4 -m timeit -s '...; c=nums[:1000]' 'binsum2(c)' > 100 loops, best of 3: 8.7 msec per loop > > But again, slower for short ones: > > $ python3.4 -m timeit -s '...; c=nums[:100]' 'mergesum(c)' > 1000 loops, best of 3: 937 usec per loop > $ python3.4 -m timeit -s '...; c=nums[:100]' 'binsum2(c)' > 1000 loops, best of 3: 1.34 msec per loop > > $ python3.4 -m timeit -s '...; c=nums[:10]' 'mergesum(c)' > 10000 loops, best of 3: 80.2 usec per loop > $ python3.4 -m timeit -s '...; c=nums[:10]' 'binsum2(c)' > 10000 loops, best of 3: 137 usec per loop > > > Which is expected, because short sequences make it less likely to actually > find common denominators. Assuming that the set of distinct denominators is > usually small compared to the number of values, this would be the right > tradeoff. > > Maybe inlining the denominator normalisation instead of creating Fraction > instances at the end would give another boost here. > > In any case, this huge difference in performance speaks for providing some > kind of specialised sum() function in the fractions module. At first glance, I agree. It would make using Fractions in a real application more practical. -- Terry Jan Reedy From oscar.j.benjamin at gmail.com Tue Aug 20 18:39:59 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 20 Aug 2013 17:39:59 +0100 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: On 19 August 2013 10:35, Stefan Behnel wrote: > David Mertz, 16.08.2013 22:06: >> My intuition was that one might do better than mergesum() by binning >> numbers. I.e. this implementation: >> >> def binsum(seq): >> bins = dict() >> for f in seq: >> bins[f.denominator] = bins.get(f.denominator, 0) + f >> return mergesum(list(bins.values())) Good thinking David. Actually for some distributions of inputs this massively outperforms the other implementations. One common use case I have for Fractions is to check the accuracy of floating point computation by repeating the computation with Fractions. In this case I would be working with Fractions whose denominators are always powers of 2 (and usually only a handful of different values). And as Stefan says if we're binning on the denominator then we can make it really fast by adding the numerators with int.__add__. > Simply sorting by denominator gives me a visible advantage over the above: > > def sortsum(seq): > def key(f): > return f.denominator if isinstance(f, F) else 1 > seq = sorted(seq, key=key) > if len(seq) < 3: > return sum(seq) > return mergesum(seq) Interesting. I've developed an iterator version of mergesum: def imergesum(iterable): stack = [] it = iter(iterable) nextpow2 = 1 while True: for n2, val in enumerate(it, 1): if n2 == nextpow2: stack.append(val) nextpow2 *= 2 break elif n2 % 2: stack[-1] += val else: ishift = -1 while not n2 % 2: val, stack[ishift] = stack[ishift], val ishift -= 1 n2 >>= 1 stack[ishift] += val else: return mergesum(stack) # Could just use sum here It uses log(N) storage so you'd run out of space on an iterator of say 2 ** sys.maxint Fractions but that's unlikely to bother anyone. I combined the different approaches to make a rationalsum() function that is scalable and tries to take advantage of the binsum() and sortsum() improvements where possible: def rationalsum(iterable, tablesize=1000): def binsumgen(iterable): iterator = iter(iterable) bins = defaultdict(int) finished = False denom = itemgetter(0) while not finished: finished = True for n, num in zip(range(tablesize), iterator): bins[num.denominator] += num.numerator finished = False if len(bins) >= tablesize or finished: dn = sorted(bins.items(), key=denom) yield mergesum([F(n, d) for d, n in dn]) bins.clear() return imergesum(binsumgen(iterable)) I tested this on a suite of different rational sequences (full script at the bottom): 1) float - Fractions made from floats with power of 2 denominators. 2) d1000 - the original numbers with denominators uniform in [1, 1000]. 3) d10 - like above but [1, 10]. 4) bigint - big ints rather than Fractions (just for comparison). 5) e - the sequence 1 + 1 + 1/2 + 1/3! + 1/4! + ... (this is slow). In different situations the binsum(), sortedsum() and mergesum() functions parform differently (in some cases wildly so). The rationalsum() function is never the best but always has the same order of magnitude as the best. Apart from the bigint case builtins.sum() is always among the worst performing. Here's the benchmark output (Python 3.3 on Windows XP 32-bit - the script takes a while): Running benchmarks (times in microseconds)... -------------- float --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 282 | 269 | 298 | 238 | 284 | 249 100 | 2804 | 2717 | 3045 | 766 | 2689 | 780 1000 | 29597 | 27468 | 31275 | 2093 | 27151 | 2224 10000 | 2.9e+5 | 2.7e+5 | 3.1e+5 | 12908 | 2.7e+5 | 13122 100000 | 3.1e+6 | 2.8e+6 | 3.1e+6 | 1.2e+5 | 2.8e+6 | 1.2e+5 -------------- d1000 --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 214 | 190 | 203 | 318 | 206 | 343 100 | 5879 | 2233 | 2578 | 3178 | 2338 | 3197 1000 | 2.6e+5 | 24866 | 29015 | 21929 | 23344 | 21901 10000 | 6.1e+6 | 2.7e+5 | 3e+5 | 48111 | 1.8e+5 | 48814 100000 | 6.6e+7 | 2.7e+6 | 3e+6 | 1.5e+5 | 1.8e+6 | 3.5e+5 -------------- bigint --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 1 | 14 | 19 | 19 | 19 | 36 100 | 15 | 54 | 140 | 59 | 72 | 81 1000 | 158 | 329 | 1294 | 461 | 499 | 550 10000 | 1655 | 2957 | 13102 | 4477 | 4497 | 5242 100000 | 16539 | 35929 | 1.3e+5 | 44905 | 55101 | 52732 -------------- e --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 155 | 156 | 164 | 262 | 166 | 274 100 | 9464 | 2777 | 3190 | 9282 | 2893 | 4079 1000 | 1.1e+7 | 8.7e+5 | 9.4e+5 | 1.1e+7 | 8.8e+5 | 9e+5 -------------- d10 --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 142 | 144 | 156 | 139 | 157 | 153 100 | 1456 | 1470 | 1822 | 343 | 1541 | 347 1000 | 15080 | 14978 | 16955 | 1278 | 15557 | 1241 10000 | 1.5e+5 | 1.5e+5 | 1.6e+5 | 9845 | 1.5e+5 | 10343 100000 | 1.5e+6 | 1.5e+6 | 1.7e+6 | 96015 | 1.5e+6 | 99856 And here's the script: # tmpsum.py from __future__ import print_function def mergesum(seq): while len(seq) > 1: new = [a + b for a, b in zip(seq[:-1:2], seq[1::2])] if len(seq) % 2: new.append(seq[-1]) seq = new return seq[0] def imergesum(iterable): stack = [] it = iter(iterable) nextpow2 = 1 while True: for n2, val in enumerate(it, 1): if n2 == nextpow2: stack.append(val) nextpow2 *= 2 break elif n2 % 2: stack[-1] += val else: ishift = -1 while not n2 % 2: val, stack[ishift] = stack[ishift], val ishift -= 1 n2 >>= 1 stack[ishift] += val else: return mergesum(stack) from collections import defaultdict def binsum(iterable): bins = defaultdict(int) for num in iterable: bins[num.denominator] += num.numerator return mergesum([F(n, d) for d, n in bins.items()]) from operator import attrgetter def sortsum(seq): seq = sorted(seq, key=attrgetter('denominator')) if len(seq) < 3: return sum(seq) return mergesum(seq) from operator import itemgetter def rationalsum(iterable, tablesize=1000): def binsumgen(iterable): iterator = iter(iterable) bins = defaultdict(int) finished = False denom = itemgetter(0) while not finished: finished = True for n, num in zip(range(tablesize), iterator): bins[num.denominator] += num.numerator finished = False if len(bins) >= tablesize or finished: dn = sorted(bins.items(), key=denom) yield mergesum([F(n, d) for d, n in dn]) bins.clear() return imergesum(binsumgen(iterable)) sumfuncs = sum, mergesum, imergesum, binsum, sortsum, rationalsum # Just a quick test if True: print('testing', end=' ') from fractions import Fraction as F nums = [F(n, 2*n +1) for n in range(3000)] for n in 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 100, 2005: ns = nums[:n] result = sum(ns) for func in sumfuncs: assert func(ns) == result, func.__name__ print('.', end='') print(' passed!') if True: print('generating data...') from random import randint, gauss, seed from fractions import Fraction as F seed(123456789) nmax = 10 ** 5 nums_d1000 = [F(randint(-1000, 1000), randint(1, 1000)) for _ in range(nmax)] nums_d10 = [F(randint(-10, 10), randint(1, 10)) for _ in range(nmax)] nums_float = [F(*gauss(0, 1).as_integer_ratio()) for _ in range(nmax)] nums_e = [F(1)] for n in range(1, 1000): nums_e.append(nums_e[-1] / n) nums_bigint = [10**65 + randint(1, 100) for n in range(nmax)] nums_all = { 'd1000': nums_d1000, 'd10': nums_d10, 'float': nums_float, 'e': nums_e, 'bigint': nums_bigint, } if True: print('\nRunning benchmarks (times in microseconds)...') import timeit def mytimeit(stmt, setup): n = 10 time = lambda : timeit.timeit(stmt, setup, number=n) / n t = time() if t * n < 1e-1: while t * n < 1e-2: n *= 10 if n == 10: ts = [t, time(), time()] else: ts = [time(), time(), time()] t = min(ts) return 1e6 * t def fmtnum(n): s = str(int(n)) if len(s) > 5: s = '%1.2g' % n if s[-4:-1] == 'e+0': s = s[:-2] + s[-1] return '%-6s' % s gapline = '-' * 60 fnames = [f.__name__ for f in sumfuncs] header = ' n | ' + ' | '.join('%-6s' % name[:6] for name in fnames) sum = sum setup = 'from __main__ import ' + ', '.join(['nums'] + fnames) setup += '; nums=nums[:%s]' for name, nums in nums_all.items(): print('\n\n--------------\n%s\n---------------' % name) print(header) print(gapline) for n in 10, 100, 1000, 10000, 100000: if n > len(nums): break times = [] for func in sumfuncs: stmt = '%s(nums)' % (func.__name__,) times.append(mytimeit(stmt, setup % n)) print(('%6s | ' % n), end='') print(' | '.join(fmtnum(t) for t in times)) Oscar From oscar.j.benjamin at gmail.com Tue Aug 20 19:30:30 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 20 Aug 2013 18:30:30 +0100 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: On 19 August 2013 11:09, Peter Otten <__peter__ at web.de> wrote: > Oscar Benjamin wrote: > > If that takes on, and the number of sum implementations grows, maybe there > should be a __sum__() special (class) method, and the sum built-in be > changed roughly to > > def sum(items, start=0): > try: > specialized_sum = start.__sum__ > except AttributeError: > return ... # current behaviour > return specialized_sum(items, start) > > sum(items, 0.0) would then automatically profit from the clever > optimizations of math.fsum() etc. fsum() is about controlling accumulated rounding errors rather than optimisation (although it may be faster I've never needed to check). I'd rather write sum(items, Fraction) than sum(items, Fraction(0)) and either way it's so close to Fraction.sum(items) I understand what you mean about having a single function that does a good job for all types and it goes into a much broader set of issues around how Python handles different numeric types. It would be possible in a backward compatible way provided Fraction.__sum__ falls back on sum when it finds a non-Rational. There are lots of other areas in the stdlib where a similar sort of thinking could apply e.g.: >>> import math >>> from decimal import Decimal as D >>> from fractions import Fraction as F There's currently no fsum equivalent for Decimals even though a pure Python implementation could have reasonable performance: >>> math.fsum([0.1, 0.2, 0.3]) == 0.6 True >>> math.fsum([D('0.1'), D('0.2'), D('0.3')]) == D('0.6') False >>> D(math.fsum([D('0.1'), D('0.2'), D('0.3')])) == D('0.6') False sum() would do better in the above but then it fails in other situations: >>> sum([D('1e50'), D('1'), D('-1e50')]) == 1 False >>> math.fsum([D('1e50'), D('1'), D('-1e50')]) == 1 True The math module rounds everything to float losing precision even if better routines are available: >>> math.sqrt(D('0.02')) 0.1414213562373095 >>> D('0.02').sqrt() Decimal('0.1414213562373095048801688724') >>> math.exp(D(1)) 2.718281828459045 >>> D(1).exp() Decimal('2.718281828459045235360287471') There's also no support for computing the other transcendental functions with Decimals e.g. sin, cos etc. without rounding to float. The math module also fails to find exact results when possible: >>> a = 10 ** 20 + 1 >>> a 100000000000000000001 >>> math.sqrt(a**2) == a False >>> b = F(2, 3) >>> math.sqrt(b ** 2) == b False These ones could just get fixed in the Fractions module: >>> F(4, 9) ** F(1, 2) 0.6666666666666666 >>> F(2, 3) ** 2 Fraction(4, 9) >>> a 100000000000000000001 >>> (a ** 2) ** F(1, 2) == a False Do you think that any of these things should also be changed so that there can be e.g. one sqrt() function that does the right thing for all types? One way to unify these things is with a load of new dunder methods so that each type can implement its own response to the standard function. Another is to have special modules that deal with different things and e.g. a function for summing Rationals and another for summing inexact types and so on. Another possibility is that you have decimal functions in the decimal module and fraction functions in the fractions module and so on and don't use any duck-typing (except in coercion). That may seem strange for Python but it's worth remembering that most of the best numerical code that has ever been written was written in languages that *require* you to write separate code for each numeric type and don't give any syntactic support for working with non-native types. Oscar From mertz at gnosis.cx Tue Aug 20 20:14:03 2013 From: mertz at gnosis.cx (David Mertz) Date: Tue, 20 Aug 2013 11:14:03 -0700 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: Oscar's improvement to my binsum() idea, using Stefan's excellent point that we should just use int.__add__ on the numerators, is quite elegant. I.e.: def binsum(iterable): bins = defaultdict(int) for num in iterable: bins[num.denominator] += num.numerator return mergesum([F(n, d) for d, n in bins.items()]) Moreover, looking at Oscar's data, it looks like the improved binsum() ALWAYS beats rationalsum()[*] [*] OK, technically there are three cases where this isn't true: d1000/n=100 and d10/n=1000 where it is about one percent slower (although I think that is in the noise of timing it); and the "pathological" case of calculating 'e', where no denominators repeat and where rationalsum() beats everything else by a large margin at the asymptote (or maybe imergesum() does, but it is because they behave the same here). So two thoughts: (1) Use the much simpler binsum(), which *does* accept an iterator, and in non-pathological cases will also use moderate memory (i.e. N numbers will fall into approximately log(N) different denominator classes). (2) Call this specialized sum something like Fraction.sum(). While the other functions work on, for example, bigint, the builtin sum() is vastly better there. I presume a similar result would be true when concrete optimizations of "Decimal.sum()" are discussed. The generic case in Python allows for sum()'ing heterogeneous collections of numbers. Putting specialized dunder methods into e.g. Fraction.__sum__ is still going to fail in the general case (i.e. fall back to __builtins__.sum()). The special case where you KNOW you are summing a homogenous collection of a certain style of number is special enough that spelling it Fraction.sum() or math.fsum() or Decimal.sum() is more explicit and no harder. I guess on a small point of symmetry, if the spellings above are chosen, I'd like it also to be true that: math.fsum == float.sum On Tue, Aug 20, 2013 at 9:39 AM, Oscar Benjamin wrote: > On 19 August 2013 10:35, Stefan Behnel wrote: > > David Mertz, 16.08.2013 22:06: > >> My intuition was that one might do better than mergesum() by binning > >> numbers. I.e. this implementation: > >> > >> def binsum(seq): > >> bins = dict() > >> for f in seq: > >> bins[f.denominator] = bins.get(f.denominator, 0) + f > >> return mergesum(list(bins.values())) > > Good thinking David. Actually for some distributions of inputs this > massively outperforms the other implementations. One common use case I > have for Fractions is to check the accuracy of floating point > computation by repeating the computation with Fractions. In this case > I would be working with Fractions whose denominators are always powers > of 2 (and usually only a handful of different values). And as Stefan > says if we're binning on the denominator then we can make it really > fast by adding the numerators with int.__add__. > > > Simply sorting by denominator gives me a visible advantage over the > above: > > > > def sortsum(seq): > > def key(f): > > return f.denominator if isinstance(f, F) else 1 > > seq = sorted(seq, key=key) > > if len(seq) < 3: > > return sum(seq) > > return mergesum(seq) > > Interesting. > > I've developed an iterator version of mergesum: > > def imergesum(iterable): > stack = [] > it = iter(iterable) > nextpow2 = 1 > while True: > for n2, val in enumerate(it, 1): > if n2 == nextpow2: > stack.append(val) > nextpow2 *= 2 > break > elif n2 % 2: > stack[-1] += val > else: > ishift = -1 > while not n2 % 2: > val, stack[ishift] = stack[ishift], val > ishift -= 1 > n2 >>= 1 > stack[ishift] += val > else: > return mergesum(stack) # Could just use sum here > > It uses log(N) storage so you'd run out of space on an iterator of say > 2 ** sys.maxint Fractions but that's unlikely to bother anyone. > > I combined the different approaches to make a rationalsum() function > that is scalable and tries to take advantage of the binsum() and > sortsum() improvements where possible: > > def rationalsum(iterable, tablesize=1000): > def binsumgen(iterable): > iterator = iter(iterable) > bins = defaultdict(int) > finished = False > denom = itemgetter(0) > while not finished: > finished = True > for n, num in zip(range(tablesize), iterator): > bins[num.denominator] += num.numerator > finished = False > if len(bins) >= tablesize or finished: > dn = sorted(bins.items(), key=denom) > yield mergesum([F(n, d) for d, n in dn]) > bins.clear() > return imergesum(binsumgen(iterable)) > > I tested this on a suite of different rational sequences (full script > at the bottom): > 1) float - Fractions made from floats with power of 2 denominators. > 2) d1000 - the original numbers with denominators uniform in [1, 1000]. > 3) d10 - like above but [1, 10]. > 4) bigint - big ints rather than Fractions (just for comparison). > 5) e - the sequence 1 + 1 + 1/2 + 1/3! + 1/4! + ... (this is slow). > > In different situations the binsum(), sortedsum() and mergesum() > functions parform differently (in some cases wildly so). The > rationalsum() function is never the best but always has the same order > of magnitude as the best. Apart from the bigint case builtins.sum() is > always among the worst performing. > > Here's the benchmark output (Python 3.3 on Windows XP 32-bit - the > script takes a while): > > Running benchmarks (times in microseconds)... > > > -------------- > float > --------------- > n | sum | merges | imerge | binsum | sortsu | ration > ------------------------------------------------------------ > 10 | 282 | 269 | 298 | 238 | 284 | 249 > 100 | 2804 | 2717 | 3045 | 766 | 2689 | 780 > 1000 | 29597 | 27468 | 31275 | 2093 | 27151 | 2224 > 10000 | 2.9e+5 | 2.7e+5 | 3.1e+5 | 12908 | 2.7e+5 | 13122 > 100000 | 3.1e+6 | 2.8e+6 | 3.1e+6 | 1.2e+5 | 2.8e+6 | 1.2e+5 > > > -------------- > d1000 > --------------- > n | sum | merges | imerge | binsum | sortsu | ration > ------------------------------------------------------------ > 10 | 214 | 190 | 203 | 318 | 206 | 343 > 100 | 5879 | 2233 | 2578 | 3178 | 2338 | 3197 > 1000 | 2.6e+5 | 24866 | 29015 | 21929 | 23344 | 21901 > 10000 | 6.1e+6 | 2.7e+5 | 3e+5 | 48111 | 1.8e+5 | 48814 > 100000 | 6.6e+7 | 2.7e+6 | 3e+6 | 1.5e+5 | 1.8e+6 | 3.5e+5 > > > -------------- > bigint > --------------- > n | sum | merges | imerge | binsum | sortsu | ration > ------------------------------------------------------------ > 10 | 1 | 14 | 19 | 19 | 19 | 36 > 100 | 15 | 54 | 140 | 59 | 72 | 81 > 1000 | 158 | 329 | 1294 | 461 | 499 | 550 > 10000 | 1655 | 2957 | 13102 | 4477 | 4497 | 5242 > 100000 | 16539 | 35929 | 1.3e+5 | 44905 | 55101 | 52732 > > > -------------- > e > --------------- > n | sum | merges | imerge | binsum | sortsu | ration > ------------------------------------------------------------ > 10 | 155 | 156 | 164 | 262 | 166 | 274 > 100 | 9464 | 2777 | 3190 | 9282 | 2893 | 4079 > 1000 | 1.1e+7 | 8.7e+5 | 9.4e+5 | 1.1e+7 | 8.8e+5 | 9e+5 > > > -------------- > d10 > --------------- > n | sum | merges | imerge | binsum | sortsu | ration > ------------------------------------------------------------ > 10 | 142 | 144 | 156 | 139 | 157 | 153 > 100 | 1456 | 1470 | 1822 | 343 | 1541 | 347 > 1000 | 15080 | 14978 | 16955 | 1278 | 15557 | 1241 > 10000 | 1.5e+5 | 1.5e+5 | 1.6e+5 | 9845 | 1.5e+5 | 10343 > 100000 | 1.5e+6 | 1.5e+6 | 1.7e+6 | 96015 | 1.5e+6 | 99856 > > > And here's the script: > > # tmpsum.py > > from __future__ import print_function > > def mergesum(seq): > while len(seq) > 1: > new = [a + b for a, b in zip(seq[:-1:2], seq[1::2])] > if len(seq) % 2: > new.append(seq[-1]) > seq = new > return seq[0] > > def imergesum(iterable): > stack = [] > it = iter(iterable) > nextpow2 = 1 > while True: > for n2, val in enumerate(it, 1): > if n2 == nextpow2: > stack.append(val) > nextpow2 *= 2 > break > elif n2 % 2: > stack[-1] += val > else: > ishift = -1 > while not n2 % 2: > val, stack[ishift] = stack[ishift], val > ishift -= 1 > n2 >>= 1 > stack[ishift] += val > else: > return mergesum(stack) > > from collections import defaultdict > > def binsum(iterable): > bins = defaultdict(int) > for num in iterable: > bins[num.denominator] += num.numerator > return mergesum([F(n, d) for d, n in bins.items()]) > > from operator import attrgetter > > def sortsum(seq): > seq = sorted(seq, key=attrgetter('denominator')) > if len(seq) < 3: > return sum(seq) > return mergesum(seq) > > from operator import itemgetter > > def rationalsum(iterable, tablesize=1000): > def binsumgen(iterable): > iterator = iter(iterable) > bins = defaultdict(int) > finished = False > denom = itemgetter(0) > while not finished: > finished = True > for n, num in zip(range(tablesize), iterator): > bins[num.denominator] += num.numerator > finished = False > if len(bins) >= tablesize or finished: > dn = sorted(bins.items(), key=denom) > yield mergesum([F(n, d) for d, n in dn]) > bins.clear() > return imergesum(binsumgen(iterable)) > > sumfuncs = sum, mergesum, imergesum, binsum, sortsum, rationalsum > > # Just a quick test > if True: > print('testing', end=' ') > from fractions import Fraction as F > nums = [F(n, 2*n +1) for n in range(3000)] > for n in 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 100, 2005: > ns = nums[:n] > result = sum(ns) > for func in sumfuncs: > assert func(ns) == result, func.__name__ > print('.', end='') > print(' passed!') > > if True: > print('generating data...') > from random import randint, gauss, seed > from fractions import Fraction as F > seed(123456789) > nmax = 10 ** 5 > nums_d1000 = [F(randint(-1000, 1000), randint(1, 1000)) for _ in > range(nmax)] > nums_d10 = [F(randint(-10, 10), randint(1, 10)) for _ in range(nmax)] > nums_float = [F(*gauss(0, 1).as_integer_ratio()) for _ in range(nmax)] > nums_e = [F(1)] > for n in range(1, 1000): > nums_e.append(nums_e[-1] / n) > nums_bigint = [10**65 + randint(1, 100) for n in range(nmax)] > nums_all = { > 'd1000': nums_d1000, > 'd10': nums_d10, > 'float': nums_float, > 'e': nums_e, > 'bigint': nums_bigint, > } > > if True: > print('\nRunning benchmarks (times in microseconds)...') > > import timeit > > def mytimeit(stmt, setup): > n = 10 > time = lambda : timeit.timeit(stmt, setup, number=n) / n > t = time() > if t * n < 1e-1: > while t * n < 1e-2: > n *= 10 > if n == 10: > ts = [t, time(), time()] > else: > ts = [time(), time(), time()] > t = min(ts) > return 1e6 * t > > def fmtnum(n): > s = str(int(n)) > if len(s) > 5: > s = '%1.2g' % n > if s[-4:-1] == 'e+0': > s = s[:-2] + s[-1] > return '%-6s' % s > > gapline = '-' * 60 > fnames = [f.__name__ for f in sumfuncs] > header = ' n | ' + ' | '.join('%-6s' % name[:6] for name in fnames) > sum = sum > setup = 'from __main__ import ' + ', '.join(['nums'] + fnames) > setup += '; nums=nums[:%s]' > > for name, nums in nums_all.items(): > print('\n\n--------------\n%s\n---------------' % name) > print(header) > print(gapline) > for n in 10, 100, 1000, 10000, 100000: > if n > len(nums): break > times = [] > for func in sumfuncs: > stmt = '%s(nums)' % (func.__name__,) > times.append(mytimeit(stmt, setup % n)) > print(('%6s | ' % n), end='') > print(' | '.join(fmtnum(t) for t in times)) > > > Oscar > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Wed Aug 21 00:26:23 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 20 Aug 2013 23:26:23 +0100 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: On 20 August 2013 19:14, David Mertz wrote: > Oscar's improvement to my binsum() idea, using Stefan's excellent point that > we should just use int.__add__ on the numerators, is quite elegant. I.e.: > > def binsum(iterable): > bins = defaultdict(int) > for num in iterable: > bins[num.denominator] += num.numerator > return mergesum([F(n, d) for d, n in bins.items()]) > > Moreover, looking at Oscar's data, it looks like the improved binsum() > ALWAYS beats rationalsum()[*] > > [*] OK, technically there are three cases where this isn't true: d1000/n=100 > and d10/n=1000 where it is about one percent slower (although I think that > is in the noise of timing it); and the "pathological" case of calculating > 'e', where no denominators repeat and where rationalsum() beats everything > else by a large margin at the asymptote (or maybe imergesum() does, but it > is because they behave the same here). There's nothing pathological about at that case. It's very common to have series where the denominators all differ. What you said got me thinking, though, about why it underperformed in that case and I realised that it's because binsum reorders the numbers before passing them to mergesum and therefore cannot take advantage of the natural ordering in the series. So with all our powers combined I created binsortmergesum (bsms) and it looks like this: def bsms(iterable): bins = defaultdict(int) for num in iterable: bins[num.denominator] += num.numerator return mergesum([F(n, d) for d, n in sorted(bins.items())]) The new timing results show that bsms is slightly poorer than binsum in the cases where it does well but does much better in the case where it did badly. So here's the new timing results: -------------- bigint --------------- n | sum | merges | imerge | binsum | sortsu | ration | bsms ------------------------------------------------------------ 10 | 2 | 20 | 27 | 28 | 28 | 50 | 31 100 | 21 | 73 | 208 | 93 | 102 | 126 | 96 1000 | 212 | 506 | 2162 | 749 | 762 | 943 | 746 10000 | 2283 | 5644 | 21975 | 7407 | 8503 | 9204 | 7463 100000 | 23940 | 63465 | 2.3e+5 | 75810 | 93666 | 91938 | 74516 -------------- float --------------- n | sum | merges | imerge | binsum | sortsu | ration | bsms ------------------------------------------------------------ 10 | 435 | 417 | 430 | 377 | 426 | 393 | 373 100 | 4437 | 4298 | 4549 | 1220 | 4351 | 1252 | 1231 1000 | 45889 | 43373 | 45424 | 3520 | 44004 | 3743 | 3578 10000 | 4.6e+5 | 4.7e+5 | 4.5e+5 | 23759 | 4.4e+5 | 25135 | 23601 100000 | 4.8e+6 | 4.4e+6 | 4.5e+6 | 2.2e+5 | 4.4e+6 | 2.3e+5 | 2.2e+5 -------------- d1000 --------------- n | sum | merges | imerge | binsum | sortsu | ration | bsms ------------------------------------------------------------ 10 | 339 | 296 | 304 | 497 | 322 | 528 | 505 100 | 9747 | 3554 | 3872 | 5065 | 3655 | 5197 | 5155 1000 | 4.2e+5 | 39259 | 42661 | 35199 | 37243 | 36470 | 35317 10000 | 9.3e+6 | 4.1e+5 | 4.3e+5 | 80244 | 2.9e+5 | 81798 | 80658 100000 | 1e+8 | 4.2e+6 | 4.4e+6 | 2.6e+5 | 2.7e+6 | 6e+5 | 2.6e+5 -------------- d10 --------------- n | sum | merges | imerge | binsum | sortsu | ration | bsms ------------------------------------------------------------ 10 | 214 | 219 | 226 | 204 | 236 | 232 | 206 100 | 2286 | 2245 | 2404 | 514 | 2353 | 563 | 519 1000 | 23485 | 22776 | 24744 | 2130 | 23284 | 2347 | 2119 10000 | 2.4e+5 | 2.3e+5 | 2.5e+5 | 18265 | 2.3e+5 | 20109 | 18098 100000 | 2.4e+6 | 2.3e+6 | 2.5e+6 | 1.8e+5 | 2.3e+6 | 2e+5 | 1.8e+5 -------------- e --------------- n | sum | merges | imerge | binsum | sortsu | ration | bsms ------------------------------------------------------------ 10 | 239 | 230 | 239 | 387 | 247 | 413 | 386 100 | 15221 | 4330 | 4794 | 14786 | 4483 | 6431 | 6361 1000 | 1.6e+7 | 1.4e+6 | 1.5e+6 | 1.5e+7 | 1.5e+6 | 1.6e+6 | 1.8e+6 Oscar From abarnert at yahoo.com Wed Aug 21 00:47:27 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 20 Aug 2013 15:47:27 -0700 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: On Aug 20, 2013, at 10:30, Oscar Benjamin wrote: > Do you think that any of these things should also be changed so that > there can be e.g. one sqrt() function that does the right thing for > all types? So instead of math.sqrt and cmath.sqrt, we just have one function that decides whether sqrt(-1) is 0+1j or an exception based on guessing whether you wanted complex numbers? :) I think in this case "explicit is better" beats "simple is better". I think having multiple sqrt functions--and, yes, maybe multiple sum functions--and making the user ask for the right one is appropriate. From mertz at gnosis.cx Wed Aug 21 02:16:09 2013 From: mertz at gnosis.cx (David Mertz) Date: Tue, 20 Aug 2013 17:16:09 -0700 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: On Tue, Aug 20, 2013 at 3:26 PM, Oscar Benjamin wrote: > So with all our powers combined I created binsortmergesum (bsms) and > it looks like this: > > def bsms(iterable): > bins = defaultdict(int) > for num in iterable: > bins[num.denominator] += num.numerator > return mergesum([F(n, d) for d, n in sorted(bins.items())]) > Wow. I noticed that you didn't do a sort in your binsum(), but I didn't think it would make much difference. Actually, I thought the cost of the sort wouldn't be worth it at all. It's actually a little unclear to me exactly why it makes such a big difference as it does in the 'e' case. Still, I love that function. It's simple, elegant, and fast (when used on Fraction, of course; not everywhere). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From __peter__ at web.de Wed Aug 21 08:25:04 2013 From: __peter__ at web.de (Peter Otten) Date: Wed, 21 Aug 2013 08:25:04 +0200 Subject: [Python-ideas] Yet another sum function (fractions.sum) References: Message-ID: Oscar Benjamin wrote: > On 19 August 2013 11:09, Peter Otten > <__peter__ at web.de> wrote: >> sum(items, 0.0) would then automatically profit from the clever >> optimizations of math.fsum() etc. > > fsum() is about controlling accumulated rounding errors rather than > optimisation (although it may be faster I've never needed to check). Is I understand it, you can "optimize" for precision, memory usage, code simplicity -- not just speed. > > I'd rather write > sum(items, Fraction) > than > sum(items, Fraction(0)) > and either way it's so close to > Fraction.sum(items) > > I understand what you mean about having a single function that does a > good job for all types and it goes into a much broader set of issues > around how Python handles different numeric types. It would be > possible in a backward compatible way provided Fraction.__sum__ falls > back on sum when it finds a non-Rational. > > There are lots of other areas in the stdlib where a similar sort of > thinking could apply e.g.: > >>>> import math >>>> from decimal import Decimal as D >>>> from fractions import Fraction as F > > There's currently no fsum equivalent for Decimals even though a pure > Python implementation could have reasonable performance: > >>>> math.fsum([0.1, 0.2, 0.3]) == 0.6 > True >>>> math.fsum([D('0.1'), D('0.2'), D('0.3')]) == D('0.6') > False >>>> D(math.fsum([D('0.1'), D('0.2'), D('0.3')])) == D('0.6') > False > > sum() would do better in the above but then it fails in other situations: > >>>> sum([D('1e50'), D('1'), D('-1e50')]) == 1 > False >>>> math.fsum([D('1e50'), D('1'), D('-1e50')]) == 1 > True > > The math module rounds everything to float losing precision even if > better routines are available: > >>>> math.sqrt(D('0.02')) > 0.1414213562373095 >>>> D('0.02').sqrt() > Decimal('0.1414213562373095048801688724') >>>> math.exp(D(1)) > 2.718281828459045 >>>> D(1).exp() > Decimal('2.718281828459045235360287471') > > There's also no support for computing the other transcendental > functions with Decimals e.g. sin, cos etc. without rounding to float. > > The math module also fails to find exact results when possible: > >>>> a = 10 ** 20 + 1 >>>> a > 100000000000000000001 >>>> math.sqrt(a**2) == a > False >>>> b = F(2, 3) >>>> math.sqrt(b ** 2) == b > False > > These ones could just get fixed in the Fractions module: > >>>> F(4, 9) ** F(1, 2) > 0.6666666666666666 >>>> F(2, 3) ** 2 > Fraction(4, 9) >>>> a > 100000000000000000001 >>>> (a ** 2) ** F(1, 2) == a > False > > Do you think that any of these things should also be changed so that > there can be e.g. one sqrt() function that does the right thing for > all types? At the time I only thought about sum(), but yes, for every operation that has one "best" implementation per class there should be a uniform way to make these implementations available. > One way to unify these things is with a load of new dunder > methods so that each type can implement its own response to the > standard function. Another is to have special modules that deal with > different things and e.g. a function for summing Rationals and another > for summing inexact types and so on. > > Another possibility is that you have decimal functions in the decimal > module and fraction functions in the fractions module and so on and > don't use any duck-typing (except in coercion). That may seem strange > for Python but it's worth remembering that most of the best numerical > code that has ever been written was written in languages that > *require* you to write separate code for each numeric type and don't > give any syntactic support for working with non-native types. Classmethods would be yet another option float.sum(numbers) or complex.sqrt(a) don't look bad, though I'm not sure what the right approach for int.sqrt() would be... From taleinat at gmail.com Wed Aug 21 10:20:33 2013 From: taleinat at gmail.com (Tal Einat) Date: Wed, 21 Aug 2013 11:20:33 +0300 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: On Wed, Aug 21, 2013 at 9:25 AM, Peter Otten <__peter__ at web.de> wrote: > At the time I only thought about sum(), but yes, for every operation that > has one "best" implementation per class there should be a uniform way to > make these implementations available. > > Oscar Benjamin wrote: >> One way to unify these things is with a load of new dunder >> methods so that each type can implement its own response to the >> standard function. Another is to have special modules that deal with >> different things and e.g. a function for summing Rationals and another >> for summing inexact types and so on. >> >> Another possibility is that you have decimal functions in the decimal >> module and fraction functions in the fractions module and so on and >> don't use any duck-typing (except in coercion). That may seem strange >> for Python but it's worth remembering that most of the best numerical >> code that has ever been written was written in languages that >> *require* you to write separate code for each numeric type and don't >> give any syntactic support for working with non-native types. > > Classmethods would be yet another option > > float.sum(numbers) > > or > > complex.sqrt(a) > > don't look bad, though I'm not sure what the right approach for int.sqrt() > would be... I must say, this sounds like a classic case for a cookbook recipe, especially considering that the implementations are simple and elegant Python code. - Tal From oscar.j.benjamin at gmail.com Wed Aug 21 12:58:09 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 21 Aug 2013 11:58:09 +0100 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: On 21 August 2013 01:16, David Mertz wrote: > On Tue, Aug 20, 2013 at 3:26 PM, Oscar Benjamin > wrote: >> >> So with all our powers combined I created binsortmergesum (bsms) and >> it looks like this: >> >> def bsms(iterable): >> bins = defaultdict(int) >> for num in iterable: >> bins[num.denominator] += num.numerator >> return mergesum([F(n, d) for d, n in sorted(bins.items())]) > > > Wow. I noticed that you didn't do a sort in your binsum(), but I didn't > think it would make much difference. And now I see that Stefan already proposed exactly this function as binsum2. Sorry Stefan, I missed that from your initial post! > Actually, I thought the cost of the > sort wouldn't be worth it at all. It's actually a little unclear to me > exactly why it makes such a big difference as it does in the 'e' case. It's because of the natural ordering in the series. Summing in a random order means that you mix up giant denominators with tiny ones for pretty much every addition step. This means massive gcd computations for every addition. Mergesum sums the small denominators with small ones avoiding the extra large denominators. This is particularly important for this series since the denominators are growing super-exponentially. Oscar From oscar.j.benjamin at gmail.com Wed Aug 21 13:27:58 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 21 Aug 2013 12:27:58 +0100 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: On 21 August 2013 07:25, Peter Otten <__peter__ at web.de> wrote: > Oscar Benjamin wrote: > >> On 19 August 2013 11:09, Peter Otten >> <__peter__ at web.de> wrote: > >>> sum(items, 0.0) would then automatically profit from the clever >>> optimizations of math.fsum() etc. >> >> fsum() is about controlling accumulated rounding errors rather than >> optimisation (although it may be faster I've never needed to check). > > Is I understand it, you can "optimize" for precision, memory usage, code > simplicity -- not just speed. Sorry, you're right. >> Do you think that any of these things should also be changed so that >> there can be e.g. one sqrt() function that does the right thing for >> all types? > > At the time I only thought about sum(), but yes, for every operation that > has one "best" implementation per class there should be a uniform way to > make these implementations available. [snip] > > Classmethods would be yet another option > > float.sum(numbers) > > or > > complex.sqrt(a) > > don't look bad, though I'm not sure what the right approach for int.sqrt() > would be... It's not necessarily per-class. For sqrt we can write functions targeted at the Rational ABC that will work for anything that fits with that part of the numeric tower (including int) e.g.: import math def sqrt_ceil(y): xguess = int(math.floor(math.sqrt(y))) while xguess ** 2 < y: # This can be improved xguess += 1 return xguess def sqrt_floor(y): x = sqrt_ceil(y) if x ** 2 != y: x -= 1 return x def sqrt_exact(y): if y.denominator != 1: return type(y)(sqrt_exact(y.numerator), sqrt_exact(y.denominator)) x = sqrt_ceil(y) if x ** 2 == y: return x else: raise ValueError('No exact rational root') I think it's reasonable to have things like that in the fractions module since that's where the stdlib implements its concrete Rational type. A similar thing is possible with fsum. Alternative algorithms can achieve the same effect for arbitrary radix (e.g. decimal) numeric types under the appropriate rounding modes so it would be possible to make it do the right thing for decimals while keeping a fast-path for sum. This would lead to a significant performance regression for anyone who was actually hoping that their decimals would get coerced to floats though. So maybe a function in the decimal module could provide the fully general algorithm that works well for sensibly rounded instances of the Real ABC. Oscar From random832 at fastmail.us Wed Aug 21 16:19:12 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 21 Aug 2013 10:19:12 -0400 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: Message-ID: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> On Tue, Aug 20, 2013, at 18:47, Andrew Barnert wrote: > So instead of math.sqrt and cmath.sqrt, we just have one function that > decides whether sqrt(-1) is 0+1j or an exception based on guessing > whether you wanted complex numbers? :) Why exactly is an exception reasonable? If you don't want complex numbers, don't take square roots of negative numbers. If you can't handle complex numbers, you'll get an exception down the line anyway. From stephen at xemacs.org Wed Aug 21 16:45:25 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 21 Aug 2013 23:45:25 +0900 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> Message-ID: <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> random832 at fastmail.us writes: > Why exactly is an exception reasonable? If you don't want complex > numbers, don't take square roots of negative numbers. If you can't > handle complex numbers, you'll get an exception down the line anyway. That's precisely why you want an exception: to terminate the computation as soon as the unexpected condition can be detected. From eric at trueblade.com Wed Aug 21 17:22:03 2013 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 21 Aug 2013 11:22:03 -0400 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <5214DB1B.4050807@trueblade.com> On 8/21/2013 10:45 AM, Stephen J. Turnbull wrote: > random832 at fastmail.us writes: > > > Why exactly is an exception reasonable? If you don't want complex > > numbers, don't take square roots of negative numbers. If you can't > > handle complex numbers, you'll get an exception down the line anyway. > > That's precisely why you want an exception: to terminate the > computation as soon as the unexpected condition can be detected. Exactly. Here's an extreme case: Say I do an operation, unexpectedly get a negative number, take the square root, pickle it, and store it in a file. 6 months from now I read the pickle and perform some other operation, and boom, I get an exception. I'd much prefer getting the exception today than at a later date. -- Eric. From random832 at fastmail.us Wed Aug 21 18:19:07 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 21 Aug 2013 12:19:07 -0400 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1377101947.24146.12503765.75BBFCDB@webmail.messagingengine.com> On Wed, Aug 21, 2013, at 10:45, Stephen J. Turnbull wrote: > random832 at fastmail.us writes: > > > Why exactly is an exception reasonable? If you don't want complex > > numbers, don't take square roots of negative numbers. If you can't > > handle complex numbers, you'll get an exception down the line anyway. > > That's precisely why you want an exception: to terminate the > computation as soon as the unexpected condition can be detected. Isn't that unpythonic? I mean, it's like doing type checking to make sure the object you're passed doesn't implement only half of the duck type you want (or none of a duck type the consumer of a list you're adding it to wants, etc), and we just had a discussion about that (might have been on python-list). Also, not wanting complex numbers seems to me like not wanting negative numbers, but we don't have a positive-subtract function that raises an exception if a References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> <5214DB1B.4050807@trueblade.com> Message-ID: <1377102113.25586.12505301.0DA8DE02@webmail.messagingengine.com> On Wed, Aug 21, 2013, at 11:22, Eric V. Smith wrote: > Exactly. Here's an extreme case: Say I do an operation, unexpectedly get > a negative number, Why can that operation return a negative number instead of raising an exception in the situation where it would return a negative number? I mean, if you want to get your exceptions as early as possible, however unpythonic that may be, that's the next logical step. Why do we need an implicit* wall between real and complex numbers, but not between positive and negative numbers, or integers and floats? *as in "explicit is better than", as in check your results yourself. From oscar.j.benjamin at gmail.com Wed Aug 21 18:31:30 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 21 Aug 2013 17:31:30 +0100 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: <1377102113.25586.12505301.0DA8DE02@webmail.messagingengine.com> References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> <5214DB1B.4050807@trueblade.com> <1377102113.25586.12505301.0DA8DE02@webmail.messagingengine.com> Message-ID: On 21 August 2013 17:21, wrote: > On Wed, Aug 21, 2013, at 11:22, Eric V. Smith wrote: >> Exactly. Here's an extreme case: Say I do an operation, unexpectedly get >> a negative number, > > Why can that operation return a negative number instead of raising an > exception in the situation where it would return a negative number? I > mean, if you want to get your exceptions as early as possible, however > unpythonic that may be, that's the next logical step. Why do we need an > implicit* wall between real and complex numbers, but not between > positive and negative numbers, or integers and floats? Because people want to use positive and negative integers and floats much more often than they want to use complex numbers. > *as in "explicit is better than", as in check your results yourself. I would say that using cmath.sqrt() is explicitly stating that you're interested in using complex numbers. Oscar From abarnert at yahoo.com Wed Aug 21 18:43:03 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 21 Aug 2013 09:43:03 -0700 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: <1377101947.24146.12503765.75BBFCDB@webmail.messagingengine.com> References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> <1377101947.24146.12503765.75BBFCDB@webmail.messagingengine.com> Message-ID: On Aug 21, 2013, at 9:19, random832 at fastmail.us wrote: > On Wed, Aug 21, 2013, at 10:45, Stephen J. Turnbull wrote: >> random832 at fastmail.us writes: >> >>> Why exactly is an exception reasonable? If you don't want complex >>> numbers, don't take square roots of negative numbers. If you can't >>> handle complex numbers, you'll get an exception down the line anyway. >> >> That's precisely why you want an exception: to terminate the >> computation as soon as the unexpected condition can be detected. > > Isn't that unpythonic? I mean, it's like doing type checking No it isn't like type checking. That's the whole point of EAFP--and really, the whole point of exceptions. You write the natural code you would write without checking any preconditions, and then you wrap it in a try/except to deal with any exceptions that arise when the preconditions have turned out to be invalid. You could make a parallel argument about dict.__getitem__, which I think is more obviously wrong. Why should it return an exception instead of returning some "undefined" value like JavaScript? For that matter, why should any function raise an exception; even open could be changed to return a file object that can be checked for validity instead of raising on file not found, like C returning a -1 fd (or NULL FILE*). There's no simple rule that says which things should be pre-checked, which should be handled with exceptions, and which should allow you to propagate errors as far as possible with out-of-range values. That's why language design is an art, with tradeoffs requiring judgment calls. > Also, not wanting complex numbers seems to me like not wanting negative > numbers, but we don't have a positive-subtract function that raises an > exception if a References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> Message-ID: <5214F37F.7050800@mrabarnett.plus.com> On 21/08/2013 15:19, random832 at fastmail.us wrote: > On Tue, Aug 20, 2013, at 18:47, Andrew Barnert wrote: >> So instead of math.sqrt and cmath.sqrt, we just have one function that >> decides whether sqrt(-1) is 0+1j or an exception based on guessing >> whether you wanted complex numbers? :) > > Why exactly is an exception reasonable? If you don't want complex > numbers, don't take square roots of negative numbers. If you can't > handle complex numbers, you'll get an exception down the line anyway. > I think a simpler rule might be: if the argument is a float then the result is a float; if the argument is complex then the result is complex. If that's the case, then do we really need to keep them separate, having math and cmath? From oscar.j.benjamin at gmail.com Wed Aug 21 19:27:46 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 21 Aug 2013 18:27:46 +0100 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: <5214F37F.7050800@mrabarnett.plus.com> References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <5214F37F.7050800@mrabarnett.plus.com> Message-ID: On 21 August 2013 18:06, MRAB wrote: > On 21/08/2013 15:19, random832 at fastmail.us wrote: >> >> On Tue, Aug 20, 2013, at 18:47, Andrew Barnert wrote: >>> >>> So instead of math.sqrt and cmath.sqrt, we just have one function that >>> decides whether sqrt(-1) is 0+1j or an exception based on guessing >>> whether you wanted complex numbers? :) >> >> Why exactly is an exception reasonable? If you don't want complex >> numbers, don't take square roots of negative numbers. If you can't >> handle complex numbers, you'll get an exception down the line anyway. >> > I think a simpler rule might be: if the argument is a float then the > result is a float; if the argument is complex then the result is > complex. I like the fact that math.sqrt() raises an error for negative numbers. That error message has been far more useful to me than cmath.sqrt() ever has. > If that's the case, then do we really need to keep them separate, > having math and cmath? And what if the argument's an int? Does the int duck-type as a float or a complex? Or should it raise an error if the root is not an integer? I feel like I'd end up writing: math.sqrt(a + 0j) when I want to get complex roots. Definitely cmath.sqrt(a) is better. The problem I have with sqrt(int) is that it doesn't raise an error and just produces an inaccurate result. Fixing that for the cases where exact results are available could be a big performance regression for some people. Anyway I'd rather use an alternative sqrt_exact() that wasn't expected to be fast but was guaranteed to be accurate or to raise an error. Oscar From stephen at xemacs.org Wed Aug 21 19:32:06 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 22 Aug 2013 02:32:06 +0900 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: <1377101947.24146.12503765.75BBFCDB@webmail.messagingengine.com> References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> <1377101947.24146.12503765.75BBFCDB@webmail.messagingengine.com> Message-ID: <87fvu3eyqx.fsf@uwakimon.sk.tsukuba.ac.jp> random832 at fastmail.us writes: > On Wed, Aug 21, 2013, at 10:45, Stephen J. Turnbull wrote: > > random832 at fastmail.us writes: > > > > > Why exactly is an exception reasonable? If you don't want complex > > > numbers, don't take square roots of negative numbers. If you can't > > > handle complex numbers, you'll get an exception down the line anyway. > > > > That's precisely why you want an exception: to terminate the > > computation as soon as the unexpected condition can be detected. > > Isn't that unpythonic? I mean, it's like doing type checking No, it's not, and it's not. In Python, turning 1 into 1 + 0j is a type conversion[1]: >>> 1 is 1 + 0j False >>> 1 is 1 True Adding 1 + 0j to 1 requires type checking of the arguments, and a decision about the type of the result. The same is true here; behavior on taking square root is part of the *implementation* of a type, and a negativity check needs to be made before taking the square root. This is *not* the same thing as EAFP vs. LBYL in application programming. Rather, this is the low-level implementation that makes EAFP safe in Python. > Also, not wanting complex numbers seems to me like not wanting > negative numbers, but we don't have a positive-subtract function > that raises an exception if a Hi all, I'd like to propose changing ipaddress.IPv[46]Interface to not inherit from IPv[46]Address. (And I hope I've got the right mailing list?) The "ipaddress" module is currently in the standard library on a provisional basis, so "backwards incompatible changes may occur". But obviously there needs to be a very good reason. I think there is. The problem is that IPv[46]Interface can't currently be passed to a function that expects an IPv[46]Address, because it redefines str(), ".exploded", ".compressed", "==" and "<" in an incompatible way. E.g suppose we have: >>> from ipaddress import IPv4Address, IPv4Interface >>> my_dict = {IPv4Address('1.2.3.4'): 'Hello'} Obviously lookup with an IPv4Address works: >>> addr = IPv4Address('1.2.3.4') >>> print(my_dict.get(addr)) Hello But with the IPv4Interface subclass, the lookup doesn't work: >>> intf = IPv4Interface('1.2.3.4/24') >>> print(my_dict.get(intf)) None And that's because equality has been redefined: >>> IPv4Address('1.2.3.4') == IPv4Interface('1.2.3.4/24') False When doing inheritance the usual expectation is that "Functions that use references to base classes must be able to use objects of derived classes without knowing it". This is called the "Liskov Substitution Principle". And IPv4Interface isn't following that principle. More informally, since IPv4Interface inherits from IPv4Address you'd expect that an IPv4Interface "is a" IPv4Address, but it really isn't. It's really a "has a" relationship, which is more commonly done by giving IPv4Interface a property that's an IPv4Address. This problem can't be solved easily by just redefining "==". If we let IPv4Address('1.2.3.4') == IPv4Interface('1.2.3.4/24'), and IPv4Address('1.2.3.4') == IPv4Interface('1.2.3.4/16'), then to keep the normal transitive behaviour of equals we'd have to let IPv4Interface('1.2.3.4/16') == IPv4Interface('1.2.3.4/24'), and that seems wrong. Where people actually know they're comparing IPv4Interface objects they will really want to compare both the IP address and netmask. My proposed solution is just to change IPv4Interface to not inherit from IPv4Address. IPv4Interface already has a "ip" property that gives you the IP address as a proper IPv4Address object. So code written for Python 3.4, using the "ip" property, would be backward-compatible with Python 3.3. And people could obviousy start writing code that way today, to be compatible with both Python 3.3 and Python 3.4. I.e. people should write code like this: >>> extracted_address = intf.ip >>> extracted_address IPv4Address('1.2.3.4') >>> print(my_dict.get(extracted_address)) Hello I'm proposing that IPv4Interface would not have a (public) base class, and the only remaining properties and methods on IPv4Interface would be: __init__() ip network compressed exploded with_prefixlen with_netmask with_hostmask __eq__() / __ne__() __hash__() __lt__(), __gt__() etc __str__() And all these properties and methods would do exactly what they do today. I.e. IPv4Interface becomes just a container for the "ip" and "network" fields, plus the parsing code in __init__() and a few formatting and comparison functions. This is a lot simpler to understand than the current design. With this design, IPv4Interface wouldn't be comparable to IPv4Address or IPv4Network objects. If the user wants to do a comparison like that, they can get the .ip or .network properties and compare that. Explicit is better than implicit. If there is interest in this idea, I'll try to put together a patch next week. Thanks to anyone who's read this far. What do you think? Kind regards, Jon From steve at pearwood.info Thu Aug 22 03:48:35 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 22 Aug 2013 11:48:35 +1000 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: <87fvu3eyqx.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> <1377101947.24146.12503765.75BBFCDB@webmail.messagingengine.com> <87fvu3eyqx.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52156DF3.2050204@pearwood.info> On 22/08/13 03:32, Stephen J. Turnbull wrote: > random832 at fastmail.us writes: > > Isn't that unpythonic? I mean, it's like doing type checking > > No, it's not, and it's not. In Python, turning 1 into 1 + 0j is a > type conversion[1]: > >>>> 1 is 1 + 0j > False >>>> 1 is 1 > True Steve, not only is that a dubious, implementation-dependent test, as you recognise in the footnote, but it doesn't even demonstrate the fact you are trying to demonstrate. To whit: py> [] is [] False and therefore [] and [] are different types... not. The test I think you want is: py> type(1) is type(1+0j) False As far as whether some hypothetical sqrt method should return a complex number or raise an exception, I'd like to point out that as of Python 3.x, math.sqrt is now the odd-man-out: py> (-25)**0.5 (3.061515884555943e-16+5j) py> cmath.sqrt(-25) 5j py> math.sqrt(-25) Traceback (most recent call last): File "", line 1, in ValueError: math domain error -- Steven From dickinsm at gmail.com Thu Aug 22 05:02:22 2013 From: dickinsm at gmail.com (Mark Dickinson) Date: Thu, 22 Aug 2013 08:32:22 +0530 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: <52156DF3.2050204@pearwood.info> References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> <1377101947.24146.12503765.75BBFCDB@webmail.messagingengine.com> <87fvu3eyqx.fsf@uwakimon.sk.tsukuba.ac.jp> <52156DF3.2050204@pearwood.info> Message-ID: On Thu, Aug 22, 2013 at 7:18 AM, Steven D'Aprano wrote: > As far as whether some hypothetical sqrt method should return a complex > number or raise an exception, I'd like to point out that as of Python 3.x, > math.sqrt is now the odd-man-out: > If you restrict your attention to square root operations, then yes. But math.sqrt is behaving just like math.log, math.acos, math.asin, math.atanh, etc. in raising an exception when it could have been returning a complex number. In that sense, I'd say that it's rather the ** operation that's the odd man out, in that there are very few other ways to get complex numbers with no obvious explicit request for complex numbers (either in the form of a cmath import or use of imaginary literals, or use of the 'complex' constructor, etc.). I've never been particularly comfortable with this aspect of PEP 3141. While I'm ranting: the other part of PEP 3141 that was wrong IMO is the decision to return ints from math.floor and math.ceil (e.g., for floats and Decimal instance). On floats, floor and ceil are simple, fundamental and inexpensive operations. But the conversion to int adds unnecessary overhead to those simple operations. iwasawa:~ mdickinson$ /opt/local/bin/python2.7 -m timeit -s "x = 1e300; import math" "y = math.floor(x)" 10000000 loops, best of 3: 0.107 usec per loop iwasawa:~ mdickinson$ /opt/local/bin/python3.3 -m timeit -s "x = 1e300; import math" "y = math.floor(x)" 1000000 loops, best of 3: 0.574 usec per loop -- Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Thu Aug 22 05:13:29 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 22 Aug 2013 12:13:29 +0900 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: <52156DF3.2050204@pearwood.info> References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> <1377101947.24146.12503765.75BBFCDB@webmail.messagingengine.com> <87fvu3eyqx.fsf@uwakimon.sk.tsukuba.ac.jp> <52156DF3.2050204@pearwood.info> Message-ID: <878uzufmee.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > The test I think you want is: > > py> type(1) is type(1+0j) > False Thank you for the correction. For some reason, I took 'random's emphasis on duck-typing more seriously than I should have. > As far as whether some hypothetical sqrt method should return a > complex number or raise an exception, I'd like to point out that as > of Python 3.x, math.sqrt is now the odd-man-out: Sure, but from a pure mathematics standpoint, the transcendental functions are inherently complex functions. So it doesn't surprise *me* that 25 ** 0.5 returns a complex number. And since the reals can be naturally embedded in the complex domain, *I'm* not surprised that cmath.sqrt(25) returns a complex result, rather than raising a domain error. Speaking only for myself about "what's surprising". Nevertheless, clearly it makes mathematical sense to distinguish real numbers from complex numbers. Complex analysis goes back before Riemann, yet (smart) mathematicians have continued doing real analysis to this day. So having real and complex math separated in Python is not nonsense; it's a design choice, and it's not clear to me that "odd-man-out" indicates that the choice has really been made -- there are reasons why it might be so that are compatible with both ways of thinking about the issues. From ncoghlan at gmail.com Thu Aug 22 06:09:32 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 22 Aug 2013 14:09:32 +1000 Subject: [Python-ideas] ipaddress: Interface inheriting from Address In-Reply-To: <52154558.4080102@jon-foster.co.uk> References: <52154558.4080102@jon-foster.co.uk> Message-ID: I'm at least mildly in favour of the idea. As you say, containment better reflects the relationship than inheritance. However, it's probably worth checking the ipaddress PEP and related discussions. I seem to recall we considered this approach, but I don't recall *why* we didn't use it. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Aug 22 06:42:04 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 21 Aug 2013 21:42:04 -0700 Subject: [Python-ideas] ipaddress: Interface inheriting from Address In-Reply-To: <52154558.4080102@jon-foster.co.uk> References: <52154558.4080102@jon-foster.co.uk> Message-ID: <2AB4748C-DE07-4286-8ACA-B9DB83D8E671@yahoo.com> On Aug 21, 2013, at 15:55, Jon Foster wrote: > And that's because equality has been redefined: >>>> IPv4Address('1.2.3.4') == IPv4Interface('1.2.3.4/24') > False But why should you expect these to be equal? The interface is-a address, but isn't _that_ address. Of course an interface isn't equal to _any_ address instance (that isn't also an interface instance), but so what? That's actually _typical_ of class hierarchies, not unusual. Imagine that we had Range with start and stop, and SteppedRange with start, stop, and step. Any SteppedRange with a step != 1 would be unequal to any (non-SteppedRange) Range, but it's still usable as a Range. This is exactly the same situation here. In fact, your example shows that you can store addresses and interfaces together, and use them all as addresses, rather than the opposite--in other words, it fits the LSP perfectly. All that being said, it does feel weird that an interface is an address. While it may satisfy the syntax of an address, it doesn't work behaviorally in most typical uses. For example, socket.connect((str(addr), port)) is the most obvious useful thing to do with an address, and it's not going to work if addr is an interface... So, I think you may have found a real problem, even if your argument for why it's a problem isn't that compelling. Meanwhile, PEP 3144 explains that "ipaddr makes extensive use of inheritance to avoid code duplication...", but that bit doesn't apply here, and I don't see any other justification for the inheritance here in the PEP, docs, or source. So I think we have to dig through the discussions (which may mean going back to the predecessor library's discussions) to see what the intended benefit is. From rosuav at gmail.com Thu Aug 22 09:11:52 2013 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 22 Aug 2013 17:11:52 +1000 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> <1377101947.24146.12503765.75BBFCDB@webmail.messagingengine.com> <87fvu3eyqx.fsf@uwakimon.sk.tsukuba.ac.jp> <52156DF3.2050204@pearwood.info> Message-ID: On Thu, Aug 22, 2013 at 1:02 PM, Mark Dickinson wrote: > In that sense, I'd say that it's rather the ** operation that's the odd man > out, in that there are very few other ways to get complex numbers with no > obvious explicit request for complex numbers 1/2 == 0.5 # int/int --> float (-4.0)**0.5 == (-4.0)**0.5 == 1.2246467991473532e-16+2j # float**float --> complex (though why it doesn't equal 2j exactly, I don't know - surely there's enough precision in these floats to calculate that?) Personally, I don't like the automated casting of int to float, since int covers arbitrary range and float will quietly lose precision; if you flick to float early in a calculation, you may be suddenly surprised by the inaccuracy at the end. But that's the decision Python's made, so it makes good sense for the upcasting of float to complex to work the same way - especially since you can't lose precision by going from float to complex (AFAIK). ChrisA From oscar.j.benjamin at gmail.com Thu Aug 22 12:58:30 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 22 Aug 2013 11:58:30 +0100 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> <1377101947.24146.12503765.75BBFCDB@webmail.messagingengine.com> <87fvu3eyqx.fsf@uwakimon.sk.tsukuba.ac.jp> <52156DF3.2050204@pearwood.info> Message-ID: On Aug 22, 2013 8:12 AM, "Chris Angelico" wrote: > > On Thu, Aug 22, 2013 at 1:02 PM, Mark Dickinson wrote: > > In that sense, I'd say that it's rather the ** operation that's the odd man > > out, in that there are very few other ways to get complex numbers with no > > obvious explicit request for complex numbers > > 1/2 == 0.5 # int/int --> float > (-4.0)**0.5 == (-4.0)**0.5 == 1.2246467991473532e-16+2j # > float**float --> complex > > (though why it doesn't equal 2j exactly, I don't know - surely there's > enough precision in these floats to calculate that?) __pow__ is less accurate than sqrt: $ py -3.3 Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> 4 ** .5 2.0 >>> (-4) ** .5 (1.2246467991473532e-16+2j) >>> 1j * (4 ** .5) 2j >>> import cmath >>> cmath.sqrt(-4) 2j >>> cmath.sqrt(-4.) 2j sqrt and __pow__ for negative numbers use very different algorithms so they give different results. IIUC (4.).__pow__(.5) is handled directly by the FPU and (-4.).__pow__(.5) is handled in the CPython code base. Clearly, though, it would be better if (-4)**.5 would be modified to return 1j*(4**.5). > Personally, I don't like the automated casting of int to float, since > int covers arbitrary range and float will quietly lose precision; if > you flick to float early in a calculation, you may be suddenly > surprised by the inaccuracy at the end. But that's the decision > Python's made, so it makes good sense for the upcasting of float to > complex to work the same way - especially since you can't lose > precision by going from float to complex (AFAIK) If the int is out of range you'll get an error: >>> 1.0 * (10 ** 1000) Traceback (most recent call last): File "", line 1, in OverflowError: long int too large to convert to float An implicit float conversion does not quietly lose precision. All subsequent values are usually infected and get turned into floats. The implicit float conversion only happens in two cases: 1) You mixed a float into your otherwise exact integer computation. 2) Division. For case 1) the answer is just don't do this. It's usually possible to spot when it's happened because your end result is a float. The exceptions to this are if you're doing something like: while m < n: # do stuff m = ... # Somehow a float can occasionally infect m return n # But we don't return m I think it would be good to be able to know that implicit conversions wouldn't occur but I also dislike putting spurious decimal points everywhere to ensure floating point computation e.g.: x = x / 2. # Unnecessary in Python 3 Note that for loss of accuracy the implicit float conversion from Python 3's true division gives a relative error of ~1e-16 which usually corresponds to a similarly small absolute error. Python 2's floor division has an absolute error of order 1 which is usually a massive error for people who care about accuracy. For me personally though I think that a conversion context would be useful e.g.: with everything_is_exact_or_I_get_an_error(): # compute stuff That would save me *a lot* of testing/debugging. Oscar From rosuav at gmail.com Thu Aug 22 13:11:25 2013 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 22 Aug 2013 21:11:25 +1000 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> <1377101947.24146.12503765.75BBFCDB@webmail.messagingengine.com> <87fvu3eyqx.fsf@uwakimon.sk.tsukuba.ac.jp> <52156DF3.2050204@pearwood.info> Message-ID: On Thu, Aug 22, 2013 at 8:58 PM, Oscar Benjamin wrote: > If the int is out of range you'll get an error: >>>> 1.0 * (10 ** 1000) > Traceback (most recent call last): > File "", line 1, in > OverflowError: long int too large to convert to float > > An implicit float conversion does not quietly lose precision. All > subsequent values are usually infected and get turned into floats. The > implicit float conversion only happens in two cases: > 1) You mixed a float into your otherwise exact integer computation. > 2) Division. >>> (1<<64)*3//2+10 27670116110564327434 >>> (1<<64)*3/2+10 2.7670116110564327e+19 It's not so large that it cannot be converted to floating point, but it's above the point at which floats are accurate to the integer. Therefore precision has been lost. Is it obvious from the second line of code that this will be the case? Obviously if you "mix in" a float, then it'll infect the calculations. But the effect of the / operator is less obvious. Fortunately it's consistent. It will ALWAYS return a float. However, I do see this as "implicit" conversion. Anyway, this is somewhat off-topic for this thread. ChrisA From oscar.j.benjamin at gmail.com Thu Aug 22 13:49:52 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 22 Aug 2013 12:49:52 +0100 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> <1377101947.24146.12503765.75BBFCDB@webmail.messagingengine.com> <87fvu3eyqx.fsf@uwakimon.sk.tsukuba.ac.jp> <52156DF3.2050204@pearwood.info> Message-ID: On 22 August 2013 12:11, Chris Angelico wrote: > On Thu, Aug 22, 2013 at 8:58 PM, Oscar Benjamin > wrote: >> If the int is out of range you'll get an error: >>>>> 1.0 * (10 ** 1000) >> Traceback (most recent call last): >> File "", line 1, in >> OverflowError: long int too large to convert to float >> >> An implicit float conversion does not quietly lose precision. All >> subsequent values are usually infected and get turned into floats. The >> implicit float conversion only happens in two cases: >> 1) You mixed a float into your otherwise exact integer computation. >> 2) Division. > >>>> (1<<64)*3//2+10 > 27670116110564327434 >>>> (1<<64)*3/2+10 > 2.7670116110564327e+19 > > It's not so large that it cannot be converted to floating point, but > it's above the point at which floats are accurate to the integer. > Therefore precision has been lost. Is it obvious from the second line > of code that this will be the case? Obviously if you "mix in" a float, > then it'll infect the calculations. But the effect of the / operator > is less obvious. Fortunately it's consistent. It will ALWAYS return a > float. However, I do see this as "implicit" conversion. As I said you need to be careful around division. There's no right answer for integer division (for computers). I'd rather have an implicit conversion than an implicit massively incorrect answer. The result above has a relative error of ~1e-16. The result below has a relative error of order 1: $ py -2.7 >>> 3 / 2 1 If you use that in subsequent calculations your subsequent results could be *way* off. In many cases where you expected to compute an integer but end up with a float you'll subsequently get an error: $ py -3.3 >>> a = [1, 2, 3, 4, 5] >>> b = 3 >>> a[b / 2] Traceback (most recent call last): File "", line 1, in TypeError: list indices must be integers, not float If you just get an incorrect integer there's no way to know if it's exact or not without checking after every division e.g.: a = b / 2 assert 2 * a == b which is tedious. Oscar From Andy.Henshaw at gtri.gatech.edu Thu Aug 22 15:27:32 2013 From: Andy.Henshaw at gtri.gatech.edu (Henshaw, Andy) Date: Thu, 22 Aug 2013 13:27:32 +0000 Subject: [Python-ideas] Yet another sum function (fractions.sum) In-Reply-To: References: <1377094752.17975.12454933.54287557@webmail.messagingengine.com> <87ioyzf6gq.fsf@uwakimon.sk.tsukuba.ac.jp> <1377101947.24146.12503765.75BBFCDB@webmail.messagingengine.com> <87fvu3eyqx.fsf@uwakimon.sk.tsukuba.ac.jp> <52156DF3.2050204@pearwood.info> Message-ID: From: Python-ideas [mailto:python-ideas-bounces+andy.henshaw=gtri.gatech.edu at python.org] On Behalf Of Chris Angelico Sent: Thursday, August 22, 2013 7:11 AM > >>> (1<<64)*3//2+10 > 27670116110564327434 > >>> (1<<64)*3/2+10 > 2.7670116110564327e+19 > > It's not so large that it cannot be converted to floating point, but it's above > the point at which floats are accurate to the integer. > Therefore precision has been lost. Is it obvious from the second line of code > that this will be the case? Obviously if you "mix in" a float, then it'll infect the > calculations. But the effect of the / operator is less obvious. Fortunately it's > consistent. It will ALWAYS return a float. However, I do see this as "implicit" conversion. > >Anyway, this is somewhat off-topic for this thread. One of the things that I always admired about the Occam programming language was that you had to explicitly state whether to ROUND or TRUNC when converting from integers to floats, for exactly this reason. A 32-bit integer potentially has more precision than a 32-bit float, so you had to tell the compiler how to handle the dropped bits. From rymg19 at gmail.com Sat Aug 24 23:54:18 2013 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sat, 24 Aug 2013 16:54:18 -0500 Subject: [Python-ideas] Add Clang to distutils Message-ID: Well, the name is pretty self-explanatory. I'm working on a patch right now. It my tests(I went through the file and replaced all occurrences of 'gcc' with 'clang'), everything compiled fine. I can't see an reason why it'd hurt something. -- Ryan -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Aug 25 01:26:45 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 25 Aug 2013 09:26:45 +1000 Subject: [Python-ideas] Add Clang to distutils In-Reply-To: References: Message-ID: <52194135.70906@pearwood.info> On 25/08/13 07:54, Ryan Gonzalez wrote: > Well, the name is pretty self-explanatory. I'm working on a patch right > now. It my tests(I went through the file and replaced all occurrences of > 'gcc' with 'clang'), everything compiled fine. I can't see an reason why > it'd hurt something. Are you suggesting that the maintainer of distutils should take over maintenance of clang, in order to make clang a part of distutils? Do the current maintainers of clang get a say in this? If that's not what you mean, perhaps what you mean isn't quite so self-explanatory as you think. In what sense should clang be added to distutils? gcc isn't currently part of distutils. It's an external dependency, not an internal component. -- Steven From brian at python.org Sun Aug 25 02:07:10 2013 From: brian at python.org (Brian Curtin) Date: Sat, 24 Aug 2013 19:07:10 -0500 Subject: [Python-ideas] Add Clang to distutils In-Reply-To: <52194135.70906@pearwood.info> References: <52194135.70906@pearwood.info> Message-ID: On Sat, Aug 24, 2013 at 6:26 PM, Steven D'Aprano wrote: > On 25/08/13 07:54, Ryan Gonzalez wrote: >> >> Well, the name is pretty self-explanatory. I'm working on a patch right >> now. It my tests(I went through the file and replaced all occurrences of >> 'gcc' with 'clang'), everything compiled fine. I can't see an reason why >> it'd hurt something. > > > Are you suggesting that the maintainer of distutils should take over > maintenance of clang, in order to make clang a part of distutils? Do the > current maintainers of clang get a say in this? > > If that's not what you mean, perhaps what you mean isn't quite so > self-explanatory as you think. In what sense should clang be added to > distutils? gcc isn't currently part of distutils. It's an external > dependency, not an internal component. The person is talking about the strings "gcc" and "clang"... From rymg19 at gmail.com Sun Aug 25 02:29:59 2013 From: rymg19 at gmail.com (Ryan) Date: Sat, 24 Aug 2013 19:29:59 -0500 Subject: [Python-ideas] Add Clang to distutils In-Reply-To: <52194135.70906@pearwood.info> References: <52194135.70906@pearwood.info> Message-ID: <1679cbcd-5f05-434c-8a50-b65ce6898870@email.android.com> Sorry...I meant building Python C extensions with Clang. Steven D'Aprano wrote: >On 25/08/13 07:54, Ryan Gonzalez wrote: >> Well, the name is pretty self-explanatory. I'm working on a patch >right >> now. It my tests(I went through the file and replaced all occurrences >of >> 'gcc' with 'clang'), everything compiled fine. I can't see an reason >why >> it'd hurt something. > >Are you suggesting that the maintainer of distutils should take over >maintenance of clang, in order to make clang a part of distutils? Do >the current maintainers of clang get a say in this? > >If that's not what you mean, perhaps what you mean isn't quite so >self-explanatory as you think. In what sense should clang be added to >distutils? gcc isn't currently part of distutils. It's an external >dependency, not an internal component. > > >-- >Steven >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nad at acm.org Sun Aug 25 03:04:12 2013 From: nad at acm.org (Ned Deily) Date: Sat, 24 Aug 2013 18:04:12 -0700 Subject: [Python-ideas] Add Clang to distutils References: <52194135.70906@pearwood.info> <1679cbcd-5f05-434c-8a50-b65ce6898870@email.android.com> Message-ID: In article <1679cbcd-5f05-434c-8a50-b65ce6898870 at email.android.com>, Ryan wrote: > Sorry...I meant building Python C extensions with Clang. I suggest you open an issue on the Python bug tracker with a diff patch of your suggested changes. But, FWIW, clang is being used today with Distutils on some platforms, at least, like OS X. On current versions of Python on Unix-y platforms, you should be able to dynamically override which compiler Distutils is looking for by using the CC and possibly the LDSHARED environment variables. -- Ned Deily, nad at acm.org From musicdenotation at gmail.com Sun Aug 25 05:24:57 2013 From: musicdenotation at gmail.com (Musical Notation) Date: Sun, 25 Aug 2013 10:24:57 +0700 Subject: [Python-ideas] Multiple statement lambda expressions Message-ID: <1F2DD32E-7240-4AB9-B66D-46F6E58088C2@gmail.com> What about this? lambda x, y: a = sum(x)/len(x); b = sum(y)/len(y); (a+b)/2;; The double-semicolon notation can also replace indentation for grouping of statements: y=0 for x in list: y=2*y+x if y%13==0: y=12;;;; From epsilonmichael at gmail.com Sun Aug 25 07:01:28 2013 From: epsilonmichael at gmail.com (Michael Mitchell) Date: Sun, 25 Aug 2013 00:01:28 -0500 Subject: [Python-ideas] Multiple statement lambda expressions In-Reply-To: <1F2DD32E-7240-4AB9-B66D-46F6E58088C2@gmail.com> References: <1F2DD32E-7240-4AB9-B66D-46F6E58088C2@gmail.com> Message-ID: Can you give an example where having multiple statements on one line wouldn't be less readable and debuggable? If either len(x) or len(y) were zero, a ZeroDivisionError would be thrown, but you wouldn't know whether or not it was x or y that caused it. Would a or b in the local scope be overwritten? Does the final "statement" have to be an expression or would None be returned as with named functions? Are we allowing return statements, e.g. "return (a+b)/2", which would allow returning early? If so, wouldn't it be more consistent with named functions to only be able to return with a return statement? Would all lambdas be required to end in double-semicolons? Single-statement ones? No-statement ones? If it were required, this would (in my opinion, unnecessarily) break old code. And if it didn't require it, wouldn't both of the following lines be valid: lambda x: x - 1; x + 3; lambda x: x - 1; x + 3;; which could be tricky to debug if one accidentally included/excluded an extra semi-colon. I think this idea needs some fleshing out. On Sat, Aug 24, 2013 at 10:24 PM, Musical Notation < musicdenotation at gmail.com> wrote: > What about this? > lambda x, y: a = sum(x)/len(x); b = sum(y)/len(y); (a+b)/2;; > The double-semicolon notation can also replace indentation for grouping of > statements: > > y=0 > for x in list: > y=2*y+x > if y%13==0: > y=12;;;; > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sun Aug 25 09:08:24 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 25 Aug 2013 16:08:24 +0900 Subject: [Python-ideas] Add Clang to distutils In-Reply-To: References: Message-ID: <87li3qdz87.fsf@uwakimon.sk.tsukuba.ac.jp> Ryan Gonzalez writes: > Well, the name is pretty self-explanatory. I'm working on a patch > right now. It my tests(I went through the file and replaced all > occurrences of 'gcc' with 'clang'), everything compiled fine. I can't > see an reason why it'd hurt something. Doesn't just CC=clang in the environment give you everything you want? From ncoghlan at gmail.com Sun Aug 25 09:19:39 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 25 Aug 2013 17:19:39 +1000 Subject: [Python-ideas] Multiple statement lambda expressions In-Reply-To: References: <1F2DD32E-7240-4AB9-B66D-46F6E58088C2@gmail.com> Message-ID: On 25 August 2013 15:01, Michael Mitchell wrote: > I think this idea needs some fleshing out. Independent of the readability issues associated with any proposal to expand lambdas out to full function definitions, Guido has mandated that it shall be possible to parse Python's grammer with an LL(1) parser. There's no way such a parser can look ahead far enough to see the double semi-colon to change how earlier semi-colons are parsed. PEPs 403 and 3150 are still the current "state of the art" for proposals to add the equivalent of multi-line lambda capabilities to Python, and both still have fairly major flaws (which is why they're Deferred rather than under active development). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Sun Aug 25 10:10:53 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 25 Aug 2013 18:10:53 +1000 Subject: [Python-ideas] Multiple statement lambda expressions In-Reply-To: <1F2DD32E-7240-4AB9-B66D-46F6E58088C2@gmail.com> References: <1F2DD32E-7240-4AB9-B66D-46F6E58088C2@gmail.com> Message-ID: <5219BC0D.7090706@pearwood.info> On 25/08/13 13:24, Musical Notation wrote: > The double-semicolon notation can also replace indentation for grouping of statements: > > y=0 > for x in list: > y=2*y+x > if y%13==0: > y=12;;;; I don't understand what you mean by "replace indentation". You haven't replaced indentation, it is still there. Also, I think that any proposal to remove significant indentation will go nowhere. Python's philosophy is that significant indentation is a feature, not a bug to be removed. You might as well propose getting rid of functions. -- Steven From steve at pearwood.info Sun Aug 25 12:51:43 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 25 Aug 2013 20:51:43 +1000 Subject: [Python-ideas] Multiple statement lambda expressions In-Reply-To: <6E5A8A18-8764-438F-A76F-13E18538D43C@gmail.com> References: <1F2DD32E-7240-4AB9-B66D-46F6E58088C2@gmail.com> <5219BC0D.7090706@pearwood.info> <6E5A8A18-8764-438F-A76F-13E18538D43C@gmail.com> Message-ID: <5219E1BF.5060004@pearwood.info> On 25/08/13 19:46, Musical Notation wrote: > This is not a proposal to remove significant indentation, this is to complement it. You can declare "from __future__ import braces" to use it. Try that at the interactive interpreter, you may be surprised. -- Steven From musicdenotation at gmail.com Sun Aug 25 13:30:03 2013 From: musicdenotation at gmail.com (Musical Notation) Date: Sun, 25 Aug 2013 18:30:03 +0700 Subject: [Python-ideas] Multiple statement lambda expressions In-Reply-To: <5219E1BF.5060004@pearwood.info> References: <1F2DD32E-7240-4AB9-B66D-46F6E58088C2@gmail.com> <5219BC0D.7090706@pearwood.info> <6E5A8A18-8764-438F-A76F-13E18538D43C@gmail.com> <5219E1BF.5060004@pearwood.info> Message-ID: <58041194-58C1-4DD6-B3C3-D1059714CDD2@gmail.com> On Aug 25, 2013, at 17:51, Steven D'Aprano wrote: > Try that at the interactive interpreter, you may be surprised. It isn't usable yet, I propose to use "from __future__ import braces" for that purpose (or rather "from __future__ import noindent") From vito.detullio at gmail.com Sun Aug 25 13:45:14 2013 From: vito.detullio at gmail.com (Vito De Tullio) Date: Sun, 25 Aug 2013 13:45:14 +0200 Subject: [Python-ideas] Multiple statement lambda expressions References: <1F2DD32E-7240-4AB9-B66D-46F6E58088C2@gmail.com> <5219BC0D.7090706@pearwood.info> <6E5A8A18-8764-438F-A76F-13E18538D43C@gmail.com> <5219E1BF.5060004@pearwood.info> <58041194-58C1-4DD6-B3C3-D1059714CDD2@gmail.com> Message-ID: Musical Notation wrote: > On Aug 25, 2013, at 17:51, Steven D'Aprano > wrote: > >> Try that at the interactive interpreter, you may be surprised. > It isn't usable yet I think you missed the joke here... -- By ZeD From rymg19 at gmail.com Sun Aug 25 19:50:35 2013 From: rymg19 at gmail.com (Ryan) Date: Sun, 25 Aug 2013 12:50:35 -0500 Subject: [Python-ideas] Clang in distutils TAKE 2 Message-ID: Ok, I wasn't very clear in my first one, so I'm going to try this again. My idea is to put Clang as a compiler for building Python extensions on Windows. i.e. I would be able to do this: C:\my-python-extension> python setup.py build -c clang -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian at python.org Sun Aug 25 20:03:58 2013 From: brian at python.org (Brian Curtin) Date: Sun, 25 Aug 2013 13:03:58 -0500 Subject: [Python-ideas] Clang in distutils TAKE 2 In-Reply-To: References: Message-ID: On Sun, Aug 25, 2013 at 12:50 PM, Ryan wrote: > Ok, I wasn't very clear in my first one, so I'm going to try this again. > > My idea is to put Clang as a compiler for building Python extensions on > Windows. i.e. I would be able to do this: > > C:\my-python-extension> python setup.py build -c clang You can just submit this on http://bugs.python.org/. From kim.grasman at gmail.com Sun Aug 25 20:26:42 2013 From: kim.grasman at gmail.com (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Sun, 25 Aug 2013 20:26:42 +0200 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: References: Message-ID: Ping? Can I clarify something to move this forward? It seems like a good idea to me, but I don't have the history of Py_DeleteFileW -- maybe somebody tried this already? Thanks, - Kim On Tue, Aug 13, 2013 at 1:17 PM, Kim Gr?sman wrote: > Hi all, > > I posted this bug a while back, but haven't had any feedback on it, so > I figured I'd open discussion here: > http://bugs.python.org/issue18314 > > I want to have os.unlink recognize NTFS junction points (a prequel to > proper symbolic links) and remove them transparently. > > Currently, os.unlink on a path to a junction point will fail with a > WindowsError and the error code 5, access denied. > > Does this sound controversial, or would anybody be interested in > reviewing a patch to this effect? > > Thanks, > - Kim From tjreedy at udel.edu Sun Aug 25 21:20:17 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 25 Aug 2013 15:20:17 -0400 Subject: [Python-ideas] Clang in distutils TAKE 2 In-Reply-To: References: Message-ID: On 8/25/2013 2:03 PM, Brian Curtin wrote: > On Sun, Aug 25, 2013 at 12:50 PM, Ryan wrote: >> My idea is to put Clang as a compiler for building Python extensions on >> Windows. i.e. I would be able to do this: >> >> C:\my-python-extension> python setup.py build -c clang Do clang/llvm really work on Windows yet? None of these pages even mention the work. https://en.wikipedia.org/wiki/Clang http://clang.llvm.org/ http://www.llvm.org/ and there does not seem to be Windows binaries. > You can just submit this on http://bugs.python.org/. Please don't unless and until it is really possible. -- Terry Jan Reedy From brian at python.org Sun Aug 25 22:50:01 2013 From: brian at python.org (Brian Curtin) Date: Sun, 25 Aug 2013 15:50:01 -0500 Subject: [Python-ideas] Clang in distutils TAKE 2 In-Reply-To: References: Message-ID: On Sun, Aug 25, 2013 at 2:20 PM, Terry Reedy wrote: > On 8/25/2013 2:03 PM, Brian Curtin wrote: >> You can just submit this on http://bugs.python.org/. > > > Please don't unless and until it is really possible. It's a simple feature request. If this isn't actually possible, it just gets rejected. From eliben at gmail.com Mon Aug 26 00:33:34 2013 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 25 Aug 2013 15:33:34 -0700 Subject: [Python-ideas] Clang in distutils TAKE 2 In-Reply-To: References: Message-ID: On Sun, Aug 25, 2013 at 12:20 PM, Terry Reedy wrote: > On 8/25/2013 2:03 PM, Brian Curtin wrote: > >> On Sun, Aug 25, 2013 at 12:50 PM, Ryan wrote: >> > > My idea is to put Clang as a compiler for building Python extensions on >>> Windows. i.e. I would be able to do this: >>> >>> C:\my-python-extension> python setup.py build -c clang >>> >> > Do clang/llvm really work on Windows yet? None of these pages even mention > the work. > > https://en.wikipedia.org/wiki/**Clang > http://clang.llvm.org/ > http://www.llvm.org/ > > and there does not seem to be Windows binaries. > > Do the best of my knowledge, Clang doen't produce MSVC ABI binaries on Windows at this time. Isn't this a requirement for the official Python build? Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Mon Aug 26 00:59:28 2013 From: rymg19 at gmail.com (Ryan) Date: Sun, 25 Aug 2013 17:59:28 -0500 Subject: [Python-ideas] Clang in distutils TAKE 2 In-Reply-To: References: Message-ID: <018055fc-9066-4f0b-8916-097b244914c2@email.android.com> I tried Clang, though. It worked for me. Version 3.2. Corresponds to GCC 4.2(I think?). It's more mature than it seems. Plus, the errors beat GCC's by about 10x. Even with GCC 4.7. Eli Bendersky wrote: >On Sun, Aug 25, 2013 at 12:20 PM, Terry Reedy wrote: > >> On 8/25/2013 2:03 PM, Brian Curtin wrote: >> >>> On Sun, Aug 25, 2013 at 12:50 PM, Ryan wrote: >>> >> >> My idea is to put Clang as a compiler for building Python extensions >on >>>> Windows. i.e. I would be able to do this: >>>> >>>> C:\my-python-extension> python setup.py build -c clang >>>> >>> >> Do clang/llvm really work on Windows yet? None of these pages even >mention >> the work. >> >> >https://en.wikipedia.org/wiki/**Clang >> http://clang.llvm.org/ >> http://www.llvm.org/ >> >> and there does not seem to be Windows binaries. >> >> >Do the best of my knowledge, Clang doen't produce MSVC ABI binaries on >Windows at this time. Isn't this a requirement for the official Python >build? > >Eli > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at gmail.com Mon Aug 26 01:23:44 2013 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 25 Aug 2013 16:23:44 -0700 Subject: [Python-ideas] Clang in distutils TAKE 2 In-Reply-To: <018055fc-9066-4f0b-8916-097b244914c2@email.android.com> References: <018055fc-9066-4f0b-8916-097b244914c2@email.android.com> Message-ID: On Sun, Aug 25, 2013 at 3:59 PM, Ryan wrote: > I tried Clang, though. It worked for me. Version 3.2. > By "tried", do you mean you built a DLL out of a non-trivial Python extension and the Python interpreter on Windows loaded it successfully? > Corresponds to GCC 4.2(I think?). > What do you mean by "corresponds"? > It's more mature than it seems. Plus, the errors beat GCC's by about 10x. > Even with GCC 4.7. > I'm a LLVM/Clang committer, so I know how mature it is. What I'm questioning is current Clang's ability to generate Windows executables that conform to the MSVC ABI. It's not exactly my domain of expertise, but last I heard this isn't yet prime-time ready. Maybe you have more up-to-date information - this is what I'm asking. Eli > Eli Bendersky wrote: >> >> >> >> >> On Sun, Aug 25, 2013 at 12:20 PM, Terry Reedy wrote: >> >>> On 8/25/2013 2:03 PM, Brian Curtin wrote: >>> >>>> On Sun, Aug 25, 2013 at 12:50 PM, Ryan wrote: >>>> >>> >>> My idea is to put Clang as a compiler for building Python extensions on >>>>> Windows. i.e. I would be able to do this: >>>>> >>>>> C:\my-python-extension> python setup.py build -c clang >>>>> >>>> >>> Do clang/llvm really work on Windows yet? None of these pages even >>> mention the work. >>> >>> https://en.wikipedia.org/wiki/**Clang >>> http://clang.llvm.org/ >>> http://www.llvm.org/ >>> >>> and there does not seem to be Windows binaries. >>> >>> >> Do the best of my knowledge, Clang doen't produce MSVC ABI binaries on >> Windows at this time. Isn't this a requirement for the official Python >> build? >> >> Eli >> >> >> ------------------------------ >> >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon Aug 26 01:59:29 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 25 Aug 2013 19:59:29 -0400 Subject: [Python-ideas] Clang in distutils TAKE 2 In-Reply-To: References: Message-ID: On 8/25/2013 4:50 PM, Brian Curtin wrote: > On Sun, Aug 25, 2013 at 2:20 PM, Terry Reedy wrote: >> On 8/25/2013 2:03 PM, Brian Curtin wrote: >>> You can just submit this on http://bugs.python.org/. >> >> Please don't unless and until it is really possible. > > It's a simple feature request. If this isn't actually possible, it > just gets rejected. But is it possible? That seems uncertain. I think requestors should try to determine that first, especially when there is already a thread going. There are 1144 open non-doc features requests on the tracker. Many have not been touched in years. -- Terry Jan Reedy From steve at pearwood.info Mon Aug 26 02:22:14 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 26 Aug 2013 10:22:14 +1000 Subject: [Python-ideas] Clang in distutils TAKE 2 In-Reply-To: References: Message-ID: <521A9FB6.3040804@pearwood.info> On 26/08/13 05:20, Terry Reedy wrote: > Do clang/llvm really work on Windows yet? None of these pages even mention the work. > > https://en.wikipedia.org/wiki/Clang > http://clang.llvm.org/ > http://www.llvm.org/ > > and there does not seem to be Windows binaries. Allegedly clang does exist for Windows: http://clang.llvm.org/get_started.html -- Steven From stephen at xemacs.org Mon Aug 26 04:32:08 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 26 Aug 2013 11:32:08 +0900 Subject: [Python-ideas] Clang in distutils TAKE 2 In-Reply-To: References: Message-ID: <878uzpdvx3.fsf@uwakimon.sk.tsukuba.ac.jp> Eli Bendersky writes: > Do the best of my knowledge, Clang doen't produce MSVC ABI binaries > on Windows at this time. Isn't this a requirement for the official > Python build? Sure, but AFAICS he's not suggesting that Clang be used for the official build, only that it be available to users who build their own. I agree that it's extra rope for users who want to hang themselves by using different compilers for the main executable and extensions, but isn't that their problem? From eliben at gmail.com Mon Aug 26 04:40:15 2013 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 25 Aug 2013 19:40:15 -0700 Subject: [Python-ideas] Clang in distutils TAKE 2 In-Reply-To: <878uzpdvx3.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878uzpdvx3.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Aug 25, 2013 at 7:32 PM, Stephen J. Turnbull wrote: > Eli Bendersky writes: > > > Do the best of my knowledge, Clang doen't produce MSVC ABI binaries > > on Windows at this time. Isn't this a requirement for the official > > Python build? > > Sure, but AFAICS he's not suggesting that Clang be used for the > official build, only that it be available to users who build their > own. > I read it differently. He said "Clang as a compiler for building Python extensions on Windows", by which I understood extensions that get loaded by the official Python build. To this I raised a concern of ABI incompatibility. If his intention was to build Python itself with Clang on Windows, as well as the extensions for it, then it's a different issue altogether. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Mon Aug 26 06:19:52 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 26 Aug 2013 00:19:52 -0400 Subject: [Python-ideas] Have os.unlink remove directories on Unix (was: junction points on Windows) In-Reply-To: References: Message-ID: <1377490792.32342.14074717.656C065D@webmail.messagingengine.com> On Sun, Aug 25, 2013, at 14:26, Kim Gr?sman wrote: > Ping? > > Can I clarify something to move this forward? It seems like a good > idea to me, but I don't have the history of Py_DeleteFileW -- maybe > somebody tried this already? What happens if you call os.rmdir? And just out of curiosity, what happens if you call msvcrt's _wremove and _wrmdir functions? While we're on the subject of os.remove, can someone explain to me why it doesn't work on directories in Unix? That's the main difference between the C function of that name vs unlink in POSIX, and there doesn't seem to be a "remove a file or directory" function in os at all on unix systems as it stands (whereas both of them seem to be able to remove directories on windows). I'm almost more bothered by the fact that it works on Windows and not on Unix (and a bit by the fact that the "remove" name was used without actually implementing the behavior or calling the POSIX "remove" function) than by the functionality not existing in the first place. But it makes much more sense to add the functionality on Unix than to remove it on windows. Alternately, we could create a distinction between unlink and remove, and only do this in remove. From clay.sweetser at gmail.com Mon Aug 26 06:50:17 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Mon, 26 Aug 2013 00:50:17 -0400 Subject: [Python-ideas] Changing all() and any() to take multiple arguments Message-ID: I believe that the all() and any() library functions should be modified to accept multiple arguments as well as single iterators. Currently, code such as this (taken from a sublime text plugin), breaks: def is_enabled(self): return all( self.connected, not self.broadcasting, self.mode is 'client' ) Meaning either that one must write either this, and use parenthesis to avoid some obscure order of operations error: def is_enabled(self): return ( (self.connected) and (not self.broadcasting) and (self.mode is 'client') ) Or this, and have others wonder at the odd double parenthesis or list: def is_enabled(self): return (( self.connected, not self.broadcasting, self.mode is 'client' )) I can't foresee any compatibility issues for this modification, as it would only make currently broken code work, not the other way around. I searched the mailing list archives with Google to see if there were any past discussions on this topic, however the ambiguousness my search terms ('all', 'any' 'multiple, 'arguments') meant that the results were too numerous for me to sort through ( I gave up after the first 3 pages). Clay Sweetser "The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it. " - Terry Pratchett -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian at python.org Mon Aug 26 07:31:00 2013 From: brian at python.org (Brian Curtin) Date: Mon, 26 Aug 2013 00:31:00 -0500 Subject: [Python-ideas] Changing all() and any() to take multiple arguments In-Reply-To: References: Message-ID: On Sun, Aug 25, 2013 at 11:50 PM, Clay Sweetser wrote: > I can't foresee any compatibility issues for this modification, as it would > only make currently broken code work, not the other way around. This is not a strong point in your idea's favor. If people are shipping broken code such as your example which would be a TypeError, there are any number of things they should be doing before we change the function. From solipsis at pitrou.net Mon Aug 26 08:31:03 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 26 Aug 2013 08:31:03 +0200 Subject: [Python-ideas] Clang in distutils TAKE 2 References: Message-ID: <20130826083103.509c51cf@fsol> On Sun, 25 Aug 2013 19:59:29 -0400 Terry Reedy wrote: > On 8/25/2013 4:50 PM, Brian Curtin wrote: > > On Sun, Aug 25, 2013 at 2:20 PM, Terry Reedy wrote: > >> On 8/25/2013 2:03 PM, Brian Curtin wrote: > >>> You can just submit this on http://bugs.python.org/. > >> > >> Please don't unless and until it is really possible. > > > > It's a simple feature request. If this isn't actually possible, it > > just gets rejected. > > But is it possible? That seems uncertain. I think requestors should try > to determine that first, especially when there is already a thread going. The bug tracker is the place for that. > There are 1144 open non-doc features requests on the tracker. Many have > not been touched in years. Yes, distutils not having an actual maintainer is an ongoing problem :-) Regards Antoine. From abarnert at yahoo.com Mon Aug 26 09:05:04 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 26 Aug 2013 00:05:04 -0700 (PDT) Subject: [Python-ideas] Changing all() and any() to take multiple arguments In-Reply-To: References: Message-ID: <1377500704.53137.YahooMailNeo@web184705.mail.ne1.yahoo.com> From: Clay Sweetser Sent: Sunday, August 25, 2013 9:50 PM >I believe that the all() and any() library functions should be modified to accept multiple arguments as well as single iterators. But that would make a number of expressions ambiguous.?Is all([0, 0, 0]) true because it has one true argument, or false because it has three false arguments? What about all([0])? Or all([]), or all()??Even if you invented rules for which interpretation wins in these cases, it would still be ambiguous to human readers. Especially in cases where (as usual) the arguments aren't actually a static list display, but a variable, comprehension, or other expression. There are a few places in Python where we have such ambiguity, such as giving the string % operator a single value, but I don't think most people think that's a good thing. In fact, a few such cases were eliminated in Python 3.0, and newer functions like str.format or itertools.chain all avoid it. >Currently, code such as this (taken from a sublime text plugin), breaks: >? ? def is_enabled(self): >? ? ? ? return all( >? ? ? ? ? ? self.connected, >? ? ? ? ? ? not self.broadcasting, >? ? ? ? ? ? self.mode is 'client' >? ? ? ? )? > > >Meaning either that one must write either this, and use parenthesis to avoid some obscure order of operations error: >? ? def is_enabled(self): >? ? ? ? return ( >? ? ? ? ? ? (self.connected) and >? ? ? ? ? ? (not self.broadcasting) and >? ? ? ? ? ? (self.mode is 'client') >? ? ? ? )? Do you really think anyone doesn't know that the dot in "self.connected" binds more tightly than "and"? The fact that you can add excessive parentheses doesn't that mean you should, or that most people do. And really, I think this is more readable than the way you'd like to write it.?For a short, static sequence, "all" is just extra verbiage getting in the way?exactly as?in English, where you wouldn't say "All of John, Mary, and Pete went to the ceremony." From rosuav at gmail.com Mon Aug 26 09:23:34 2013 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 26 Aug 2013 17:23:34 +1000 Subject: [Python-ideas] Changing all() and any() to take multiple arguments In-Reply-To: References: Message-ID: On Mon, Aug 26, 2013 at 2:50 PM, Clay Sweetser wrote: > Meaning either that one must write either this, and use parenthesis to avoid > some obscure order of operations error: > def is_enabled(self): > return ( > (self.connected) and > (not self.broadcasting) and > (self.mode is 'client') > ) I'd be inclined toward this option, especially if some of your expressions are complex checks - the 'and' will short-circuit, a tuple won't. It looks perfectly readable, to me. Side point: I'd be cautious about using 'is' with strings, unless you have some way of guaranteeing that both sides have been interned. ChrisA From clay.sweetser at gmail.com Mon Aug 26 09:28:14 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Mon, 26 Aug 2013 03:28:14 -0400 Subject: [Python-ideas] Changing all() and any() to take multiple arguments In-Reply-To: References: Message-ID: On Aug 26, 2013 3:24 AM, "Chris Angelico" wrote: > > On Mon, Aug 26, 2013 at 2:50 PM, Clay Sweetser wrote: > > Meaning either that one must write either this, and use parenthesis to avoid > > some obscure order of operations error: > > def is_enabled(self): > > return ( > > (self.connected) and > > (not self.broadcasting) and > > (self.mode is 'client') > > ) > > I'd be inclined toward this option, especially if some of your > expressions are complex checks - the 'and' will short-circuit, a tuple > won't. It looks perfectly readable, to me. Side point: I'd be cautious > about using 'is' with strings, unless you have some way of > guaranteeing that both sides have been interned. Heh, I had actually changed that snippet somewhat from its original source. My own code uses many constants, so it's something like "MODE is CLIENT_TO_CLIENT" (The way commands in sublime text are written makes it almost mandatory to use globals in order to share state). > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From kim.grasman at gmail.com Mon Aug 26 13:42:12 2013 From: kim.grasman at gmail.com (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Mon, 26 Aug 2013 13:42:12 +0200 Subject: [Python-ideas] Have os.unlink remove directories on Unix (was: junction points on Windows) In-Reply-To: <1377490792.32342.14074717.656C065D@webmail.messagingengine.com> References: <1377490792.32342.14074717.656C065D@webmail.messagingengine.com> Message-ID: Hi random, On Mon, Aug 26, 2013 at 6:19 AM, wrote: > On Sun, Aug 25, 2013, at 14:26, Kim Gr?sman wrote: >> Ping? >> >> Can I clarify something to move this forward? It seems like a good >> idea to me, but I don't have the history of Py_DeleteFileW -- maybe >> somebody tried this already? > > What happens if you call os.rmdir? And just out of curiosity, what > happens if you call msvcrt's _wremove and _wrmdir functions? os.rmdir just delegates to RemoveDirectoryW and so successfully removes junction points too. This seems slightly against the spirit of POSIX: " The rmdir() function shall remove a directory whose name is given by path. The directory shall be removed only if it is an empty directory. [...] If path names a symbolic link, then rmdir() shall fail and set errno to [ENOTDIR]. " - http://pubs.opengroup.org/onlinepubs/009695399/functions/rmdir.html The junction point is removed irrespective of whether the target is empty or not. Junction points are sort of symbolic links, but are removed without error. I can't speak for _wremove and _wrmdir, but I assume they're POSIX compat shims, so they probably follow remove and rmdir, possibly with less understanding of links in general. I wouldn't switch to using os.rmdir, however -- if the path names an actual directory rather than a symlink or a junction point, os.rmdir will delete it, whereas os.unlink will fail with access denied (as I believe it should.) > While we're on the subject of os.remove, can someone explain to me why > it doesn't work on directories in Unix? [...] I have nothing to offer here, sorry. It seems a little dangerous to muck about with the details of os file management, there must be millions of lines of code relying on the current behavior in one way or another. Thanks, - Kim From random832 at fastmail.us Mon Aug 26 14:25:55 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 26 Aug 2013 08:25:55 -0400 Subject: [Python-ideas] Have os.unlink remove directories on Unix (was: junction points on Windows) In-Reply-To: References: <1377490792.32342.14074717.656C065D@webmail.messagingengine.com> Message-ID: <1377519955.17534.14193997.50075D54@webmail.messagingengine.com> On Mon, Aug 26, 2013, at 7:42, Kim Gr?sman wrote: > I wouldn't switch to using os.rmdir, however -- if the path names an > actual directory rather than a symlink or a junction point, os.rmdir > will delete it, whereas os.unlink will fail with access denied (as I > believe it should.) Only if it's empty. You could at least replace whatever your _delete_junction_point function is with it. > > While we're on the subject of os.remove, can someone explain to me why > > it doesn't work on directories in Unix? [...] > > I have nothing to offer here, sorry. It seems a little dangerous to > muck about with the details of os file management, there must be > millions of lines of code relying on the current behavior in one way > or another. I just don't like the fact that it's called "remove" but doesn't behave the same as the remove function from POSIX. Of course, remove/_wremove do not remove directories on Windows, and I've found evidence that this was true for some early Unixes as well. And the fact that windows unlink() allows you to remove some (but not all) things that windows considers to be directories is already violating the principle of being thin wrappers around system calls. From drekin at gmail.com Mon Aug 26 16:36:53 2013 From: drekin at gmail.com (Draic Kin) Date: Mon, 26 Aug 2013 16:36:53 +0200 Subject: [Python-ideas] Move more parts of interpreter core to stdlib Message-ID: Hello, it would be nice if reference pure Python implementation existed for more parts of interpreter core and the core actually used them. This was done for import machinery in Python 3.3 making importlib library. One potential target for this would be moving the logic of what python.exe does ? parsing its arguments, finding the script to run, compiling its code and running as __main__ module, running REPL if it's in interactive mode afterwards. There could be a stdlib module exposing this logic, using argparse to parse the arguments, overhauled version of runpy to run the __main__ module and overhauled version of code to run the REPL. Python.exe would be just thin wrapper which bootstraps the interpreter and runs this runner module. What brought me to this idea. I just wanted to use unicode in REPL on Windows. It doesn't work because sys.stdin.buffer.raw.read just doesn't read unicode characters on Windows, similar for sys.stdout.buffer.raw.write. See http://bugs.python.org/issue1602 . There is a workaround, one can write custom sys.stdin and sys.stdout objects which use winapi functions ReadConsoleW and WriteConsoleW called via ctypes. Setting these objects seems to solve the problem ? input() and print() work during execution of a script. There is however problem in interactive mode since Python REPL actually doesn't use sys.stdin for input. It takes input from real STDIN, but it uses encoding of sys.stdin, which doesn't seem to make sense, see http://bugs.python.org/issue17620 . So I have written my own REPL based on stdlib module 'code', but I needed some hook which runs it just after the execution of a script and before standard REPL starts. There is just PYTHONSTARTUP environment variable which works only for bare Python console, not running any script. So I needed a script run.py such that "py run.py [ []]" did almost the same thing as "py [ []]". It would run my REPL at the right time. Writing things like run.py is difficult since there are many details one should handle so the inner script being run behaves the same way as if it was run directly. It is also difficult to test it (e.g. http://bugs.python.org/issue18838). It would be easy if there where reference implementation how Python itself does it. The script like run.py has more use cases, for example Ned Batchelder's coverage.py implements its own version. Generally if more parts of interpreter core were exposed via stdlib, issues like the ones mentioned could be handled more easily. Another example: there are some issues when one hits Ctrl-C on input on Windows, it seems that one should detect the condition and wait for signal to arrive (see http://bugs.python.org/issue18597 , http://bugs.python.org/issue17619 ). I thought that input() is just a thin wrapper around sys.stdin.readline() so it could be easily implemented in pure Python (there was even idea that input() won't be in Python 3000). So it surprises me that input() is implemented in very different way and provides alternative codepath to low-level reading function that the codepath sys.stdin -> sys.stdin.buffer -> sys.stdin.buffer.raw. If there was only one path, it would be easier to fix issues like that. Thank you for response, Drekin. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram.rachum at gmail.com Mon Aug 26 19:11:56 2013 From: ram.rachum at gmail.com (Ram Rachum) Date: Mon, 26 Aug 2013 10:11:56 -0700 (PDT) Subject: [Python-ideas] Allow multiple arguments to `list.remove` and flag for silent fail Message-ID: <2e0e0eed-01e4-48d3-a9e0-51946940bba4@googlegroups.com> Why do I have to do this: for thing in things: try: my_list.remove(thing) except ValueError: pass When I could do this: my_list.remove(*things, silent_fail=True) Aside from being much more concise, it could be more efficient too, couldn't it? Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Mon Aug 26 19:55:40 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 26 Aug 2013 13:55:40 -0400 Subject: [Python-ideas] Allow multiple arguments to `list.remove` and flag for silent fail In-Reply-To: <2e0e0eed-01e4-48d3-a9e0-51946940bba4@googlegroups.com> References: <2e0e0eed-01e4-48d3-a9e0-51946940bba4@googlegroups.com> Message-ID: <1377539740.21130.14322453.4818ECA6@webmail.messagingengine.com> On Mon, Aug 26, 2013, at 13:11, Ram Rachum wrote: > my_list.remove(*things, silent_fail=True) > > > Aside from being much more concise, it could be more efficient too, > couldn't it? my_list = list(filter(lambda x: x in things, my_list)) From solipsis at pitrou.net Mon Aug 26 19:55:55 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 26 Aug 2013 19:55:55 +0200 Subject: [Python-ideas] Move more parts of interpreter core to stdlib References: Message-ID: <20130826195555.74aa3fe5@fsol> On Mon, 26 Aug 2013 16:36:53 +0200 Draic Kin wrote: > Hello, it would be nice if reference pure Python implementation existed for > more parts of interpreter core and the core actually used them. This was > done for import machinery in Python 3.3 making importlib library. > > One potential target for this would be moving the logic of what python.exe > does ? parsing its arguments, finding the script to run, compiling its code > and running as __main__ module, running REPL if it's in interactive mode > afterwards. There could be a stdlib module exposing this logic, using > argparse to parse the arguments, overhauled version of runpy to run the > __main__ module and overhauled version of code to run the REPL. Python.exe > would be just thin wrapper which bootstraps the interpreter and runs this > runner module. The interpreter needs a lot of information to be bootstrapped; you are proposing that the code which extracts that information be run *after* the interpreter is bootstrapped, which creates a nasty temporal problem. In the end, it may make maintenance *more* difficult, rather than less, to rewrite that code in Python. Regards Antoine. From __peter__ at web.de Mon Aug 26 20:02:44 2013 From: __peter__ at web.de (Peter Otten) Date: Mon, 26 Aug 2013 20:02:44 +0200 Subject: [Python-ideas] Allow multiple arguments to `list.remove` and flag for silent fail References: <2e0e0eed-01e4-48d3-a9e0-51946940bba4@googlegroups.com> Message-ID: Ram Rachum wrote: > Why do I have to do this: > > for thing in things: > try: > my_list.remove(thing) > except ValueError: > pass > > > When I could do this: > > my_list.remove(*things, silent_fail=True) > > > Aside from being much more concise, it could be more efficient too, > couldn't it? Have you considered a set instead of a list? my_set.difference_update(things) is certainly more efficient if the items in the list are hashable, occur only once, and you don't care about order. From ned at nedbatchelder.com Mon Aug 26 20:54:43 2013 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 26 Aug 2013 14:54:43 -0400 Subject: [Python-ideas] Move more parts of interpreter core to stdlib In-Reply-To: <20130826195555.74aa3fe5@fsol> References: <20130826195555.74aa3fe5@fsol> Message-ID: <521BA473.7070206@nedbatchelder.com> On 8/26/13 1:55 PM, Antoine Pitrou wrote: > On Mon, 26 Aug 2013 16:36:53 +0200 > Draic Kin wrote: >> Hello, it would be nice if reference pure Python implementation existed for >> more parts of interpreter core and the core actually used them. This was >> done for import machinery in Python 3.3 making importlib library. >> >> One potential target for this would be moving the logic of what python.exe >> does ? parsing its arguments, finding the script to run, compiling its code >> and running as __main__ module, running REPL if it's in interactive mode >> afterwards. There could be a stdlib module exposing this logic, using >> argparse to parse the arguments, overhauled version of runpy to run the >> __main__ module and overhauled version of code to run the REPL. Python.exe >> would be just thin wrapper which bootstraps the interpreter and runs this >> runner module. > The interpreter needs a lot of information to be bootstrapped; you are > proposing that the code which extracts that information be run *after* > the interpreter is bootstrapped, which creates a nasty temporal problem. > > In the end, it may make maintenance *more* difficult, rather than less, > to rewrite that code in Python. It seems to me that this argument could have been made against the import rewrite in Python. I don't know enough about the various factors to know what the differences are between the two scenarios (import and startup) to know whether it's a valid argument here or not. Can someone elaborate? I know it would be great to have the startup logic more accessible. --Ned. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From brett at python.org Mon Aug 26 21:27:28 2013 From: brett at python.org (Brett Cannon) Date: Mon, 26 Aug 2013 15:27:28 -0400 Subject: [Python-ideas] Move more parts of interpreter core to stdlib In-Reply-To: <521BA473.7070206@nedbatchelder.com> References: <20130826195555.74aa3fe5@fsol> <521BA473.7070206@nedbatchelder.com> Message-ID: On Mon, Aug 26, 2013 at 2:54 PM, Ned Batchelder wrote: > On 8/26/13 1:55 PM, Antoine Pitrou wrote: > >> On Mon, 26 Aug 2013 16:36:53 +0200 >> Draic Kin wrote: >> >>> Hello, it would be nice if reference pure Python implementation existed >>> for >>> more parts of interpreter core and the core actually used them. This was >>> done for import machinery in Python 3.3 making importlib library. >>> >>> One potential target for this would be moving the logic of what >>> python.exe >>> does ? parsing its arguments, finding the script to run, compiling its >>> code >>> and running as __main__ module, running REPL if it's in interactive mode >>> afterwards. There could be a stdlib module exposing this logic, using >>> argparse to parse the arguments, overhauled version of runpy to run the >>> __main__ module and overhauled version of code to run the REPL. >>> Python.exe >>> would be just thin wrapper which bootstraps the interpreter and runs this >>> runner module. >>> >> The interpreter needs a lot of information to be bootstrapped; you are >> proposing that the code which extracts that information be run *after* >> the interpreter is bootstrapped, which creates a nasty temporal problem. >> >> In the end, it may make maintenance *more* difficult, rather than less, >> to rewrite that code in Python. >> > It seems to me that this argument could have been made against the import > rewrite in Python. Not quite. Antoine's point is that the flags used to start Python are needed to set up certain things that may influence how using e.g. argparse works. Importlib is in a unique position because I wrote it from the beginning to be bootstrapped, so I designed it to deal with bootstrapping issues. It also led to the code being a little odd and having to work around things like lacking imports, etc. Argparse (and any of its dependencies) have not been designed in such a fashion, especially if they are directly involved in setting key settings in the interpreter. > I don't know enough about the various factors to know what the > differences are between the two scenarios (import and startup) to know > whether it's a valid argument here or not. Can someone elaborate? > The exact location where importlib is bootstrapped is at http://hg.python.org/cpython/file/8fb3a6f9b0a4/Python/pythonrun.c#l387 . I think it happens as soon as humanly possible, but it also is in a very restricted environment that is atypical (e.g. no imports, only uses built-in modules, can't have any alternative encoding, etc.). > > I know it would be great to have the startup logic more accessible. Sure, but there is also a performance consideration to take in. I wrote a blog post once on this topic: http://sayspy.blogspot.ca/2012/12/how-much-of-python-can-be-written-in.html . Basically you might be able to pull off exceptions in Python because they are typically such simple chunks of code, but otherwise everything else is too performance-sensitive. To really dive in you would need to look at the C code and see what happens at what point to know if unmodified Python code could be used instead of the C code (or be willing to write it all from scratch to allow bootstrapping). -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Aug 26 21:30:31 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 26 Aug 2013 21:30:31 +0200 Subject: [Python-ideas] Move more parts of interpreter core to stdlib References: <20130826195555.74aa3fe5@fsol> <521BA473.7070206@nedbatchelder.com> Message-ID: <20130826213031.445503da@fsol> On Mon, 26 Aug 2013 14:54:43 -0400 Ned Batchelder wrote: > On 8/26/13 1:55 PM, Antoine Pitrou wrote: > > On Mon, 26 Aug 2013 16:36:53 +0200 > > Draic Kin wrote: > >> Hello, it would be nice if reference pure Python implementation existed for > >> more parts of interpreter core and the core actually used them. This was > >> done for import machinery in Python 3.3 making importlib library. > >> > >> One potential target for this would be moving the logic of what python.exe > >> does ? parsing its arguments, finding the script to run, compiling its code > >> and running as __main__ module, running REPL if it's in interactive mode > >> afterwards. There could be a stdlib module exposing this logic, using > >> argparse to parse the arguments, overhauled version of runpy to run the > >> __main__ module and overhauled version of code to run the REPL. Python.exe > >> would be just thin wrapper which bootstraps the interpreter and runs this > >> runner module. > > The interpreter needs a lot of information to be bootstrapped; you are > > proposing that the code which extracts that information be run *after* > > the interpreter is bootstrapped, which creates a nasty temporal problem. > > > > In the end, it may make maintenance *more* difficult, rather than less, > > to rewrite that code in Python. > It seems to me that this argument could have been made against the > import rewrite in Python. I don't know enough about the various factors > to know what the differences are between the two scenarios (import and > startup) to know whether it's a valid argument here or not. Can someone > elaborate? The import system is algorithmically non-trivial, and it needs to be easily reusable by outside code. These are two good reasons to write the implementation in Python. The early startup sequence, on the other hand, is trivially sequential and can't be reused by user code (by construction: once the interpreter is initialized, you can't run the startup sequence again). The one thing that may be considered is whether there is a point in having two versions of the interactive prompt: the default one in C, and a re-usable Python one in code.py. Regards Antoine. From drekin at gmail.com Mon Aug 26 21:53:16 2013 From: drekin at gmail.com (Draic Kin) Date: Mon, 26 Aug 2013 21:53:16 +0200 Subject: [Python-ideas] Move more parts of interpreter core to stdlib In-Reply-To: References: <20130826195555.74aa3fe5@fsol> <521BA473.7070206@nedbatchelder.com> Message-ID: On Mon, Aug 26, 2013 at 9:27 PM, Brett Cannon wrote: > > > > On Mon, Aug 26, 2013 at 2:54 PM, Ned Batchelder wrote: > >> On 8/26/13 1:55 PM, Antoine Pitrou wrote: >> >>> On Mon, 26 Aug 2013 16:36:53 +0200 >>> Draic Kin wrote: >>> >>>> Hello, it would be nice if reference pure Python implementation existed >>>> for >>>> more parts of interpreter core and the core actually used them. This was >>>> done for import machinery in Python 3.3 making importlib library. >>>> >>>> One potential target for this would be moving the logic of what >>>> python.exe >>>> does ? parsing its arguments, finding the script to run, compiling its >>>> code >>>> and running as __main__ module, running REPL if it's in interactive mode >>>> afterwards. There could be a stdlib module exposing this logic, using >>>> argparse to parse the arguments, overhauled version of runpy to run the >>>> __main__ module and overhauled version of code to run the REPL. >>>> Python.exe >>>> would be just thin wrapper which bootstraps the interpreter and runs >>>> this >>>> runner module. >>>> >>> The interpreter needs a lot of information to be bootstrapped; you are >>> proposing that the code which extracts that information be run *after* >>> the interpreter is bootstrapped, which creates a nasty temporal problem. >>> >>> In the end, it may make maintenance *more* difficult, rather than less, >>> to rewrite that code in Python. >>> >> It seems to me that this argument could have been made against the import >> rewrite in Python. > > > Not quite. Antoine's point is that the flags used to start Python are > needed to set up certain things that may influence how using e.g. argparse > works. Importlib is in a unique position because I wrote it from the > beginning to be bootstrapped, so I designed it to deal with bootstrapping > issues. It also led to the code being a little odd and having to work > around things like lacking imports, etc. Argparse (and any of its > dependencies) have not been designed in such a fashion, especially if they > are directly involved in setting key settings in the interpreter. > But even if flags have to be parsed before argparse can be used, the startup functionality of localizing code to be run as __main__, running it and running REPL could be moved to stdlib or are there similar issues? -------------- next part -------------- An HTML attachment was scrubbed... URL: From drekin at gmail.com Mon Aug 26 21:57:50 2013 From: drekin at gmail.com (Draic Kin) Date: Mon, 26 Aug 2013 21:57:50 +0200 Subject: [Python-ideas] Move more parts of interpreter core to stdlib In-Reply-To: <20130826213031.445503da@fsol> References: <20130826195555.74aa3fe5@fsol> <521BA473.7070206@nedbatchelder.com> <20130826213031.445503da@fsol> Message-ID: On Mon, Aug 26, 2013 at 9:30 PM, Antoine Pitrou wrote: > On Mon, 26 Aug 2013 14:54:43 -0400 > Ned Batchelder > wrote: > > On 8/26/13 1:55 PM, Antoine Pitrou wrote: > > > On Mon, 26 Aug 2013 16:36:53 +0200 > > > Draic Kin wrote: > > >> Hello, it would be nice if reference pure Python implementation > existed for > > >> more parts of interpreter core and the core actually used them. This > was > > >> done for import machinery in Python 3.3 making importlib library. > > >> > > >> One potential target for this would be moving the logic of what > python.exe > > >> does ? parsing its arguments, finding the script to run, compiling > its code > > >> and running as __main__ module, running REPL if it's in interactive > mode > > >> afterwards. There could be a stdlib module exposing this logic, using > > >> argparse to parse the arguments, overhauled version of runpy to run > the > > >> __main__ module and overhauled version of code to run the REPL. > Python.exe > > >> would be just thin wrapper which bootstraps the interpreter and runs > this > > >> runner module. > > > The interpreter needs a lot of information to be bootstrapped; you are > > > proposing that the code which extracts that information be run *after* > > > the interpreter is bootstrapped, which creates a nasty temporal > problem. > > > > > > In the end, it may make maintenance *more* difficult, rather than less, > > > to rewrite that code in Python. > > It seems to me that this argument could have been made against the > > import rewrite in Python. I don't know enough about the various factors > > to know what the differences are between the two scenarios (import and > > startup) to know whether it's a valid argument here or not. Can someone > > elaborate? > > The import system is algorithmically non-trivial, and it needs to be > easily reusable by outside code. These are two good reasons to write the > implementation in Python. The early startup sequence, on the other > hand, is trivially sequential and can't be reused by user code (by > construction: once the interpreter is initialized, you can't run the > startup sequence again). > > The one thing that may be considered is whether there is a point in > having two versions of the interactive prompt: the default one in C, > and a re-usable Python one in code.py. > > There are also subtle differences between the two. E.g. whether to write the prompt to stdout or stderr, how to behave when attributes sys.ps1, sys.ps2 are missing, and of course whether to get input from sys.stdin or the standard STDIN. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Mon Aug 26 22:42:38 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 26 Aug 2013 22:42:38 +0200 Subject: [Python-ideas] Move more parts of interpreter core to stdlib In-Reply-To: References: <20130826195555.74aa3fe5@fsol> <521BA473.7070206@nedbatchelder.com> Message-ID: <521BBDBE.1010400@egenix.com> Brett Cannon wrote: > On Mon, Aug 26, 2013 at 2:54 PM, Ned Batchelder wrote: > >> On 8/26/13 1:55 PM, Antoine Pitrou wrote: >> >>> On Mon, 26 Aug 2013 16:36:53 +0200 >>> Draic Kin wrote: >>> >>>> Hello, it would be nice if reference pure Python implementation existed >>>> for >>>> more parts of interpreter core and the core actually used them. This was >>>> done for import machinery in Python 3.3 making importlib library. >>>> >>>> One potential target for this would be moving the logic of what >>>> python.exe >>>> does ? parsing its arguments, finding the script to run, compiling its >>>> code >>>> and running as __main__ module, running REPL if it's in interactive mode >>>> afterwards. There could be a stdlib module exposing this logic, using >>>> argparse to parse the arguments, overhauled version of runpy to run the >>>> __main__ module and overhauled version of code to run the REPL. >>>> Python.exe >>>> would be just thin wrapper which bootstraps the interpreter and runs this >>>> runner module. >>>> >>> The interpreter needs a lot of information to be bootstrapped; you are >>> proposing that the code which extracts that information be run *after* >>> the interpreter is bootstrapped, which creates a nasty temporal problem. >>> >>> In the end, it may make maintenance *more* difficult, rather than less, >>> to rewrite that code in Python. >>> >> It seems to me that this argument could have been made against the import >> rewrite in Python. > > > Not quite. Antoine's point is that the flags used to start Python are > needed to set up certain things that may influence how using e.g. argparse > works. Importlib is in a unique position because I wrote it from the > beginning to be bootstrapped, so I designed it to deal with bootstrapping > issues. It also led to the code being a little odd and having to work > around things like lacking imports, etc. Argparse (and any of its > dependencies) have not been designed in such a fashion, especially if they > are directly involved in setting key settings in the interpreter. There are some parts which cannot easily be done in Python at startup time, but a lot of the logic used for running the prompt or loading and running Python scripts can easily be done in Python. See the pyrun code for an example of what can easily be implemented in Python: http://www.egenix.com/products/python/PyRun/ pyrun emulates many Python interpreter startup features in pure Python. While working on that emulation, I found quite a few quirky "features" in the interpreter that could use some cleanup or at least proper documentation, e.g. the way sys.path is setup for the various different ways of running Python code from the command prompt. In one of the next pyrun releases, I want to refactor the code into a separate module to make the various bits more accessible and usable outside pyrun. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From sebastien.volle at gmail.com Mon Aug 26 23:07:00 2013 From: sebastien.volle at gmail.com (=?ISO-8859-1?Q?S=E9bastien_Volle?=) Date: Mon, 26 Aug 2013 23:07:00 +0200 Subject: [Python-ideas] Allow multiple arguments to `list.remove` and flag for silent fail In-Reply-To: <1377539740.21130.14322453.4818ECA6@webmail.messagingengine.com> References: <2e0e0eed-01e4-48d3-a9e0-51946940bba4@googlegroups.com> <1377539740.21130.14322453.4818ECA6@webmail.messagingengine.com> Message-ID: List comprehension works too: my_list = [ thing for thing in my_list if thing not in things ] But both our solutions don't change my_list in place but require creating a new list, and would not be very practical with big lists. Anyway, I'd rather stick with the simple for loop than making list methods too clever. -- seb Le 26 ao?t 2013 19:56, a ?crit : > On Mon, Aug 26, 2013, at 13:11, Ram Rachum wrote: > > my_list.remove(*things, silent_fail=True) > > > > > > Aside from being much more concise, it could be more efficient too, > > couldn't it? > > my_list = list(filter(lambda x: x in things, my_list)) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yoavglazner at gmail.com Mon Aug 26 23:14:00 2013 From: yoavglazner at gmail.com (yoav glazner) Date: Tue, 27 Aug 2013 00:14:00 +0300 Subject: [Python-ideas] Allow multiple arguments to `list.remove` and flag for silent fail In-Reply-To: References: <2e0e0eed-01e4-48d3-a9e0-51946940bba4@googlegroups.com> <1377539740.21130.14322453.4818ECA6@webmail.messagingengine.com> Message-ID: On Aug 27, 2013 12:07 AM, "S?bastien Volle" wrote: > > List comprehension works too: > > my_list = [ thing for thing in my_list if thing not in things ] > > But both our solutions don't change my_list in place but require creating a new list, and would not be very practical with big lists. Your solution will also be slow on large lists. Since removing an item is o(n) I would pick a better container or mark items as None instead of removing them > > Anyway, I'd rather stick with the simple for loop than making list methods too clever. > > -- seb > > Le 26 ao?t 2013 19:56, a ?crit : > >> On Mon, Aug 26, 2013, at 13:11, Ram Rachum wrote: >> > my_list.remove(*things, silent_fail=True) >> > >> > >> > Aside from being much more concise, it could be more efficient too, >> > couldn't it? >> >> my_list = list(filter(lambda x: x in things, my_list)) >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas Cheeres -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Tue Aug 27 00:23:28 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 26 Aug 2013 23:23:28 +0100 Subject: [Python-ideas] Allow multiple arguments to `list.remove` and flag for silent fail In-Reply-To: References: <2e0e0eed-01e4-48d3-a9e0-51946940bba4@googlegroups.com> <1377539740.21130.14322453.4818ECA6@webmail.messagingengine.com> Message-ID: On 26 August 2013 22:07, S?bastien Volle wrote: > List comprehension works too: > > my_list = [ thing for thing in my_list if thing not in things ] > > But both our solutions don't change my_list in place but require creating a > new list, and would not be very practical with big lists. Aside from the fact that using a list here is basically a stupid idea, the analysis is as such: my_list = [ thing for thing in my_list if thing not in things ] has time complexity O(n * m) where n is the length of my_list and m is the length of things, unless things is made a set in which it would be O(n). It has space complexity O(n). for thing in things: try: my_list.remove(thing) except ValueError: pass has time complexity O(n * m) and space complexity O(1). So they're both terrible. From abarnert at yahoo.com Tue Aug 27 01:19:16 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 26 Aug 2013 16:19:16 -0700 Subject: [Python-ideas] Allow multiple arguments to `list.remove` and flag for silent fail In-Reply-To: References: <2e0e0eed-01e4-48d3-a9e0-51946940bba4@googlegroups.com> <1377539740.21130.14322453.4818ECA6@webmail.messagingengine.com> Message-ID: <213BE0FD-BEBC-474D-A401-0050B6D5C98B@yahoo.com> On Aug 26, 2013, at 15:23, Joshua Landau wrote: > On 26 August 2013 22:07, S?bastien Volle wrote: >> List comprehension works too: >> >> my_list = [ thing for thing in my_list if thing not in things ] >> >> But both our solutions don't change my_list in place but require creating a >> new list, and would not be very practical with big lists. > > Aside from the fact that using a list here is basically a stupid idea, > the analysis is as such: > > my_list = [ thing for thing in my_list if thing not in things ] > > has time complexity O(n * m) where n is the length of my_list and m is > the length of things, unless things is made a set in which it would be > O(n). > It has space complexity O(n). > > for thing in things: > try: > my_list.remove(thing) > except ValueError: > pass > > has time complexity O(n * m) and space complexity O(1). > > So they're both terrible. In practical terms, except for the edge cases of m being near 0 or near n, in-place removing is almost always significantly slower. In general, if you're filtering a list in-place, you probably shouldn't be. If you're doing it in-place because it's a shared value, just use my_list[:]= instead of my_list=. If you're doing it for atomicity reasons, it doesn't work. If you're doing it as an optimization because the equivalent would be faster in, say, C++, it's probably a pessimization. If you're doing it for memory savings, the fact that you're likely to end up with O(m) wasted permanent storage (because shrinking a list usually doesn't resize it, but making a new list and releasing the old obviously does) is often more important than the O(n-m) temporary storage. (Of course half the time, you can just turn the listcomp into a genexpr and get the best of both worlds, but obviously that isn't always appropriate.) From steve at pearwood.info Tue Aug 27 03:47:11 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 27 Aug 2013 11:47:11 +1000 Subject: [Python-ideas] Allow multiple arguments to `list.remove` and flag for silent fail In-Reply-To: <2e0e0eed-01e4-48d3-a9e0-51946940bba4@googlegroups.com> References: <2e0e0eed-01e4-48d3-a9e0-51946940bba4@googlegroups.com> Message-ID: <521C051F.1030200@pearwood.info> On 27/08/13 03:11, Ram Rachum wrote: > Why do I have to do this: > > for thing in things: > try: > my_list.remove(thing) > except ValueError: > pass > > > When I could do this: > > my_list.remove(*things, silent_fail=True) How is the remove method supposed to distinguish between existing code that does this: my_list.remove((1, 2, 3)) # remove a single item, a tuple (1, 2, 3) and new code that does this? my_list.remove(1, 2, 3) # remove three items, 1, 2 and 3, without suppressing errors Justify why remove(a, b, c) should mean "remove the first of each of a and b and c" rather than "remove the first of a, or if no a, the first b, or if no b, the first c", that is, this code instead: try: for thing in things: my_list.remove(thing) except ValueError: pass Both are useful. Why is your example more useful than mine? But even more useful than both would be, "remove the first occurring of either a, b or c": indexes = [] for thing in things: try: indexes.append(my_list.index(thing) except ValueError: pass del my_list[min(indexes)] So that's three distinct behaviours that remove(a, b, c, ...) could mean. All of them are easy enough to program, and none of them are fundamental list operations. Justify why one of them should be privileged as a method. > Aside from being much more concise, it could be more efficient too, > couldn't it? If you're worried about efficiency or removals, a list is probably the wrong data structure to use. -- Steven From steve at pearwood.info Tue Aug 27 07:24:25 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 27 Aug 2013 15:24:25 +1000 Subject: [Python-ideas] Allow multiple arguments to `list.remove` and flag for silent fail In-Reply-To: References: <2e0e0eed-01e4-48d3-a9e0-51946940bba4@googlegroups.com> <1377539740.21130.14322453.4818ECA6@webmail.messagingengine.com> Message-ID: <20130827052425.GA4387@ando> On Mon, Aug 26, 2013 at 11:07:00PM +0200, S?bastien Volle wrote: > List comprehension works too: > > my_list = [ thing for thing in my_list if thing not in things ] > > But both our solutions don't change my_list in place but require creating a > new list, and would not be very practical with big lists. That's true, but it is remarkable how on a modern computer, Python's idea of "big" is so different from what the average coder considers big. When I was starting out with Python, I wrote code like I learned to code using Pascal: for i in range(len(mylist)-1, -1, -1): if condition(mylist[i]): del mylist[i] Since that's an inplace deletion, it seems like it should be cheaper than making a copy of mylist: mylist[:] = [x for x in mylist if not condition(x)] Just common sense, right? In a fit of premature optimization, I started writing: N = 1000 # take a wild guess if len(mylist) > N: ... # inplace deletion else: ... # copy and slice Then I decided to systematically determine the value of N. I have not been able to. Copy and slice is always faster, in my experience. If there is such an N that counts as "big" in this sense, it is probably so big that I'm not able to create the list in the first place. So that probably means hundreds of millions of items, or more. Certainly not what the average coder thinks of as "a big list", say, a couple of screens full of items when printed. > Anyway, I'd rather stick with the simple for loop than making list methods > too clever. +1 We are agreed :-) -- Steven From random832 at fastmail.us Tue Aug 27 07:37:11 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 27 Aug 2013 01:37:11 -0400 Subject: [Python-ideas] Allow multiple arguments to `list.remove` and flag for silent fail In-Reply-To: <20130827052425.GA4387@ando> References: <2e0e0eed-01e4-48d3-a9e0-51946940bba4@googlegroups.com> <1377539740.21130.14322453.4818ECA6@webmail.messagingengine.com> <20130827052425.GA4387@ando> Message-ID: <1377581831.29963.14521149.15E77E9E@webmail.messagingengine.com> On Tue, Aug 27, 2013, at 1:24, Steven D'Aprano wrote: > When I was starting out with Python, I wrote code like I learned to code > using Pascal: > > for i in range(len(mylist)-1, -1, -1): > if condition(mylist[i]): > del mylist[i] > > Since that's an inplace deletion, it seems like it should be cheaper > than making a copy of mylist: That's the wrong way to do it - each time you delete an item, you're moving everything behind it by one space. The fact that you're iterating backwards doesn't solve that issue. The _right_ way to do it, as far as there is a right way, is: dst=0 for item in mylist: if not condition(item): mylist[dst] = item dst += 1 mylist[dst:]=[] This avoids moving any item or resizing the list more than once, and is therefore O(n). From abarnert at yahoo.com Tue Aug 27 09:58:29 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 27 Aug 2013 00:58:29 -0700 (PDT) Subject: [Python-ideas] Allow multiple arguments to `list.remove` and flag for silent fail In-Reply-To: <1377581831.29963.14521149.15E77E9E@webmail.messagingengine.com> References: <2e0e0eed-01e4-48d3-a9e0-51946940bba4@googlegroups.com> <1377539740.21130.14322453.4818ECA6@webmail.messagingengine.com> <20130827052425.GA4387@ando> <1377581831.29963.14521149.15E77E9E@webmail.messagingengine.com> Message-ID: <1377590309.2614.YahooMailNeo@web184702.mail.ne1.yahoo.com> From: "random832 at fastmail.us" Sent: Monday, August 26, 2013 10:37 PM > That's the wrong way to do it - each time you delete an item, you're > moving everything behind it by one space. The fact that you're iterating > backwards doesn't solve that issue. The _right_ way to do it, as far as > there is a right way, is: > > dst=0 > for item in mylist: > ? if not condition(item): > ? ? ? mylist[dst] = item > ? ? ? dst += 1 > mylist[dst:]=[] > > This avoids moving any item or resizing the list more than once, and is > therefore O(n). ? and it's still slower than a list comprehension. See?http://pastebin.com/0zAS7dQ7 and?http://pastebin.com/AatAWFBF for transcripts of initial tests with 2.7.2 and 3.3.0. I also tested for smaller values (down to 10000), different predicates, etc.; unless you're keeping almost everything, or using a predicate so slow it swamps everything else, it's consistently 50% slower in 2.7.2 and 60% in 3.3.0 (or 60%/80%, once you subtract out the overhead that's not actually part of either call). Of course if you make N huge enough to run into VM swapping, the in-place might be faster? but they're both so intolerably slow that I didn't finish measuring it. From solipsis at pitrou.net Tue Aug 27 10:11:02 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 27 Aug 2013 10:11:02 +0200 Subject: [Python-ideas] Move more parts of interpreter core to stdlib References: <20130826195555.74aa3fe5@fsol> <521BA473.7070206@nedbatchelder.com> <20130826213031.445503da@fsol> Message-ID: <20130827101102.00b78f03@pitrou.net> Le Mon, 26 Aug 2013 21:57:50 +0200, Draic Kin a ?crit : > > The one thing that may be considered is whether there is a point in > > having two versions of the interactive prompt: the default one in C, > > and a re-usable Python one in code.py. > > > > There are also subtle differences between the two. E.g. whether to > > write > the prompt to stdout or stderr, how to behave when attributes sys.ps1, > sys.ps2 are missing, and of course whether to get input from > sys.stdin or the standard STDIN. Yes... But I'm not sure those differences were intended in the first place. IMO it would make sense to smooth them out. (there's a lot of historical baggage that could explain the discrepancies) Regards Antoine. From drekin at gmail.com Tue Aug 27 10:27:15 2013 From: drekin at gmail.com (Draic Kin) Date: Tue, 27 Aug 2013 10:27:15 +0200 Subject: [Python-ideas] Move more parts of interpreter core to stdlib In-Reply-To: <20130827101102.00b78f03@pitrou.net> References: <20130826195555.74aa3fe5@fsol> <521BA473.7070206@nedbatchelder.com> <20130826213031.445503da@fsol> <20130827101102.00b78f03@pitrou.net> Message-ID: On Tue, Aug 27, 2013 at 10:11 AM, Antoine Pitrou wrote: > Le Mon, 26 Aug 2013 21:57:50 +0200, > Draic Kin a ?crit : > > > The one thing that may be considered is whether there is a point in > > > having two versions of the interactive prompt: the default one in C, > > > and a re-usable Python one in code.py. > > > > > > There are also subtle differences between the two. E.g. whether to > > > write > > the prompt to stdout or stderr, how to behave when attributes sys.ps1, > > sys.ps2 are missing, and of course whether to get input from > > sys.stdin or the standard STDIN. > > Yes... But I'm not sure those differences were intended in the first > place. IMO it would make sense to smooth them out. > (there's a lot of historical baggage that could explain the > discrepancies) > I definitely agree that they should be smoothed out, I was just pointing them out. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Aug 27 12:13:33 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 27 Aug 2013 20:13:33 +1000 Subject: [Python-ideas] Move more parts of interpreter core to stdlib In-Reply-To: <20130826195555.74aa3fe5@fsol> References: <20130826195555.74aa3fe5@fsol> Message-ID: On 27 Aug 2013 03:57, "Antoine Pitrou" wrote: > > On Mon, 26 Aug 2013 16:36:53 +0200 > Draic Kin wrote: > > Hello, it would be nice if reference pure Python implementation existed for > > more parts of interpreter core and the core actually used them. This was > > done for import machinery in Python 3.3 making importlib library. > > > > One potential target for this would be moving the logic of what python.exe > > does ? parsing its arguments, finding the script to run, compiling its code > > and running as __main__ module, running REPL if it's in interactive mode > > afterwards. There could be a stdlib module exposing this logic, using > > argparse to parse the arguments, overhauled version of runpy to run the > > __main__ module and overhauled version of code to run the REPL. Python.exe > > would be just thin wrapper which bootstraps the interpreter and runs this > > runner module. > > The interpreter needs a lot of information to be bootstrapped; you are > proposing that the code which extracts that information be run *after* > the interpreter is bootstrapped, which creates a nasty temporal problem. > > In the end, it may make maintenance *more* difficult, rather than less, > to rewrite that code in Python. Enabling more of this kind of thing with frozen modules is actually one of my motivations for PEP 432. It can't be done readily until we have a clear separation of "working compiler, event loop and builtin types" from "fully configured interpreter instance", though. runpy.run_path and run_module unfortunately need updating before they can be used to fully emulate normal __main__ execution (the -m switch uses an underscore prefixed private API). There's a tracker issue about those updates that the various authors of a third party runpy alternatives may care to investigate. The problem with the current APIs is you can't run in a preexisting namespace and you can't get the partially populated namespace after an exception. Aside from that, the test suite ensures that the runpy functions gives the same behaviour as the command line (in the case of module execution, that *is* running almost exactly the same code already). Cheers, Nick. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Tue Aug 27 15:18:53 2013 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 27 Aug 2013 09:18:53 -0400 Subject: [Python-ideas] ipaddress: Interface inheriting from Address In-Reply-To: <52154558.4080102@jon-foster.co.uk> References: <52154558.4080102@jon-foster.co.uk> Message-ID: <521CA73D.4010703@trueblade.com> On 08/21/2013 06:55 PM, Jon Foster wrote: > Hi all, > > I'd like to propose changing ipaddress.IPv[46]Interface to not inherit > from IPv[46]Address. I agree that it's odd that an [x]Interface would inherit from an [x]Address. I think it should be a has-a relationship, as you describe with the "ip" property. > If there is interest in this idea, I'll try to put together a patch next > week. I'd review the patch. Eric. From jess.austin at gmail.com Tue Aug 27 16:57:32 2013 From: jess.austin at gmail.com (Jess Austin) Date: Tue, 27 Aug 2013 09:57:32 -0500 Subject: [Python-ideas] proposed sequence method: index_subseq() Message-ID: Recently I've repeatedly needed to check whether a particular sequence occurred as a "subsequence" of another. I think this could be a general requirement, so I'd like to ask if anyone else agrees. In python code, I have the following, although I can rewrite it in C if people on this list like it enough to add it to core: def index_subseq(self, subseq): '''slice over which subseq is a sub-sequence of sequence self''' i = -1 while True: i = self.index(subseq[0], i + 1) if all((a == b) for a, b in zip(subseq, itertools.chain(self[i:], iter(object, None)))): return slice(i, i + len(subseq)) (0, 1, 2, 3, 4, 5, 6, 7, 8).index_subseq((3,4,5))) # slice(3, 6, None) [0, 1, 2, 3, 4, 5, 6, 7, 8].index_subseq((3,4,5))) (0, 1, 2, 3, 4).index_subseq((3,4,5))) # ValueError [0, 1, 2, 3, 4].index_subseq((3,4,5))) [This listing omits the monkeypatching into the list and tuple builtins.] The index() method of more specialized sequences like str and bytearray already has behavior much like what I propose for this method, since it doesn't have to worry about elements of those sequences having elements in turn. This method returns a slice object: I considered but decided against returning just the initial index of the slice. As I've written it here, the method doesn't care whether it is passed a list or a tuple as the "subseq" argument. This could be generalized to take any iterable. If anyone has a suggestion for a better name than "index_subseq", that would be good too. If people think this is a good idea, I'll post a patch in the tracker. thanks, Jess -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Aug 27 17:01:56 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 28 Aug 2013 01:01:56 +1000 Subject: [Python-ideas] proposed sequence method: index_subseq() In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 12:57 AM, Jess Austin wrote: > (0, 1, 2, 3, 4, 5, 6, 7, 8).index_subseq((3,4,5))) # slice(3, 6, None) > [0, 1, 2, 3, 4, 5, 6, 7, 8].index_subseq((3,4,5))) > (0, 1, 2, 3, 4).index_subseq((3,4,5))) # ValueError > [0, 1, 2, 3, 4].index_subseq((3,4,5))) Apologies for bikeshedding, but it looks to me like this would do better as a stand-alone function than a method. I was going to suggest itertools, but since its first argument has to be repeatably indexable, that may not be the best place for it; in any case, though, making it a module-level function somewhere allows it to not care about the type of its first arg. ChrisA From taleinat at gmail.com Tue Aug 27 17:27:43 2013 From: taleinat at gmail.com (Tal Einat) Date: Tue, 27 Aug 2013 18:27:43 +0300 Subject: [Python-ideas] proposed sequence method: index_subseq() In-Reply-To: References: Message-ID: On Tue, Aug 27, 2013 at 5:57 PM, Jess Austin wrote: > Recently I've repeatedly needed to check whether a particular sequence > occurred as a "subsequence" of another. I think this could be a general > requirement, so I'd like to ask if anyone else agrees. In python code, I > have the following, although I can rewrite it in C if people on this list > like it enough to add it to core: > > def index_subseq(self, subseq): > '''slice over which subseq is a sub-sequence of sequence self''' > i = -1 > while True: > i = self.index(subseq[0], i + 1) > if all((a == b) for a, b in zip(subseq, itertools.chain(self[i:], > iter(object, None)))): > return slice(i, i + len(subseq)) > > (0, 1, 2, 3, 4, 5, 6, 7, 8).index_subseq((3,4,5))) # slice(3, 6, None) > [0, 1, 2, 3, 4, 5, 6, 7, 8].index_subseq((3,4,5))) > (0, 1, 2, 3, 4).index_subseq((3,4,5))) # ValueError > [0, 1, 2, 3, 4].index_subseq((3,4,5))) > > [This listing omits the monkeypatching into the list and tuple builtins.] > > The index() method of more specialized sequences like str and bytearray > already has behavior much like what I propose for this method, since it > doesn't have to worry about elements of those sequences having elements in > turn. This method returns a slice object: I considered but decided against > returning just the initial index of the slice. As I've written it here, the > method doesn't care whether it is passed a list or a tuple as the "subseq" > argument. This could be generalized to take any iterable. If anyone has a > suggestion for a better name than "index_subseq", that would be good too. > > If people think this is a good idea, I'll post a patch in the tracker. > > thanks, > Jess Hi Jess, There are many algorithms for sub-sequence search, most of which can be significantly more efficient than the one you used under common circumstances. For example, see the Knuth-Morris-Pratt algorithm [1], and an example Python implementation on the ActiveState cookbook [2]. A good implementation that would work relatively well in most common cases could be useful, much as Python's sorting implementation is. I think, however, that it would be better to start this as a 3rd party module rather than push for immediate inclusion in the stdlib. - Tal [1] http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm [2] http://code.activestate.com/recipes/117214/ From kim.grasman at gmail.com Tue Aug 27 17:47:08 2013 From: kim.grasman at gmail.com (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Tue, 27 Aug 2013 17:47:08 +0200 Subject: [Python-ideas] Have os.unlink remove directories on Unix (was: junction points on Windows) In-Reply-To: <1377519955.17534.14193997.50075D54@webmail.messagingengine.com> References: <1377490792.32342.14074717.656C065D@webmail.messagingengine.com> <1377519955.17534.14193997.50075D54@webmail.messagingengine.com> Message-ID: On Mon, Aug 26, 2013 at 2:25 PM, wrote: > On Mon, Aug 26, 2013, at 7:42, Kim Gr?sman wrote: >> I wouldn't switch to using os.rmdir, however -- if the path names an >> actual directory rather than a symlink or a junction point, os.rmdir >> will delete it, whereas os.unlink will fail with access denied (as I >> believe it should.) > > Only if it's empty. You could at least replace whatever your > _delete_junction_point function is with it. I don't want to remove empty dirs, only links. > And the fact that windows > unlink() allows you to remove some (but not all) things that windows > considers to be directories is already violating the principle of being > thin wrappers around system calls. Do you mean Python's os.unlink() here, or the Microsoft C-runtime's unlink()? - Kim From jess.austin at gmail.com Tue Aug 27 18:19:50 2013 From: jess.austin at gmail.com (Jess Austin) Date: Tue, 27 Aug 2013 11:19:50 -0500 Subject: [Python-ideas] proposed sequence method: index_subseq() In-Reply-To: References: Message-ID: On Tue, Aug 27, 2013 at 10:27 AM, Tal Einat wrote: > There are many algorithms for sub-sequence search, most of which can > be significantly more efficient than the one you used under common > circumstances. For example, see the Knuth-Morris-Pratt algorithm [1], > and an example Python implementation on the ActiveState cookbook [2]. > Good point; O(n+m) is better than O(n*m). Minor observation: KMP would disallow the possibility I raised of subseq being just an iterator, rather than a sequence. I think that's OK, since my use cases haven't had iterators here. It actually seems more likely that the "containing" object will be an iterator, which the recipe you linked would allow. Hmmm.... I really just provided the example code to specify exactly what I meant rather than as a proposed algorithm, but I'm glad you took it seriously. This really does want to be more of a function that takes iterators than a method of sequences. A good implementation that would work relatively well in most common > cases could be useful, much as Python's sorting implementation is. I > think, however, that it would be better to start this as a 3rd party > module rather than push for immediate inclusion in the stdlib. > No push here! I'm just asking questions. Thanks for your insights. cheers, Jess -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Aug 27 19:34:37 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 28 Aug 2013 03:34:37 +1000 Subject: [Python-ideas] proposed sequence method: index_subseq() In-Reply-To: References: Message-ID: <521CE32D.1020503@pearwood.info> On 28/08/13 00:57, Jess Austin wrote: > Recently I've repeatedly needed to check whether a particular sequence > occurred as a "subsequence" of another. I think this could be a general > requirement, so I'd like to ask if anyone else agrees. In python code, I > have the following, although I can rewrite it in C if people on this list > like it enough to add it to core: Well, I personally think it is an obvious and useful piece of functionality: http://code.activestate.com/recipes/577850-search-sequences-for-sub-sequence/ but a less naive implementation would be more useful. > def index_subseq(self, subseq): > '''slice over which subseq is a sub-sequence of sequence self''' > i = -1 > while True: > i = self.index(subseq[0], i + 1) > if all((a == b) for a, b in zip(subseq, itertools.chain(self[i:], > iter(object, None)))): > return slice(i, i + len(subseq)) > > (0, 1, 2, 3, 4, 5, 6, 7, 8).index_subseq((3,4,5))) # slice(3, 6, None) > [0, 1, 2, 3, 4, 5, 6, 7, 8].index_subseq((3,4,5))) > (0, 1, 2, 3, 4).index_subseq((3,4,5))) # ValueError > [0, 1, 2, 3, 4].index_subseq((3,4,5))) > > [This listing omits the monkeypatching into the list and tuple builtins.] Okay, now I'm curious. Unless you're talking about patching the Python compiler, how on earth are you monkey-patching *into* list and tuple? > The index() method of more specialized sequences like str and bytearray > already has behavior much like what I propose for this method, since it > doesn't have to worry about elements of those sequences having elements in > turn. This method returns a slice object: I considered but decided against > returning just the initial index of the slice. I dislike that. The end and step parts of the slice are redundant: end is easily calculated as just start + len(subseq), and step will always be None. A more familiar, and obvious, functionality is to return the starting index, and then either raise an exception if not found, or return some sentinel value (not -1 since it can be used as an index in error). >As I've written it here, the > method doesn't care whether it is passed a list or a tuple as the "subseq" > argument. This could be generalized to take any iterable. If anyone has a > suggestion for a better name than "index_subseq", that would be good too. > > If people think this is a good idea, I'll post a patch in the tracker. +0.5 I found this useful enough to write my own quick and dirty version, but not enough to spend time optimizing it. -- Steven From random832 at fastmail.us Tue Aug 27 19:52:23 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 27 Aug 2013 13:52:23 -0400 Subject: [Python-ideas] Have os.unlink remove directories on Unix (was: junction points on Windows) In-Reply-To: References: <1377490792.32342.14074717.656C065D@webmail.messagingengine.com> <1377519955.17534.14193997.50075D54@webmail.messagingengine.com> Message-ID: <1377625943.15893.14771809.07C79927@webmail.messagingengine.com> On Tue, Aug 27, 2013, at 11:47, Kim Gr?sman wrote: > I don't want to remove empty dirs, only links. So only call it on links. > > And the fact that windows > > unlink() allows you to remove some (but not all) things that windows > > considers to be directories is already violating the principle of being > > thin wrappers around system calls. > > Do you mean Python's os.unlink() here, or the Microsoft C-runtime's > unlink()? I'm talking about Python's unlink. The fact that it works to remove directory symlinks is what I was talking about here. From kim.grasman at gmail.com Tue Aug 27 20:52:16 2013 From: kim.grasman at gmail.com (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Tue, 27 Aug 2013 20:52:16 +0200 Subject: [Python-ideas] Have os.unlink remove directories on Unix (was: junction points on Windows) In-Reply-To: <1377625943.15893.14771809.07C79927@webmail.messagingengine.com> References: <1377490792.32342.14074717.656C065D@webmail.messagingengine.com> <1377519955.17534.14193997.50075D54@webmail.messagingengine.com> <1377625943.15893.14771809.07C79927@webmail.messagingengine.com> Message-ID: I'm not sure this has anything to do with my original question anymore. > I'm talking about Python's unlink. The fact that it works to remove > directory symlinks is what I was talking about here. I don't know enough about Unix to discuss the merits of changing os.unlink(), but I do believe that it should treat junction points as symlinks on Windows. I'll try to continue that thread on my original subject. - Kim From tjreedy at udel.edu Tue Aug 27 22:06:48 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 27 Aug 2013 16:06:48 -0400 Subject: [Python-ideas] Move more parts of interpreter core to stdlib In-Reply-To: References: <20130826195555.74aa3fe5@fsol> <521BA473.7070206@nedbatchelder.com> <20130826213031.445503da@fsol> <20130827101102.00b78f03@pitrou.net> Message-ID: On 8/27/2013 4:27 AM, Draic Kin wrote: > On Tue, Aug 27, 2013 at 10:11 AM, Antoine Pitrou > > Draic Kin > > Antoine wrote: > > > > The one thing that may be considered is whether there is a point in > > > > having two versions of the interactive prompt: the default one > > > > in C, and a re-usable Python one in code.py. Idlelib.PyShell subclasses code.InteractiveInterpreter as ModifiedInterpreter. It probably add the same behavior as code.InteractiveConsole(InteractiveInterpreter), except for a gui rather than text environment. > > > There are also subtle differences between the two. E.g. whether to > > > write the prompt to stdout or stderr, how to behave when attributes > > > sys.ps1, sys.ps2 are missing, and of course whether to get input from > > > sys.stdin or the standard STDIN. The Idle shell window subclasses the Idle editor window, which is based on tk text widgets. The shell puts prompts to and gets code input from its text widget. By default, it uses '>>> ' and '' as prompts (and indents with \t = 8 spaces, which should be changed somehow). > > Yes... But I'm not sure those differences were intended in the first > > place. IMO it would make sense to smooth them out. (there's a lot > > of historical baggage that could explain the discrepancies) Guido did the first 13 commits; ask him ;-). > I definitely agree that they should be smoothed out, I was just pointing > them out. Nice idea, as long as it does not break current user applications ;-). -- Terry Jan Reedy From jess.austin at gmail.com Tue Aug 27 22:11:47 2013 From: jess.austin at gmail.com (Jess Austin) Date: Tue, 27 Aug 2013 15:11:47 -0500 Subject: [Python-ideas] proposed sequence method: index_subseq() In-Reply-To: <521CE32D.1020503@pearwood.info> References: <521CE32D.1020503@pearwood.info> Message-ID: On Tue, Aug 27, 2013 at 12:34 PM, Steven D'Aprano wrote: > On 28/08/13 00:57, Jess Austin wrote: > >> >> but a less naive implementation would be more useful. Ummm, OK. Tal Einat already pointed me to KMP. > [This listing omits the monkeypatching into the list and tuple builtins.] >> > > Okay, now I'm curious. Unless you're talking about patching the Python > compiler, how on earth are you monkey-patching *into* list and tuple? That was kind of a little joke. I had a class called "list" which inherited from "real" list and a mixin, and I called that instead of using the "[]" literal. +0.5 Like I replied to Tal, his message inspired a re-think on my part. I won't be posting any of this to the tracker. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Aug 27 22:21:25 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 27 Aug 2013 16:21:25 -0400 Subject: [Python-ideas] proposed sequence method: index_subseq() In-Reply-To: <521CE32D.1020503@pearwood.info> References: <521CE32D.1020503@pearwood.info> Message-ID: On 8/27/2013 1:34 PM, Steven D'Aprano wrote: > On 28/08/13 00:57, Jess Austin wrote: >> Recently I've repeatedly needed to check whether a particular sequence >> occurred as a "subsequence" of another. I think this could be a general >> requirement, so I'd like to ask if anyone else agrees. I think the need is too specialized for the stdlib. On the otherhand, a generalized subsequence module on PyPI would be fine if there is none there already. If you call the initial releases 'alpha', you would be free to experiments with the apis. > I dislike that. The end and step parts of the slice are redundant: end > is easily calculated as just start + len(subseq), and step will always > be None. A more familiar, and obvious, functionality is to return the > starting index, and then either raise an exception if not found, or > return some sentinel value (not -1 since it can be used as an index in > error). There are subsequence algorithms that see, for instance, 1,4,7 as a subsequence of 0,1,2,3,4,5,6,7,8. -- Terry Jan Reedy From oscar.j.benjamin at gmail.com Wed Aug 28 03:47:36 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 28 Aug 2013 02:47:36 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? Message-ID: I came across the following today: $ python3.3 Python 3.3.0 (default, Sep 29 2012, 17:14:58) [GCC 4.7.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numbers >>> import decimal >>> d = decimal.Decimal() >>> isinstance(d, numbers.Number) True >>> isinstance(d, numbers.Complex) False >>> isinstance(d, numbers.Real) False >>> isinstance(d, numbers.Rational) False >>> isinstance(d, numbers.Integral) False That seems plainly absurd to me. Decimals are quite clearly real numbers. I then found the following in PEP-3141 [1]: """ The Decimal Type After consultation with its authors it has been decided that the Decimal type should not at this time be made part of the numeric tower. """ What was the rationale for this decision and does it still apply? Oscar References: [1] http://www.python.org/dev/peps/pep-3141/#the-decimal-type From steve at pearwood.info Wed Aug 28 05:11:53 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 28 Aug 2013 13:11:53 +1000 Subject: [Python-ideas] proposed sequence method: index_subseq() In-Reply-To: References: <521CE32D.1020503@pearwood.info> Message-ID: <521D6A79.1090206@pearwood.info> On 28/08/13 06:21, Terry Reedy wrote: > On 8/27/2013 1:34 PM, Steven D'Aprano wrote: >> On 28/08/13 00:57, Jess Austin wrote: >>> Recently I've repeatedly needed to check whether a particular sequence >>> occurred as a "subsequence" of another. I think this could be a general >>> requirement, so I'd like to ask if anyone else agrees. > > I think the need is too specialized for the stdlib. This is exactly what strings and bytes do, so it clearly isn't that specialised. The only question is whether it is worth generalizing to sequences other than strings and bytes. [...] >> I dislike that. The end and step parts of the slice are redundant: end >> is easily calculated as just start + len(subseq), and step will always >> be None. A more familiar, and obvious, functionality is to return the >> starting index, and then either raise an exception if not found, or >> return some sentinel value (not -1 since it can be used as an index in >> error). > > There are subsequence algorithms that see, for instance, 1,4,7 as a subsequence of 0,1,2,3,4,5,6,7,8. That's fine. They can invent their own API, but I reckon that returning a slice will still be the wrong thing to do. Consider (1, 3, 8) as a subsequence of (1, 2, 3, 4, 5, 6, 7, 8), what slice should be returned? -- Steven From tjreedy at udel.edu Wed Aug 28 06:47:43 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 28 Aug 2013 00:47:43 -0400 Subject: [Python-ideas] proposed sequence method: index_subseq() In-Reply-To: <521D6A79.1090206@pearwood.info> References: <521CE32D.1020503@pearwood.info> <521D6A79.1090206@pearwood.info> Message-ID: On 8/27/2013 11:11 PM, Steven D'Aprano wrote: > On 28/08/13 06:21, Terry Reedy wrote: > >> There are subsequence algorithms that see, for instance, 1,4,7 as a >> subsequence of 0,1,2,3,4,5,6,7,8. An even step was an accident. It is also a subsequence of 0, 1, 40, 4, 5, 8, 9, 5, 7. > That's fine. They can invent their own API, but I reckon that returning > a slice will still be the wrong thing to do. Consider (1, 3, 8) as a > subsequence of (1, 2, 3, 4, 5, 6, 7, 8), what slice should be returned? The indexes of each item matched. -- Terry Jan Reedy From ncoghlan at gmail.com Wed Aug 28 09:02:20 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 28 Aug 2013 17:02:20 +1000 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On 28 August 2013 11:47, Oscar Benjamin wrote: > I came across the following today: > > $ python3.3 > Python 3.3.0 (default, Sep 29 2012, 17:14:58) > [GCC 4.7.2] on linux > Type "help", "copyright", "credits" or "license" for more information. >>>> import numbers >>>> import decimal >>>> d = decimal.Decimal() >>>> isinstance(d, numbers.Number) > True >>>> isinstance(d, numbers.Complex) > False >>>> isinstance(d, numbers.Real) > False >>>> isinstance(d, numbers.Rational) > False >>>> isinstance(d, numbers.Integral) > False > > That seems plainly absurd to me. Decimals are quite clearly real > numbers. I then found the following in PEP-3141 [1]: > """ > The Decimal Type > > After consultation with its authors it has been decided that the > Decimal type should not at this time be made part of the numeric > tower. > """ > > What was the rationale for this decision and does it still apply? If I recall correctly, it was the fact that isinstance(d, Real) implies isinstance(d, Complex), yet there's no way to do complex arithmetic with Decimal real and imaginary components. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From michelelacchia at gmail.com Wed Aug 28 10:25:42 2013 From: michelelacchia at gmail.com (Michele Lacchia) Date: Wed, 28 Aug 2013 10:25:42 +0200 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: How can isinstance(d, Real) imply isinstance(d, Complex)? Can you elaborate a little more? It makes no sense to me, since the set of real numbers is a subset of the complex one. On Wed, Aug 28, 2013 at 9:02 AM, Nick Coghlan wrote: > On 28 August 2013 11:47, Oscar Benjamin > wrote: > > I came across the following today: > > > > $ python3.3 > > Python 3.3.0 (default, Sep 29 2012, 17:14:58) > > [GCC 4.7.2] on linux > > Type "help", "copyright", "credits" or "license" for more information. > >>>> import numbers > >>>> import decimal > >>>> d = decimal.Decimal() > >>>> isinstance(d, numbers.Number) > > True > >>>> isinstance(d, numbers.Complex) > > False > >>>> isinstance(d, numbers.Real) > > False > >>>> isinstance(d, numbers.Rational) > > False > >>>> isinstance(d, numbers.Integral) > > False > > > > That seems plainly absurd to me. Decimals are quite clearly real > > numbers. I then found the following in PEP-3141 [1]: > > """ > > The Decimal Type > > > > After consultation with its authors it has been decided that the > > Decimal type should not at this time be made part of the numeric > > tower. > > """ > > > > What was the rationale for this decision and does it still apply? > > If I recall correctly, it was the fact that isinstance(d, Real) > implies isinstance(d, Complex), yet there's no way to do complex > arithmetic with Decimal real and imaginary components. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Michele Lacchia -------------- next part -------------- An HTML attachment was scrubbed... URL: From taleinat at gmail.com Wed Aug 28 10:21:36 2013 From: taleinat at gmail.com (Tal Einat) Date: Wed, 28 Aug 2013 11:21:36 +0300 Subject: [Python-ideas] proposed sequence method: index_subseq() In-Reply-To: References: Message-ID: On Tue, Aug 27, 2013 at 7:19 PM, Jess Austin wrote: > On Tue, Aug 27, 2013 at 10:27 AM, Tal Einat wrote: >> >> There are many algorithms for sub-sequence search, most of which can >> be significantly more efficient than the one you used under common >> circumstances. For example, see the Knuth-Morris-Pratt algorithm [1], >> and an example Python implementation on the ActiveState cookbook [2]. > > > Good point; O(n+m) is better than O(n*m). Minor observation: KMP would > disallow the possibility I raised of subseq being just an iterator, rather > than a sequence. I think that's OK, since my use cases haven't had iterators > here. It actually seems more likely that the "containing" object will be an > iterator, which the recipe you linked would allow. Hmmm.... You meant that the searched sequence would be an iterator, not the sub-sequence, right? This should be possible with KMP or similar algorithms (e.g. as described under the "Variants" section in the KMP Wikipedia article). >> A good implementation that would work relatively well in most common >> cases could be useful, much as Python's sorting implementation is. I >> think, however, that it would be better to start this as a 3rd party >> module rather than push for immediate inclusion in the stdlib. > > No push here! I'm just asking questions. Thanks for your insights. My pleasure. I'd be happy to help with the development of this, if you're going to work on it. - Tal From steve at pearwood.info Wed Aug 28 10:38:12 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 28 Aug 2013 18:38:12 +1000 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: <20130828083812.GB12823@ando> On Wed, Aug 28, 2013 at 10:25:42AM +0200, Michele Lacchia wrote: > How can isinstance(d, Real) imply isinstance(d, Complex)? Can you elaborate > a little more? > It makes no sense to me, since the set of real numbers is a subset of the > complex one. You've just explained it. Since real numbers are a subset of complex numbers, every real number is a complex number with the imaginary part set to zero. py> import numbers py> isinstance(42.0, numbers.Complex) True py> (42.0).imag 0.0 This only applies with abstract base classes like numbers.Complex, not concrete classes like complex. py> isinstance(42.0, complex) False -- Steven From steve at pearwood.info Wed Aug 28 10:43:15 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 28 Aug 2013 18:43:15 +1000 Subject: [Python-ideas] proposed sequence method: index_subseq() In-Reply-To: References: Message-ID: <20130828084315.GC12823@ando> On Wed, Aug 28, 2013 at 11:21:36AM +0300, Tal Einat wrote: > On Tue, Aug 27, 2013 at 7:19 PM, Jess Austin wrote: > > On Tue, Aug 27, 2013 at 10:27 AM, Tal Einat wrote: > >> > >> There are many algorithms for sub-sequence search, most of which can > >> be significantly more efficient than the one you used under common > >> circumstances. For example, see the Knuth-Morris-Pratt algorithm [1], > >> and an example Python implementation on the ActiveState cookbook [2]. > > > > > > Good point; O(n+m) is better than O(n*m). Minor observation: KMP would > > disallow the possibility I raised of subseq being just an iterator, rather > > than a sequence. I think that's OK, since my use cases haven't had iterators > > here. It actually seems more likely that the "containing" object will be an > > iterator, which the recipe you linked would allow. Hmmm.... > > You meant that the searched sequence would be an iterator, not the > sub-sequence, right? This should be possible with KMP or similar > algorithms (e.g. as described under the "Variants" section in the KMP > Wikipedia article). What would be the point of that? Having searched the iterator for some sub-sequence, you will have consumed the iterator and can no longer do anything with the values consumed. It is true that this works: py> 4 in iter([2, 3, 4, 5]) True but I believe that is a side-effect of how the ``in`` operator works, not a deliberate design feature. I think it is perfectly reasonable to insist on actual sequences for sub-sequence testing, even if the algorithm happens to work on iterators. -- Steven From taleinat at gmail.com Wed Aug 28 10:57:52 2013 From: taleinat at gmail.com (Tal Einat) Date: Wed, 28 Aug 2013 11:57:52 +0300 Subject: [Python-ideas] proposed sequence method: index_subseq() In-Reply-To: <20130828084315.GC12823@ando> References: <20130828084315.GC12823@ando> Message-ID: On Wed, Aug 28, 2013 at 11:43 AM, Steven D'Aprano wrote: > On Wed, Aug 28, 2013 at 11:21:36AM +0300, Tal Einat wrote: >> You meant that the searched sequence would be an iterator, not the >> sub-sequence, right? This should be possible with KMP or similar >> algorithms (e.g. as described under the "Variants" section in the KMP >> Wikipedia article). > > What would be the point of that? Having searched the iterator for some > sub-sequence, you will have consumed the iterator and can no longer do > anything with the values consumed. One use case, just off the top of my head, would be to search a bunch of files, some potentially very large, for some text or data. Requiring the loading of each file into memory just for a search is unnecessary. > I think it is perfectly reasonable to insist on actual sequences for > sub-sequence testing, even if the algorithm happens to work on > iterators. As far as I can tell, restricting this to searching in sequences would gain nothing, but would make this unfit for searching very large data sets. - Tal From michelelacchia at gmail.com Wed Aug 28 11:34:58 2013 From: michelelacchia at gmail.com (Michele Lacchia) Date: Wed, 28 Aug 2013 11:34:58 +0200 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: <20130828083812.GB12823@ando> References: <20130828083812.GB12823@ando> Message-ID: Damn, it seems that my mind was filled by some kind of fog... Ehm, I now get it, sorry. I really don't know what was passing in my mind before. On Wed, Aug 28, 2013 at 10:38 AM, Steven D'Aprano wrote: > On Wed, Aug 28, 2013 at 10:25:42AM +0200, Michele Lacchia wrote: > > How can isinstance(d, Real) imply isinstance(d, Complex)? Can you > elaborate > > a little more? > > It makes no sense to me, since the set of real numbers is a subset of the > > complex one. > > You've just explained it. Since real numbers are a subset of complex > numbers, every real number is a complex number with the imaginary part > set to zero. > > py> import numbers > py> isinstance(42.0, numbers.Complex) > True > py> (42.0).imag > 0.0 > > > This only applies with abstract base classes like numbers.Complex, not > concrete classes like complex. > > py> isinstance(42.0, complex) > False > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Michele Lacchia -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Wed Aug 28 12:15:35 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 28 Aug 2013 11:15:35 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On 28 August 2013 08:02, Nick Coghlan wrote: > On 28 August 2013 11:47, Oscar Benjamin wrote: >> >> That seems plainly absurd to me. Decimals are quite clearly real >> numbers. I then found the following in PEP-3141 [1]: >> """ >> The Decimal Type >> >> After consultation with its authors it has been decided that the >> Decimal type should not at this time be made part of the numeric >> tower. >> """ >> >> What was the rationale for this decision and does it still apply? > > If I recall correctly, it was the fact that isinstance(d, Real) > implies isinstance(d, Complex), yet there's no way to do complex > arithmetic with Decimal real and imaginary components. There's also no way to arithmetic with int/Fraction real and imaginary components (e.g. gaussian integers etc.) but these are still instances of Complex. >From the PEP ''' Complex defines the operations that work on the builtin complex type. In short, those are: conversion to complex, bool(), .real, .imag, +, -, *, /, **, abs(), .conjugate(), ==, and !=. ''' Complex(decimal) works, Decimal defines all the above operations and has .real and .imag attributes: $ python3.3 Python 3.3.0 (default, Sep 29 2012, 17:14:58) [GCC 4.7.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from decimal import Decimal >>> d = Decimal('4') >>> complex(d) (4+0j) >>> bool(d) True >>> d.real Decimal('4') >>> d.imag Decimal('0') >>> d + d Decimal('8') >>> d - d Decimal('0') >>> d * d Decimal('16') >>> d / d Decimal('1') >>> d ** d Decimal('256') >>> abs(d) Decimal('4') >>> d.conjugate() Decimal('4') >>> d == d True >>> d != d False Oscar From p.f.moore at gmail.com Wed Aug 28 12:22:47 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 28 Aug 2013 11:22:47 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On 28 August 2013 11:15, Oscar Benjamin wrote: > There's also no way to arithmetic with int/Fraction real and imaginary > components (e.g. gaussian integers etc.) but these are still instances > of Complex. > The difference is that there is no implicit conversion from Decimal to float: >>> from decimal import Decimal as D >>> D('1.5') + 2.3 Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for +: 'Decimal' and 'float' >>> from fractions import Fraction as F >>> F(1,2) + 2.3 2.8 IIRC, that's the reason Decimal is treated specially here - maybe it's not a sufficiently good reason (I don't have an opinion on that) but from what little I recall of the discussions around the time, it's the key distinguishing feature of Decimal in these matters... Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From vernondcole at gmail.com Wed Aug 28 12:30:36 2013 From: vernondcole at gmail.com (Vernon D. Cole) Date: Wed, 28 Aug 2013 11:30:36 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? Message-ID: Darn right it should return False. Given the principle of least surprise (and my prejudices built up over 40 years as a computer programmer) I would expect that decimal.Decimal data would be stored internally as some form of decimal data, and would store into a database as such. It would be expected to be in a fixed point format. Real, on the other hand, I would expect to be stored as an IEEE double precision floating point number, or something like that. I don't care whether a fixed point decimal number might be defined by a mathematician as "real" -- I care whether it can be processed by an FPU, and whether it will loose precision in large financial calculations. -------------- next part -------------- An HTML attachment was scrubbed... URL: From drekin at gmail.com Wed Aug 28 12:48:02 2013 From: drekin at gmail.com (Draic Kin) Date: Wed, 28 Aug 2013 12:48:02 +0200 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 12:30 PM, Vernon D. Cole wrote: > Darn right it should return False. Given the principle of least surprise > (and my prejudices built up over 40 years as a computer programmer) I would > expect that decimal.Decimal data would be stored internally as some form of > decimal data, and would store into a database as such. It would be > expected to be in a fixed point format. Real, on the other hand, I would > expect to be stored as an IEEE double precision floating point number, or > something like that. > I don't care whether a fixed point decimal number might be defined by a > mathematician as "real" -- I care whether it can be processed by an FPU, > and whether it will loose precision in large financial calculations. > > For the same reason, I could think that isinstance(Decimal, Rational) -> True and issubclass(Rational, Real) -> False. It's more about exact vs. non-exact computations which is orthogonal to number hierarchy. Maybe there should be some ExactNumber abstract base class and some convention that exact shouldn't coerce with non-exact. So Decimal + float should raise an exception even if both would be subclasses of Real (and Decimal even of Rational). Or maybe it would be enough if there were just non-exact variants of Real and Complex since non-exactness if just issue of them. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Wed Aug 28 12:49:15 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 28 Aug 2013 11:49:15 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On 28 August 2013 11:22, Paul Moore wrote: > On 28 August 2013 11:15, Oscar Benjamin wrote: >> >> There's also no way to arithmetic with int/Fraction real and imaginary >> components (e.g. gaussian integers etc.) but these are still instances >> of Complex. > > The difference is that there is no implicit conversion from Decimal to > float: > >>>> from decimal import Decimal as D >>>> D('1.5') + 2.3 > Traceback (most recent call last): > File "", line 1, in > TypeError: unsupported operand type(s) for +: 'Decimal' and 'float' >>>> from fractions import Fraction as F >>>> F(1,2) + 2.3 > 2.8 > > IIRC, that's the reason Decimal is treated specially here - maybe it's not a > sufficiently good reason (I don't have an opinion on that) but from what > little I recall of the discussions around the time, it's the key > distinguishing feature of Decimal in these matters... Why shouldn't there be implicit conversion in Decimal arithmetic? There already is for all the other numeric types. Also explicit conversion seems to blocked in some cases. This one in particular bothers me (since it's often desirable to get a decimal representation of a fraction): $ python3.3 Python 3.3.0 (default, Sep 29 2012, 17:14:58) [GCC 4.7.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from decimal import Decimal as D >>> from fractions import Fraction as F >>> D(F(1, 2)) Traceback (most recent call last): File "", line 1, in TypeError: conversion from Fraction to Decimal is not supported Oscar From oscar.j.benjamin at gmail.com Wed Aug 28 12:53:06 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 28 Aug 2013 11:53:06 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On 28 August 2013 11:48, Draic Kin wrote: > On Wed, Aug 28, 2013 at 12:30 PM, Vernon D. Cole > wrote: >> >> Darn right it should return False. Given the principle of least surprise >> (and my prejudices built up over 40 years as a computer programmer) I would >> expect that decimal.Decimal data would be stored internally as some form of >> decimal data, and would store into a database as such. It would be expected >> to be in a fixed point format. Real, on the other hand, I would expect to >> be stored as an IEEE double precision floating point number, or something >> like that. >> I don't care whether a fixed point decimal number might be defined by a >> mathematician as "real" -- I care whether it can be processed by an FPU, and >> whether it will loose precision in large financial calculations. Decimals are effectively stored internally as integers. They won't be processed by the FPU which operates on binary floating point types. Also the decimal module has traps for people who want to prevent inexact etc. operations. > For the same reason, I could think that isinstance(Decimal, Rational) -> > True and issubclass(Rational, Real) -> False. It's more about exact vs. > non-exact computations which is orthogonal to number hierarchy. Maybe there > should be some ExactNumber abstract base class and some convention that > exact shouldn't coerce with non-exact. So Decimal + float should raise an > exception even if both would be subclasses of Real (and Decimal even of > Rational). Or maybe it would be enough if there were just non-exact variants > of Real and Complex since non-exactness if just issue of them. I agree that an Exact ABC would be good. The PEP has a reference to exact as a superclass of Rational but I guess it's just left over from a previous edit: http://www.python.org/dev/peps/pep-3141/#numeric-classes Oscar From drekin at gmail.com Wed Aug 28 13:14:26 2013 From: drekin at gmail.com (Draic Kin) Date: Wed, 28 Aug 2013 13:14:26 +0200 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 12:49 PM, Oscar Benjamin wrote: > On 28 August 2013 11:22, Paul Moore wrote: > > On 28 August 2013 11:15, Oscar Benjamin > wrote: > >> > >> There's also no way to arithmetic with int/Fraction real and imaginary > >> components (e.g. gaussian integers etc.) but these are still instances > >> of Complex. > > > > The difference is that there is no implicit conversion from Decimal to > > float: > > > >>>> from decimal import Decimal as D > >>>> D('1.5') + 2.3 > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: unsupported operand type(s) for +: 'Decimal' and 'float' > >>>> from fractions import Fraction as F > >>>> F(1,2) + 2.3 > > 2.8 > > > > IIRC, that's the reason Decimal is treated specially here - maybe it's > not a > > sufficiently good reason (I don't have an opinion on that) but from what > > little I recall of the discussions around the time, it's the key > > distinguishing feature of Decimal in these matters... > > Why shouldn't there be implicit conversion in Decimal arithmetic? > There already is for all the other numeric types. Also explicit > conversion seems to blocked in some cases. This one in particular > bothers me (since it's often desirable to get a decimal representation > of a fraction): > > $ python3.3 > Python 3.3.0 (default, Sep 29 2012, 17:14:58) > [GCC 4.7.2] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> from decimal import Decimal as D > >>> from fractions import Fraction as F > >>> D(F(1, 2)) > Traceback (most recent call last): > File "", line 1, in > TypeError: conversion from Fraction to Decimal is not supported > > I would think that it's because you can express a fraction as finite decimal expansion iff the prime decomposition of denominator contains only 2s and 5s, since conceptualy decimal is just a fraction with power of 10 in denominator. So Decimal is less expressible than Fraction. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Wed Aug 28 13:14:40 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 28 Aug 2013 07:14:40 -0400 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: <1377688480.26811.15068793.5857AF32@webmail.messagingengine.com> On Wed, Aug 28, 2013, at 4:25, Michele Lacchia wrote: > How can isinstance(d, Real) imply isinstance(d, Complex)? Can you > elaborate > a little more? > It makes no sense to me, since the set of real numbers is a subset of the > complex one. I'm not sure what your question is, since that's precisely why it does imply it. From random832 at fastmail.us Wed Aug 28 13:24:05 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 28 Aug 2013 07:24:05 -0400 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: <1377689045.28555.15070181.2430EB57@webmail.messagingengine.com> On Wed, Aug 28, 2013, at 6:30, Vernon D. Cole wrote: > Darn right it should return False. Given the principle of least surprise > (and my prejudices built up over 40 years as a computer programmer) I > would > expect that decimal.Decimal data would be stored internally as some form > of > decimal data, and would store into a database as such. It would be > expected to be in a fixed point format. Real, on the other hand, I would > expect to be stored as an IEEE double precision floating point number, or > something like that. > I don't care whether a fixed point decimal number might be defined by a > mathematician as "real" -- I care whether it can be processed by an FPU, > and whether it will loose precision in large financial calculations. That's not what Real means, and it's not, for example, true of Fraction. The type you're thinking of is called "float", and the fact that Fortran calls it "real" does not obligate numbers.Real to have the same meaning, no more than the fact that C uses "float" to refer to single precision obligates us to use that term. There's a difference between "principle of least surprise" and arbitrarily importing terminology from an unrelated language. Also, Decimal is a floating point format - a _decimal_, arbitrary-precision, floating point format rather than a binary one. Fixed point means there's a hard limit to the precision (e.g. can never represent anything finer than 1/100000) and a hard limit to the range. From oscar.j.benjamin at gmail.com Wed Aug 28 13:35:48 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 28 Aug 2013 12:35:48 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On 28 August 2013 12:14, Draic Kin wrote: >> Why shouldn't there be implicit conversion in Decimal arithmetic? >> There already is for all the other numeric types. Also explicit >> conversion seems to blocked in some cases. This one in particular >> bothers me (since it's often desirable to get a decimal representation >> of a fraction): >> >> $ python3.3 >> Python 3.3.0 (default, Sep 29 2012, 17:14:58) >> [GCC 4.7.2] on linux >> Type "help", "copyright", "credits" or "license" for more information. >> >>> from decimal import Decimal as D >> >>> from fractions import Fraction as F >> >>> D(F(1, 2)) >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: conversion from Fraction to Decimal is not supported >> > I would think that it's because you can express a fraction as finite decimal > expansion iff the prime decomposition of denominator contains only 2s and > 5s, since conceptualy decimal is just a fraction with power of 10 in > denominator. So Decimal is less expressible than Fraction. The same is true of float but float(Fraction) happily works and so does float(int), complex(int), float(Decimal) and Decimal(float) (depending on the context Decimal can be either a subset or a superset of float). A recent example where I wanted to do this was: def sum_exact(nums): T = type(nums[0]) return T(sum(map(Fraction, nums))) The above sum function can happily sum anything that is convertible to Fraction (which includes Decimals in Python 3.3). However Decimal(Fraction) fails so you need something like: def sum_exact(nums): T = type(nums[0]) if issubclass(T, Decimal): return T.from_decimal(...) else: ... This just seems unnecessary to me. Oscar From vernondcole at gmail.com Wed Aug 28 14:06:01 2013 From: vernondcole at gmail.com (Vernon D. Cole) Date: Wed, 28 Aug 2013 13:06:01 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 11:53 AM, Oscar Benjamin wrote: > On 28 August 2013 11:48, Draic Kin wrote: > > On Wed, Aug 28, 2013 at 12:30 PM, Vernon D. Cole > > wrote: > >> > >> Darn right it should return False. Given the principle of least > surprise > >> (and my prejudices built up over 40 years as a computer programmer) I > would > >> expect that decimal.Decimal data would be stored internally as some > form of > >> decimal data, and would store into a database as such. It would be > expected > >> to be in a fixed point format. Real, on the other hand, I would expect > to > >> be stored as an IEEE double precision floating point number, or > something > >> like that. > >> I don't care whether a fixed point decimal number might be defined by > a > >> mathematician as "real" -- I care whether it can be processed by an > FPU, and > >> whether it will loose precision in large financial calculations. > > Decimals are effectively stored internally as integers. That surprises me, a bit, but given the efficiency of modern 64 bit processors it's not a bad choice. We've moved on from the day when CPUs like the Intel 4004 and the IBM 360 expected to do most of their work in Binary Coded Decimal. > They won't be > processed by the FPU which operates on binary floating point types. > _Exactly_ my point. > Also the decimal module has traps for people who want to prevent > inexact etc. operations. > > > For the same reason, I could think that isinstance(Decimal, Rational) -> > > True and issubclass(Rational, Real) -> False. It's more about exact vs. > > non-exact computations which is orthogonal to number hierarchy. Maybe > there > > should be some ExactNumber abstract base class and some convention that > > exact shouldn't coerce with non-exact. So Decimal + float should raise an > > exception even if both would be subclasses of Real (and Decimal even of > > Rational). Or maybe it would be enough if there were just non-exact > variants > > of Real and Complex since non-exactness if just issue of them. > That's it. This is a "practicality beats purity" issue. Python types Real and Complex are not exact because we usually don't need exact. "Close enough" is enough. I know that 3.14159 is not the true value for Pi, but it suffices when I am trying to figure out how fast a vehicle will travel with a given size tire. Now, consider when I am processing the arguments for an SQL "execute" method. [*] How do I prepare the values for the underlying db engine? I use a long list which includes lots of "elif isinstance(value, ):" The code for "isinstance(value, Real)" is quite straight forward. The code for "isinstance(value, decimal.Decimal)" requires 18 lines of incredibly obscure Python. I really do need to be able to tell them apart. I agree that an Exact ABC would be good. The PEP has a reference to > exact as a superclass of Rational but I guess it's just left over from > a previous edit: > http://www.python.org/dev/peps/pep-3141/#numeric-classes > > > Oscar > [*] For this example, I am referring to the code in http://sf.net/projects/adodbapi of which I am the maintainer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From drekin at gmail.com Wed Aug 28 14:25:36 2013 From: drekin at gmail.com (Draic Kin) Date: Wed, 28 Aug 2013 14:25:36 +0200 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 2:06 PM, Vernon D. Cole wrote: > On Wed, Aug 28, 2013 at 11:53 AM, Oscar Benjamin < > oscar.j.benjamin at gmail.com> wrote: > >> On 28 August 2013 11:48, Draic Kin wrote: >> > On Wed, Aug 28, 2013 at 12:30 PM, Vernon D. Cole > > >> > wrote: >> >> >> >> Darn right it should return False. Given the principle of least >> surprise >> >> (and my prejudices built up over 40 years as a computer programmer) I >> would >> >> expect that decimal.Decimal data would be stored internally as some >> form of >> >> decimal data, and would store into a database as such. It would be >> expected >> >> to be in a fixed point format. Real, on the other hand, I would >> expect to >> >> be stored as an IEEE double precision floating point number, or >> something >> >> like that. >> >> I don't care whether a fixed point decimal number might be defined >> by a >> >> mathematician as "real" -- I care whether it can be processed by an >> FPU, and >> >> whether it will loose precision in large financial calculations. >> >> Decimals are effectively stored internally as integers. > > That surprises me, a bit, but given the efficiency of modern 64 bit > processors it's not a bad choice. We've moved on from the day when CPUs > like the Intel 4004 and the IBM 360 expected to do most of their work in > Binary Coded Decimal. > > >> They won't be >> processed by the FPU which operates on binary floating point types. >> > > _Exactly_ my point. > > >> Also the decimal module has traps for people who want to prevent >> inexact etc. operations. >> >> > For the same reason, I could think that isinstance(Decimal, Rational) -> >> > True and issubclass(Rational, Real) -> False. It's more about exact vs. >> > non-exact computations which is orthogonal to number hierarchy. Maybe >> there >> > should be some ExactNumber abstract base class and some convention that >> > exact shouldn't coerce with non-exact. So Decimal + float should raise >> an >> > exception even if both would be subclasses of Real (and Decimal even of >> > Rational). Or maybe it would be enough if there were just non-exact >> variants >> > of Real and Complex since non-exactness if just issue of them. >> > > That's it. This is a "practicality beats purity" issue. Python types > Real and Complex are not exact because we usually don't need exact. "Close > enough" is enough. I know that 3.14159 is not the true value for Pi, but > it suffices when I am trying to figure out how fast a vehicle will travel > with a given size tire. > > Python types Real and Complex are abstract base classes, they have no implementation so are neither exact not non-exact, Python types float and complex are non-exact. > Now, consider when I am processing the arguments for an SQL "execute" > method. [*] How do I prepare the values for the underlying db engine? I > use a long list which includes lots of "elif isinstance(value, type>):" > > The code for "isinstance(value, Real)" is quite straight forward. > > The code for "isinstance(value, decimal.Decimal)" requires 18 lines of > incredibly obscure Python. > > I really do need to be able to tell them apart. > > Couldn't you tell them apart by isinstance(value, decimal.Decimal) vs. isinstance(value, float)? -------------- next part -------------- An HTML attachment was scrubbed... URL: From drekin at gmail.com Wed Aug 28 14:32:10 2013 From: drekin at gmail.com (Draic Kin) Date: Wed, 28 Aug 2013 14:32:10 +0200 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 1:35 PM, Oscar Benjamin wrote: > On 28 August 2013 12:14, Draic Kin wrote: > >> Why shouldn't there be implicit conversion in Decimal arithmetic? > >> There already is for all the other numeric types. Also explicit > >> conversion seems to blocked in some cases. This one in particular > >> bothers me (since it's often desirable to get a decimal representation > >> of a fraction): > >> > >> $ python3.3 > >> Python 3.3.0 (default, Sep 29 2012, 17:14:58) > >> [GCC 4.7.2] on linux > >> Type "help", "copyright", "credits" or "license" for more information. > >> >>> from decimal import Decimal as D > >> >>> from fractions import Fraction as F > >> >>> D(F(1, 2)) > >> Traceback (most recent call last): > >> File "", line 1, in > >> TypeError: conversion from Fraction to Decimal is not supported > >> > > I would think that it's because you can express a fraction as finite > decimal > > expansion iff the prime decomposition of denominator contains only 2s and > > 5s, since conceptualy decimal is just a fraction with power of 10 in > > denominator. So Decimal is less expressible than Fraction. > > The same is true of float but float(Fraction) happily works and so > does float(int), complex(int), float(Decimal) and Decimal(float) > (depending on the context Decimal can be either a subset or a superset > of float). A recent example where I wanted to do this was: > > def sum_exact(nums): > T = type(nums[0]) > return T(sum(map(Fraction, nums))) > > The above sum function can happily sum anything that is convertible to > Fraction (which includes Decimals in Python 3.3). However > Decimal(Fraction) fails so you need something like: > > def sum_exact(nums): > T = type(nums[0]) > if issubclass(T, Decimal): > return T.from_decimal(...) > else: > ... > > This just seems unnecessary to me. > > Maybe it's because float and complex don't care for exactness. On the other hand Decimal always represent exactly the value of int and float but it cannot represent exactly the value of Fraction(1, 3). But if would be nice if one could make a decimal of given precision from fraction or make exact decimal representaion of a fraction if it is possible and raise exception otherwise. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vernondcole at gmail.com Wed Aug 28 14:47:36 2013 From: vernondcole at gmail.com (Vernon D. Cole) Date: Wed, 28 Aug 2013 13:47:36 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: I stand corrected. In my mind "Real" and "float" were always synonymous. (Probably a result of over exposure to FORTRAN, or Alzheimer's.) On Wed, Aug 28, 2013 at 1:25 PM, Draic Kin wrote: > > On Wed, Aug 28, 2013 at 2:06 PM, Vernon D. Cole wrote: > >> On Wed, Aug 28, 2013 at 11:53 AM, Oscar Benjamin < >> oscar.j.benjamin at gmail.com> wrote: >> >>> On 28 August 2013 11:48, Draic Kin wrote: >>> > On Wed, Aug 28, 2013 at 12:30 PM, Vernon D. Cole < >>> vernondcole at gmail.com> >>> > wrote: >>> >> >>> >> Darn right it should return False. Given the principle of least >>> surprise >>> >> (and my prejudices built up over 40 years as a computer programmer) I >>> would >>> >> expect that decimal.Decimal data would be stored internally as some >>> form of >>> >> decimal data, and would store into a database as such. It would be >>> expected >>> >> to be in a fixed point format. Real, on the other hand, I would >>> expect to >>> >> be stored as an IEEE double precision floating point number, or >>> something >>> >> like that. >>> >> I don't care whether a fixed point decimal number might be defined >>> by a >>> >> mathematician as "real" -- I care whether it can be processed by an >>> FPU, and >>> >> whether it will loose precision in large financial calculations. >>> >>> Decimals are effectively stored internally as integers. >> >> That surprises me, a bit, but given the efficiency of modern 64 bit >> processors it's not a bad choice. We've moved on from the day when CPUs >> like the Intel 4004 and the IBM 360 expected to do most of their work in >> Binary Coded Decimal. >> >> >>> They won't be >>> processed by the FPU which operates on binary floating point types. >>> >> >> _Exactly_ my point. >> >> >>> Also the decimal module has traps for people who want to prevent >>> inexact etc. operations. >>> >>> > For the same reason, I could think that isinstance(Decimal, Rational) >>> -> >>> > True and issubclass(Rational, Real) -> False. It's more about exact vs. >>> > non-exact computations which is orthogonal to number hierarchy. Maybe >>> there >>> > should be some ExactNumber abstract base class and some convention that >>> > exact shouldn't coerce with non-exact. So Decimal + float should raise >>> an >>> > exception even if both would be subclasses of Real (and Decimal even of >>> > Rational). Or maybe it would be enough if there were just non-exact >>> variants >>> > of Real and Complex since non-exactness if just issue of them. >>> >> >> That's it. This is a "practicality beats purity" issue. Python types >> Real and Complex are not exact because we usually don't need exact. "Close >> enough" is enough. I know that 3.14159 is not the true value for Pi, but >> it suffices when I am trying to figure out how fast a vehicle will travel >> with a given size tire. >> >> Python types Real and Complex are abstract base classes, they have no > implementation so are neither exact not non-exact, Python types float and > complex are non-exact. > > >> Now, consider when I am processing the arguments for an SQL "execute" >> method. [*] How do I prepare the values for the underlying db engine? I >> use a long list which includes lots of "elif isinstance(value, > type>):" >> >> The code for "isinstance(value, Real)" is quite straight forward. >> >> The code for "isinstance(value, decimal.Decimal)" requires 18 lines of >> incredibly obscure Python. >> >> I really do need to be able to tell them apart. >> >> Couldn't you tell them apart by isinstance(value, decimal.Decimal) vs. > isinstance(value, float)? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Wed Aug 28 14:48:30 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 28 Aug 2013 13:48:30 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On 28 August 2013 13:06, Vernon D. Cole wrote: > Now, consider when I am processing the arguments for an SQL "execute" > method. [*] How do I prepare the values for the underlying db engine? I use > a long list which includes lots of "elif isinstance(value, type>):" > > The code for "isinstance(value, Real)" is quite straight forward. > > The code for "isinstance(value, decimal.Decimal)" requires 18 lines of > incredibly obscure Python. > > I really do need to be able to tell them apart. Can you not just reverse the order of those two tests? i.e.: elif isinstance(value, decimal.Decimal): # obscure decimal code elif isinstance(value, Real): # code for real or what about elif isinstance(value, decimal.Decimal) and not isinstance(value, Real): # obscure decimal code or even elif isinstance(value, float): # code for float elif isinstance(value, decimal.Decimal): # obscure decimal code What other types are you hoping to catch by testing against numbers.Real instead of float? What guarantees does the Real ABC give you in this situation that Decimal does not (hence requiring all the obscure code)? BTW it's not a big deal for database code but for intensive numerical code it's worth noting that testing against numbers.Real is significantly slower than testing against float: $ py -3.3 -m timeit -s 'from numbers import Real' 'isinstance(1.0, Real)' 1000000 loops, best of 3: 1.93 usec per loop $ py -3.3 -m timeit -s 'from numbers import Real' 'isinstance(1.0, float)' 1000000 loops, best of 3: 0.217 usec per loop You can use a tuple to speed it up for float: $ py -3.3 -m timeit -s 'from numbers import Real' 'isinstance(1.0, (float, Real))' 1000000 loops, best of 3: 0.277 usec per loop Given that $ py -3.3 -m timeit -s 'a = 123.45; b=987.654' 'a+b' 10000000 loops, best of 3: 0.0718 usec per loop it seems that isinstance(x, float) is roughly the same as 4 arithmetic operations and isinstance(x, Real) is about 40 arithmetic operations. Oscar From oscar.j.benjamin at gmail.com Wed Aug 28 15:13:10 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 28 Aug 2013 14:13:10 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On 28 August 2013 13:32, Draic Kin wrote: > On Wed, Aug 28, 2013 at 1:35 PM, Oscar Benjamin > wrote: >> >> The same is true of float but float(Fraction) happily works and so >> does float(int), complex(int), float(Decimal) and Decimal(float) >> (depending on the context Decimal can be either a subset or a superset >> of float). A recent example where I wanted to do this was: >> >> def sum_exact(nums): >> T = type(nums[0]) >> return T(sum(map(Fraction, nums))) >> >> The above sum function can happily sum anything that is convertible to >> Fraction (which includes Decimals in Python 3.3). However >> Decimal(Fraction) fails so you need something like: >> >> def sum_exact(nums): >> T = type(nums[0]) >> if issubclass(T, Decimal): >> return T.from_decimal(...) >> else: >> ... >> >> This just seems unnecessary to me. >> > Maybe it's because float and complex don't care for exactness. On the other > hand Decimal always represent exactly the value of int and float but it > cannot represent exactly the value of Fraction(1, 3). But if would be nice > if one could make a decimal of given precision from fraction or make exact > decimal representaion of a fraction if it is possible and raise exception > otherwise. But that's why the decimal module has the Inexact etc. traps. Here's a table of coercions that are possible in 3.3. Key (i:int, F:Fraction, D:Decimal, f:float, c:complex). If Tr is the type of the row and Tc is the type of the column then X indicates that Tr(Tc) is always possible and exact and R indicates that rounding may occur: i F D f c --+--------------- i | X R R R F | X X X X D | X X X f | R R R X c | R R R X X There's a little hole in the middle there that makes the real types not quite inter-operable. It's easy to implement the appropriate conversion: def decimal_from_rational(r): # Result will be correctly rounded according to the current context. # Raises any of the signals Clamped, InvalidOperation, DivisionByZero, # Inexact, Rounded, Subnormal, Overflow, or Underflow as appropriate. return Decimal(r.numerator) / Decimal(r.denominator) But it's much more useful if that conversion takes place in Decimal.__new__. Oscar From mbuttu at oa-cagliari.inaf.it Wed Aug 28 20:42:44 2013 From: mbuttu at oa-cagliari.inaf.it (Marco Buttu) Date: Wed, 28 Aug 2013 20:42:44 +0200 Subject: [Python-ideas] Optional keepsep argument in str.split() Message-ID: <521E44A4.1060804@oa-cagliari.inaf.it> What do you think about an optional `keepsep` argument in str.split(), in order to keep the separator? Something like the `keepends` of str.splitlines(): >>> 'I am\ngoing\nto...'.splitlines(keepends=True) ['I am\n', 'going\n', 'to...'] For instance: >>> 'python3'.split('n') ['pytho', '3'] >>> 'python3'.split('n', keepsep=True) ['python', '3'] Regards, Marco -- Marco Buttu INAF Osservatorio Astronomico di Cagliari Loc. Poggio dei Pini, Strada 54 - 09012 Capoterra (CA) - Italy Phone: +39 070 71180255 Email: mbuttu at oa-cagliari.inaf.it From python at mrabarnett.plus.com Wed Aug 28 20:57:02 2013 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 28 Aug 2013 19:57:02 +0100 Subject: [Python-ideas] Optional keepsep argument in str.split() In-Reply-To: <521E44A4.1060804@oa-cagliari.inaf.it> References: <521E44A4.1060804@oa-cagliari.inaf.it> Message-ID: <521E47FE.6040809@mrabarnett.plus.com> On 28/08/2013 19:42, Marco Buttu wrote: > What do you think about an optional `keepsep` argument in str.split(), > in order to keep the separator? > Something like the `keepends` of str.splitlines(): > > >>> 'I am\ngoing\nto...'.splitlines(keepends=True) > ['I am\n', 'going\n', 'to...'] > > For instance: > > >>> 'python3'.split('n') > ['pytho', '3'] > >>> 'python3'.split('n', keepsep=True) > ['python', '3'] > If it's a _separator_, should it be attached to the previous part? Shouldn't it be: >>> 'python3'.split('n', keepsep=True) ['pytho', 'n', '3'] That might be why the keyword argument of .splitlines method is called 'keepends'. Usually you're not interested in the separator itself, but only in what it separates. What's your use-case? From masklinn at masklinn.net Wed Aug 28 21:36:39 2013 From: masklinn at masklinn.net (Masklinn) Date: Wed, 28 Aug 2013 21:36:39 +0200 Subject: [Python-ideas] Optional keepsep argument in str.split() In-Reply-To: <521E47FE.6040809@mrabarnett.plus.com> References: <521E44A4.1060804@oa-cagliari.inaf.it> <521E47FE.6040809@mrabarnett.plus.com> Message-ID: On 2013-08-28, at 20:57 , MRAB wrote: > On 28/08/2013 19:42, Marco Buttu wrote: >> What do you think about an optional `keepsep` argument in str.split(), >> in order to keep the separator? >> Something like the `keepends` of str.splitlines(): >> >> >>> 'I am\ngoing\nto...'.splitlines(keepends=True) >> ['I am\n', 'going\n', 'to...'] >> >> For instance: >> >> >>> 'python3'.split('n') >> ['pytho', '3'] >> >>> 'python3'.split('n', keepsep=True) >> ['python', '3'] >> > If it's a _separator_, should it be attached to the previous part? > Shouldn't it be: > > >>> 'python3'.split('n', keepsep=True) > ['pytho', 'n', '3'] Which, for what it's worth, is already covered by re.split: >>> re.split(r"(n)", "python3") ['pytho', 'n', '3'] and the "keeping" split can be handled via findall: >>> re.findall(r'([^n]+(?:n|$))', "python3") ['python', '3'] From mbuttu at oa-cagliari.inaf.it Wed Aug 28 21:40:01 2013 From: mbuttu at oa-cagliari.inaf.it (Marco Buttu) Date: Wed, 28 Aug 2013 21:40:01 +0200 Subject: [Python-ideas] Optional keepsep argument in str.split() In-Reply-To: <521E47FE.6040809@mrabarnett.plus.com> References: <521E44A4.1060804@oa-cagliari.inaf.it> <521E47FE.6040809@mrabarnett.plus.com> Message-ID: <521E5211.8070407@oa-cagliari.inaf.it> On 08/28/2013 08:57 PM, MRAB wrote: > On 28/08/2013 19:42, Marco Buttu wrote: >> What do you think about an optional `keepsep` argument in str.split(), >> in order to keep the separator? >> Something like the `keepends` of str.splitlines(): >> >> >>> 'I am\ngoing\nto...'.splitlines(keepends=True) >> ['I am\n', 'going\n', 'to...'] >> >> For instance: >> >> >>> 'python3'.split('n') >> ['pytho', '3'] >> >>> 'python3'.split('n', keepsep=True) >> ['python', '3'] >> > If it's a _separator_, should it be attached to the previous part? > Shouldn't it be: > > >>> 'python3'.split('n', keepsep=True) > ['pytho', 'n', '3'] It should be attached to the previous part, exactly as my example > > What's your use-case? I think it could be useful in a lot of use-cases, when you have to parse a string. For instance, if you have some source code, and you want to write it better//: >>> source_code = "int a = 33;cout << a << endl;return 0;" >>> print('\n'.join(source_code.split(';'))) int a = 33 cout << a << endl return 0 >>> print('\n'.join(source_code.split(';', keepsep=True))) int a = 33; cout << a << endl; return 0; -- Marco Buttu INAF Osservatorio Astronomico di Cagliari Loc. Poggio dei Pini, Strada 54 - 09012 Capoterra (CA) - Italy Phone: +39 070 71180255 Email: mbuttu at oa-cagliari.inaf.it -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Aug 28 21:44:03 2013 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 28 Aug 2013 14:44:03 -0500 Subject: [Python-ideas] Optional keepsep argument in str.split() In-Reply-To: <521E44A4.1060804@oa-cagliari.inaf.it> References: <521E44A4.1060804@oa-cagliari.inaf.it> Message-ID: Sounds interesting...not sure about how often it'd be used, since I could always use re: re.split('(n)', 'python3') On Wed, Aug 28, 2013 at 1:42 PM, Marco Buttu wrote: > What do you think about an optional `keepsep` argument in str.split(), in > order to keep the separator? > Something like the `keepends` of str.splitlines(): > > >>> 'I am\ngoing\nto...'.splitlines(**keepends=True) > ['I am\n', 'going\n', 'to...'] > > For instance: > > >>> 'python3'.split('n') > ['pytho', '3'] > >>> 'python3'.split('n', keepsep=True) > ['python', '3'] > > Regards, Marco > > -- > Marco Buttu > > INAF Osservatorio Astronomico di Cagliari > Loc. Poggio dei Pini, Strada 54 - 09012 Capoterra (CA) - Italy > Phone: +39 070 71180255 > Email: mbuttu at oa-cagliari.inaf.it > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- Ryan -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuttu at oa-cagliari.inaf.it Wed Aug 28 23:14:12 2013 From: mbuttu at oa-cagliari.inaf.it (Marco Buttu) Date: Wed, 28 Aug 2013 23:14:12 +0200 Subject: [Python-ideas] Optional keepsep argument in str.split() In-Reply-To: References: <521E44A4.1060804@oa-cagliari.inaf.it> <521E47FE.6040809@mrabarnett.plus.com> Message-ID: <521E6824.4040009@oa-cagliari.inaf.it> On 08/28/2013 09:36 PM, Masklinn wrote: > and the "keeping" split can be handled via findall: > >>>> >>>re.findall(r'([^n]+(?:n|$))', "python3") > ['python', '3'] Of course, but this is not built-in and not obvious. Furthermore, a regex is not so trivial as the built-in solution, as you can see: >>> re.findall(r'([^n]+(?:n|$))', "pythonn3") ['python', '3'] >>> 'pythonn3'.split(sep='n', keepsep=True) ['python', 'n', '3'] Regards, -- Marco Buttu INAF Osservatorio Astronomico di Cagliari Loc. Poggio dei Pini, Strada 54 - 09012 Capoterra (CA) - Italy Phone: +39 070 71180255 Email: mbuttu at oa-cagliari.inaf.it From steve at pearwood.info Thu Aug 29 03:32:00 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 29 Aug 2013 11:32:00 +1000 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: <521EA490.2070405@pearwood.info> On 28/08/13 20:30, Vernon D. Cole wrote: > Darn right it should return False. Given the principle of least surprise > (and my prejudices built up over 40 years as a computer programmer) I would > expect that decimal.Decimal data would be stored internally as some form of > decimal data, and would store into a database as such. "Some form of decimal data" -- so you're saying "Decimals are decimals". Well, that certainly clarifies matters :-) Out of curiosity, those 40 years as a computer programmer, how much heavy-duty computational mathematics have you done? If you've spent 30 years writing business apps in COBOL and 10 years writing web apps in PHP, I wouldn't expect your prejudices about numeric computations to be trustworthy. (I know mine aren't, and I've spent years dabbling with numeric maths. The only reason I'm not surprised by computational mathematics is because I've learned not to trust my assumptions.) > It would be > expected to be in a fixed point format. Decimal is a floating point format: http://docs.python.org/2/library/decimal.html so if you are assuming that Decimal is a fixed point number, your prejudices are wrong. (Although I think the claims about exactness are misleading. There are plenty of numbers which cannot be represented exactly as decimals, just as they cannot be represented exactly as binary floats. 1/3 is the obvious example. Both float and Decimal suffer from the same inexactness issues, it's just that they sometimes suffer from them for different values.) > Real, on the other hand, I would > expect to be stored as an IEEE double precision floating point number, or > something like that. > I don't care whether a fixed point decimal number might be defined by a > mathematician as "real" -- I care whether it can be processed by an FPU, > and whether it will loose precision in large financial calculations. Membership of number.Real has little to do with what mathematicians consider real numbers. No floating point or fixed point implementation obeys the rules of real number arithmetic. But it's the traditional name, I expect due to the precedent set by Fortran. -- Steven From steve at pearwood.info Thu Aug 29 04:02:38 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 29 Aug 2013 12:02:38 +1000 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: <521EABBE.10203@pearwood.info> On 28/08/13 20:48, Draic Kin wrote: > For the same reason, I could think that isinstance(Decimal, Rational) -> > True If Decimal were a subclass of Rational, so should float. The only fundamental difference between the two is that one uses base 10 floating point numbers and the other uses base 2. >and issubclass(Rational, Real) -> False. It's more about exact vs. > non-exact computations which is orthogonal to number hierarchy. The numeric tower is precisely about the numeric hierarchy of Number > Complex > Real > Rational > Integral, and since they are all *abstract* base classes, exact and inexact doesn't come into it. Concrete classes can be inexact or exact, or one could implement separate Exact and Inexact towers. In practice, it's hard to think of a concrete way to implement exact real numbers. Maybe a symbolic maths application like Mathematica comes close? > Maybe there > should be some ExactNumber abstract base class and some convention that > exact shouldn't coerce with non-exact. So Decimal + float should raise an > exception even if both would be subclasses of Real (and Decimal even of > Rational). Or maybe it would be enough if there were just non-exact > variants of Real and Complex since non-exactness if just issue of them. Personally, I would implement inexact/exact as an attribute on the type: if type(x).exact: ... sort of thing. -- Steven From steve at pearwood.info Thu Aug 29 04:04:07 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 29 Aug 2013 12:04:07 +1000 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: <521EAC17.2070804@pearwood.info> On 28/08/13 22:06, Vernon D. Cole wrote: > Now, consider when I am processing the arguments for an SQL "execute" > method. [*] How do I prepare the values for the underlying db engine? I > use a long list which includes lots of "elif isinstance(value, type>):" > > The code for "isinstance(value, Real)" is quite straight forward. number.Real is not merely a synonym for built-in float. For example, should somebody decide to implement a floating point number class based on base 3 rather than 2 or 10 (perhaps they are trying to emulate some old Soviet ternary-based computer), it would be a subclass of Real. >The code for "isinstance(value, decimal.Decimal)" requires 18 lines of > incredibly obscure Python. > > I really do need to be able to tell them apart. You can always tell them apart. Decimal instances are instances of Decimal. Non-Decimal instances are not. -- Steven From jared.grubb at gmail.com Thu Aug 29 07:49:21 2013 From: jared.grubb at gmail.com (Jared Grubb) Date: Wed, 28 Aug 2013 22:49:21 -0700 Subject: [Python-ideas] Optional keepsep argument in str.split() In-Reply-To: <521E5211.8070407@oa-cagliari.inaf.it> References: <521E44A4.1060804@oa-cagliari.inaf.it> <521E47FE.6040809@mrabarnett.plus.com> <521E5211.8070407@oa-cagliari.inaf.it> Message-ID: <65904EC9-2969-48FC-B843-06911E2F1E99@gmail.com> On Aug 28, 2013, at 12:40, Marco Buttu wrote: > On 08/28/2013 08:57 PM, MRAB wrote: >> On 28/08/2013 19:42, Marco Buttu wrote: >>> What do you think about an optional `keepsep` argument in str.split(), >>> in order to keep the separator? >>> Something like the `keepends` of str.splitlines(): >>> >>> >>> 'I am\ngoing\nto...'.splitlines(keepends=True) >>> ['I am\n', 'going\n', 'to...'] >>> >>> For instance: >>> >>> >>> 'python3'.split('n') >>> ['pytho', '3'] >>> >>> 'python3'.split('n', keepsep=True) >>> ['python', '3'] >>> >> If it's a _separator_, should it be attached to the previous part? >> Shouldn't it be: >> >> >>> 'python3'.split('n', keepsep=True) >> ['pytho', 'n', '3'] > > It should be attached to the previous part, exactly as my example > >> >> What's your use-case? > > I think it could be useful in a lot of use-cases, when you have to parse a string. For instance, > if you have some source code, and you want to write it better: > > >>> source_code = "int a = 33;cout << a << endl;return 0;" > >>> print('\n'.join(source_code.split(';'))) > int a = 33 > cout << a << endl > return 0 > > >>> print('\n'.join(source_code.split(';', keepsep=True))) > int a = 33; > cout << a << endl; > return 0; Split and join are inverses, so it's lossless. You get this behavior by putting the semicolon in: >>> print(';\n'.join(source_code.split(';'))) int a = 33; cout << a << endl; return 0; So I'm not sure this particular use-case is compelling. Jared -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leapyear.org Thu Aug 29 08:26:34 2013 From: bruce at leapyear.org (Bruce Leban) Date: Wed, 28 Aug 2013 23:26:34 -0700 Subject: [Python-ideas] Optional keepsep argument in str.split() In-Reply-To: <65904EC9-2969-48FC-B843-06911E2F1E99@gmail.com> References: <521E44A4.1060804@oa-cagliari.inaf.it> <521E47FE.6040809@mrabarnett.plus.com> <521E5211.8070407@oa-cagliari.inaf.it> <65904EC9-2969-48FC-B843-06911E2F1E99@gmail.com> Message-ID: On Wed, Aug 28, 2013 at 10:49 PM, Jared Grubb wrote: > Split and join are inverses, so it's lossless. > That's not true. If a separator is specified, it's lossless, but not in this case: >>> 'a b c d'.split() ['a', 'b', 'c', 'd'] >>> ' '.join('a b c d'.split()) 'a b c d' I don't see a compelling use case for modifying split though. If I wanted to keep separators around, I'd probably want to work with lists like these: ['a', ' ', 'b', ' ', 'c', ' ', 'd'] or [['a', ' '], ['b', ' '], ['c', ' '], ['d', '']] and changing split to return either of those would be a bad idea. --- Bruce I'm hiring: http://www.cadencemd.com/info/jobs Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From drekin at gmail.com Thu Aug 29 10:13:35 2013 From: drekin at gmail.com (Draic Kin) Date: Thu, 29 Aug 2013 10:13:35 +0200 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: <521EABBE.10203@pearwood.info> References: <521EABBE.10203@pearwood.info> Message-ID: On Thu, Aug 29, 2013 at 4:02 AM, Steven D'Aprano wrote: > On 28/08/13 20:48, Draic Kin wrote: > >> For the same reason, I could think that isinstance(Decimal, Rational) -> >> True >> > > If Decimal were a subclass of Rational, so should float. The only > fundamental difference between the two is that one uses base 10 floating > point numbers and the other uses base 2. > > Another difference is, that precision of float is fixly limited. I actually thought that Decimal is unlimited the same way as int, however decimal.MAX_PREC is pretty big number. But you're right Decimal shouldn't be subclass of Rational. However the original question was why it is not subclass of Real. > > and issubclass(Rational, Real) -> False. It's more about exact vs. >> non-exact computations which is orthogonal to number hierarchy. >> > > The numeric tower is precisely about the numeric hierarchy of Number > > Complex > Real > Rational > Integral, and since they are all *abstract* > base classes, exact and inexact doesn't come into it. Concrete classes can > be inexact or exact, or one could implement separate Exact and Inexact > towers. > In practice, it's hard to think of a concrete way to implement exact real > numbers. Maybe a symbolic maths application like Mathematica comes close? > > > > Maybe there >> should be some ExactNumber abstract base class and some convention that >> exact shouldn't coerce with non-exact. So Decimal + float should raise an >> exception even if both would be subclasses of Real (and Decimal even of >> Rational). Or maybe it would be enough if there were just non-exact >> variants of Real and Complex since non-exactness if just issue of them. >> > > > Personally, I would implement inexact/exact as an attribute on the type: > > if type(x).exact: ... > > > sort of thing. > > There were some points (possible against the idea of issubclass(Decimal, Real) -> True) like Decimal doesn't coerce to float, you cannot convert Fraction to Decimal easily, you cannot sum Fraction + Decimal. Maybe some of the rationale for the behavior is the matter of exact vs. non-exact. So for that reason the exactness rationale could be made explicit by adding some indicator of exacness and base some coercion cases on this indicator. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuttu at oa-cagliari.inaf.it Thu Aug 29 11:25:55 2013 From: mbuttu at oa-cagliari.inaf.it (Marco Buttu) Date: Thu, 29 Aug 2013 11:25:55 +0200 Subject: [Python-ideas] Optional keepsep argument in str.split() In-Reply-To: References: <521E44A4.1060804@oa-cagliari.inaf.it> Message-ID: <521F13A3.3000707@oa-cagliari.inaf.it> On 08/28/2013 09:44 PM, Ryan Gonzalez wrote: > Sounds interesting...not sure about how often it'd be used, since I > could always use re: > > re.split('(n)', 'python3') It is not the same. As I wrote in the first message, the separator have to be attached at its token, in the same way the srt.splitlines() `keepends` argument works: >>> data = "{1: 'one', 2: 'two'}{3: 'three', 4: 'four'}" >>> import re >>> for item in re.split('(})', data): ... print(item) ... {1: 'one', 2: 'two' } {3: 'three', 4: 'four' } >>> for item in data.split(sep='}', keepsep=True): ... print(item) ... {1: 'one', 2: 'two'} {3: 'three', 4: 'four'} Regards, Marco -- Marco Buttu INAF Osservatorio Astronomico di Cagliari Loc. Poggio dei Pini, Strada 54 - 09012 Capoterra (CA) - Italy Phone: +39 070 71180255 Email: mbuttu at oa-cagliari.inaf.it From rob.cliffe at btinternet.com Thu Aug 29 12:20:43 2013 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Thu, 29 Aug 2013 11:20:43 +0100 Subject: [Python-ideas] Optional keepsep argument in str.split() In-Reply-To: <521E5211.8070407@oa-cagliari.inaf.it> References: <521E44A4.1060804@oa-cagliari.inaf.it> <521E47FE.6040809@mrabarnett.plus.com> <521E5211.8070407@oa-cagliari.inaf.it> Message-ID: <521F207B.3060903@btinternet.com> On 28/08/2013 20:40, Marco Buttu wrote: > It should be attached to the previous part, exactly as my example > If there is a leading separator in your original string, you will have to decide whether to keep it prefixed to the first element of your split list. Rob Cliffe From oscar.j.benjamin at gmail.com Thu Aug 29 15:12:49 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 29 Aug 2013 14:12:49 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: <521EABBE.10203@pearwood.info> Message-ID: On 29 August 2013 09:13, Draic Kin wrote: > > On Thu, Aug 29, 2013 at 4:02 AM, Steven D'Aprano > wrote: >> >> On 28/08/13 20:48, Draic Kin wrote: >>> >>> For the same reason, I could think that isinstance(Decimal, Rational) -> >>> True >> >> If Decimal were a subclass of Rational, so should float. The only >> fundamental difference between the two is that one uses base 10 floating >> point numbers and the other uses base 2. >> > Another difference is, that precision of float is fixly limited. I actually > thought that Decimal is unlimited the same way as int, however > decimal.MAX_PREC is pretty big number. But you're right Decimal shouldn't be > subclass of Rational. However the original question was why it is not > subclass of Real. The precision of Decimal in arithmetic is fixed and usually much smaller than MAX_PREC which is simply the maximum value it can be set to. The default is 28 decimal digits of precision: $ python3 Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from decimal import Decimal >>> import decimal >>> print(decimal.getcontext()) Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[InvalidOperation, DivisionByZero, Overflow]) >>> decimal.Decimal(1) / 3 Decimal('0.3333333333333333333333333333') >>> decimal.MAX_PREC 425000000 The context says prec=28 and that Decimal(1)/3 gives a result with 28 threes. Decimals are exact in the sense that conversion to Decimal from int, str or float is guaranteed to be exact. Note that this is not required by the standards on which it is based. The standards suggest that conversion from a supported integer type or from a string should be exact *if possible*. The reason for the "if possible" caveat is that not every implementation of the standards would be able to create arbitrary precision Decimals and in fact many would be limited to a smaller precision than 28 (e.g. decimal hardware in hand calculators etc.). The standards only require that inexact conversions should set the Inexact flag and - if the Inexact trap is set - raise the Inexact exception. It's important to be clear about the distinction between the precision of a Decimal *instance* and the precision of the current *arithmetic context*. While it is possible to exactly convert an int/str/float to a Decimal with a precision that is higher than the current context, any arithmetic operations will be rounded to the context precision according to the context rounding mode (there are 8 different rounding modes and precision is any positive integer). This arithmetic rounding is actually *required* by the IEEE-854 standard unlike the exact conversion from arbitrary precision integers etc. Specifically the standard requires that the result be (effectively) computed exactly and then rounded according to context. This means that individual binary arithmetic operations can behave as if they have a precision that is higher than the current context but as soon as you try to e.g. sum 3 numbers you should assume that you're effectively working with context precision. An example: >>> d1 = decimal.Decimal('1'*40 + '2') >>> d2 = decimal.Decimal('-'+ '1'*41) >>> d1 Decimal('11111111111111111111111111111111111111112') >>> d2 Decimal('-11111111111111111111111111111111111111111') >>> d1 + d2 # Computed exact and then rounded (no rounding occurs) Decimal('1') >>> (+d1) + (+d2) # Rounded, computed and rounded again Decimal('0E+13') >>> d1 + 0 + d2 # What happens here? Decimal('-1111111111111') For this reason sum(Decimals) is just as inaccurate as sum(floats) and Decimals need a decimalsum function just like float's fsum. My first attempt at such a function was the following (all code below was modified for posting and is untested): # My simplification of the algorithms from # "Algorithms for Arbitrary Precision Floating Point Arithmetic" # by Douglas M. Priest 1991. # http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.3546 # # This function is my own modification of Raymond Hettinger's recipe to use # the more general sum_err function from the Priest paper and perform the # final summation with Fractions def fixedwidthsum(iterable): "Full precision summation for fixed-width floating point types" partials = [] iterator = iter(iterable) isfinite = math.isfinite for x in iterable: # Handle NaN/Inf if not isfinite(x): return sum(iterator, x) i = 0 for y in partials: if abs(x) < abs(y): x, y = y, x hi, lo = sum_err(x, y) # The key modification if lo: partials[i] = lo i += 1 x = hi partials[i:] = [x] # Also modified: used fractions to add the partials if not partials: return 0 elif isinstance(partials[0], Decimal): # This is needed because Decimal(Fraction) doesn't work fresult = sum(map(Fraction, partials)) # Assumes Python 3.3 return Decimal(fresult.numerator) / Decimal(fresult.denominator) else: return type(partials[0])(sum(map(Fraction, partials))) def sum_err(a, b): if abs(a) < abs(b): a, b = b, a c = a + b; e = c - a # Standard Kahan # The line below is needed unless the arithmetic is # faithful-binary, properly-truncating or correctly-chopping g = c - e; h = g - a; f = b - h d = f - e # For Kahan replace f with b # The two lines below are needed unless the arithmetic # uses round-to-nearest or proper-truncation if d + e != f: c, d = a, b return c, d The functions above can exactly sum any fixed precision floating point type of any radix (including decimal) under any sensible rounding mode including all the rounding modes in the decimal module. The problem with it though is that Decimals are almost but not quite a fixed precision type: it is possible to create Decimals whose precision exceeds that of the arithmetic context (as in the examples I showed above). If the instance is precision does not exceed twice the arithmetic context then we can decompose the decimal into two numbers each of which has a precision within the current context e.g.: def expand_two(d): if d == +d: # Does d equal itself after rounding return [d] else: return [+d, d-(+d)] >>> decimal.getcontext().prec=4 >>> d1 = decimal.Decimal('1234567') Decimal('1234567') >>> [+d1, d1-(+d1)] [Decimal('1.235E+6'), Decimal('-433')] However once we go to more than twice the context precision there's no duck-typey way to do it: We need to know the instance precision and the context precision and we're better off ripping out the internal Decimal representation than trying to use arithmetic: def expand_full(d): if d == +d: return [d] prec = decimal.getcontext().prec sign, digits, exponent = d.as_tuple() expansion = [] while digits: digits, lodigits = digits[:-prec], digits[-prec:] expansion.append(decimal.Decimal((sign, lodigits, exponent))) exponent += prec return expansion And that should do it. The above functions are jumping through the same kind of hoops that fsum does precisely because Decimals are a floating point type (not exact) based on the IEEE-854 "Standard for radix-independent *floating-point* arithmetic". Oscar From oscar.j.benjamin at gmail.com Thu Aug 29 15:19:45 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 29 Aug 2013 14:19:45 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: <521EABBE.10203@pearwood.info> Message-ID: On 29 August 2013 09:13, Draic Kin wrote: > On Thu, Aug 29, 2013 at 4:02 AM, Steven D'Aprano wrote: >> On 28/08/13 20:48, Draic Kin wrote: >> >>> and issubclass(Rational, Real) -> False. It's more about exact vs. >>> non-exact computations which is orthogonal to number hierarchy. >> >> >> The numeric tower is precisely about the numeric hierarchy of Number > >> Complex > Real > Rational > Integral, and since they are all *abstract* base >> classes, exact and inexact doesn't come into it. Concrete classes can be >> inexact or exact, or one could implement separate Exact and Inexact towers. >> >> In practice, it's hard to think of a concrete way to implement exact real >> numbers. Maybe a symbolic maths application like Mathematica comes close? Yes, Mathematica and more pertinently sympy implement exact real numbers. All fixed- or floating-point number formats are restricted to representing rational numbers. The obvious examples of exact irrational numbers are things like pi, e, sqrt(2) etc. Sympy can represent these exactly and guarantee that sqrt(n)**2 is exactly n for any integer n. >>> Maybe there >>> should be some ExactNumber abstract base class and some convention that >>> exact shouldn't coerce with non-exact. So Decimal + float should raise an >>> exception even if both would be subclasses of Real (and Decimal even of >>> Rational). Or maybe it would be enough if there were just non-exact >>> variants of Real and Complex since non-exactness if just issue of them. >> >> Personally, I would implement inexact/exact as an attribute on the type: >> >> if type(x).exact: ... >> >> sort of thing. Exactness is more complicated than that. Whether or not operation(a, b) is exact depends on: 1) the operation 2) the types of a AND b 3) the values of a and b For example if both operands are ints then results are exact for addition, subtraction, multiplication, and sometimes for division. Exponentiation is exact where the exponent is a non-negative integer but not if the exponent is negative. Fractions are exact for division also (except by 0) and for exponentiation where the exponent is any integer but not if the exponent is a non-integer valued Fraction (even if exact results are possible). int(num) is not exact unless num is an integer. Fraction(num) is always exact (or an error). The only senses in which Decimals are exact are that Decimal(str, Decimal(int) and Decimal(float) are exact and - if you set the Inexact trap - you can get an error any time something inexact would have otherwise occurred. (Well you could say that decimals are "exactly rounded" but that's not what we mean by exact here). > There were some points (possible against the idea of issubclass(Decimal, > Real) -> True) like Decimal doesn't coerce to float, Thinking about it now I would rather that float/Decimal operations coerce to Decimal and a FloatOperation error to be raised if set (by someone who doesn't want to mix Decimals and floats). > you cannot convert > Fraction to Decimal easily, you cannot sum Fraction + Decimal. These are just missing features IMO. Presumably if Decimal had been integrated into the numbers hierarchy these would have been added. > Maybe some of > the rationale for the behavior is the matter of exact vs. non-exact. Decimals are not exact! > So for > that reason the exactness rationale could be made explicit by adding some > indicator of exacness and base some coercion cases on this indicator. The coercion may be exact but the subsequent arithmetic operations are not. Preventing mixed Fraction/Decimal arithmetic does not save you rounding error: >>> d = decimal.Decimal('.1234') >>> f = fractions.Fraction('1/3') >>> d Decimal('0.1234') >>> f Fraction(1, 3) >>> d * f Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for *: 'decimal.Decimal' and 'Fraction' Well it's a good thing that type coercion saved us from being able to get rounding errors. Hang on... >>> d / 3 Decimal('0.04113333333333333333333333333') That's the exact same rounding error we would have got with the Fraction! Any Decimal whose precision exceeds the current context precision will not have exact arithmetic operations (for any operation): >>> d = decimal.Decimal(11**30) >>> (d/3)*3 == d False Oscar From drekin at gmail.com Thu Aug 29 16:06:34 2013 From: drekin at gmail.com (Draic Kin) Date: Thu, 29 Aug 2013 16:06:34 +0200 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: <521EABBE.10203@pearwood.info> Message-ID: On Thu, Aug 29, 2013 at 3:19 PM, Oscar Benjamin wrote: > On 29 August 2013 09:13, Draic Kin wrote: > > On Thu, Aug 29, 2013 at 4:02 AM, Steven D'Aprano > wrote: > >> On 28/08/13 20:48, Draic Kin wrote: > >> > >>> and issubclass(Rational, Real) -> False. It's more about exact vs. > >>> non-exact computations which is orthogonal to number hierarchy. > >> > >> > >> The numeric tower is precisely about the numeric hierarchy of Number > > >> Complex > Real > Rational > Integral, and since they are all *abstract* > base > >> classes, exact and inexact doesn't come into it. Concrete classes can be > >> inexact or exact, or one could implement separate Exact and Inexact > towers. > >> > >> In practice, it's hard to think of a concrete way to implement exact > real > >> numbers. Maybe a symbolic maths application like Mathematica comes > close? > > Yes, Mathematica and more pertinently sympy implement exact real > numbers. All fixed- or floating-point number formats are restricted to > representing rational numbers. The obvious examples of exact > irrational numbers are things like pi, e, sqrt(2) etc. Sympy can > represent these exactly and guarantee that sqrt(n)**2 is exactly n for > any integer n. > > >>> Maybe there > >>> should be some ExactNumber abstract base class and some convention that > >>> exact shouldn't coerce with non-exact. So Decimal + float should raise > an > >>> exception even if both would be subclasses of Real (and Decimal even of > >>> Rational). Or maybe it would be enough if there were just non-exact > >>> variants of Real and Complex since non-exactness if just issue of them. > >> > >> Personally, I would implement inexact/exact as an attribute on the type: > >> > >> if type(x).exact: ... > >> > >> sort of thing. > > Exactness is more complicated than that. Whether or not operation(a, > b) is exact depends on: > 1) the operation > 2) the types of a AND b > 3) the values of a and b > > For example if both operands are ints then results are exact for > addition, subtraction, multiplication, and sometimes for division. > Exponentiation is exact where the exponent is a non-negative integer > but not if the exponent is negative. Fractions are exact for division > also (except by 0) and for exponentiation where the exponent is any > integer but not if the exponent is a non-integer valued Fraction (even > if exact results are possible). int(num) is not exact unless num is an > integer. Fraction(num) is always exact (or an error). The only senses > in which Decimals are exact are that Decimal(str, Decimal(int) and > Decimal(float) are exact and - if you set the Inexact trap - you can > get an error any time something inexact would have otherwise occurred. > (Well you could say that decimals are "exactly rounded" but that's not > what we mean by exact here). > > > There were some points (possible against the idea of issubclass(Decimal, > > Real) -> True) like Decimal doesn't coerce to float, > > Thinking about it now I would rather that float/Decimal operations > coerce to Decimal and a FloatOperation error to be raised if set (by > someone who doesn't want to mix Decimals and floats). > > > you cannot convert > > Fraction to Decimal easily, you cannot sum Fraction + Decimal. > > These are just missing features IMO. Presumably if Decimal had been > integrated into the numbers hierarchy these would have been added. > > To answer your original question, according to comments in sources of decimal.py and numbers.py, Decimal shouldn't be subclass of Real since it doesn't interoperate with floats and different subclasses of Real should interoperate. From my point of view, if floats weren't more common that decimals, one could turn the same argument around: Decimal subclasses Real but float doesn't since it doesn't interoperate with Decimal. Maybe they should interoperate and as you pointed out, Decimal is more robust in handling errors so maybe float + Decimal should yield Decimal. Then Decimal could be integrated to the number hierarchy. Maybe there is still this problem: what would Decimal + complex return? > Maybe some of > > the rationale for the behavior is the matter of exact vs. non-exact. > > Decimals are not exact! > Got it now :-). I was thinking of Decimals as if they had always MAX_PREC set. > > > So for > > that reason the exactness rationale could be made explicit by adding some > > indicator of exacness and base some coercion cases on this indicator. > > The coercion may be exact but the subsequent arithmetic operations are > not. Preventing mixed Fraction/Decimal arithmetic does not save you > rounding error: > > >>> d = decimal.Decimal('.1234') > >>> f = fractions.Fraction('1/3') > >>> d > Decimal('0.1234') > >>> f > Fraction(1, 3) > >>> d * f > Traceback (most recent call last): > File "", line 1, in > TypeError: unsupported operand type(s) for *: 'decimal.Decimal' and > 'Fraction' > > Well it's a good thing that type coercion saved us from being able to > get rounding errors. Hang on... > > >>> d / 3 > Decimal('0.04113333333333333333333333333') > > That's the exact same rounding error we would have got with the Fraction! > > Any Decimal whose precision exceeds the current context precision will > not have exact arithmetic operations (for any operation): > > >>> d = decimal.Decimal(11**30) > >>> (d/3)*3 == d > False > > > Oscar > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Thu Aug 29 16:43:26 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 29 Aug 2013 15:43:26 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: <521EABBE.10203@pearwood.info> Message-ID: On 29 August 2013 15:06, Draic Kin wrote: > > To answer your original question, according to comments in sources of > decimal.py and numbers.py, Decimal shouldn't be subclass of Real since it > doesn't interoperate with floats and different subclasses of Real should > interoperate. From my point of view, if floats weren't more common that > decimals, one could turn the same argument around: Decimal subclasses Real > but float doesn't since it doesn't interoperate with Decimal. Maybe they > should interoperate and as you pointed out, Decimal is more robust in > handling errors so maybe float + Decimal should yield Decimal. Then Decimal > could be integrated to the number hierarchy. At least the explicit Decimal(Fraction) should work (setting the Inexact flag as necessary). Decimal * Fraction etc. should also work. It's not trivial to ensure that e.g. Decimal*Fraction gives a correctly rounded Decimal result. > Maybe there is still this problem: what would Decimal + complex return? I guess it would have to be a complex. It would be good if there were a ComplexDecimal but it's not so important as getting the real numbers right. Oscar From stephen at xemacs.org Thu Aug 29 17:36:21 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 30 Aug 2013 00:36:21 +0900 Subject: [Python-ideas] Optional keepsep argument in str.split() In-Reply-To: <521F207B.3060903@btinternet.com> References: <521E44A4.1060804@oa-cagliari.inaf.it> <521E47FE.6040809@mrabarnett.plus.com> <521E5211.8070407@oa-cagliari.inaf.it> <521F207B.3060903@btinternet.com> Message-ID: <87li3kcxvu.fsf@uwakimon.sk.tsukuba.ac.jp> Rob Cliffe writes: > > On 28/08/2013 20:40, Marco Buttu wrote: > > > It should be attached to the previous part, exactly as my example > > > If there is a leading separator in your original string, you will have > to decide whether to keep it prefixed to the first element of your split > list. No, he wants it affixed to the null first element, not prefixed to the first non-null element. That's not a problem with his proposal. The problem with his proposal is that it's quite incoherent. The semantics of 'separator' is precisely that it doesn't belong to the preceding element nor to the following element, but rather is an emergent property of the juxtaposition of *two* items (either of which might be null!) The C semicolon that he uses as an example is syntactically not a separator, it's a terminator. That's precisely why he wants it affixed! Also, his "use case" isn't really one. "Nobody" really wants "a;b;c;" to become ["a;", "b;", "c;"] (consider s/a/if var/), and they "certainly" don't want "a; b; c;" to become ["a;", " b;", " c;"]. Finally, if you *do* for some reason (despite the absolute confidence that I know better than you that I display above, I'm probably wrong :-), re.find_all("[^;]*;", "a;b;c;") does exactly what you want. -1 on keepsep in str.split(). Steve From masklinn at masklinn.net Thu Aug 29 18:19:57 2013 From: masklinn at masklinn.net (Masklinn) Date: Thu, 29 Aug 2013 18:19:57 +0200 Subject: [Python-ideas] Optional keepsep argument in str.split() In-Reply-To: <521E6824.4040009@oa-cagliari.inaf.it> References: <521E44A4.1060804@oa-cagliari.inaf.it> <521E47FE.6040809@mrabarnett.plus.com> <521E6824.4040009@oa-cagliari.inaf.it> Message-ID: On 2013-08-28, at 23:14 , Marco Buttu wrote: > On 08/28/2013 09:36 PM, Masklinn wrote: >> and the "keeping" split can be handled via findall: >> >>>>> >>>re.findall(r'([^n]+(?:n|$))', "python3") >> ['python', '3'] > > Of course, but this is not built-in How is the re module not built-in? > and not obvious. Neither is the behavior you want out of "keepsep". From steve at pearwood.info Fri Aug 30 02:46:31 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 30 Aug 2013 10:46:31 +1000 Subject: [Python-ideas] Optional keepsep argument in str.split() In-Reply-To: References: <521E44A4.1060804@oa-cagliari.inaf.it> <521E47FE.6040809@mrabarnett.plus.com> <521E6824.4040009@oa-cagliari.inaf.it> Message-ID: <521FEB67.1010002@pearwood.info> On 30/08/13 02:19, Masklinn wrote: > On 2013-08-28, at 23:14 , Marco Buttu wrote: > >> On 08/28/2013 09:36 PM, Masklinn wrote: >>> and the "keeping" split can be handled via findall: >>> >>>>>>>>> re.findall(r'([^n]+(?:n|$))', "python3") >>> ['python', '3'] >> >> Of course, but this is not built-in > > How is the re module not built-in? py> import builtins # use __builtin__ in Python 2 py> 're' in vars(builtins) False Of course, not everything needs to be a builtin. -- Steven From dickinsm at gmail.com Fri Aug 30 16:54:21 2013 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 30 Aug 2013 15:54:21 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 8:02 AM, Nick Coghlan wrote: > > The Decimal Type > > > > After consultation with its authors it has been decided that the > > Decimal type should not at this time be made part of the numeric > > tower. > > """ > > > > What was the rationale for this decision and does it still apply? > > If I recall correctly, it was the fact that isinstance(d, Real) > implies isinstance(d, Complex), yet there's no way to do complex > arithmetic with Decimal real and imaginary components. > In the past, there's also been a general desire to keep the decimal module API closely focused on the specification, and to avoid adding too much functionality outside the spec; I suspect that that was the main motivation. I don't recall Complex entering the discussion, but it might well have done. It may be time to revisit the discussion. I'd like to see the Decimal type being more closely integrated with the rest of the language in the future (and adding the C version of Decimal was the first step along that path). -- Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From dickinsm at gmail.com Fri Aug 30 17:03:29 2013 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 30 Aug 2013 16:03:29 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: <521EABBE.10203@pearwood.info> References: <521EABBE.10203@pearwood.info> Message-ID: On Thu, Aug 29, 2013 at 3:02 AM, Steven D'Aprano wrote: > On 28/08/13 20:48, Draic Kin wrote: > >> For the same reason, I could think that isinstance(Decimal, Rational) -> >> True >> > > If Decimal were a subclass of Rational, so should float. The only > fundamental difference between the two is that one uses base 10 floating > point numbers and the other uses base 2. Exactly, yes! The "Decimal is a fixed-point type" and "Decimal is exact" myths seem to be unfortunately common. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Fri Aug 30 17:45:34 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 30 Aug 2013 16:45:34 +0100 Subject: [Python-ideas] isinstance(Decimal(), Real) -> False? In-Reply-To: References: <521EABBE.10203@pearwood.info> Message-ID: On 30 August 2013 16:03, Mark Dickinson wrote: > On Thu, Aug 29, 2013 at 3:02 AM, Steven D'Aprano > wrote: >> >> On 28/08/13 20:48, Draic Kin wrote: >>> >>> For the same reason, I could think that isinstance(Decimal, Rational) -> >>> True >> >> >> If Decimal were a subclass of Rational, so should float. The only >> fundamental difference between the two is that one uses base 10 floating >> point numbers and the other uses base 2. > > Exactly, yes! The "Decimal is a fixed-point type" and "Decimal is exact" > myths seem to be unfortunately common. I should say that this actually arises from the properties of the decimal module that are *not* from the standards. For example the idea that conversion to Decimal from string or integer is exact is documented in the decimal module but the standards do not require this. The only part that is standard is really that inexact conversions should trigger the Inexact exception. Similarly the IEEE-854 standard requires single precision, double precision and optionally two implementation dependent extended precisions. The Decimal Arithmetic Specification only explicitly requires a basic context with 9-digit precision. There's nowhere that says it should support absurdly high precision (not that that's a bad thing). Also about conversion from string to decimal the arithmetic specification says: ''' A numeric string to finite number conversion is always exact unless there is an underflow or overflow (see below) or the number of digits in the decimal-part of the string is greater than the precision in the context. In this latter case the coefficient will be rounded (shortened) to exactly precision digits, using the rounding algorithm, and the exponent is increased by the number of digits removed. The rounded and other flags may be set, as if an arithmetic operation had taken place (see below). ''' http://speleotrove.com/decimal/daconvs.html My interpretation of the above is that Decimal(str) should round according to the precision of the current context rather than return an exact result as the decimal module does. Conversion to/from binary float or fractions or arithmetic interoperation with other types are not defined in the standards. It seems that some people really don't want to mix Decimal and float (not that it does actually protect you from rounding error) which is reasonable. However the FloatOperation trap (which is not part of the standard) already exists for this case. Similarly for Fractions in the cases where mixing Fractions and Decimals would lead to an inexact result the Inexact trap exists for people who would want to control this. I think that many aspects of the current behaviour are useful but it should not be confused with being a strict interpretation of any standard. For example the exact int/float to Decimal conversion enables me to easily and efficiently compute the exact sum of any mix of ints, floats and Decimals: def decimalsum(iterable): '''Exact sum of Decimal/int/float mix; Result is *unrounded*''' # We need our own context and we can't just set it once because # the loop could be over a generator/iterator/coroutine ctx = getcontext().copy() ctx.traps[Inexact] = True one = Decimal(1) total = Decimal(0) for x in iterable: if not isinstance(x, Decimal): if isinstance(x, (int, float)): x = Decimal(x) else: raise TypeError # Handle NaN/Inf if not x.is_finite(): return sum(map(Decimal, iterable), x) # Increase the precision until we get an exact result. # Accepting Fractions could turn this into an infinite loop... while True: try: total = total.fma(one, x, ctx) break except Inexact: ctx.prec *= 2 # Using +total on the line below rounds to context. return total # Unrounded! But this really illustrates how you are supposed to get exact results with decimals which is by controlling the context precision and manipulating the Inexact trap. If conversion from int/float were not guaranteed to be exact I would just need to move the conversion above into the while loop. As long as you're doing any arithmetic exact conversion does not guarantee exact results (and if you're not doing arithmetic then what's the point of Decimal?). Oscar