From musicdenotation at gmail.com Sun Sep 1 10:31:54 2013 From: musicdenotation at gmail.com (Musical Notation) Date: Sun, 1 Sep 2013 15:31:54 +0700 Subject: [Python-ideas] Another indentation style Message-ID: <470AF9E3-9750-4B11-9BAF-257535DE245E@gmail.com> In Haskell, you can write: let x=1 y=2 In Python, why you can't write: if True: x=x+1 y=x ? From steve at pearwood.info Sun Sep 1 11:21:00 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 01 Sep 2013 19:21:00 +1000 Subject: [Python-ideas] Another indentation style In-Reply-To: <470AF9E3-9750-4B11-9BAF-257535DE245E@gmail.com> References: <470AF9E3-9750-4B11-9BAF-257535DE245E@gmail.com> Message-ID: <522306FC.3070202@pearwood.info> On 01/09/13 18:31, Musical Notation wrote: > In Haskell, you can write: > > let x=1 > y=2 > > In Python, why you can't write: > > if True: x=x+1 > y=x > > ? Because allowing that does not let you do anything different or new that you couldn't do before, it would not make code any clearer or more understandable, and it would decrease the readability of the code. The one-line per block form: if condition: do_this() avoids emphasizing the one-line block, and puts the focus on the `if`. As far as I am concerned, it is not much more than a convenience for the interactive interpreter. Dropping the `if` block onto a second line returns shares focus between the two equally: if condition: do_this() If Python allowed the form you want with multi-line blocks: if condition: do_this() do_that() do_something_else() the call to `do_this` would be lost, up there in the same line as the test. The block structure looks like this: ..............BLOCK ....BLOCK ....BLOCK instead of: ............. ....BLOCK ....BLOCK ....BLOCK and that hurts readability. Worse is the temptation to waste time trying to line everything up: if condition: do_this() do_that() do_something_else() if flag: do_this() do_that() do_something_else() if really_long_clause_in_a_boolean_context: do_this() do_that() do_something_else() which obscures the fact that all three `if` blocks are at the same indent level. Even worse: if condition: do_this() do_that() do_something_else() if flag: do_this() do_that() do_something_else() if really_long_clause_in_a_boolean_context: do_this() do_that() do_something_else() which is just abominable. Python doesn't prevent you from writing ugly code, but neither does it allow syntax which encourages you to write ugly code. -- Steven From rob.cliffe at btinternet.com Sun Sep 1 13:10:19 2013 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Sun, 01 Sep 2013 12:10:19 +0100 Subject: [Python-ideas] Another indentation style In-Reply-To: <522306FC.3070202@pearwood.info> References: <470AF9E3-9750-4B11-9BAF-257535DE245E@gmail.com> <522306FC.3070202@pearwood.info> Message-ID: <5223209B.9020806@btinternet.com> On 01/09/2013 10:21, Steven D'Aprano wrote: > On 01/09/13 18:31, Musical Notation wrote: >> In Haskell, you can write: >> >> let x=1 >> y=2 >> >> In Python, why you can't write: >> >> if True: x=x+1 >> y=x >> >> ? > > Because allowing that does not let you do anything different or new > that you couldn't do before, it would not make code any clearer or > more understandable, and it would decrease the readability of the code. > If I can be forgiven for slightly changing the subject (sorry, Musical Notation): Talking of unconventional ways of indenting: In Python, writing multiple code statements on a single line, especially with a semi-colin, appears to be a taboo roughly on a par with appearing naked in public. But I believe that there are times when it is the clearest way of writing code, viz. when it makes visually obvious a *pattern* in the code. Here is one example, not very different from some "real" code that I wrote: def test(condition, a, b): if condition=='equals' : return a==b if condition=='is greater than' : return a>b if condition=='contains' : return b in a if condition=='starts with' : return a.startswith(b) etc. And here is some real code that I wrote (not worth explaining in detail). I am sorry that it breaks another convention, having lines longer than 80 characters - this happens not be inconvenient for me, and was the best authentic *real* example I could find without spending a long time searching: assert PN[0].isalpha() ; FirstPart = PN[0] ; PN = PN[1:].lstrip(Seps) # Must be a letter if PN[0].isalpha() : FirstPart += PN[0] ; PN = PN[1:].lstrip(Seps) # May be a second letter assert PN[0].isdigit() ; FirstPart += PN[0] ; PN = PN[1:].lstrip(Seps) # Must be a digit if PN and PN[0].isalnum() : FirstPart += PN[0] ; PN = PN[1:] # May be a letter or digit (These examples look best with the colons/semicolons/equals signs/statements lined up vertically. They will probably look ragged in an e-mail. They should look as intended if they are cut and pasted into a (fixed-size font) editor.) Writing the code like this makes apparent: (1) There is a pattern to the code. (2) Where the pattern is not quite consistent. E.g. in my second example the first line contains "FirstPart =", the other lines contain "FirstPart +=". *Seeing* this is half-way to understanding it. (3) The conceptual separation of the whole chunk of code from what precedes and what follows it (which can be emphasised by putting a blank line before and after it). *None* of this would be so apparent if the code were written one statement per line. (Is 'statement' the correct technical term? Please correct me.) There is also a minor advantage to writing fewer lines of code - you can see more of the program in one screenfull at a time. (And: that you may find a smarter way of rewriting these specific examples is not really the point. In my younger days I might have written: if wkday==0: return 'Monday' if wkday==1: return 'Tuesday' etc. Nowadays I would probably write something like return { 0 : 'Monday', 1 : 'Tuesday' ... etc. }[wkday] And you may have an even better way. Again - not really the point. ) Best wishes, Rob Cliffe -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun Sep 1 15:21:39 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 1 Sep 2013 06:21:39 -0700 Subject: [Python-ideas] Another indentation style In-Reply-To: <5223209B.9020806@btinternet.com> References: <470AF9E3-9750-4B11-9BAF-257535DE245E@gmail.com> <522306FC.3070202@pearwood.info> <5223209B.9020806@btinternet.com> Message-ID: <25F9189B-567A-44E6-91B2-9DD5868FAE96@yahoo.com> On Sep 1, 2013, at 4:10, Rob Cliffe wrote: > def test(condition, a, b): > if condition=='equals' : return a==b > if condition=='is greater than' : return a>b > if condition=='contains' : return b in a > if condition=='starts with' : return a.startswith(b) This isn't _terrible_... But it argues against the original proposal even further, because it's an example of a one-line if statement that's explicitly set off in a way that's unmistakable, because the statements can't be continued. That being said, I think it would be much better to write: _ops = { 'equals': eq, 'is greater than': gt, ... } def test(condition, a, b): return _ops[condition](a, b) ... Or maybe the equivalent with methods instead of functions from operator. Or, if some of the conditions don't already have functions with nice names to map to (e.g. "within 1%" mapping to .99*b <= a <= 1.01*b) maybe even inline lambdas. Besides reducing all the boilerplate repetition, having the mapping in data instead of code gives you the flexibility to do all kinds of things that would otherwise be impossible--register new conditions dynamically, inspect the conditions, reuse them in another function without a parallel chain of if statements, etc. From jon at jon-foster.co.uk Mon Sep 2 01:28:05 2013 From: jon at jon-foster.co.uk (Jon Foster) Date: Mon, 02 Sep 2013 00:28:05 +0100 Subject: [Python-ideas] ipaddress: Interface inheriting from Address In-Reply-To: <521CA73D.4010703@trueblade.com> References: <52154558.4080102@jon-foster.co.uk> <521CA73D.4010703@trueblade.com> Message-ID: <5223CD85.7080100@jon-foster.co.uk> Hi all, After looking at ipaddress some more, I've got a few patches to suggest. I've pushed them all to a Mercurial repository which you can see here: https://bitbucket.org/jonfoster/python-ipaddress/commits/all?page=3 The "Interface not inheriting from Address" patch is: https://bitbucket.org/jonfoster/python-ipaddress/commits/146d1ffa832fd0b72696a57c806995e6c53601a3 It depends on some refactoring, which I did in a separate commit: https://bitbucket.org/jonfoster/python-ipaddress/commits/af480dbe385f65da3cf6b20d85a31854bf233772 There are a few other ipaddress patches in that repository, too. I'd appreciate any feedback you have on these. Kind regards, Jon P.S. I have just signed a Contributor Agreement. On 27/08/2013 14:18, Eric V. Smith wrote: > On 08/21/2013 06:55 PM, Jon Foster wrote: >> Hi all, >> >> I'd like to propose changing ipaddress.IPv[46]Interface to not inherit >> from IPv[46]Address. > > > > I agree that it's odd that an [x]Interface would inherit from an > [x]Address. I think it should be a has-a relationship, as you describe > with the "ip" property. > >> If there is interest in this idea, I'll try to put together a patch next >> week. > > I'd review the patch. > > Eric. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From steve at pearwood.info Mon Sep 2 02:30:25 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 02 Sep 2013 10:30:25 +1000 Subject: [Python-ideas] Another indentation style In-Reply-To: References: <470AF9E3-9750-4B11-9BAF-257535DE245E@gmail.com> <522306FC.3070202@pearwood.info> Message-ID: <5223DC21.6010206@pearwood.info> On 01/09/13 21:49, Musical Notation wrote: > View my original proposal in a fixed-width font and you will understand it. Your assumption that I didn't use a fixed-width font is wrong. I *always* use fixed-width fonts for email. And don't imagine that the only reason I could disagree with your proposal is that I don't understand it. I understand your proposal very well, and still disagree. > That indentation style is quite idiomatic in Haskell. Irrelevant. Python does not have a "let name=value" statement, so the Haskell "let" idiom does not apply. There is an enormous difference between the leading fixed-width "let" and variable-width "if" clauses: let x = value y = name let extremely_long_name_that_goes_on_and_on = value name = value let foo = value bar = value versus: if f: do_this() do_that() if long_condition: do_this() do_that() elif flag: do_this() do_that() -- Steven From steve at pearwood.info Mon Sep 2 03:28:53 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 02 Sep 2013 11:28:53 +1000 Subject: [Python-ideas] Another indentation style In-Reply-To: <5223209B.9020806@btinternet.com> References: <470AF9E3-9750-4B11-9BAF-257535DE245E@gmail.com> <522306FC.3070202@pearwood.info> <5223209B.9020806@btinternet.com> Message-ID: <5223E9D5.4040502@pearwood.info> On 01/09/13 21:10, Rob Cliffe wrote: > In Python, writing multiple code statements on a single line, especially with a semi-colin, appears to be a taboo roughly on a par with appearing naked in public. But I believe that there are times when it is the clearest way of writing code, viz. when it makes visually obvious a *pattern* in the code. > Here is one example, not very different from some "real" code that I wrote: > > def test(condition, a, b): > if condition=='equals' : return a==b > if condition=='is greater than' : return a>b > if condition=='contains' : return b in a > if condition=='starts with' : return a.startswith(b) > etc. I believe that the pattern is *more* readily apparent written the conventional way: def test(condition, a, b): if condition=='equals': return a==b if condition=='is greater than': return a>b if condition=='contains': return b in a if condition=='starts with': return a.startswith(b) You can run your eye down indent level 2 and see "condition return condition return condition return", which avoids needing to scan left-to-right (which is a significant slowdown when skimming code), and the distraction of that great big river of whitespace running down the middle of the block. Another issue is the time spent deleting and inserting spaces in the middle of the lines to keep the return statements lined up after edits. With no clear benefit, that's just unproductive make-work. (But good if you're paid by the hour and your boss doesn't cotton on to what you are doing *wink*) Later in your post, you write: > And you may have an even better way. > Again - not really the point. But that precisely is the point! If the layout of code is obscuring the ways it can be simplified, generalized or refactored, then the layout is *actively* harmful. Now, maybe you have a reason for preferring a long list of if...return statements instead of the more usual idiom of a dict lookup. I don't understand your code to comment on that. But it is obvious to me that splitting the if...return over two lines certainly doesn't hurt the ability to visualise the pattern in the code, and probably helps make it even more clear. > And here is some real code that I wrote (not worth explaining in detail). I am sorry that it breaks another convention, having lines longer than 80 characters - this happens not be inconvenient for me, and was the best authentic *real* example I could find without spending a long time searching: > > assert PN[0].isalpha() ; FirstPart = PN[0] ; PN = PN[1:].lstrip(Seps) # Must be a letter > if PN[0].isalpha() : FirstPart += PN[0] ; PN = PN[1:].lstrip(Seps) # May be a second letter That second line uses a layout that I wish was a SyntaxError, because it is ambiguous whether if cond: statementA; statementB should be grouped as follows (using braces as visual aids): if cond: { statementA; statementB } or like this: { if cond: statementA } statementB Such a shame that it is allowed. I can just count myself fortunate that I've never seen it before in the wild. -- Steven From stephen at xemacs.org Mon Sep 2 05:39:33 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 02 Sep 2013 12:39:33 +0900 Subject: [Python-ideas] Another indentation style In-Reply-To: <5223E9D5.4040502@pearwood.info> References: <470AF9E3-9750-4B11-9BAF-257535DE245E@gmail.com> <522306FC.3070202@pearwood.info> <5223209B.9020806@btinternet.com> <5223E9D5.4040502@pearwood.info> Message-ID: <8738pndh8q.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > def test(condition, a, b): > if condition=='equals': > return a==b > if condition=='is greater than': > return a>b > if condition=='contains': > return b in a > if condition=='starts with': > return a.startswith(b) > > You can run your eye down indent level 2 and see "condition return > condition return condition return", which avoids needing to scan > left-to-right No, you can't, FVO "you" == "me". In fact, I can only read about 20 characters without moving my eyes, and your format encourages my eyes to zigzag. I find it *much* easier to scan the OP's format for a particular condition or a particular return expression. > Another issue is the time spent deleting and inserting spaces in > the middle of the lines to keep the return statements lined up > after edits. "Obviously" you're not an Emacs user (or other editor with powerful on-the-fly scripting capability). It takes about 10 minutes to write enough Lisp to maintain that table with one keystroke, including detecting the beginning and end of the table, the column widths, and so on. Not everybody uses their editor that way, but people who make such suites in tabular form *should* -- time is not an issue here. *Despite* the above, I don't like the OP's format. With Andrew Barnert's suggestion of a dictionary, I get all the above benefits, with *less* detritus (no "if", "condition==", "return"; jus' the facs, ma'am) in the tabular format, conformity to common practice ("intuitive" is just an alternative spelling of "familiar"), and it's monkey-patchable if I need to add a new condition at runtime. > > And here is some real code that I wrote (not worth explaining in > > detail). > > > > assert PN[0].isalpha() ; FirstPart = PN[0] ; PN = PN[1:].lstrip(Seps) # Must be a letter > > if PN[0].isalpha() : FirstPart += PN[0] ; PN = PN[1:].lstrip(Seps) # May be a second letter > > That second line uses a layout that I wish was a SyntaxError, > because it is ambiguous whether > > if cond: statementA; statementB > > should be grouped as follows (using braces as visual aids): > > if cond: > { statementA; statementB } > > or like this: > > { if cond: statementA } > statementB I don't have a problem with it from this point of view, because the clear intent is that the semicolons separate the statements of the suite controlled by the if, and that's what they do. I do have a problem with the fact that 'assert' does not introduce a suite. "Syntax must not look like grit on Tim's screen!" Besides, it would screw up my Lisp. From rob.cliffe at btinternet.com Tue Sep 3 13:47:49 2013 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Tue, 03 Sep 2013 12:47:49 +0100 Subject: [Python-ideas] Another indentation style In-Reply-To: <1378159467.10776.YahooMailNeo@web184703.mail.ne1.yahoo.com> References: <470AF9E3-9750-4B11-9BAF-257535DE245E@gmail.com> <522306FC.3070202@pearwood.info> <5223209B.9020806@btinternet.com> <25F9189B-567A-44E6-91B2-9DD5868FAE96@yahoo.com> <522367A1.1010804@btinternet.com> <4F21A705-17D0-42C9-81FB-24E362272696@yahoo.com> <52247D6B.5030900@btinternet.com> <3137AD20-5BA2-4034-91EE-DF687EF5A215@yahoo.com> <5224AEE3.9020309@btinternet.com> <1378159467.10776.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: <5225CC65.5070100@btinternet.com> On 02/09/2013 23:04, Andrew Barnert wrote: > From:Rob Cliffe > Sent:Monday, September 2, 2013 8:29 AM >> You seem to have replied just to me, not to the list, which is a pity, because it means this reply is going just to you. > Your last message was just to me, so I didn't think you wanted it going to the list. Sorry, my mistake, I'm attempting to copy this to the list now. > >>>> What do you think is easier and quicker to write (or indeed, to understand, if you're not a Python expert)? Say at 1am when you're trying to meet a deadline and keeping awake with coffee? >>> I can write either one easily. >> OK, you can. (How many years have you been programming in Python?) That does not mean that everybody can. (I can too but maybe not as easily as you. But I might write it in my simple way at first, to get it working, then come back later and smarten it up.) > Novices looking for help on Stack Overflow can create dictionaries. My coworker who just learned Python over the last two months can. This is in the tutorial, it's not some deep magic for experts only. > >>> The question is which one I can write without making a stupid mistake that I'll have to debug six weeks later. Less repetition and less visual noise means fewer places to make a mistake, and easier to spot it when you do. > >> But the very repetition (and vertical alignment) mean that many of the possible mistakes stand out like a sore thumb. The human brain is good at seeing patterns. > Repetitive code is exactly where people make the most mistakes. And if you've ever debugged any serious project, it really can be hard to notice that you used ssock instead of csock in one of eight near-identical blocks of code. If you just write one block of code instead of eight, it's impossible to make that mistake in the first place. But if the csock s are vertically aligned, the mistake stands out. >> My version (boring, repetitive, unimaginative ... but simple and straightforward), > or something involving dictionaries, lambda functions, and having to look up the docs (I didn't actually know about __contains__ or operator.contains) ? >>> The whole point of programming is using your imagination to eliminate boring, repetitive, and simple tasks. > Er, no. No. The point of programming varies. And "good" and "bad" code do not exist in isolation, context matters too (commercial environment? academic?). > > No, it really doesn't. Except for learning and language-research purposes, when you write a program, it's to accomplish something that otherwise you or another person would have to do manually, which would be tedious, or difficult, or error-prone. You could go through a spreadsheet and count up the number of unique users (column 3) in each state (column 2), but it would take days, and you'd make dozens of mistakes, and you'd be miserable. Or you could write a program in a few minutes or hours. > >>>> Obviously this is a judgement call, but I know what my view is. > As a touchstone: my version could be rewritten in just about any other language with minimal effort, just because it is so simple and uses no "tricks". How many languages could yours be rewritten in as quickly (assuming you're not already an expert in the target language) ? >>> Putting functions into a dictionary is not some advanced "trick", it's a basic idiom. It's used in the tutorial, the FAQ, the stdlib, etc. >> Sure it is (in Python). But if I wanted to translate this code into another language (especially in a hurry), I would (as I said before) need minimal knowledge of that language to translate my boring version. (LISP?) True but is does add one level of abstraction, and one's brain has to go through that level to understand the code. The more levels of abstraction, the less intuitive and harder to understand code becomes (look at Twisted, possibly the ultimate example). > Why would you need to translate it into another language in a hurry? Who knows? We live in an unpredictable world; we adapt to it or die. If I could predict all the things my manager asks me to do ... well, I guess I wouldn't need a manager. > >> We have got a bit bogged down discussing this particular example > (partly my fault). I was simply trying to make the point that there > may be circumstances when it is OK, even a good thing, to put 2, 3 > or (heaven forbid) 4 statements on a single line, and to illustrate > the point with a couple of examples. That (you find that) the > examples are less than perfect does not, in itself, mean that my > point is entirely wrong. > > I said right at the beginning that your code isn't terrible; you're the one who insisted that other people will say it is. > > And there are certainly examples where writing two statements on a line makes sense. I have code like this: > > if stop: break > > x += a; y += b > > So I agree with you that sometimes putting multiple statements on a line is a good thing. Good. I also quite often (not always) write 1-line if-suites like that (as in my first example). I'm glad we agree on something. > But doing it so you can avoid using one of Python's fundamental features because you're afraid you might have to translate the code to a language you barely know is not a good reason to do it. We'll have to disagree on that. I think sometimes it might be. But my primary motive was to write the code so that it was simple to write and simple to understand. Another (invented) example occurred to me (before I saw yours above): Version 1: x1 += 1 x2 += 1 x3 += 1 y1 += 1 y2 += 1 y2 += 1 z1 += 1 z2 += 1 z3 += 1 Version 2 (Rob's version): x1 += 1 ; x2 += 1 ; x3 += 1 y1 += 1 ; y2 += 1 ; y2 += 1 z1 += 1 ; z2 += 1 ; z3 += 1 In which version is it easier to grasp what this code does? In which version is it easier to spot the deliberate mistake? (And please don't rubbish the example by saying it should have been written differently in the first place. Circumstances *do* alter cases; I could be making a minor alteration to a huge inherited program which it is not practicable or necessary to rewrite.) Rob Cliffe -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Wed Sep 11 18:05:22 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 11 Sep 2013 19:05:22 +0300 Subject: [Python-ideas] AST Hash Message-ID: Hi, We need a checksum for code pieces. The goal of the checksum is to reliably detect pieces of code with absolutely identical behaviour. Borders of such checksum can be functions, classes, modules,. Practical application for such checksums are: - detecting usage of recipes and examples across PyPI packages - detecting usage of standard stdlib calls - creating execution safe serialization formats for data - choosing class to deserialize data fields of the object based on its hash - enable consistent validation and testing of results across various AST tools There can be two approaches to build such checksum: 1. Code Section Hash 2. AST Hash Code Section Hash is built from a substring of a source code, cut on function or class boundaries. This hash is flaky - whitespace and comment differences ruin it, even when behaviour (and bytecode) stays the same. It is possible to reduce the effect of whitespace and comment changes by normalizing the substring - dedenting, reindenting with 4 spaces, stripping empty lines, comments and trailing whitespace. And it still will be unreliable and affected by whitespace changes in the middle of the string. Therefore a 2nd way of hashing is more preferable. AST Hash is build on AST. This excludes any comments, whitespace etc. and makes the hash strict and reliable. This is a canonical Default AST Hash. There are cases when Default AST Hash may not be enough for comparison. For example, if local variables are renamed, or docstrings changed, the behaviour of a function may not change, but its AST hash will. In these cases additional normalization rules apply. Such as changing all local variable names to var1, var2, ... in order of appearance, stripping docstrings etc. Every set of such normalization rules should have a name. This will also be the name of resulting custom AST Hash. Explicit naming of AST Hashes and hardlinking of names to rules that are used to build them will settle common ground (base) for AST tools interoperability and research papers. As such, it most likely require a separate PEP. -- anatoly t. From techtonik at gmail.com Wed Sep 11 18:54:00 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 11 Sep 2013 19:54:00 +0300 Subject: [Python-ideas] Python Sound API (Was: Cross Platform Python Sound Module/Library) Message-ID: On Sat, Apr 27, 2013 at 11:59 PM, M.-A. Lemburg wrote: > On 27.04.2013 22:19, anatoly techtonik wrote: >> On Sat, Apr 27, 2013 at 1:51 PM, Antoine Pitrou wrote: >> >>> On Fri, 26 Apr 2013 20:39:30 -0700 >>> Andrew Barnert wrote: >>>> On Apr 26, 2013, at 16:59, Greg Ewing >>> wrote: >>>> >>>>> Oleg Broytman wrote: >>>>>> Are there cross-platform audio libraries that Python could wrap? >>>>> >>>>> There's OpenAL: >>>>> >>>>> http://connect.creativelabs.com/openal/default.aspx >>>>> >>>> There's actually a bunch of options. >>>> >>>> The hard question is picking one and endorsing it as "right", or at least >>>> "good enough to enshrine in stdlib ala tkinter". >>> >>> When you notice how "good enough" tkinter is (and has been for 10 >>> years at least), you realize the trap hidden in this question. >>> >>> Really, see my message earlier in this thread. This is better left to >>> third-party libraries (which already exist, please do some research). >>> >> >>>From the other side if 80% of cases can be covered without Python packaging >> problems - that's already an advantage. For example most people find date / >> time functionality in Python enough to avoid using mxDateTime as a >> dependency. As for audio, most people find it insufficient. > > I'm not sure whether 3D audio support is really needed as core > feature in a general purpose programming language ;-) > > I'd suggest to have a look at http://www.libsdl.org/, which can > be used from Python via http://pygame.org/ 3D audio support is not the basic common cross-platform base layer. Many devices that run Python are mono at all. What stdlib should concentrate on are basic audio operations needed by people with accent on pure Python implementation of everything that not represents system layer. Safe sound synthesis and sound output in canonical format. If people need advanced algorithms and operations, they are free to use SDL2, OpenAL, FFmpeg and other libs that are inherently insecure due to amount of low level C code. Audio even on Android devices doesn't require any advanced privileges. It is a basic need for many programs and attraction for many creative people who may use Python as an auxiliarry language in their works. >From stdlib I also expected abstract scheduling and buffering algorithms with explanations and documentation with pictures. I expect super simple API for all basic cases. Ability to use this API on any platform. I'd expect Audio API to be multi-level. Level 1: Beep - make audio signal to attract attention with the most basic level OS provides Level 2: Customize audio signal to attract attention (query signals, chose, beep) Level 3: Play pre-rendered waveform (such as WAV file) with the most basic OS level (default format) Level 4: Play continuous pre-rendered stream Level 5: Mix pre-renderered streams and waveforms Level 6: Synthesize sound in pure Python Level 7: Synthesize sound indirectly (using GPU, MIDI interfaces, external libs, ...) Level 8: Audio device control - formats, channels, volumes - everything hardware specific -- anatoly t. From amauryfa at gmail.com Wed Sep 11 19:05:52 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 11 Sep 2013 19:05:52 +0200 Subject: [Python-ideas] AST Hash In-Reply-To: References: Message-ID: 2013/9/11 anatoly techtonik > Hi, > > We need a checksum for code pieces. The goal of the checksum is to > reliably detect pieces of code with absolutely identical behaviour. > Borders of such checksum can be functions, classes, modules,. > This looks like a nice project; I think this should first take the form of an external package. I'm sure there are many details to iron before this kind of technique can be widely adopted. For example: - Is there only one kind of hash? you suggested to erase the differences in variable names, are there other possible customizations? - To detect common patterns, is it interesting to hash and index all the nodes of an AST tree? - Is there a central repository to store hashes of recipes? Is Google Search enough? I don't need answers, only a reference implementation that people can discuss! Good luck, > Practical application for such checksums are: > > - detecting usage of recipes and examples across PyPI packages > - detecting usage of standard stdlib calls > - creating execution safe serialization formats for data > - choosing class to deserialize data fields of the object based on its > hash > - enable consistent validation and testing of results across various AST > tools > > There can be two approaches to build such checksum: > 1. Code Section Hash > 2. AST Hash > > Code Section Hash is built from a substring of a source code, cut on > function or class boundaries. This hash is flaky - whitespace and > comment differences ruin it, even when behaviour (and bytecode) stays > the same. It is possible to reduce the effect of whitespace and > comment changes by normalizing the substring - dedenting, reindenting > with 4 spaces, stripping empty lines, comments and trailing > whitespace. And it still will be unreliable and affected by whitespace > changes in the middle of the string. Therefore a 2nd way of hashing is > more preferable. > > AST Hash is build on AST. This excludes any comments, whitespace etc. > and makes the hash strict and reliable. This is a canonical Default > AST Hash. > > There are cases when Default AST Hash may not be enough for > comparison. For example, if local variables are renamed, or docstrings > changed, the behaviour of a function may not change, but its AST hash > will. In these cases additional normalization rules apply. Such as > changing all local variable names to var1, var2, ... in order of > appearance, stripping docstrings etc. Every set of such normalization > rules should have a name. This will also be the name of resulting > custom AST Hash. > > Explicit naming of AST Hashes and hardlinking of names to rules that > are used to build them will settle common ground (base) for AST tools > interoperability and research papers. As such, it most likely require > a separate PEP. > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From kn0m0n3 at gmail.com Wed Sep 11 20:49:48 2013 From: kn0m0n3 at gmail.com (Jason Bursey) Date: Wed, 11 Sep 2013 13:49:48 -0500 Subject: [Python-ideas] AST Hash In-Reply-To: References: Message-ID: ????? ????? ????? ????? ????? ??????????? ??? ???? ???? On Wed, Sep 11, 2013 at 11:05 AM, anatoly techtonik wrote: > Hi, > > We need a checksum for code pieces. The goal of the checksum is to > reliably detect pieces of code with absolutely identical behaviour. > Borders of such checksum can be functions, classes, modules,. > Practical application for such checksums are: > > - detecting usage of recipes and examples across PyPI packages > - detecting usage of standard stdlib calls > - creating execution safe serialization formats for data > - choosing class to deserialize data fields of the object based on its > hash > - enable consistent validation and testing of results across various AST > tools > > There can be two approaches to build such checksum: > 1. Code Section Hash > 2. AST Hash > > Code Section Hash is built from a substring of a source code, cut on > function or class boundaries. This hash is flaky - whitespace and > comment differences ruin it, even when behaviour (and bytecode) stays > the same. It is possible to reduce the effect of whitespace and > comment changes by normalizing the substring - dedenting, reindenting > with 4 spaces, stripping empty lines, comments and trailing > whitespace. And it still will be unreliable and affected by whitespace > changes in the middle of the string. Therefore a 2nd way of hashing is > more preferable. > > AST Hash is build on AST. This excludes any comments, whitespace etc. > and makes the hash strict and reliable. This is a canonical Default > AST Hash. > > There are cases when Default AST Hash may not be enough for > comparison. For example, if local variables are renamed, or docstrings > changed, the behaviour of a function may not change, but its AST hash > will. In these cases additional normalization rules apply. Such as > changing all local variable names to var1, var2, ... in order of > appearance, stripping docstrings etc. Every set of such normalization > rules should have a name. This will also be the name of resulting > custom AST Hash. > > Explicit naming of AST Hashes and hardlinking of names to rules that > are used to build them will settle common ground (base) for AST tools > interoperability and research papers. As such, it most likely require > a separate PEP. > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Wed Sep 11 22:53:39 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 11 Sep 2013 22:53:39 +0200 Subject: [Python-ideas] AST Hash In-Reply-To: References: Message-ID: <5230D853.6090108@egenix.com> On 11.09.2013 18:05, anatoly techtonik wrote: > Hi, > > We need a checksum for code pieces. The goal of the checksum is to > reliably detect pieces of code with absolutely identical behaviour. > Borders of such checksum can be functions, classes, modules,. > Practical application for such checksums are: > > - detecting usage of recipes and examples across PyPI packages > - detecting usage of standard stdlib calls > - creating execution safe serialization formats for data > - choosing class to deserialize data fields of the object based on its hash > - enable consistent validation and testing of results across various AST tools > > There can be two approaches to build such checksum: > 1. Code Section Hash > 2. AST Hash > > Code Section Hash is built from a substring of a source code, cut on > function or class boundaries. This hash is flaky - whitespace and > comment differences ruin it, even when behaviour (and bytecode) stays > the same. It is possible to reduce the effect of whitespace and > comment changes by normalizing the substring - dedenting, reindenting > with 4 spaces, stripping empty lines, comments and trailing > whitespace. And it still will be unreliable and affected by whitespace > changes in the middle of the string. Therefore a 2nd way of hashing is > more preferable. > > AST Hash is build on AST. This excludes any comments, whitespace etc. > and makes the hash strict and reliable. This is a canonical Default > AST Hash. > > There are cases when Default AST Hash may not be enough for > comparison. For example, if local variables are renamed, or docstrings > changed, the behaviour of a function may not change, but its AST hash > will. In these cases additional normalization rules apply. Such as > changing all local variable names to var1, var2, ... in order of > appearance, stripping docstrings etc. Every set of such normalization > rules should have a name. This will also be the name of resulting > custom AST Hash. > > Explicit naming of AST Hashes and hardlinking of names to rules that > are used to build them will settle common ground (base) for AST tools > interoperability and research papers. As such, it most likely require > a separate PEP. You might want to have a look at this paper which discussed AST compression (for Java, but the ideas apply to Python just as well): http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.135.5917&rep=rep1&type=pdf If you compress the AST into a string and take its hash, you should pretty much have what you want. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 11 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-04: Released eGenix pyOpenSSL 0.13.2 ... http://egenix.com/go48 2013-09-20: PyCon UK 2013, Coventry, UK ... 9 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tjreedy at udel.edu Thu Sep 12 00:07:10 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 11 Sep 2013 18:07:10 -0400 Subject: [Python-ideas] AST Hash In-Reply-To: <5230D853.6090108@egenix.com> References: <5230D853.6090108@egenix.com> Message-ID: On 9/11/2013 4:53 PM, M.-A. Lemburg wrote: > You might want to have a look at this paper which discussed > AST compression (for Java, but the ideas apply to Python just > as well): > > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.135.5917&rep=rep1&type=pdf The prototype implementation is written in Python! (p.3, right). -- Terry Jan Reedy From mistersheik at gmail.com Thu Sep 12 00:18:01 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 11 Sep 2013 15:18:01 -0700 (PDT) Subject: [Python-ideas] Replace option set/get methods through the standard library with a ChainMap; add a context manager to ChainMap Message-ID: <8a0c3260-7c85-46ba-9dae-102e7fceb8f4@googlegroups.com> With numpy print options, for example, the usual pattern is to save some of the print options, set some of them, and then restore the old options. Why not expose the options as a ChainMap called numpy.printoptions? ChainMap could then expose a context manager that pushes a new dictionary on entry and pops it on exit via, say, child_context that accepts a dictionary. Now, instead of: saved_precision = np.get_printoptions()['precision'] np.set_printoptions(precision=23) do_something() np.set_printoptions(precision=saved_precision) You can do the same with a context manager, which I think is stylistically better (as it's impossible to forget to reset the option, and no explicit temporary invades the local variables): with np.printoptions.child_context({'precision', 23}): do_something() Best, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed Sep 11 23:44:11 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 11 Sep 2013 14:44:11 -0700 (PDT) Subject: [Python-ideas] Idea: Compressing the stack on the fly In-Reply-To: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> References: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> Message-ID: <34b72f2c-94bf-4452-9ec3-54336e6ed562@googlegroups.com> You can just use a memoize decorator to automatically convert your recursive solution to a fast linear time one. Search for "python memoization decorator". Best, Neil On Sunday, May 26, 2013 8:00:13 AM UTC-4, Ram Rachum wrote: > > Hi everybody, > > Here's an idea I had a while ago. Now, I'm an ignoramus when it comes to > how programming languages are implemented, so this idea will most likely be > either (a) completely impossible or (b) trivial knowledge. > > I was thinking about the implementation of the factorial in Python. I was > comparing in my mind 2 different solutions: The recursive one, and the one > that uses a loop. Here are example implementations for them: > > def factorial_recursive(n): > if n == 1: > return 1 > return n * factorial_recursive(n - 1) > > def factorial_loop(n): > result = 1 > for i in range(1, n + 1): > result *= i > return result > > > I know that the recursive one is problematic, because it's putting a lot > of items on the stack. In fact it's using the stack as if it was a loop > variable. The stack wasn't meant to be used like that. > > Then the question came to me, why? Maybe the stack could be built to > handle this kind of (ab)use? > > I read about tail-call optimization on Wikipedia. If I understand > correctly, the gist of it is that the interpreter tries to recognize, on a > frame-by-frame basis, which frames could be completely eliminated, and then > it eliminates those. Then I read Guido's blog post explaining why he > doesn't want it in Python. In that post he outlined 4 different reasons why > TCO shouldn't be implemented in Python. > > But then I thought, maybe you could do something smarter than eliminating > individual stack frames. Maybe we could create something that is to the > current implementation of the stack what `xrange` is to the old-style > `range`. A smart object that allows access to any of a long list of items > in it, without actually having to store those items. This would solve the > first argument that Guido raises in his post, which I found to be the most > substantial one. > > What I'm saying is: Imagine the stack of the interpreter when it runs the > factorial example above for n=1000. It has around 1000 items in it and it's > just about to explode. But then, if you'd look at the contents of that > stack, you'd see it's embarrassingly regular, a compression algorithm's > wet dream. It's just the same code location over and over again, with a > different value for `n`. > > So what I'm suggesting is an algorithm to compress that stack on the fly. > An algorithm that would detect regularities in the stack and instead of > saving each individual frame, save just the pattern. Then, there wouldn't > be any problem with showing informative stack trace: Despite not storing > every individual frame, each individual frame could still be *accessed*, > similarly to how `xrange` allow access to each individual member without > having to store each of them. > > Then, the stack could store a lot more items, and tasks that currently > require recursion (like pickling using the standard library) will be able > to handle much deeper recursions. > > What do you think? > > > Ram. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Sep 12 00:07:45 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 11 Sep 2013 15:07:45 -0700 (PDT) Subject: [Python-ideas] Replace numpy get_printoptions/set_printoptions, and similar patterns with a ChainMap; add a context manager to ChainMap In-Reply-To: <00094cfb-73b9-422c-b9d0-50e290813819@googlegroups.com> References: <00094cfb-73b9-422c-b9d0-50e290813819@googlegroups.com> Message-ID: Just to be clear, my proposal is to replace all such get/set options patterns throughout Python's standard library. On Wednesday, September 11, 2013 5:13:55 PM UTC-4, Neil Girdhar wrote: > > With numpy, the usual pattern is to get_printoptions, set some of them, > and then restore the old options. Why not expose the options in a ChainMap > as numpy.printoptions? ChainMap could then expose a context manager that > pushes a new dictionary on entry and pops it on exit via, say, > child_context that accepts a dictionary. Now, instead of: > > saved_precision = np.get_printoptions()['precision'] > np.set_printoptions(precision=23) > do_something() > np.set_printoptions(precision=saved_precision) > > You can do the same with a context manager, which I think is stylistically > better (as it's impossible to forget to reset the option, and no explicit > temporary invades the local variables): > > with np.printoptions.child_context({'precision', 23}): > do_something() > > Best, > > Neil > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Sep 12 00:15:49 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 11 Sep 2013 15:15:49 -0700 (PDT) Subject: [Python-ideas] Idea: Compressing the stack on the fly In-Reply-To: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> References: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> Message-ID: You can just use a memoize decorator to automatically convert your recursive solution to a fast linear time one. Search for "python memoization decorator". This would make a much broader range of recursive solution run in linear time. Best, Neil On Sunday, May 26, 2013 8:00:13 AM UTC-4, Ram Rachum wrote: > > Hi everybody, > > Here's an idea I had a while ago. Now, I'm an ignoramus when it comes to > how programming languages are implemented, so this idea will most likely be > either (a) completely impossible or (b) trivial knowledge. > > I was thinking about the implementation of the factorial in Python. I was > comparing in my mind 2 different solutions: The recursive one, and the one > that uses a loop. Here are example implementations for them: > > def factorial_recursive(n): > if n == 1: > return 1 > return n * factorial_recursive(n - 1) > > def factorial_loop(n): > result = 1 > for i in range(1, n + 1): > result *= i > return result > > > I know that the recursive one is problematic, because it's putting a lot > of items on the stack. In fact it's using the stack as if it was a loop > variable. The stack wasn't meant to be used like that. > > Then the question came to me, why? Maybe the stack could be built to > handle this kind of (ab)use? > > I read about tail-call optimization on Wikipedia. If I understand > correctly, the gist of it is that the interpreter tries to recognize, on a > frame-by-frame basis, which frames could be completely eliminated, and then > it eliminates those. Then I read Guido's blog post explaining why he > doesn't want it in Python. In that post he outlined 4 different reasons why > TCO shouldn't be implemented in Python. > > But then I thought, maybe you could do something smarter than eliminating > individual stack frames. Maybe we could create something that is to the > current implementation of the stack what `xrange` is to the old-style > `range`. A smart object that allows access to any of a long list of items > in it, without actually having to store those items. This would solve the > first argument that Guido raises in his post, which I found to be the most > substantial one. > > What I'm saying is: Imagine the stack of the interpreter when it runs the > factorial example above for n=1000. It has around 1000 items in it and it's > just about to explode. But then, if you'd look at the contents of that > stack, you'd see it's embarrassingly regular, a compression algorithm's > wet dream. It's just the same code location over and over again, with a > different value for `n`. > > So what I'm suggesting is an algorithm to compress that stack on the fly. > An algorithm that would detect regularities in the stack and instead of > saving each individual frame, save just the pattern. Then, there wouldn't > be any problem with showing informative stack trace: Despite not storing > every individual frame, each individual frame could still be *accessed*, > similarly to how `xrange` allow access to each individual member without > having to store each of them. > > Then, the stack could store a lot more items, and tasks that currently > require recursion (like pickling using the standard library) will be able > to handle much deeper recursions. > > What do you think? > > > Ram. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anikom15 at gmail.com Thu Sep 12 06:12:43 2013 From: anikom15 at gmail.com (=?UTF-8?Q?Westley_Mart=C3=ADnez?=) Date: Wed, 11 Sep 2013 21:12:43 -0700 Subject: [Python-ideas] FW: Idea: Compressing the stack on the fly References: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> Message-ID: <002a01ceaf6e$55629330$0027b990$@gmail.com> -----Original Message----- From: Westley Mart?nez [mailto:anikom15 at gmail.com] Sent: Wednesday, September 11, 2013 9:03 PM To: 'Ram Rachum'; 'python-ideas at googlegroups.com' Cc: 'Ram Rachum' Subject: RE: [Python-ideas] Idea: Compressing the stack on the fly > -----Original Message----- > From: Python-ideas [mailto:python-ideas- > bounces+anikom15=gmail.com at python.org] On Behalf Of Ram Rachum > Sent: Sunday, May 26, 2013 5:00 AM > To: python-ideas at googlegroups.com > Cc: Ram Rachum > Subject: [Python-ideas] Idea: Compressing the stack on the fly > > So what I'm suggesting is an algorithm to compress that stack on the > fly. An algorithm that would detect regularities in the stack and > instead of saving each individual frame, save just the pattern. Then, > there wouldn't be any problem with showing informative stack trace: > Despite not storing every individual frame, each individual frame > could still be accessed, similarly to how `xrange` allow access to > each individual member without having to store each of them. > > > Then, the stack could store a lot more items, and tasks that currently > require recursion (like pickling using the standard library) will be > able to handle much deeper recursions. > > > What do you think? I think this is an interesting idea. It sounds possible, but the question is whether or not it can be efficiently done with Python. I'd heed Guido's advice in first implementing this. It could probably be done effectively with a compiled language like C, but I'd imagine it'd be too difficult for Python. The other question is usability. What would this actually be used for. I'm not a fan of recursion. I think anything that uses recursion could be restructured into something simpler. A lot of people find recursion to be elegant. For me it just hurts my brain. From joshua at landau.ws Thu Sep 12 06:29:12 2013 From: joshua at landau.ws (Joshua Landau) Date: Thu, 12 Sep 2013 05:29:12 +0100 Subject: [Python-ideas] FW: Idea: Compressing the stack on the fly In-Reply-To: <002a01ceaf6e$55629330$0027b990$@gmail.com> References: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> <002a01ceaf6e$55629330$0027b990$@gmail.com> Message-ID: Does anyone actually write recursive Python code where the recursion in a significant bottleneck? The only such code I can think of is either for a tree, in which case stack depth is irrelevant, or bad code. Why would anyone care, basically? From clay.sweetser at gmail.com Thu Sep 12 06:46:42 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Thu, 12 Sep 2013 00:46:42 -0400 Subject: [Python-ideas] FW: Idea: Compressing the stack on the fly In-Reply-To: References: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> <002a01ceaf6e$55629330$0027b990$@gmail.com> Message-ID: This sounds like something that the PyPy team might be interested in. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Sep 12 07:12:23 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 11 Sep 2013 22:12:23 -0700 (PDT) Subject: [Python-ideas] AST Hash In-Reply-To: References: Message-ID: <1378962743.33918.YahooMailNeo@web184705.mail.ne1.yahoo.com> From: anatoly techtonik Sent: Wednesday, September 11, 2013 9:05 AM > We need a checksum for code pieces. The goal of the checksum is to > reliably detect pieces of code with absolutely identical behaviour. > Borders of such checksum can be functions, classes, modules,. Cool idea. But why not also fragments? A single expression, statement, or suite could be useful in many contexts without having to artificially wrap it in a function, couldn't it? > Practical application for such checksums are: > > - detecting usage of recipes and examples across PyPI packages > - detecting usage of standard stdlib calls > - creating execution safe serialization formats for data > ? - choosing class to deserialize data fields of the object based on its hash > - enable consistent validation and testing of results across various AST tools > > There can be two approaches to build such checksum: > 1. Code Section Hash > 2. AST Hash I'm not sure either of these is right. You want to treat two functions as equal if they have renamed locals: ? ? def f1(): ? ? ? ? i = 0 ? ? ? ? return i ? ? def f2(): ? ? ? ? j = 0 ? ? ? ? return j But I'm guessing you _don't_ want to treat them as equal if they reference different globals: ? ? i, j = 0, 1 ? ? def f1(): ? ? ? ? return i ? ? def f2(): ? ? ? ? return j If it's not obvious why you want those two to be different, consider this identical case: ? ? def f1(): ? ? ? ? return os.open('foo') ? ? def f2(): ? ? ? ? return zipfile.open('foo') But the difference in the ASTs in this case looks identical to the difference in the local-renaming case (every Name node with id i/os has it changed to j/zipfile).?Unless you implement the exact same logic as the compiler to distinguish local and global names, there's no way to allow local renaming but not global renaming. And it gets even worse if you consider closures; two functions could have identical ASTs but different meanings, and even applying the compiler's name-distinguishing logic doesn't help, unless you also look at the context around the definition. But there's an obvious answer here that takes care of all of this: Just has the compiled code objects. I'm not sure _exactly_ which attributes you want, but something like co_code, co_flags,?co_consts, co_names, co_freevars, and maybe the ones related to the parameters. I don't think you need anything from the function, class, or module that owns the code. (Yes, two functions with identical __code__ (including co_names and co_freevars) but different __globals__ or __closure__ will act differently? but I don't think there's any reasonable rule you could apply except either (a) ignore them, or (b) raise an exception if f.__globals__ != globals() or f.__closure__. Besides, the functions will also act differently if you just change the values of global variables. So I think just ignore them.) Anyway, all of these attributes are easy to hash: one's a bytes, one's a fixed-size int, and the rest are tuples of strings. There are some obvious downsides to this, but I don't think any of them are too serious.?For example, you can't hash anything that doesn't both parse and compile, while you can build an AST from code that just parses. But practically, there aren't too many good examples of things that parse but won't compile; if you really want to be able to hash invalid code, you really need to stick with source. But I'm sure someone will come up with something big and obvious that I'm missing. From abarnert at yahoo.com Thu Sep 12 07:15:15 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 11 Sep 2013 22:15:15 -0700 (PDT) Subject: [Python-ideas] AST Hash In-Reply-To: <1378962743.33918.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <1378962743.33918.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: <1378962915.56806.YahooMailNeo@web184706.mail.ne1.yahoo.com> From: Andrew Barnert Sent: Wednesday, September 11, 2013 10:12 PM > But there's an obvious answer here that takes care of all of this: Just has > the compiled code objects. ?? > There are some obvious downsides to this, but I don't think any of them are > too serious. ? > But I'm sure someone will come up with something big and obvious that > I'm missing. Actually, I just thought of one. Let's say mod.py looks like this: ? ? def f(): ? ? ? ? def g(): ? ? ? ? ? ? pass How do we get the code for the g function? With an AST, it's obvious?you may not get sufficient/correct context, but at least you can get to it. With a compiled module, whether we explicitly compile mod.py or import it or whatever? well, there's?a code object for g compiled in there, and it even knows that it's part of module fc, and from line 1 of fc.py, and so on? but you can't access it as fc.g, or anything else obvious, without picking through the compiled code format. From abarnert at yahoo.com Thu Sep 12 07:26:28 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 11 Sep 2013 22:26:28 -0700 (PDT) Subject: [Python-ideas] FW: Idea: Compressing the stack on the fly In-Reply-To: <002a01ceaf6e$55629330$0027b990$@gmail.com> References: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> <002a01ceaf6e$55629330$0027b990$@gmail.com> Message-ID: <1378963588.75466.YahooMailNeo@web184704.mail.ne1.yahoo.com> Why is everyone suddenly responding to a thread that died months ago? If anyone really wants to re-propose the idea, they should at least go back and look over the discussion that followed it. ----- Original Message ----- > From: Westley Mart?nez > To: python-ideas at python.org > Cc: > Sent: Wednesday, September 11, 2013 9:12 PM > Subject: [Python-ideas] FW: Idea: Compressing the stack on the fly > > > > -----Original Message----- > From: Westley Mart?nez [mailto:anikom15 at gmail.com] > Sent: Wednesday, September 11, 2013 9:03 PM > To: 'Ram Rachum'; 'python-ideas at googlegroups.com' > Cc: 'Ram Rachum' > Subject: RE: [Python-ideas] Idea: Compressing the stack on the fly > >> -----Original Message----- >> From: Python-ideas [mailto:python-ideas- >> bounces+anikom15=gmail.com at python.org] On Behalf Of Ram Rachum >> Sent: Sunday, May 26, 2013 5:00 AM >> To: python-ideas at googlegroups.com >> Cc: Ram Rachum >> Subject: [Python-ideas] Idea: Compressing the stack on the fly >> >> So what I'm suggesting is an algorithm to compress that stack on the >> fly. An algorithm that would detect regularities in the stack and >> instead of saving each individual frame, save just the pattern. Then, >> there wouldn't be any problem with showing informative stack trace: >> Despite not storing every individual frame, each individual frame >> could still be accessed, similarly to how `xrange` allow access to >> each individual member without having to store each of them. >> >> >> Then, the stack could store a lot more items, and tasks that currently >> require recursion (like pickling using the standard library) will be >> able to handle much deeper recursions. >> >> >> What do you think? > > I think this is an interesting idea.? It sounds possible, but the > question is whether or not it can be efficiently done with Python. > > I'd heed Guido's advice in first implementing this.? It could probably > be done effectively with a compiled language like C, but I'd imagine > it'd be too difficult for Python. > > The other question is usability.? What would this actually be used for. > I'm not a fan of recursion.? I think anything that uses recursion could > be restructured into something simpler.? A lot of people find recursion > to be elegant.? For me it just hurts my brain. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > From mal at egenix.com Thu Sep 12 08:59:40 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 12 Sep 2013 08:59:40 +0200 Subject: [Python-ideas] FW: Idea: Compressing the stack on the fly In-Reply-To: References: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> <002a01ceaf6e$55629330$0027b990$@gmail.com> Message-ID: <5231665C.207@egenix.com> On 12.09.2013 06:29, Joshua Landau wrote: > Does anyone actually write recursive Python code where the recursion > in a significant bottleneck? The only such code I can think of is > either for a tree, in which case stack depth is irrelevant, or bad > code. Any kind of backtracking algorithm will need recursion or a separate stack data structure to keep track of the various decisions made up to a certain point on the path. The C stack is rather limited in size, so a recursive parser can easily blow up if it uses the C stack alone for managing backtracking. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-04: Released eGenix pyOpenSSL 0.13.2 ... http://egenix.com/go48 2013-09-20: PyCon UK 2013, Coventry, UK ... 8 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From joshua at landau.ws Thu Sep 12 09:03:08 2013 From: joshua at landau.ws (Joshua Landau) Date: Thu, 12 Sep 2013 08:03:08 +0100 Subject: [Python-ideas] FW: Idea: Compressing the stack on the fly In-Reply-To: <5231665C.207@egenix.com> References: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> <002a01ceaf6e$55629330$0027b990$@gmail.com> <5231665C.207@egenix.com> Message-ID: On 12 September 2013 07:59, M.-A. Lemburg wrote: > On 12.09.2013 06:29, Joshua Landau wrote: >> Does anyone actually write recursive Python code where the recursion >> in a significant bottleneck? The only such code I can think of is >> either for a tree, in which case stack depth is irrelevant, or bad >> code. > > Any kind of backtracking algorithm will need recursion or a separate > stack data structure to keep track of the various decisions made > up to a certain point on the path. > > The C stack is rather limited in size, so a recursive parser can > easily blow up if it uses the C stack alone for managing > backtracking. What sort of algorithm would backtrack that many times? I doubt a parser would and I can't think of anything worse ATM. From rosuav at gmail.com Thu Sep 12 10:21:05 2013 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 12 Sep 2013 18:21:05 +1000 Subject: [Python-ideas] FW: Idea: Compressing the stack on the fly In-Reply-To: References: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> <002a01ceaf6e$55629330$0027b990$@gmail.com> <5231665C.207@egenix.com> Message-ID: On Thu, Sep 12, 2013 at 5:03 PM, Joshua Landau wrote: > On 12 September 2013 07:59, M.-A. Lemburg wrote: >> On 12.09.2013 06:29, Joshua Landau wrote: >>> Does anyone actually write recursive Python code where the recursion >>> in a significant bottleneck? The only such code I can think of is >>> either for a tree, in which case stack depth is irrelevant, or bad >>> code. >> >> Any kind of backtracking algorithm will need recursion or a separate >> stack data structure to keep track of the various decisions made >> up to a certain point on the path. >> >> The C stack is rather limited in size, so a recursive parser can >> easily blow up if it uses the C stack alone for managing >> backtracking. > > What sort of algorithm would backtrack that many times? I doubt a > parser would and I can't think of anything worse ATM. Solve chess. ChrisA From mal at egenix.com Thu Sep 12 10:34:42 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 12 Sep 2013 10:34:42 +0200 Subject: [Python-ideas] FW: Idea: Compressing the stack on the fly In-Reply-To: References: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> <002a01ceaf6e$55629330$0027b990$@gmail.com> <5231665C.207@egenix.com> Message-ID: <52317CA2.50206@egenix.com> On 12.09.2013 09:03, Joshua Landau wrote: > On 12 September 2013 07:59, M.-A. Lemburg wrote: >> On 12.09.2013 06:29, Joshua Landau wrote: >>> Does anyone actually write recursive Python code where the recursion >>> in a significant bottleneck? The only such code I can think of is >>> either for a tree, in which case stack depth is irrelevant, or bad >>> code. >> >> Any kind of backtracking algorithm will need recursion or a separate >> stack data structure to keep track of the various decisions made >> up to a certain point on the path. >> >> The C stack is rather limited in size, so a recursive parser can >> easily blow up if it uses the C stack alone for managing >> backtracking. > > What sort of algorithm would backtrack that many times? I doubt a > parser would and I can't think of anything worse ATM. Oh, that's easy. It just depends on the given data set that you're working on and how often you have to branch when working on it. http://en.wikipedia.org/wiki/Backtracking lists a few problems. Here's a regular expression example that would blow the stack, if the re module were still using it (it was fixed in 2003 to no longer do): re.match('(.*a|.*b|x)+', 'x' * 100000) The expression still uses exponential time, though. With Python 2.3, you see the stack limit error: Python 2.3.5 (#1, Aug 24 2011, 15:52:42) [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> re.match('(.*a|.*b|x)+', 'x' * 100000) Traceback (most recent call last): File "", line 1, in ? File "/usr/local/python-2.3-ucs2/lib/python2.3/sre.py", line 132, in match return _compile(pattern, flags).match(string) RuntimeError: maximum recursion limit exceeded -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-04: Released eGenix pyOpenSSL 0.13.2 ... http://egenix.com/go48 2013-09-20: PyCon UK 2013, Coventry, UK ... 8 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From joshua at landau.ws Thu Sep 12 11:11:00 2013 From: joshua at landau.ws (Joshua Landau) Date: Thu, 12 Sep 2013 10:11:00 +0100 Subject: [Python-ideas] FW: Idea: Compressing the stack on the fly In-Reply-To: References: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> <002a01ceaf6e$55629330$0027b990$@gmail.com> <5231665C.207@egenix.com> Message-ID: On 12 September 2013 09:21, Chris Angelico wrote: > On Thu, Sep 12, 2013 at 5:03 PM, Joshua Landau wrote: >> On 12 September 2013 07:59, M.-A. Lemburg wrote: >>> On 12.09.2013 06:29, Joshua Landau wrote: >>>> Does anyone actually write recursive Python code where the recursion >>>> in a significant bottleneck? The only such code I can think of is >>>> either for a tree, in which case stack depth is irrelevant, or bad >>>> code. >>> >>> Any kind of backtracking algorithm will need recursion or a separate >>> stack data structure to keep track of the various decisions made >>> up to a certain point on the path. >>> >>> The C stack is rather limited in size, so a recursive parser can >>> easily blow up if it uses the C stack alone for managing >>> backtracking. >> >> What sort of algorithm would backtrack that many times? I doubt a >> parser would and I can't think of anything worse ATM. > > Solve chess. If you're managing to simulate more than 1000 moves ahead either you're doing depth first or you've got a *blisteringly* fast computer. From joshua at landau.ws Thu Sep 12 11:18:48 2013 From: joshua at landau.ws (Joshua Landau) Date: Thu, 12 Sep 2013 10:18:48 +0100 Subject: [Python-ideas] FW: Idea: Compressing the stack on the fly In-Reply-To: <52317CA2.50206@egenix.com> References: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> <002a01ceaf6e$55629330$0027b990$@gmail.com> <5231665C.207@egenix.com> <52317CA2.50206@egenix.com> Message-ID: On 12 September 2013 09:34, M.-A. Lemburg wrote: > On 12.09.2013 09:03, Joshua Landau wrote: >> On 12 September 2013 07:59, M.-A. Lemburg wrote: >>> On 12.09.2013 06:29, Joshua Landau wrote: >>>> Does anyone actually write recursive Python code where the recursion >>>> in a significant bottleneck? The only such code I can think of is >>>> either for a tree, in which case stack depth is irrelevant, or bad >>>> code. >>> >>> Any kind of backtracking algorithm will need recursion or a separate >>> stack data structure to keep track of the various decisions made >>> up to a certain point on the path. >>> >>> The C stack is rather limited in size, so a recursive parser can >>> easily blow up if it uses the C stack alone for managing >>> backtracking. >> >> What sort of algorithm would backtrack that many times? I doubt a >> parser would and I can't think of anything worse ATM. > > Oh, that's easy. It just depends on the given data set that you're > working on and how often you have to branch when working on it. > > http://en.wikipedia.org/wiki/Backtracking lists a few problems. > > Here's a regular expression example that would blow the stack, > if the re module were still using it (it was fixed in 2003 to > no longer do): > > re.match('(.*a|.*b|x)+', 'x' * 100000) > > The expression still uses exponential time, though. Ah, Regex. 'Could'a guessed. I'll file that under "bad code". *wink* From oscar.j.benjamin at gmail.com Thu Sep 12 11:57:18 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 12 Sep 2013 10:57:18 +0100 Subject: [Python-ideas] FW: Idea: Compressing the stack on the fly In-Reply-To: References: <3daafc97-115a-4525-907c-fdbf356fd749@googlegroups.com> <002a01ceaf6e$55629330$0027b990$@gmail.com> Message-ID: On 12 September 2013 05:29, Joshua Landau wrote: > Does anyone actually write recursive Python code where the recursion > in a significant bottleneck? The only such code I can think of is > either for a tree, in which case stack depth is irrelevant, or bad > code. > > Why would anyone care, basically? I think you're asking this question the wrong way. Recursion isn't a bottleneck that slows down your program. When you hit the recursion limit your program just blows up. Since Python doesn't have the optimisations that make any particular kind of recursion scale well people do not generally use it unless they know that the depth is small enough. Currently code that is susceptible to hitting the recursion limit is "bad code" because it depends on optimisations that don't exist. However, if the optimisations did exist then people could choose to take advantage of them. As an example, I once implemented Tarjan's algorithm in Python using the recursive form shown here: http://en.wikipedia.org/wiki/Tarjan's_strongly_connected_components_algorithm#The_algorithm_in_pseudocode After implementing it and confirming that it worked I immediately found that it hit the recursion limit in my real problem. So I reimplemented it without the recursion. Had there been optimisations that would have made the reimplementation unnecessary I would have happily stuck with the first form since it was easier to understand than the explicit stack of iterators version that I ended up with. For the same reasons you won't see much code out there where recursion is a bottleneck unless, as you say, it is "bad code". Oscar From oscar.j.benjamin at gmail.com Thu Sep 12 12:04:29 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 12 Sep 2013 11:04:29 +0100 Subject: [Python-ideas] Replace option set/get methods through the standard library with a ChainMap; add a context manager to ChainMap In-Reply-To: <8a0c3260-7c85-46ba-9dae-102e7fceb8f4@googlegroups.com> References: <8a0c3260-7c85-46ba-9dae-102e7fceb8f4@googlegroups.com> Message-ID: On 11 September 2013 23:18, Neil Girdhar wrote: > > With numpy print options, for example, the usual pattern is to save some of > the print options, set some of them, and then restore the old options. Why > not expose the options as a ChainMap called numpy.printoptions? ChainMap > could then expose a context manager that pushes a new dictionary on entry > and pops it on exit via, say, child_context that accepts a dictionary. Now, > instead of: > > saved_precision = np.get_printoptions()['precision'] > np.set_printoptions(precision=23) > do_something() > np.set_printoptions(precision=saved_precision) > > You can do the same with a context manager, which I think is stylistically > better (as it's impossible to forget to reset the option, and no explicit > temporary invades the local variables): > > with np.printoptions.child_context({'precision', 23}): > do_something() You can write this yourself If you like (untested): from contextlib import contextmanager @contextmanager def print_options(**opts): oldopts = np.get_print_options() newopts = oldopts.copy() newopts.update(opts) try: np.set_print_options(**newopts) yield finally: np.set_print_options(**oldopts) with print_options(precision=23): do_something() Generally speaking numpy doesn't use context managers much. You may be right that it should use them more but this isn't the right place to make that suggestion since numpy is not part of core Python or of the standard library. I suggest that you ask this on the scipy-users mailing list. Oscar From oscar.j.benjamin at gmail.com Thu Sep 12 12:12:27 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 12 Sep 2013 11:12:27 +0100 Subject: [Python-ideas] Replace option set/get methods through the standard library with a ChainMap; add a context manager to ChainMap In-Reply-To: References: <8a0c3260-7c85-46ba-9dae-102e7fceb8f4@googlegroups.com> Message-ID: On 12 September 2013 11:06, Neil Girdhar wrote: > > Exactly. I just assumed when I wrote my comment that this was a general > problem in the standard library, but as was pointed out to me, it seems to > be only a problem with numpy. My suggestion is for printoptions to be > implemented as a ChainMap to facilitate things on the numpy end of things as > well. Your suggestion doesn't seem unreasonable to me. However you're asking on the wrong mailing list: http://www.scipy.org/scipylib/mailing-lists.html Oscar From mistersheik at gmail.com Thu Sep 12 12:14:54 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 12 Sep 2013 06:14:54 -0400 Subject: [Python-ideas] Replace option set/get methods through the standard library with a ChainMap; add a context manager to ChainMap In-Reply-To: References: <8a0c3260-7c85-46ba-9dae-102e7fceb8f4@googlegroups.com> Message-ID: Thank you. I will ask there about adding numpy context managers. However, the extra member function to ChainMap to use it as a context manager would be a question for this mailing list, right? Best, Neil On Thu, Sep 12, 2013 at 6:12 AM, Oscar Benjamin wrote: > On -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Thu Sep 12 12:24:05 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 12 Sep 2013 11:24:05 +0100 Subject: [Python-ideas] Replace option set/get methods through the standard library with a ChainMap; add a context manager to ChainMap In-Reply-To: References: <8a0c3260-7c85-46ba-9dae-102e7fceb8f4@googlegroups.com> Message-ID: On 12 September 2013 11:14, Neil Girdhar wrote: > > Thank you. I will ask there about adding numpy context managers. However, > the extra member function to ChainMap to use it as a context manager would > be a question for this mailing list, right? Perhaps you could spell out that part of the idea in more detail then. Why in particular would it need to be a ChainMap and not a regular dict? Does the method return a new ChainMap instance? What would be seen by other code that holds references to the same ChainMap? Oscar From g.rodola at gmail.com Thu Sep 12 20:59:46 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Thu, 12 Sep 2013 20:59:46 +0200 Subject: [Python-ideas] multiprocessing and physical CPU cores count Message-ID: This is a follow up of a feature request which recently appeared on psutil bug tracker: https://code.google.com/p/psutil/issues/detail?id=427 I don't know whether the proposal makes sense for psutil per-se but it certainly made me think about multiprocessing.cpu_count() and the fact that it currently returns the number of virtual CPUs (physical + logical). Given that multiple processes cannot take any advantage of hyper threading technology then maybe it makes sense for multiprocessing to expose a physical_cpu_count() function in order to preemptively figure out how many processes to spawn. Same thing is discussed here: https://groups.google.com/forum/#!msg/nzpug/_5sFW9BEMQ4/Y4laXRNlXkMJ Thoughts? --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Sep 12 21:10:04 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 12 Sep 2013 21:10:04 +0200 Subject: [Python-ideas] multiprocessing and physical CPU cores count References: Message-ID: <20130912211004.74746942@fsol> On Thu, 12 Sep 2013 20:59:46 +0200 "Giampaolo Rodola'" wrote: > This is a follow up of a feature request which recently appeared on psutil > bug tracker: > https://code.google.com/p/psutil/issues/detail?id=427 > > I don't know whether the proposal makes sense for psutil per-se but it > certainly made me think about multiprocessing.cpu_count() and the fact that > it currently returns the number of virtual CPUs (physical + logical). > > Given that multiple processes cannot take any advantage of hyper threading > technology Of course they can. The CPU doesn't distinguish between different kinds of "threads", they can either belong to the same process or to different ones. Regards Antoine. From g.rodola at gmail.com Thu Sep 12 21:26:07 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Thu, 12 Sep 2013 21:26:07 +0200 Subject: [Python-ideas] multiprocessing and physical CPU cores count In-Reply-To: <20130912211004.74746942@fsol> References: <20130912211004.74746942@fsol> Message-ID: On Thu, Sep 12, 2013 at 9:10 PM, Antoine Pitrou wrote: > On Thu, 12 Sep 2013 20:59:46 +0200 > "Giampaolo Rodola'" > wrote: > > This is a follow up of a feature request which recently appeared on > psutil > > bug tracker: > > https://code.google.com/p/psutil/issues/detail?id=427 > > > > I don't know whether the proposal makes sense for psutil per-se but it > > certainly made me think about multiprocessing.cpu_count() and the fact > that > > it currently returns the number of virtual CPUs (physical + logical). > > > > Given that multiple processes cannot take any advantage of hyper > threading > > technology > > Of course they can. The CPU doesn't distinguish between different > kinds of "threads", they can either belong to the same process or to > different ones. Of course you're right, I'm sorry. I should have phrased my statement more carefully before sending the email. Then the question is whether having physical CPU cores count can be useful. --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shibturn at gmail.com Thu Sep 12 21:27:41 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Thu, 12 Sep 2013 20:27:41 +0100 Subject: [Python-ideas] multiprocessing and physical CPU cores count In-Reply-To: References: Message-ID: On 12/09/2013 7:59pm, Giampaolo Rodola' wrote: > Given that multiple processes cannot take any advantage of hyper > threading technology then maybe it makes sense for multiprocessing to > expose a physical_cpu_count() function in order to preemptively figure > out how many processes to spawn. Do you have a reference? Wikipedia may not be reliable, but it seems to think otherwise: Hyper-threading works by duplicating certain sections of the processor? those that store the architectural state? but not duplicating the main execution resources. This allows a hyper-threading processor to appear as the usual "physical" processor and an extra "logical" processor to the host operating system (HTT-unaware operating systems see two "physical" processors), allowing the operating system to schedule two threads or processes simultaneously and appropriately. ^^^^^^^^^ -- Richard From solipsis at pitrou.net Thu Sep 12 21:32:51 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 12 Sep 2013 21:32:51 +0200 Subject: [Python-ideas] multiprocessing and physical CPU cores count References: <20130912211004.74746942@fsol> Message-ID: <20130912213251.3d12b714@fsol> On Thu, 12 Sep 2013 21:26:07 +0200 "Giampaolo Rodola'" wrote: > On Thu, Sep 12, 2013 at 9:10 PM, Antoine Pitrou wrote: > > > On Thu, 12 Sep 2013 20:59:46 +0200 > > "Giampaolo Rodola'" > > wrote: > > > This is a follow up of a feature request which recently appeared on > > psutil > > > bug tracker: > > > https://code.google.com/p/psutil/issues/detail?id=427 > > > > > > I don't know whether the proposal makes sense for psutil per-se but it > > > certainly made me think about multiprocessing.cpu_count() and the fact > > that > > > it currently returns the number of virtual CPUs (physical + logical). > > > > > > Given that multiple processes cannot take any advantage of hyper > > threading > > > technology > > > > Of course they can. The CPU doesn't distinguish between different > > kinds of "threads", they can either belong to the same process or to > > different ones. > > > Of course you're right, I'm sorry. I should have phrased my statement more > carefully before sending the email. > Then the question is whether having physical CPU cores count can be useful. I suppose it doesn't hurt :-) I don't think it belongs specifically in multiprocessing, though. Perhaps in the platform module? (unless you want to contribute psutil to the stdlib?) Regards Antoine. From g.rodola at gmail.com Thu Sep 12 21:35:13 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Thu, 12 Sep 2013 21:35:13 +0200 Subject: [Python-ideas] multiprocessing and physical CPU cores count In-Reply-To: References: Message-ID: On Thu, Sep 12, 2013 at 9:27 PM, Richard Oudkerk wrote: > On 12/09/2013 7:59pm, Giampaolo Rodola' wrote: > >> Given that multiple processes cannot take any advantage of hyper >> threading technology then maybe it makes sense for multiprocessing to >> expose a physical_cpu_count() function in order to preemptively figure >> out how many processes to spawn. >> > > Do you have a reference? Wikipedia may not be reliable, but it seems to > think otherwise: > > Hyper-threading works by duplicating certain sections of the processor? > those that store the architectural state? but not duplicating the main > execution resources. This allows a hyper-threading processor to appear > as the usual "physical" processor and an extra "logical" processor to > the host operating system (HTT-unaware operating systems see two > "physical" processors), allowing the operating system to schedule two > threads or processes simultaneously and appropriately. > ^^^^^^^^^ > No, I was wrong. Please ignore that statement. I got confused by the name "hyper-threading" and erroneously thought it only affected threads. =) --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckaynor at zindagigames.com Thu Sep 12 21:20:08 2013 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Thu, 12 Sep 2013 12:20:08 -0700 Subject: [Python-ideas] multiprocessing and physical CPU cores count In-Reply-To: <20130912211004.74746942@fsol> References: <20130912211004.74746942@fsol> Message-ID: On Thu, Sep 12, 2013 at 12:10 PM, Antoine Pitrou wrote: > On Thu, 12 Sep 2013 20:59:46 +0200 > "Giampaolo Rodola'" > wrote: > > This is a follow up of a feature request which recently appeared on > psutil > > bug tracker: > > https://code.google.com/p/psutil/issues/detail?id=427 > > > > I don't know whether the proposal makes sense for psutil per-se but it > > certainly made me think about multiprocessing.cpu_count() and the fact > that > > it currently returns the number of virtual CPUs (physical + logical). > > > > Given that multiple processes cannot take any advantage of hyper > threading > > technology > > Of course they can. The CPU doesn't distinguish between different > kinds of "threads", they can either belong to the same process or to > different ones. > > Regards > > Antoine. > Antoine's claim is backed by a document written by Intel: http://software.intel.com/en-us/articles/performance-insights-to-intel-hyper-threading-technology/. Specifically, in the section "Software Use of Intel HT Technology". -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.rodola at gmail.com Thu Sep 12 21:51:17 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Thu, 12 Sep 2013 21:51:17 +0200 Subject: [Python-ideas] multiprocessing and physical CPU cores count In-Reply-To: <20130912213251.3d12b714@fsol> References: <20130912211004.74746942@fsol> <20130912213251.3d12b714@fsol> Message-ID: On Thu, Sep 12, 2013 at 9:32 PM, Antoine Pitrou wrote: > On Thu, 12 Sep 2013 21:26:07 +0200 > "Giampaolo Rodola'" > wrote: > > On Thu, Sep 12, 2013 at 9:10 PM, Antoine Pitrou > wrote: > > > > > On Thu, 12 Sep 2013 20:59:46 +0200 > > > "Giampaolo Rodola'" > > > wrote: > > > > This is a follow up of a feature request which recently appeared on > > > psutil > > > > bug tracker: > > > > https://code.google.com/p/psutil/issues/detail?id=427 > > > > > > > > I don't know whether the proposal makes sense for psutil per-se but > it > > > > certainly made me think about multiprocessing.cpu_count() and the > fact > > > that > > > > it currently returns the number of virtual CPUs (physical + logical). > > > > > > > > Given that multiple processes cannot take any advantage of hyper > > > threading > > > > technology > > > > > > Of course they can. The CPU doesn't distinguish between different > > > kinds of "threads", they can either belong to the same process or to > > > different ones. > > > > > > Of course you're right, I'm sorry. I should have phrased my statement > more > > carefully before sending the email. > > Then the question is whether having physical CPU cores count can be > useful. > > I suppose it doesn't hurt :-) I don't think it belongs specifically in > multiprocessing, though. Perhaps in the platform module? > I'd be +0.5 for multiprocessing because: - cpu_count() is already there - physical_cpu_count() will likely be used by multiprocessing users only ...but my main concern was first figuring out whether it might actually make sense to distinguish between virtual and physical CPUs in a real world app. > (unless you want to contribute psutil to the stdlib?) That's something I'd be happy to do if there's general approval but I guess that's for another thread. --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian at python.org Thu Sep 12 22:19:45 2013 From: christian at python.org (Christian Heimes) Date: Thu, 12 Sep 2013 22:19:45 +0200 Subject: [Python-ideas] multiprocessing and physical CPU cores count In-Reply-To: References: <20130912211004.74746942@fsol> <20130912213251.3d12b714@fsol> Message-ID: Am 12.09.2013 21:51, schrieb Giampaolo Rodola': > I'd be +0.5 for multiprocessing because: > > - cpu_count() is already there > - physical_cpu_count() will likely be used by multiprocessing users only > > ...but my main concern was first figuring out whether it might actually > make sense to distinguish between virtual and physical CPUs in a real > world app. I would go one step further and expose the topology of the CPUs. It's much, much more complicated than just physical and logical CPUs. For example with Intel CPUs, two hyper-threading units have different registers but share the same L1 and L2 cache. All CPU core inside a physical processor share a common L3 cache. Multiple processor on machines with several processor slots have to communicate through QPI (QuickPath Interconnect). ccNUMA (cache coherent non-uniform memory access) ensures that memory barriers syncs these caches when a process uses multiple processors. Every processor has its own memory banks so 'remote' memory is more expensive to access. Other processors have a different internal structure. Some aren't ccNUMA ... Christian From victor.stinner at gmail.com Thu Sep 12 23:03:30 2013 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 12 Sep 2013 23:03:30 +0200 Subject: [Python-ideas] multiprocessing and physical CPU cores count In-Reply-To: References: <20130912211004.74746942@fsol> <20130912213251.3d12b714@fsol> Message-ID: Python 3.4 has os.cpu_count(). Victor Le 12 sept. 2013 21:52, "Giampaolo Rodola'" a ?crit : > > On Thu, Sep 12, 2013 at 9:32 PM, Antoine Pitrou wrote: >> >> On Thu, 12 Sep 2013 21:26:07 +0200 >> "Giampaolo Rodola'" >> wrote: >> > On Thu, Sep 12, 2013 at 9:10 PM, Antoine Pitrou wrote: >> > >> > > On Thu, 12 Sep 2013 20:59:46 +0200 >> > > "Giampaolo Rodola'" >> > > wrote: >> > > > This is a follow up of a feature request which recently appeared on >> > > psutil >> > > > bug tracker: >> > > > https://code.google.com/p/psutil/issues/detail?id=427 >> > > > >> > > > I don't know whether the proposal makes sense for psutil per-se but it >> > > > certainly made me think about multiprocessing.cpu_count() and the fact >> > > that >> > > > it currently returns the number of virtual CPUs (physical + logical). >> > > > >> > > > Given that multiple processes cannot take any advantage of hyper >> > > threading >> > > > technology >> > > >> > > Of course they can. The CPU doesn't distinguish between different >> > > kinds of "threads", they can either belong to the same process or to >> > > different ones. >> > >> > >> > Of course you're right, I'm sorry. I should have phrased my statement more >> > carefully before sending the email. >> > Then the question is whether having physical CPU cores count can be useful. >> >> I suppose it doesn't hurt :-) I don't think it belongs specifically in >> multiprocessing, though. Perhaps in the platform module? > > > I'd be +0.5 for multiprocessing because: > > - cpu_count() is already there > - physical_cpu_count() will likely be used by multiprocessing users only > > ...but my main concern was first figuring out whether it might actually make sense to distinguish between virtual and physical CPUs in a real world app. > >> >> (unless you want to contribute psutil to the stdlib?) > > > That's something I'd be happy to do if there's general approval but I guess that's for another thread. > > --- Giampaolo > https://code.google.com/p/pyftpdlib/ > https://code.google.com/p/psutil/ > https://code.google.com/p/pysendfile/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Sep 12 23:15:57 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 12 Sep 2013 23:15:57 +0200 Subject: [Python-ideas] multiprocessing and physical CPU cores count References: <20130912211004.74746942@fsol> <20130912213251.3d12b714@fsol> Message-ID: <20130912231557.443806b3@fsol> On Thu, 12 Sep 2013 22:19:45 +0200 Christian Heimes wrote: > Am 12.09.2013 21:51, schrieb Giampaolo Rodola': > > I'd be +0.5 for multiprocessing because: > > > > - cpu_count() is already there > > - physical_cpu_count() will likely be used by multiprocessing users only > > > > ...but my main concern was first figuring out whether it might actually > > make sense to distinguish between virtual and physical CPUs in a real > > world app. > > I would go one step further and expose the topology of the CPUs. It's > much, much more complicated than just physical and logical CPUs. I'm not sure what the point would be. From the point of the view of an application programmer, the CPU topology is an almost esoteric detail. This would be appropriate for a third-party "system information" package, IMO (with memory speed, number of PCIe channels, cache associativity, etc.). Regards Antoine. From rymg19 at gmail.com Fri Sep 13 01:31:08 2013 From: rymg19 at gmail.com (Ryan) Date: Thu, 12 Sep 2013 18:31:08 -0500 Subject: [Python-ideas] AST Pretty Printer Message-ID: <70c75000-23cb-482d-b12c-4610de34e0b1@email.android.com> I always encounter one problem when dealing with Python ASTs: When I print it, it looks like Lisp(aka Lots of Irritated Superfluous Parenthesis). In short: it's a mess. My idea is an AST pretty printer built on ast.NodeVisitor. If anyone finds this interesting, I can probably have a prototype of the class between later today and sometime tomorrow. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Fri Sep 13 01:40:22 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Thu, 12 Sep 2013 16:40:22 -0700 Subject: [Python-ideas] AST Pretty Printer In-Reply-To: <70c75000-23cb-482d-b12c-4610de34e0b1@email.android.com> References: <70c75000-23cb-482d-b12c-4610de34e0b1@email.android.com> Message-ID: I would be interested in it; would have made developing macropy much easier if there was a way to nicely print large blobs of AST (i.e. nicer than ast.dump). On Thu, Sep 12, 2013 at 4:31 PM, Ryan wrote: > I always encounter one problem when dealing with Python ASTs: When I print > it, it looks like Lisp(aka Lots of Irritated Superfluous Parenthesis). In > short: it's a mess. > > My idea is an AST pretty printer built on ast.NodeVisitor. If anyone finds > this interesting, I can probably have a prototype of the class between > later today and sometime tomorrow. > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 13 02:01:46 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 12 Sep 2013 17:01:46 -0700 Subject: [Python-ideas] AST Pretty Printer In-Reply-To: <70c75000-23cb-482d-b12c-4610de34e0b1@email.android.com> References: <70c75000-23cb-482d-b12c-4610de34e0b1@email.android.com> Message-ID: <0759F11C-2B77-475D-9B2D-C71BD5A95582@yahoo.com> On Sep 12, 2013, at 16:31, Ryan wrote: > I always encounter one problem when dealing with Python ASTs: When I print it, it looks like Lisp(aka Lots of Irritated Superfluous Parenthesis). Why are the parentheses irritated? Have you been taunting them? :) > In short: it's a mess. > > My idea is an AST pretty printer built on ast.NodeVisitor. If anyone finds this interesting, I can probably have a prototype of the class between later today and sometime tomorrow. Yes please! I'll bet most people who play with ASTs want this, build something half-assed, never finish it, and lose it by the next time they look at ASTs again three years later... So if you finish something, that'll save effort for hundreds of people in the future (who have no idea they'll want it one day). From anikom15 at gmail.com Fri Sep 13 04:17:41 2013 From: anikom15 at gmail.com (=?iso-8859-1?Q?Westley_Mart=EDnez?=) Date: Thu, 12 Sep 2013 19:17:41 -0700 Subject: [Python-ideas] multiprocessing and physical CPU cores count In-Reply-To: <20130912231557.443806b3@fsol> References: <20130912211004.74746942@fsol> <20130912213251.3d12b714@fsol> <20130912231557.443806b3@fsol> Message-ID: <001801ceb027$6e811710$4b834530$@gmail.com> > From: Python-ideas [mailto:python-ideas- > bounces+anikom15=gmail.com at python.org] On Behalf Of Antoine Pitrou > Sent: Thursday, September 12, 2013 2:16 PM > To: python-ideas at python.org > Subject: Re: [Python-ideas] multiprocessing and physical CPU cores > count > > I'm not sure what the point would be. From the point of the view of an > application programmer, the CPU topology is an almost esoteric detail. > This would be appropriate for a third-party "system information" > package, IMO (with memory speed, number of PCIe channels, cache > associativity, etc.). > > Regards > > Antoine. Isn't the whole point of a high-level language to be able to not have to know about the hardware? From paultag at debian.org Fri Sep 13 04:44:14 2013 From: paultag at debian.org (Paul Tagliamonte) Date: Thu, 12 Sep 2013 22:44:14 -0400 Subject: [Python-ideas] AST Pretty Printer In-Reply-To: <70c75000-23cb-482d-b12c-4610de34e0b1@email.android.com> References: <70c75000-23cb-482d-b12c-4610de34e0b1@email.android.com> Message-ID: <20130913024414.GA18064@leliel> On Thu, Sep 12, 2013 at 06:31:08PM -0500, Ryan wrote: > I always encounter one problem when dealing with Python ASTs: When I print > it, it looks like Lisp(aka Lots of Irritated Superfluous Parenthesis). In > short: it's a mess. Bwahah; well, to each their own. As some might remember from PyCon this year[1], I actually wrote a lisp front-end (OK, not *really* lisp) to Python AST. Works pretty well (even smoothed out the 2.x and 3.x differences, so most code is valid between the two) https://github.com/hylang/hy Yes, it's hilarious. No, parens aren't ugly :) > My idea is an AST pretty printer built on ast.NodeVisitor. If anyone finds > this interesting, I can probably have a prototype of the class between > later today and sometime tomorrow. I'd enjoy such a thing! [1]: http://pyvideo.org/video/1853/friday-evening-lightning-talks http://hylang.org/ http://www.youtube.com/watch?v=ulekCWvDFVI -- .''`. Paul Tagliamonte : :' : Proud Debian Developer `. `'` 4096R / 8F04 9AD8 2C92 066C 7352 D28A 7B58 5B30 807C 2A87 `- http://people.debian.org/~paultag -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: Digital signature URL: From abarnert at yahoo.com Fri Sep 13 04:37:09 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 12 Sep 2013 19:37:09 -0700 Subject: [Python-ideas] multiprocessing and physical CPU cores count In-Reply-To: <001801ceb027$6e811710$4b834530$@gmail.com> References: <20130912211004.74746942@fsol> <20130912213251.3d12b714@fsol> <20130912231557.443806b3@fsol> <001801ceb027$6e811710$4b834530$@gmail.com> Message-ID: <63D98CDD-EFCD-4BD3-8ADE-589B8967822C@yahoo.com> On Sep 12, 2013, at 19:17, Westley Mart?nez wrote: >> From: Python-ideas [mailto:python-ideas- >> bounces+anikom15=gmail.com at python.org] On Behalf Of Antoine Pitrou >> Sent: Thursday, September 12, 2013 2:16 PM >> To: python-ideas at python.org >> Subject: Re: [Python-ideas] multiprocessing and physical CPU cores >> count >> >> I'm not sure what the point would be. From the point of the view of > an >> application programmer, the CPU topology is an almost esoteric detail. >> This would be appropriate for a third-party "system information" >> package, IMO (with memory speed, number of PCIe channels, cache >> associativity, etc.). >> >> Regards >> >> Antoine. > > Isn't the whole point of a high-level language to be able to not > have to know about the hardware? Most programmers won't care; they'll just use the default value for multiprocessing.Pool. But the implementation of multiprocessing, or any similar third-party module like pp, needs that information, so it can pick a good default value so the programmers don't have to. Also, very occasionally, you need to build a pool of processes manually. So if the module has the info, it might as well expose it. From mal at egenix.com Fri Sep 13 10:17:40 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 13 Sep 2013 10:17:40 +0200 Subject: [Python-ideas] multiprocessing and physical CPU cores count In-Reply-To: References: <20130912211004.74746942@fsol> <20130912213251.3d12b714@fsol> Message-ID: <5232CA24.5070804@egenix.com> On 12.09.2013 21:51, Giampaolo Rodola' wrote: > On Thu, Sep 12, 2013 at 9:32 PM, Antoine Pitrou wrote: > >>> Then the question is whether having physical CPU cores count can be >> useful. >> >> I suppose it doesn't hurt :-) I don't think it belongs specifically in >> multiprocessing, though. Perhaps in the platform module? >> > > I'd be +0.5 for multiprocessing because: > > - cpu_count() is already there > - physical_cpu_count() will likely be used by multiprocessing users only > > ...but my main concern was first figuring out whether it might actually > make sense to distinguish between virtual and physical CPUs in a real world > app. I'm with Antoine here: both APIs would make more sense in the platform or os module. Victor mentioned that there already is an os.cpu_count() in Python 3.4, so perhaps add it there. Do you need C code for determining the physical count ? >> (unless you want to contribute psutil to the stdlib?) > > > That's something I'd be happy to do if there's general approval but I guess > that's for another thread. I'd love to see psutils in the stdlib, but also be warned: once the code lives in the stdlib, a) making changes is difficult and adding new features as well, b) you are bound by the Python release cycle. For a package such psutil, it may actually be better to keep it outside the stdlib, since the outside world changes regularly and doesn't adhere to the Python release cycle or feature for patch level releases ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 13 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-04: Released eGenix pyOpenSSL 0.13.2 ... http://egenix.com/go48 2013-09-20: PyCon UK 2013, Coventry, UK ... 7 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From g.rodola at gmail.com Fri Sep 13 14:54:31 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Fri, 13 Sep 2013 14:54:31 +0200 Subject: [Python-ideas] multiprocessing and physical CPU cores count In-Reply-To: <5232C94C.6090201@python.org> References: <20130912211004.74746942@fsol> <20130912213251.3d12b714@fsol> <5232C94C.6090201@python.org> Message-ID: On Fri, Sep 13, 2013 at 10:14 AM, M.-A. Lemburg wrote: > On 12.09.2013 21:51, Giampaolo Rodola' wrote: > > On Thu, Sep 12, 2013 at 9:32 PM, Antoine Pitrou > wrote: > > > >>> Then the question is whether having physical CPU cores count can be > >> useful. > >> > >> I suppose it doesn't hurt :-) I don't think it belongs specifically in > >> multiprocessing, though. Perhaps in the platform module? > >> > > > > I'd be +0.5 for multiprocessing because: > > > > - cpu_count() is already there > > - physical_cpu_count() will likely be used by multiprocessing users only > > > > ...but my main concern was first figuring out whether it might actually > > make sense to distinguish between virtual and physical CPUs in a real > world > > app. > > I'm with Antoine here: both APIs would make more sense in the > platform module. In the end it appears the os module would probably be better as cpu_count() already ended up there (http://bugs.python.org/issue17914) as pointed out by Victor a couple of emails ago. I have the impression no one is opposed so I can probably start working on a patch and submit it on the bug tracker. > Do you need C code for determining the physical count ? Yes, except on Linux where you'll just read /proc/cpuinfo. > >> (unless you want to contribute psutil to the stdlib?) > > > > > > That's something I'd be happy to do if there's general approval but I > guess > > that's for another thread. > > I'd love to see psutils in the stdlib, but also be warned: once > the code lives in the stdlib, > > a) making changes is difficult and adding new features as well, > > b) you are bound by the Python release cycle. > > For a package such psutil, it may actually be better to keep it > outside the stdlib, since the outside world changes regularly > and doesn't adhere to the Python release cycle or feature > for patch level releases ;-) > Yeah, you're probably right,and there's at least a couple of high priority functionalities I'd like to add first (to say one: dragonfly/open/net BSD support). --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Fri Sep 13 18:14:19 2013 From: rymg19 at gmail.com (Ryan) Date: Fri, 13 Sep 2013 11:14:19 -0500 Subject: [Python-ideas] AST Pretty Printer In-Reply-To: <20130913024414.GA18064@leliel> References: <70c75000-23cb-482d-b12c-4610de34e0b1@email.android.com> <20130913024414.GA18064@leliel> Message-ID: <1701a638-4ddc-492d-99ef-f7cc2042d7f4@email.android.com> Honestly, it wasn't really the parenthesis that made me hate Lisp; it was everything else. The lack of line numbers,... It made Perl look nice for a second there. Paul Tagliamonte wrote: >On Thu, Sep 12, 2013 at 06:31:08PM -0500, Ryan wrote: >> I always encounter one problem when dealing with Python ASTs: When >I print >> it, it looks like Lisp(aka Lots of Irritated Superfluous >Parenthesis). In >> short: it's a mess. > >Bwahah; well, to each their own. > >As some might remember from PyCon this year[1], I actually wrote a lisp >front-end (OK, not *really* lisp) to Python AST. Works pretty well >(even >smoothed out the 2.x and 3.x differences, so most code is valid between >the two) > > https://github.com/hylang/hy > >Yes, it's hilarious. No, parens aren't ugly :) > >> My idea is an AST pretty printer built on ast.NodeVisitor. If >anyone finds >> this interesting, I can probably have a prototype of the class >between >> later today and sometime tomorrow. > >I'd enjoy such a thing! > > >[1]: http://pyvideo.org/video/1853/friday-evening-lightning-talks > http://hylang.org/ > http://www.youtube.com/watch?v=ulekCWvDFVI > >-- > .''`. Paul Tagliamonte >: :' : Proud Debian Developer >`. `'` 4096R / 8F04 9AD8 2C92 066C 7352 D28A 7B58 5B30 807C 2A87 > `- http://people.debian.org/~paultag -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Sat Sep 14 02:25:25 2013 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 13 Sep 2013 19:25:25 -0500 Subject: [Python-ideas] AST Pretty Printer Message-ID: Note: I didn't know who to reply to, so I just restarted the thread with the same subject. Here is the code: class astpp(ast.NodeVisitor): def __init__(self, tree): super(ast.NodeVisitor, self).__init__() self.indent = 0 self.visit(tree) def _print(self, text): print (' ' * self.indent + text) def generic_visit(self, node): self._print(node.__class__.__name__ + '(') self.indent += 1 for name, item in node.__dict__.iteritems(): if isinstance(item, ast.AST): self._print(name + '=') self.indent += 1 self.generic_visit(item) self.indent -= 1 elif isinstance(item, list): self._print(name + '=[') self.indent += 1 [self.generic_visit(attr) for attr in item] self.indent -= 1 self._print(']') else: self._print(name + '=' + str(item)) self.indent -= 1 self._print(')') Sample usage: astpp('''len('My friends are my power!')''') Output: Module( body=[ Expr( lineno=1 value= Call( col_offset=0 starargs=None args=[ Str( s=My friends are my power! lineno=1 col_offset=4 ) ] lineno=1 func= Name( ctx= Load( ) id=len col_offset=0 lineno=1 ) kwargs=None keywords=[ ] ) col_offset=0 ) ] ) -- Ryan -------------- next part -------------- An HTML attachment was scrubbed... URL: From clay.sweetser at gmail.com Sat Sep 14 07:36:25 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Sat, 14 Sep 2013 01:36:25 -0400 Subject: [Python-ideas] Style for multi-line generator expressions Message-ID: PEP 8 currently lacks any suggestions for how multi-line generator expressions and list comprehensions should be formatted. In the absence of any official style suggestion (that I can find), I suggest the style used the most in the standard library. [ for in if ] Note, lines could still be combined where it makes sense, eg, the first two lines could be combined if they aren't too long. -- "Evil begins when you begin to treat people as things." - Terry Pratchett -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Sat Sep 14 07:49:46 2013 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 14 Sep 2013 07:49:46 +0200 Subject: [Python-ideas] Style for multi-line generator expressions In-Reply-To: References: Message-ID: On 09/14/2013 07:36 AM, Clay Sweetser wrote: > PEP 8 currently lacks any suggestions for how multi-line generator expressions > and list comprehensions should be formatted. In the absence of any official > style suggestion (that I can find), I suggest the style used the most in the > standard library. > > [ > for in > if ] > > Note, lines could still be combined where it makes sense, eg, the first two > lines could be combined if they aren't too long. But that amounts to simply respecting the line length limit and using logical breakpoints. I don't think that requires a special mention in PEP 8. cheers, Georg From storchaka at gmail.com Sat Sep 14 17:09:15 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 14 Sep 2013 18:09:15 +0300 Subject: [Python-ideas] Add dict.getkey() and set.get() Message-ID: I propose to add two methods: dict.getkey(key) returns original key stored in the dict which is equal to specified key. E.g. >>> d = {2: 'a', 5.0: 'b'} >>> d.getkey(2.0) 2 >>> d.getkey(5) 5.0 >>> d.getkey(17) Traceback (most recent call last): File "", line 1, in KeyError: 17 set.get(value) returns original value stored in the set which is equal to specified value. E.g. >>> s = {2, 5.0} >>> s.get(2.0) 2 >>> s.get(5) 5.0 >>> s.get(17) Traceback (most recent call last): File "", line 1, in KeyError: 17 From victor.stinner at gmail.com Sat Sep 14 18:19:48 2013 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 14 Sep 2013 18:19:48 +0200 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: What is the use case of such methods? Why not using dict.kets() and tuple(set)? Victor Le 14 sept. 2013 17:09, "Serhiy Storchaka" a ?crit : > I propose to add two methods: > > dict.getkey(key) returns original key stored in the dict which is equal to > specified key. E.g. > > >>> d = {2: 'a', 5.0: 'b'} > >>> d.getkey(2.0) > 2 > >>> d.getkey(5) > 5.0 > >>> d.getkey(17) > Traceback (most recent call last): > File "", line 1, in > KeyError: 17 > > set.get(value) returns original value stored in the set which is equal to > specified value. E.g. > > >>> s = {2, 5.0} > >>> s.get(2.0) > 2 > >>> s.get(5) > 5.0 > >>> s.get(17) > Traceback (most recent call last): > File "", line 1, in > KeyError: 17 > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Sep 14 18:52:30 2013 From: mertz at gnosis.cx (David Mertz) Date: Sat, 14 Sep 2013 09:52:30 -0700 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: One way you can spell it currently is: >>> getexact = lambda m, v: [x for x in m if x==v][0] >>> d = {2: 'a', 5.0: 'b'} >>> s = {2, 5.0} >>> getexact(d, 2.0) 2 >>> getexact(d, 17) Traceback (most recent call last): File "", line 1, in File "", line 1, in IndexError: list index out of range >>> getexact(s, 2.0) 2 It's true that the exception leaves a little to be desired here. If you actually want to use this function much, and worry about the extra equality comparisons in my one-line version, maybe: def getexact(m, v): for x in m: if x==v: return x else: raise KeyError(v) Like Victor, I'd want to see an actual use case before wanting these as extra methods of the actual data types. My function versions have the advantage also that they work on ANY iterable, not only dict and set, and also hence have one spelling for the same conceptual operation. On Sat, Sep 14, 2013 at 8:09 AM, Serhiy Storchaka wrote: > I propose to add two methods: > > dict.getkey(key) returns original key stored in the dict which is equal to > specified key. E.g. > > >>> d = {2: 'a', 5.0: 'b'} > >>> d.getkey(2.0) > 2 > >>> d.getkey(5) > 5.0 > >>> d.getkey(17) > Traceback (most recent call last): > File "", line 1, in > KeyError: 17 > > set.get(value) returns original value stored in the set which is equal to > specified value. E.g. > > >>> s = {2, 5.0} > >>> s.get(2.0) > 2 > >>> s.get(5) > 5.0 > >>> s.get(17) > Traceback (most recent call last): > File "", line 1, in > KeyError: 17 > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Sep 14 18:59:38 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Sep 2013 02:59:38 +1000 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: I'll also note that in any case where I've needed to be able to determine the "canonical form" of a key, I've known the transform to use (e.g. str.lower, int, operator.index) rather than (or in addition to) having a container holding the canonical forms. If this is inspired by the transform dict PEP, then it may be better to expose the conversion function rather than a way to ask the container to do the conversion itself (yes, I'm aware I suggested the latter approach the other day - this thread is making me reconsider). Cheers, Nick. From mertz at gnosis.cx Sat Sep 14 19:04:03 2013 From: mertz at gnosis.cx (David Mertz) Date: Sat, 14 Sep 2013 10:04:03 -0700 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: Perhaps being pedantic, but there is not necessarily ONE key in the original collection which is equal to the search value: >>> d = {2: 'a', 5.0: 'b', 7.2: 'low', 7.6: 'high'} >>> class FuzzyNumber(float): ... def __eq__(self, other): ... return abs(self-other) < 0.5 ... >>> fn = FuzzyNumber(7.5) >>> getexact(d, fn) # What gets returned here?! 7.6 ANY implementation of this idea would either have to pick the arbitrary first match, or ... well, something else. Equality isn't actually transitive across all Python objects... and even my quick example isn't a completely absurd data type (it would need to be fleshed out better, but a FuzzyNumber could well have sensible purposes). On Sat, Sep 14, 2013 at 9:52 AM, David Mertz wrote: > One way you can spell it currently is: > > >>> getexact = lambda m, v: [x for x in m if x==v][0] > >>> d = {2: 'a', 5.0: 'b'} > >>> s = {2, 5.0} > >>> getexact(d, 2.0) > 2 > >>> getexact(d, 17) > Traceback (most recent call last): > File "", line 1, in > File "", line 1, in > IndexError: list index out of range > >>> getexact(s, 2.0) > 2 > > It's true that the exception leaves a little to be desired here. If you > actually want to use this function much, and worry about the extra equality > comparisons in my one-line version, maybe: > > def getexact(m, v): > for x in m: > if x==v: return x > else: > raise KeyError(v) > > Like Victor, I'd want to see an actual use case before wanting these as > extra methods of the actual data types. My function versions have the > advantage also that they work on ANY iterable, not only dict and set, and > also hence have one spelling for the same conceptual operation. > > > On Sat, Sep 14, 2013 at 8:09 AM, Serhiy Storchaka wrote: > >> I propose to add two methods: >> >> dict.getkey(key) returns original key stored in the dict which is equal >> to specified key. E.g. >> >> >>> d = {2: 'a', 5.0: 'b'} >> >>> d.getkey(2.0) >> 2 >> >>> d.getkey(5) >> 5.0 >> >>> d.getkey(17) >> Traceback (most recent call last): >> File "", line 1, in >> KeyError: 17 >> >> set.get(value) returns original value stored in the set which is equal to >> specified value. E.g. >> >> >>> s = {2, 5.0} >> >>> s.get(2.0) >> 2 >> >>> s.get(5) >> 5.0 >> >>> s.get(17) >> Traceback (most recent call last): >> File "", line 1, in >> KeyError: 17 >> >> ______________________________**_________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/**mailman/listinfo/python-ideas >> > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Sep 14 19:27:00 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 14 Sep 2013 18:27:00 +0100 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: <52349C64.5050904@mrabarnett.plus.com> On 14/09/2013 16:09, Serhiy Storchaka wrote: > I propose to add two methods: > > dict.getkey(key) returns original key stored in the dict which is equal > to specified key. E.g. > > >>> d = {2: 'a', 5.0: 'b'} > >>> d.getkey(2.0) > 2 > >>> d.getkey(5) > 5.0 > >>> d.getkey(17) > Traceback (most recent call last): > File "", line 1, in > KeyError: 17 > > set.get(value) returns original value stored in the set which is equal > to specified value. E.g. > > >>> s = {2, 5.0} > >>> s.get(2.0) > 2 > >>> s.get(5) > 5.0 > >>> s.get(17) > Traceback (most recent call last): > File "", line 1, in > KeyError: 17 > There's discussion on python-dev about adding TransformDict with a .getitem method. Adding a .getkey method to dicts would not be consistent with that; really they should either both have .getitem or both have .getkey. From tjreedy at udel.edu Sat Sep 14 19:56:25 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 14 Sep 2013 13:56:25 -0400 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: On 9/14/2013 1:04 PM, David Mertz wrote: > Perhaps being pedantic, but there is not necessarily ONE key in the > original collection which is equal to the search value: Sets presume that == is an equivalence relation. When that is not true, as for such FuzzyNumbers (which break transitivity), they should not be used with sets at all, as many operations will be somewhat broken. For one thing, the composition of such 'sets' will depend on the order of addition. > >>> d = {2: 'a', 5.0: 'b', 7.2: 'low', 7.6: 'high'} > >>> class FuzzyNumber(float): > ... def __eq__(self, other): > ... return abs(self-other) < 0.5 Better to call this method 'similar' or 'is_similar', since similarity is not expected to be transitive. > >>> fn = FuzzyNumber(7.5) > >>> getexact(d, fn) # What gets returned here?! > 7.6 > > ANY implementation of this idea would either have to pick the arbitrary > first match, or ... well, something else. Equality isn't actually > transitive across all Python objects... Except for NaNs, the non-floats called floats for the benefit of languages with typed operations, I believe equality is (at least as far as possible) transitive for the built-in classes as delivered. We fixed 0.0 == 0 == Decimal(0) != 0.0 because of the problems caused by the non-transitivity. Avoiding breaking transitivity was one of the design constraints of the Enums. > and even my quick example isn't > a completely absurd data type (it would need to be fleshed out better, > but a FuzzyNumber could well have sensible purposes). The only absurd thing is calling similarity 'equality' ;=). -- Terry Jan Reedy From tjreedy at udel.edu Sat Sep 14 20:03:51 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 14 Sep 2013 14:03:51 -0400 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: On 9/14/2013 11:09 AM, Serhiy Storchaka wrote: > I propose to add two methods: > > dict.getkey(key) returns original key stored in the dict which is equal > to specified key. E.g. > > >>> d = {2: 'a', 5.0: 'b'} > >>> d.getkey(2.0) > 2 > >>> d.getkey(5) > 5.0 > >>> d.getkey(17) > Traceback (most recent call last): > File "", line 1, in > KeyError: 17 > > set.get(value) returns original value stored in the set which is equal > to specified value. E.g. > > >>> s = {2, 5.0} > >>> s.get(2.0) > 2 > >>> s.get(5) > 5.0 > >>> s.get(17) > Traceback (most recent call last): > File "", line 1, in > KeyError: 17 If sets had get() as you described, there would be no need for dict.getkey as long as the set-like key view had get(). This would be the appropriate place since get() has nothing to do with the values but only the set of keys. -- Terry Jan Reedy From mertz at gnosis.cx Sat Sep 14 20:20:53 2013 From: mertz at gnosis.cx (David Mertz) Date: Sat, 14 Sep 2013 11:20:53 -0700 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: On Sat, Sep 14, 2013 at 10:56 AM, Terry Reedy wrote: > Sets presume that == is an equivalence relation. When that is not true, as > for such FuzzyNumbers (which break transitivity), they should not be used > with sets at all, as many operations will be somewhat broken. For one > thing, the composition of such 'sets' will depend on the order of addition. > I'm not putting any huge weight in my toy class. But notice that it deliberately is NOT hashable, hence cannot make it into a set (or dict key): >>> fn = FuzzyNumber(7.5) >>> d = {fn: "fuzzy"} Traceback (most recent call last): File "", line 1, in TypeError: unhashable type: 'FuzzyNumber' What the hypothetical class might be useful for, in my passing thought, is for e.g. an imprecise measurement of something. Any value that is "close" is equal within the error of measurement. Hence perhaps we want to be able to say: >>> [x for x in (1,2, 7.2, 7.3, 7.9, 8.0, 9) if x==fn] [7.2, 7.3, 7.9] I do know one *could* spell that like: >>> [x for x in (1,2, 7.2, 7.3, 7.9, 8.0, 9) if ref_val.close_to(x)] But anyway, whether or not my FuzzyNumber class is a *good* idea, it is something that end users *could* do as long as we give them an .__eq__() magic method to play with. Hence a 'getexact()' function or a dict.getkey() method would have to do SOMETHING when presented with such a transitivity-of-equality-breaking object. > > > > >>> d = {2: 'a', 5.0: 'b', 7.2: 'low', 7.6: 'high'} >> >>> class FuzzyNumber(float): >> ... def __eq__(self, other): >> ... return abs(self-other) < 0.5 >> > > Better to call this method 'similar' or 'is_similar', since similarity is > not expected to be transitive. > > > >>> fn = FuzzyNumber(7.5) >> >>> getexact(d, fn) # What gets returned here?! >> 7.6 >> >> ANY implementation of this idea would either have to pick the arbitrary >> first match, or ... well, something else. Equality isn't actually >> transitive across all Python objects... >> > > Except for NaNs, the non-floats called floats for the benefit of languages > with typed operations, I believe equality is (at least as far as possible) > transitive for the built-in classes as delivered. We fixed > 0.0 == 0 == Decimal(0) != 0.0 > because of the problems caused by the non-transitivity. Avoiding breaking > transitivity was one of the design constraints of the Enums. > > > > and even my quick example isn't > >> a completely absurd data type (it would need to be fleshed out better, >> but a FuzzyNumber could well have sensible purposes). >> > > The only absurd thing is calling similarity 'equality' ;=). > > -- > Terry Jan Reedy > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidhalter88 at gmail.com Sat Sep 14 21:15:14 2013 From: davidhalter88 at gmail.com (David Halter) Date: Sat, 14 Sep 2013 23:45:14 +0430 Subject: [Python-ideas] Should we improve `dir`? Message-ID: I recently stumbled over `dir()` not working correctly in the case of classes: http://jedidjah.ch/code/2013/9/8/wrong_dir_function/ In short: `dir` doesn't list the `type` methods, which it should in my opinion, because there are very important attributes in there like `__name__` or `__bases__`. This led to some confusion in the past, e.g. http://www.gossamer-threads.com/lists/python/python/507363. The long version is in the above link. After discussions, I realized that I should probably bring this up in python-ideas, I think the current implementation can be very confusing for people trying to introspect classes with `dir`, which is IMHO its typical use case. -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Sep 14 21:48:21 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 14 Sep 2013 22:48:21 +0300 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: 14.09.13 21:03, Terry Reedy ???????(??): > If sets had get() as you described, there would be no need for > dict.getkey as long as the set-like key view had get(). This would be > the appropriate place since get() has nothing to do with the values but > only the set of keys. Agree. From storchaka at gmail.com Sat Sep 14 22:10:35 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 14 Sep 2013 23:10:35 +0300 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: 14.09.13 19:19, Victor Stinner ???????(??): > What is the use case of such methods? Why not using dict.kets() and > tuple(set)? Scanning dict.keys() or set has linear complexity. dict.getkey() and set.get() can be easily implemented with O(1). I have no good use cases. Perhaps every problem which requires dict.getkey() or set.get() can be solved with additional synchronized dict which maps key to key. This is also true for TransformDict. From mistersheik at gmail.com Sat Sep 14 22:10:00 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 14 Sep 2013 16:10:00 -0400 Subject: [Python-ideas] Replace option set/get methods through the standard library with a ChainMap; add a context manager to ChainMap In-Reply-To: References: <8a0c3260-7c85-46ba-9dae-102e7fceb8f4@googlegroups.com> Message-ID: ChainMap supports the pattern of a "dictionary that supports temporarily overriding items". The method I'm suggesting is as follows: class ChainMap: @contextmanager def child_contex(self, **kwargs): self.add_child(**kwargs) try: yield finally: self = self.parents Then, when updating numpy.printoptions: with numpy.printoptions.child_context(precision=23): ... # do something With a regular dict, numpy would end up implementing the necessary context manager once for each set of options instead of factoring that code out into ChainMap. On Thu, Sep 12, 2013 at 6:24 AM, Oscar Benjamin wrote: > On 12 September 2013 11:14, Neil Girdhar wrote: > > > > Thank you. I will ask there about adding numpy context managers. > However, > > the extra member function to ChainMap to use it as a context manager > would > > be a question for this mailing list, right? > > Perhaps you could spell out that part of the idea in more detail then. > Why in particular would it need to be a ChainMap and not a regular > dict? Does the method return a new ChainMap instance? What would be > seen by other code that holds references to the same ChainMap? > > > Oscar > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbvsmo at gmail.com Sat Sep 14 22:21:52 2013 From: jbvsmo at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Bernardo?=) Date: Sat, 14 Sep 2013 17:21:52 -0300 Subject: [Python-ideas] Should we improve `dir`? In-Reply-To: References: Message-ID: 2013/9/14 David Halter > I recently stumbled over `dir()` not working correctly in the case of > classes: > > http://jedidjah.ch/code/2013/9/8/wrong_dir_function/ > > That's expected and the right behavior IMHO. If you need your classes (or metaclasses) to behave differently, set the __dir__ method. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Sep 14 22:22:09 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 14 Sep 2013 16:22:09 -0400 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: On 9/14/2013 2:20 PM, David Mertz wrote: > I'm not putting any huge weight in my toy class. But notice that it > deliberately is NOT hashable, hence cannot make it into a set (or dict key): ... > >>> [x for x in (1,2, 7.2, 7.3, 7.9, 8.0, 9) if ref_val.close_to(x)] ... > But anyway, whether or not my FuzzyNumber class is a *good* idea, it is > something that end users *could* do as long as we give them an .__eq__() > magic method to play with. Hence a 'getexact()' function or a > dict.getkey() method would have to do SOMETHING when presented with such > a transitivity-of-equality-breaking object. I would expect the proposed set/dict methods to work by hashing the target. Otherwise, they would be no be justified as *methods*, as the generic function should be, as you said, a function. The generic 'iterate and return the first item matching the target' will either return the first match or do whatever the else: clause dictates. I do not see why you think there is a special problem with this. There is nothing special about equality versus any other match predicate. Transitivity is irrelevant here. On the other hand, symmetry is a concern, as 'item == target' and 'target == item' could be different and the result of the function would depend on which is used. -- Terry Jan Reedy From raymond.hettinger at gmail.com Sat Sep 14 22:54:06 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 14 Sep 2013 13:54:06 -0700 Subject: [Python-ideas] Should we improve `dir`? In-Reply-To: References: Message-ID: On Sep 14, 2013, at 12:15 PM, David Halter wrote: > After discussions, I realized that I should probably bring this up in python-ideas, I think the current implementation can be very confusing for people trying to introspect classes with `dir`, which is IMHO its typical use case. The current behavior of dir() is a bit irritating when I am teaching how Python works. That said, the irritation is minor and easily overcome. I would not want to change the behavior and risk breaking existing introspection code (that code is tends to be more fragile and implementation than most other code). In other words, I just don't think it is worth changing something that has been in-place for a very long long time. The minor benefit doesn't want the downsides that goes with API churn. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Sat Sep 14 23:03:09 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 14 Sep 2013 14:03:09 -0700 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: On Sep 14, 2013, at 1:10 PM, Serhiy Storchaka wrote: > I have no good use cases. That should be the end of the story ;-) Also, we need to have a strong preference to keep the core APIs small. Python is becoming harder and harder to teach -- it no longer "fits in your head". If you look at mapping and set APIs in other languages, you will see than this particular feature creep has usually been deemed unnecessary. The most important thing we can do for Python is to teach how to use the core objects to solve problems rather than trying to add a method for every single idea that has ever occurred to us. Dictionaries and lists are very flexible tools. We need to teach people to use them to solve simple problems: canonical = {} def intern(obj): 'Return a canonical member of an equivalent class' if obj in canonical: return canonical[obj] canonical[obj] = obj return obj Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Sep 15 01:07:57 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 15 Sep 2013 09:07:57 +1000 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: <20130914230756.GQ16820@ando> On Sat, Sep 14, 2013 at 09:52:30AM -0700, David Mertz wrote: > def getexact(m, v): > for x in m: > if x==v: return x > else: > raise KeyError(v) This has the flaw that it is O(N) rather than O(1). It's really quite unfortunate to have dicts and sets able to access keys in (almost) constant time, but not be able to communicate that key back to the caller except by walking the entire dict/set. Now O(N) is tolerable if all you want to do is retrieve the canonical version of a single key. But if you want to do so for *all* of the keys in the dict, the naive way to do it ends up walking the dict for each key, giving O(N**2) in total. Is there a non-naive way to speed this up? I haven't had breakfast yet so I can't think of one :-) My feeling here is that for ordinary dicts, needing to retrieve the canonical key is rare enough that they don't need a dedicated method to do so. But for the TransformDict suggested on the python-dev list, it will be a common need, and deserves an O(1) lookup method. -- Steven From tjreedy at udel.edu Sun Sep 15 01:12:08 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 14 Sep 2013 19:12:08 -0400 Subject: [Python-ideas] Should we improve `dir`? In-Reply-To: References: Message-ID: On 9/14/2013 4:54 PM, Raymond Hettinger wrote: > On Sep 14, 2013, at 12:15 PM, David Halter > > wrote: > >> After discussions, I realized that I should probably bring this up in >> python-ideas, I think the current implementation can be very confusing >> for people trying to introspect classes with `dir`, which is IMHO its >> typical use case. > > The current behavior of dir() is a bit irritating when I am teaching how > Python works. > > That said, the irritation is minor and easily overcome. > > I would not want to change the behavior and risk breaking > existing introspection code (that code is tends to be more > fragile and implementation than most other code). This was the basis for rejecting http://bugs.python.org/issue19002 ``dir`` function does not work correctly with classes. The proposal obviously broke pydoc and inspect modules. > In other words, I just don't think it is worth changing something > that has been in-place for a very long long time. The minor > benefit doesn't want the downsides that goes with API churn. The other point is that people *usually* call dir(cls) in order to find the methods they can call in instances of cls: -- Terry Jan Reedy From steve at pearwood.info Sun Sep 15 01:26:54 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 15 Sep 2013 09:26:54 +1000 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: <20130914232654.GR16820@ando> On Sat, Sep 14, 2013 at 01:56:25PM -0400, Terry Reedy wrote: [...] > > and even my quick example isn't > >a completely absurd data type (it would need to be fleshed out better, > >but a FuzzyNumber could well have sensible purposes). > > The only absurd thing is calling similarity 'equality' ;=). While I agree with the general thrust of your post, I'd like to point out that the creator of APL, Ken Iverson did not agree with you. A couple of relevant quotes: In an early talk Ken was explaining the advantages of tolerant comparison. A member of the audience asked incredulously, ?Surely you don?t mean that when A=B and B=C, A may not equal C?? Without skipping a beat, Ken replied, ?Any carpenter knows that!? and went on to the next question. ? quoted by Paul Berry The intransitivity of [tolerant] equality is well known in practical situations and can be easily demonstrated by sawing several pieces of wood of equal length. In one case, use the first piece to measure subsequent lengths; in the second case, use the last piece cut to measure the next. Compare the lengths of the two final pieces. ? Richard Lathwell, APL Comparison Tolerance, APL76, 1976 Mathematicians and programmers treat transitivity as far more fundamental than it actually is in real life :-) -- Steven From mertz at gnosis.cx Sun Sep 15 01:49:04 2013 From: mertz at gnosis.cx (David Mertz) Date: Sat, 14 Sep 2013 16:49:04 -0700 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: <20130914230756.GQ16820@ando> References: <20130914230756.GQ16820@ando> Message-ID: > > On Sat, Sep 14, 2013 at 09:52:30AM -0700, David Mertz wrote: > > def getexact(m, v): > > for x in m: > > if x==v: return x > > else: > > raise KeyError(v) > > This has the flaw that it is O(N) rather than O(1). It's true, it is relatively inefficient. But we also don't have a use case where we actually need to do this enough that it matters. > Now O(N) is tolerable if all you want to do is retrieve the canonical > version of a single key. But if you want to do so for *all* of the keys > in the dict, the naive way to do it ends up walking the dict for > each key, I thought the naive way to retrieve ALL the keys (in canonical form) was 'mydict.keys()'. :-) I'm sure you can spin other variations on this. Here's all the keys except one, removed using a non-canonical equivalent value: >>> {1:2, 3:4, 5:6, 7:8}.keys() - {Decimal(1.0)} {3, 5, 7} It's true though that I can't think of an efficient way to get the canonical form of a key from a dictionary in O(1). But also I think it is rare enough not to worry, and TransformDict is a good specialization that will do this in those unusual cases where we care. > giving O(N**2) in total. Is there a non-naive way to speed > this up? I haven't had breakfast yet so I can't think of one :-) > > My feeling here is that for ordinary dicts, needing to retrieve the > canonical key is rare enough that they don't need a dedicated method to > do so. But for the TransformDict suggested on the python-dev list, it > will be a common need, and deserves an O(1) lookup method. > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Sun Sep 15 02:32:48 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 14 Sep 2013 17:32:48 -0700 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: Message-ID: <8D234C67-1B8E-4653-9AA9-86DE1E5A1EC5@gmail.com> On Sep 14, 2013, at 8:09 AM, Serhiy Storchaka wrote: > I propose to add two methods: > > dict.getkey(key) returns original key stored in the dict which is equal to specified key. E.g. For what is worth: * this idea was proposed and rejected at least once before * the one obvious way to do it is an interning dictionary that maps values back to themselves * the use cases for this are somewhat uncommon (i.e. most people don't need it most of the time) * if you think you really need this functionality, it is possible to write a function that works with all containers as they are already implemented (i.e. you could you it today): http://code.activestate.com/recipes/499299-get_equivalentcontainer-item * all the participants on this list would be well served to teach some python classes to get an appreciation of the negative consequences of further expanding the APIs of the core containers. Bigger is not better. Learnability matters. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Sun Sep 15 03:37:05 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Sat, 14 Sep 2013 18:37:05 -0700 Subject: [Python-ideas] Should we improve `dir`? In-Reply-To: References: Message-ID: > I recently stumbled over `dir()` not working correctly in the case of > classes: Not working correctly? That would imply an adequate definition of "correctness". Dir should divulge all method and attribute names of a class -- a "directory", as it were. In my opinion, it should not report __bases__, __name__, __doc__, or __class__ -- all of which are meta-things not meant for the user of a class. If a programmer wants to see more, then the inspect module would presumably be appropriate, or simply calling for help(). --mark > > http://jedidjah.ch/code/2013/9/8/wrong_dir_function/ > > In short: > > `dir` doesn't list the `type` methods, which it should in my opinion, > because there are very important attributes in there like `__name__` or > `__bases__`. > > This led to some confusion in the past, e.g. > http://www.gossamer-threads.com/lists/python/python/507363. > > The long version is in the above link. > > After discussions, I realized that I should probably bring this up in > python-ideas, I think the current implementation can be very confusing for > people trying to introspect classes with `dir`, which is IMHO its typical > use case. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -- MarkJ Tacoma, Washington From ncoghlan at gmail.com Sun Sep 15 03:35:37 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Sep 2013 11:35:37 +1000 Subject: [Python-ideas] Should we improve `dir`? In-Reply-To: References: Message-ID: On 15 Sep 2013 09:13, "Terry Reedy" wrote: > > On 9/14/2013 4:54 PM, Raymond Hettinger wrote: > >> On Sep 14, 2013, at 12:15 PM, David Halter >> > > wrote: >> >>> After discussions, I realized that I should probably bring this up in >>> python-ideas, I think the current implementation can be very confusing >>> for people trying to introspect classes with `dir`, which is IMHO its >>> typical use case. >> >> >> The current behavior of dir() is a bit irritating when I am teaching how >> Python works. >> >> That said, the irritation is minor and easily overcome. >> >> I would not want to change the behavior and risk breaking >> existing introspection code (that code is tends to be more >> fragile and implementation than most other code). > > > This was the basis for rejecting http://bugs.python.org/issue19002 > ``dir`` function does not work correctly with classes. > The proposal obviously broke pydoc and inspect modules. > > >> In other words, I just don't think it is worth changing something >> that has been in-place for a very long long time. The minor >> benefit doesn't want the downsides that goes with API churn. > > > The other point is that people *usually* call dir(cls) in order to find the methods they can call in instances of cls: Right, this is one of the behavioural differences between classes and instances, and, as far as I am aware, it isn't an accident. Not only that, but (as Raymond pointed out) even if it was originally an accident it's too late to change it now. If introspection tools want to show all the operations available *on the class*, then they need to include "dir(type(cls))" as well. So there may be a legitimate feature request for a new section in the pydoc output showing "class only" methods and attributes. Cheers, Nick. > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Sun Sep 15 03:47:41 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Sat, 14 Sep 2013 18:47:41 -0700 Subject: [Python-ideas] Should we improve `dir`? In-Reply-To: References: Message-ID: > Right, this is one of the behavioural differences between classes and > instances, and, as far as I am aware, it isn't an accident. In fact, I'd argue that is a critical distinction. A language definition has to help enforce this distinction, otherwise confusion abounds. > If introspection tools want to show all the operations available *on the > class*, then they need to include "dir(type(cls))" as well. If users want to find the operations available "on the class", they should learn Python. --mark From rosuav at gmail.com Sun Sep 15 04:59:49 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 15 Sep 2013 12:59:49 +1000 Subject: [Python-ideas] Should we improve `dir`? In-Reply-To: References: Message-ID: On Sun, Sep 15, 2013 at 11:47 AM, Mark Janssen wrote: >> Right, this is one of the behavioural differences between classes and >> instances, and, as far as I am aware, it isn't an accident. > > In fact, I'd argue that is a critical distinction. A language > definition has to help enforce this distinction, otherwise confusion > abounds. > >> If introspection tools want to show all the operations available *on the >> class*, then they need to include "dir(type(cls))" as well. > > If users want to find the operations available "on the class", they > should learn Python. The other day I was looking for __bases__ but couldn't remember what it was called. I did the obvious thing and used IDLE's tab completion... and it wasn't there. There definitely is value in having those sorts of things be in dir(). ChrisA From anthonyfk at gmail.com Sun Sep 15 05:12:37 2013 From: anthonyfk at gmail.com (Kyle Fisher) Date: Sat, 14 Sep 2013 21:12:37 -0600 Subject: [Python-ideas] Keep free list of popular iterator objects Message-ID: We tend to do a lot of iterating over dictionaries in our product in some performance critical areas. It occurred to me that allocating a new iterator object every single time seems a little wasteful, especially considering that there's probably only a handful of them alive at any time. Doing a quick test with dictiterobject and 3 free lists (one for Keys, Values and Items) showed about a 4% speedup in this (best) case: python -m timeit -s "a = {'k%d' % i: i for i in xrange($2)}" "[_ for _ in a.iteritems()]" However, this seems like almost too simple of an idea. Has this been tried before? Would it be too little gain? Given that the extra memory used is negligible (3 of each iterator?), how much of a performance gain would be needed to justify it? Thanks, -Kyle -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Sun Sep 15 05:28:50 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 14 Sep 2013 20:28:50 -0700 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: References: Message-ID: On Sep 14, 2013, at 8:12 PM, Kyle Fisher wrote: > We tend to do a lot of iterating over dictionaries in our product in some performance critical areas. It occurred to me that allocating a new iterator object every single time seems a little wasteful, especially considering that there's probably only a handful of them alive at any time. Doing a quick test with dictiterobject and 3 free lists (one for Keys, Values and Items) showed about a 4% speedup in this (best) case: It is surprising that you saw any performance gain at all. Python already has a default Python freelist scheme in the _PyObject_Malloc() function in Objects/obmalloc.c. Another thought is that this isn't an inner-loop optimization. The O(1) time for iterator creation is dominated by the O(n) time to actually iterate over the dict keys, values, and items. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthonyfk at gmail.com Sun Sep 15 07:04:53 2013 From: anthonyfk at gmail.com (Kyle Fisher) Date: Sat, 14 Sep 2013 23:04:53 -0600 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: References: Message-ID: On Sat, Sep 14, 2013 at 9:28 PM, Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > > It is surprising that you saw any performance gain at all. > > Python already has a default Python freelist scheme > in the _PyObject_Malloc() function in Objects/obmalloc.c. > > Another thought is that this isn't an inner-loop optimization. > The O(1) time for iterator creation is dominated by the O(n) > time to actually iterate over the dict keys, values, and items. > > Raymond > Hi Raymond, Taking a look at _PyObject_Malloc in Objects/obmalloc.c, I see that it needs to do some lock and unlock operations. Perhaps it's the avoidance of this overhead that I'm seeing? After all, there must be a reason that dict, tuple and others are keeping their own free lists, right? I'm curious what the overhead in creating the iterator is compared to the time to iterate. Obviously there's an O(1) / O(n) difference, but perhaps the constant time outweighs smaller values of n? In our case, we are often doing something like the following (2.7): def onNewData(datapoints): for dp in datapoints: for val in dp.outputs.itervalues(): # Do things with val for status in dp.statuses.itervalues(): # Do things with status Where datapoints can have 100000 items and "outputs" and "statuses" tend to be small. So, while creating the iterator obviously isn't the slowest part of the code, it does have some impact. Cheers, -Kyle P.S. - I'm a newbie to the mailing list, so if I'm replying "wrong" sorry about that! -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthonyfk at gmail.com Sun Sep 15 08:26:00 2013 From: anthonyfk at gmail.com (Kyle Fisher) Date: Sun, 15 Sep 2013 00:26:00 -0600 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: References: Message-ID: I've realized that my original example is far too complex, so I've simplified it: Status quo: ./python -m timeit -r 100 -s "a=[1]" "iter(a)" 10000000 loops, best of 100: 0.0662 usec per loop With patch: ./python -m timeit -r 100 -s "a=[1]" "iter(a)" 10000000 loops, best of 100: 0.0557 usec per loop List iter allocations: 6 List iter reuse through freelist: 1011111554 100.00% reuse rate Which seems to show a 15% speedup. I'd be curious what others get. Cheers, -Kyle -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: list_iterator_freelist.patch Type: application/octet-stream Size: 2540 bytes Desc: not available URL: From mal at egenix.com Sun Sep 15 12:56:15 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 15 Sep 2013 12:56:15 +0200 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: References: Message-ID: <5235924F.50907@egenix.com> On 15.09.2013 08:26, Kyle Fisher wrote: > I've realized that my original example is far too complex, so I've > simplified it: > > Status quo: > ./python -m timeit -r 100 -s "a=[1]" "iter(a)" > 10000000 loops, best of 100: 0.0662 usec per loop > > With patch: > ./python -m timeit -r 100 -s "a=[1]" "iter(a)" > 10000000 loops, best of 100: 0.0557 usec per loop > List iter allocations: 6 > List iter reuse through freelist: 1011111554 > 100.00% reuse rate > > Which seems to show a 15% speedup. I'd be curious what others get. I'd suggest to open a ticket for this and then continue the discussion there. Given how often iterators are used nowadays in Python, a separate free list may actually make sense (for the same reasons it makes sense to have around for lists, tuples, etc.). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 15 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-04: Released eGenix pyOpenSSL 0.13.2 ... http://egenix.com/go48 2013-09-20: PyCon UK 2013, Coventry, UK ... 5 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Sun Sep 15 13:27:58 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 15 Sep 2013 13:27:58 +0200 Subject: [Python-ideas] Keep free list of popular iterator objects References: Message-ID: <20130915132758.4cd3e697@fsol> On Sat, 14 Sep 2013 23:04:53 -0600 Kyle Fisher wrote: > On Sat, Sep 14, 2013 at 9:28 PM, Raymond Hettinger < > raymond.hettinger at gmail.com> wrote: > > > > > It is surprising that you saw any performance gain at all. > > > > Python already has a default Python freelist scheme > > in the _PyObject_Malloc() function in Objects/obmalloc.c. > > > > Another thought is that this isn't an inner-loop optimization. > > The O(1) time for iterator creation is dominated by the O(n) > > time to actually iterate over the dict keys, values, and items. > > > > Raymond > > > > > Hi Raymond, > > Taking a look at _PyObject_Malloc in Objects/obmalloc.c, I see that it > needs to do some lock and unlock operations. Please read carefully. The lock and unlock "operations" are no-ops. Regards Antoine. From solipsis at pitrou.net Sun Sep 15 13:30:23 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 15 Sep 2013 13:30:23 +0200 Subject: [Python-ideas] Keep free list of popular iterator objects References: <5235924F.50907@egenix.com> Message-ID: <20130915133023.56566d75@fsol> On Sun, 15 Sep 2013 12:56:15 +0200 "M.-A. Lemburg" wrote: > On 15.09.2013 08:26, Kyle Fisher wrote: > > I've realized that my original example is far too complex, so I've > > simplified it: > > > > Status quo: > > ./python -m timeit -r 100 -s "a=[1]" "iter(a)" > > 10000000 loops, best of 100: 0.0662 usec per loop > > > > With patch: > > ./python -m timeit -r 100 -s "a=[1]" "iter(a)" > > 10000000 loops, best of 100: 0.0557 usec per loop > > List iter allocations: 6 > > List iter reuse through freelist: 1011111554 > > 100.00% reuse rate > > > > Which seems to show a 15% speedup. I'd be curious what others get. > > I'd suggest to open a ticket for this and then continue > the discussion there. > > Given how often iterators are used nowadays in Python, a separate > free list may actually make sense (for the same reasons it makes > sense to have around for lists, tuples, etc.). I'm -1 on adding freelists everywhere. A best-case 15% improvement on a trivial microbenchmark probably means a 0% improvement on real-world workloads. Furthermore, using specialized freelists will increase memory fragmentation and prevent the main allocator from returning memory to the system. Regards Antoine. From mal at egenix.com Sun Sep 15 13:52:39 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 15 Sep 2013 13:52:39 +0200 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: <20130915133023.56566d75@fsol> References: <5235924F.50907@egenix.com> <20130915133023.56566d75@fsol> Message-ID: <52359F87.2030409@egenix.com> On 15.09.2013 13:30, Antoine Pitrou wrote: > On Sun, 15 Sep 2013 12:56:15 +0200 > "M.-A. Lemburg" wrote: >> On 15.09.2013 08:26, Kyle Fisher wrote: >>> I've realized that my original example is far too complex, so I've >>> simplified it: >>> >>> Status quo: >>> ./python -m timeit -r 100 -s "a=[1]" "iter(a)" >>> 10000000 loops, best of 100: 0.0662 usec per loop >>> >>> With patch: >>> ./python -m timeit -r 100 -s "a=[1]" "iter(a)" >>> 10000000 loops, best of 100: 0.0557 usec per loop >>> List iter allocations: 6 >>> List iter reuse through freelist: 1011111554 >>> 100.00% reuse rate >>> >>> Which seems to show a 15% speedup. I'd be curious what others get. >> >> I'd suggest to open a ticket for this and then continue >> the discussion there. >> >> Given how often iterators are used nowadays in Python, a separate >> free list may actually make sense (for the same reasons it makes >> sense to have them around for lists, tuples, etc.). > > I'm -1 on adding freelists everywhere. Not everywhere :-) Just for objects that are often created and freed again. > A best-case 15% improvement on a > trivial microbenchmark probably means a 0% improvement on real-world > workloads. Furthermore, using specialized freelists will increase > memory fragmentation and prevent the main allocator from returning > memory to the system. Keeping e.g. a hundred such objects in a free list shouldn't really affect the memory load of the Python interpreter. A 15% improvement isn't a lot, but such small improvements add up if they are consistent and the net result is an overall performance improvement. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 15 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-04: Released eGenix pyOpenSSL 0.13.2 ... http://egenix.com/go48 2013-09-20: PyCon UK 2013, Coventry, UK ... 5 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Sun Sep 15 14:09:53 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 15 Sep 2013 14:09:53 +0200 Subject: [Python-ideas] Keep free list of popular iterator objects References: <5235924F.50907@egenix.com> <20130915133023.56566d75@fsol> <52359F87.2030409@egenix.com> Message-ID: <20130915140953.2b2a751a@fsol> On Sun, 15 Sep 2013 13:52:39 +0200 "M.-A. Lemburg" wrote: > > A best-case 15% improvement on a > > trivial microbenchmark probably means a 0% improvement on real-world > > workloads. Furthermore, using specialized freelists will increase > > memory fragmentation and prevent the main allocator from returning > > memory to the system. > > Keeping e.g. a hundred such objects in a free list shouldn't > really affect the memory load of the Python interpreter. Well, it can. The object allocator uses 256KB arenas, so if each of the hundred objects in the free list keeps a different arena alive, we are talking about a 25 MB fragmentation overhead. Yes, that's a worse case (and irrealistic for common workloads) overhead, but the 15% improvement is a best case (and very irrealistic for common workloads) performance gain :-) > A 15% improvement isn't a lot, but such small improvements > add up if they are consistent and the net result is an overall > performance improvement. I've grown skeptical that such small improvements actually "add up" to something significant. Performance differences between CPython versions can generally be attributed to one or two important changes (hopefully improvements :-)) such as e.g. PEP 393, the method lookup cache, or new-style classes. Anyway, if there's a non-trivial benchmark that can measure the real-world potential of this optimization, it would help the discussion :-) Regards Antoine. From mal at egenix.com Sun Sep 15 15:50:12 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 15 Sep 2013 15:50:12 +0200 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: <20130915140953.2b2a751a@fsol> References: <5235924F.50907@egenix.com> <20130915133023.56566d75@fsol> <52359F87.2030409@egenix.com> <20130915140953.2b2a751a@fsol> Message-ID: <5235BB14.8080208@egenix.com> On 15.09.2013 14:09, Antoine Pitrou wrote: > On Sun, 15 Sep 2013 13:52:39 +0200 > "M.-A. Lemburg" wrote: >>> A best-case 15% improvement on a >>> trivial microbenchmark probably means a 0% improvement on real-world >>> workloads. Furthermore, using specialized freelists will increase >>> memory fragmentation and prevent the main allocator from returning >>> memory to the system. >> >> Keeping e.g. a hundred such objects in a free list shouldn't >> really affect the memory load of the Python interpreter. > > Well, it can. The object allocator uses 256KB arenas, so if each of > the hundred objects in the free list keeps a different arena alive, we > are talking about a 25 MB fragmentation overhead. > > Yes, that's a worse case (and irrealistic for common workloads) > overhead, but the 15% improvement is a best case (and very irrealistic > for common workloads) performance gain :-) The trick here is to preallocate the pool of those 100 iterator objects, so you only use one such arena - hopefully the ones that's also used for the other free lists :-) >> A 15% improvement isn't a lot, but such small improvements >> add up if they are consistent and the net result is an overall >> performance improvement. > > I've grown skeptical that such small improvements actually "add up" to > something significant. Performance differences between CPython versions > can generally be attributed to one or two important changes (hopefully > improvements :-)) such as e.g. PEP 393, the method lookup cache, or > new-style classes. For Python 1.5 I had done a whole serious of such smaller improvements. The net effect was a speedup of between 20-30%, so I wouldn't be too skeptical :-) What's important about such small enhancements is that they provide consistent speedups and have a sane ratio between complexity/maintenance overhead and performance improvement. > Anyway, if there's a non-trivial benchmark that can measure the > real-world potential of this optimization, it would help the > discussion :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 15 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-04: Released eGenix pyOpenSSL 0.13.2 ... http://egenix.com/go48 2013-09-20: PyCon UK 2013, Coventry, UK ... 5 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ethan at stoneleaf.us Sun Sep 15 16:43:40 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 15 Sep 2013 07:43:40 -0700 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: <8D234C67-1B8E-4653-9AA9-86DE1E5A1EC5@gmail.com> References: <8D234C67-1B8E-4653-9AA9-86DE1E5A1EC5@gmail.com> Message-ID: <5235C79C.6030907@stoneleaf.us> On 09/14/2013 05:32 PM, Raymond Hettinger wrote: > > Bigger is not better. Learnability matters. +1 Which may sound strange coming from someone who was/is a proponent of adding Enum, TransformDict, and stats. A key distinction is that Python the language is not the same as the stdlib that ships with Python. Python the language should grow very slowly. The stdlib should grow in order to provide a consistent, stable, and sane user experience. This includes criteria such as: - is it widely re-implemented? (indicating a common need, and probably slightly different multiple APIs) - is it easy to get wrong? (indicating a complex subject) - is it a quickly changing field? (indicating an often changing code base) The first two are reasons why inclusion could be a good thing, the last a reason why inclusion would be a bad thing. -- ~Ethan~ From raymond.hettinger at gmail.com Sun Sep 15 18:38:39 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 15 Sep 2013 09:38:39 -0700 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: References: Message-ID: <0D2417A7-18FB-4520-B22B-7B2385D93603@gmail.com> On Sep 14, 2013, at 11:26 PM, Kyle Fisher wrote: > I've realized that my original example is far too complex, so I've simplified it: > > Status quo: > ./python -m timeit -r 100 -s "a=[1]" "iter(a)" > 10000000 loops, best of 100: 0.0662 usec per loop > > With patch: > ./python -m timeit -r 100 -s "a=[1]" "iter(a)" > 10000000 loops, best of 100: 0.0557 usec per loop > List iter allocations: 6 > List iter reuse through freelist: 1011111554 > 100.00% reuse rate > > Which seems to show a 15% speedup. I'd be curious what others get. This 15% claim is incredibly deceptive. You're looping over a list of length one and the "benefits" fall away immediately for anything longer. It seems like it is intentionally ignoring that you're optimizing an O(1) setup step in an O(n) operation. And the timing loop does not exercise the cases where the freelist misses. More important effects are being masked by the tight timing loop that only exercises the most favorable case. In real programs, your patch may actually make performance worse. The default freelisting scheme is heavily used and tends to always be in cache. In contrast, a freelist for less frequently used objects tend to not be in cache when you need them. Similar logic applies to branch prediction here as well. (In short, I believe that the patch serves only to optimize an unrealistic benchmark and would make actual programs worse-off). I'm -1 on adding freelisting to iterators. I recently removed the freelist scheme from Objects/setobject.c because it provided no incremental benefit over the default freelisting scheme. Please focus your optimization efforts elsewhere in the code. There are real improvements to be had for operations that matter. The time to create an iterator is one of the least important operations in Python. If this had been a real win, I would have incorporated it into itertools long ago. Raymond P.S. If you want to help benchmark the effects of aligned versus unaligned memory allocations, that is an area this likely to bear fruit (for example, if integer objects with 32 byte aligned, it would guarantee that the object head and body would be in the same cache line). -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthonyfk at gmail.com Sun Sep 15 20:52:57 2013 From: anthonyfk at gmail.com (Kyle Fisher) Date: Sun, 15 Sep 2013 12:52:57 -0600 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: References: <0D2417A7-18FB-4520-B22B-7B2385D93603@gmail.com> Message-ID: "In the case not, we'll just check an iterator and fall back to the default freelist." Should of course be: "In the case not, we'll just check an integer and fall back to the default freelist." On Sun, Sep 15, 2013 at 12:50 PM, Kyle Fisher wrote: > Hi Raymond, > > Thanks for taking the time to respond, I appreciate that! Please note > that I'm not attempting to be deceptive. I'm not looping over a list of > length one; I'm just timing the creation of the iterator object. In doing > this, I've shown that, in a micro-benchmark, creating an iterator object > can be made 15% faster with a freelist. > > Yes, in one loop this is focusing on an O(1) setup of an O(n) operation. > I think the main benefit of this would be for inner loops. (Which is what > my benchmark tested, no?) In our case, we tend to have a large list to > iterate over (n) where each item has a couple containers that we also need > to iterate over (size m). In this case, I'm focusing on the O(n) setup of > the O(n*m) operation where n is large. Surely this isn't completely > wasteful? > > I'm not completely sure what the best way to exercise the case where the > freelist misses. Create one more than the number of freelisted iterators, > perhaps? I'm not sure if that would be reflective of real-world uses > though. Ignoring threads or other spontaneous iterator creation, either a > particular loop is going to have its iterator in the freelist or not. In > the case not, we'll just check an iterator and fall back to the default > freelist. > > In regards to your second paragraph, would a more real-world benchmark > help? I don't want to put too many more resources into a bad idea, but I > know in our app we tend to iterate over things a lot, so my hunch is that > the iterator freelist would be in cache more often than not. Forgive my > ignorance, but is there a macro-benchmark suite I could try this against? > Even if the iterator freelist isn't in first-level cache, I'm almost > certain it would exist within last-level cache. Do you know what the cost > of fetching from this is compared to grabbing a lock like the current > _Py_Malloc does? > > Again, thanks for the time. This is literally the first thing I've tried > hacking into Python because it seemed like a cheap, easy (albeit, minor) > improvement to a very common operation. > > Best, > -Kyle > > P.S. I would like to put some effort into aligned memory allocations! > I've been casually browsing the issue on the bug tracker over the last week > or so; I'm somewhat surprised this isn't already the case for the numerical > types! > > > On Sun, Sep 15, 2013 at 10:38 AM, Raymond Hettinger < > raymond.hettinger at gmail.com> wrote: > >> >> On Sep 14, 2013, at 11:26 PM, Kyle Fisher wrote: >> >> I've realized that my original example is far too complex, so I've >> simplified it: >> >> Status quo: >> ./python -m timeit -r 100 -s "a=[1]" "iter(a)" >> 10000000 loops, best of 100: 0.0662 usec per loop >> >> With patch: >> ./python -m timeit -r 100 -s "a=[1]" "iter(a)" >> 10000000 loops, best of 100: 0.0557 usec per loop >> List iter allocations: 6 >> List iter reuse through freelist: 1011111554 >> 100.00% reuse rate >> >> Which seems to show a 15% speedup. I'd be curious what others get. >> >> >> This 15% claim is incredibly deceptive. You're looping over a list of >> length one and the "benefits" fall away immediately for anything longer. >> It seems like it is intentionally ignoring that you're optimizing an O(1) >> setup step in an O(n) operation. And the timing loop does not exercise the >> cases where the freelist misses. >> >> More important effects are being masked by the tight timing loop that >> only exercises the most favorable case. In real programs, your patch may >> actually make performance worse. The default freelisting scheme is heavily >> used and tends to always be in cache. In contrast, a freelist for less >> frequently used objects tend to not be in cache when you need them. >> Similar logic applies to branch prediction here as well. (In short, I >> believe that the patch serves only to optimize an unrealistic benchmark and >> would make actual programs worse-off). >> >> I'm -1 on adding freelisting to iterators. I recently removed the >> freelist scheme from Objects/setobject.c because it provided no incremental >> benefit over the default freelisting scheme. >> >> Please focus your optimization efforts elsewhere in the code. There are >> real improvements to be had for operations that matter. The time to create >> an iterator is one of the least important operations in Python. If this >> had been a real win, I would have incorporated it into itertools long ago. >> >> >> Raymond >> >> >> P.S. If you want to help benchmark the effects of aligned versus >> unaligned memory allocations, that is an area this likely to bear fruit >> (for example, if integer objects with 32 byte aligned, it would guarantee >> that the object head and body would be in the same cache line). >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthonyfk at gmail.com Sun Sep 15 20:50:42 2013 From: anthonyfk at gmail.com (Kyle Fisher) Date: Sun, 15 Sep 2013 12:50:42 -0600 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: <0D2417A7-18FB-4520-B22B-7B2385D93603@gmail.com> References: <0D2417A7-18FB-4520-B22B-7B2385D93603@gmail.com> Message-ID: Hi Raymond, Thanks for taking the time to respond, I appreciate that! Please note that I'm not attempting to be deceptive. I'm not looping over a list of length one; I'm just timing the creation of the iterator object. In doing this, I've shown that, in a micro-benchmark, creating an iterator object can be made 15% faster with a freelist. Yes, in one loop this is focusing on an O(1) setup of an O(n) operation. I think the main benefit of this would be for inner loops. (Which is what my benchmark tested, no?) In our case, we tend to have a large list to iterate over (n) where each item has a couple containers that we also need to iterate over (size m). In this case, I'm focusing on the O(n) setup of the O(n*m) operation where n is large. Surely this isn't completely wasteful? I'm not completely sure what the best way to exercise the case where the freelist misses. Create one more than the number of freelisted iterators, perhaps? I'm not sure if that would be reflective of real-world uses though. Ignoring threads or other spontaneous iterator creation, either a particular loop is going to have its iterator in the freelist or not. In the case not, we'll just check an iterator and fall back to the default freelist. In regards to your second paragraph, would a more real-world benchmark help? I don't want to put too many more resources into a bad idea, but I know in our app we tend to iterate over things a lot, so my hunch is that the iterator freelist would be in cache more often than not. Forgive my ignorance, but is there a macro-benchmark suite I could try this against? Even if the iterator freelist isn't in first-level cache, I'm almost certain it would exist within last-level cache. Do you know what the cost of fetching from this is compared to grabbing a lock like the current _Py_Malloc does? Again, thanks for the time. This is literally the first thing I've tried hacking into Python because it seemed like a cheap, easy (albeit, minor) improvement to a very common operation. Best, -Kyle P.S. I would like to put some effort into aligned memory allocations! I've been casually browsing the issue on the bug tracker over the last week or so; I'm somewhat surprised this isn't already the case for the numerical types! On Sun, Sep 15, 2013 at 10:38 AM, Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > > On Sep 14, 2013, at 11:26 PM, Kyle Fisher wrote: > > I've realized that my original example is far too complex, so I've > simplified it: > > Status quo: > ./python -m timeit -r 100 -s "a=[1]" "iter(a)" > 10000000 loops, best of 100: 0.0662 usec per loop > > With patch: > ./python -m timeit -r 100 -s "a=[1]" "iter(a)" > 10000000 loops, best of 100: 0.0557 usec per loop > List iter allocations: 6 > List iter reuse through freelist: 1011111554 > 100.00% reuse rate > > Which seems to show a 15% speedup. I'd be curious what others get. > > > This 15% claim is incredibly deceptive. You're looping over a list of > length one and the "benefits" fall away immediately for anything longer. > It seems like it is intentionally ignoring that you're optimizing an O(1) > setup step in an O(n) operation. And the timing loop does not exercise the > cases where the freelist misses. > > More important effects are being masked by the tight timing loop that only > exercises the most favorable case. In real programs, your patch may > actually make performance worse. The default freelisting scheme is heavily > used and tends to always be in cache. In contrast, a freelist for less > frequently used objects tend to not be in cache when you need them. > Similar logic applies to branch prediction here as well. (In short, I > believe that the patch serves only to optimize an unrealistic benchmark and > would make actual programs worse-off). > > I'm -1 on adding freelisting to iterators. I recently removed the > freelist scheme from Objects/setobject.c because it provided no incremental > benefit over the default freelisting scheme. > > Please focus your optimization efforts elsewhere in the code. There are > real improvements to be had for operations that matter. The time to create > an iterator is one of the least important operations in Python. If this > had been a real win, I would have incorporated it into itertools long ago. > > > Raymond > > > P.S. If you want to help benchmark the effects of aligned versus > unaligned memory allocations, that is an area this likely to bear fruit > (for example, if integer objects with 32 byte aligned, it would guarantee > that the object head and body would be in the same cache line). > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthonyfk at gmail.com Sun Sep 15 20:57:29 2013 From: anthonyfk at gmail.com (Kyle Fisher) Date: Sun, 15 Sep 2013 12:57:29 -0600 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: <5235924F.50907@egenix.com> References: <5235924F.50907@egenix.com> Message-ID: Hello Marc-Andre, Thanks for the suggestion! I think I'd like to get a better handle on Raymond's concerns before opening a ticket, as he does bring up some good criticisms. Thanks, -Kyle -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sun Sep 15 21:18:19 2013 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 15 Sep 2013 14:18:19 -0500 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: References: <0D2417A7-18FB-4520-B22B-7B2385D93603@gmail.com> Message-ID: [ Kyle Fisher] > ... > Even if the iterator freelist isn't in first-level cache, I'm almost certain > it would exist within last-level cache. Do you know what the cost of > fetching from this is compared to grabbing a lock like the current > _Py_Malloc does? It's infinitely more expensive than grabbing a lock ;-) As Antoine noted earlier in this thread, while obmalloc.c is sprinkled with LOCK() and UNLOCK() macros, they all expand to "nothing" - obmalloc.c doesn't actually grab any locks (it relies on the GIL to serialize threads). For example, LOCK is defined thusly: #define LOCK() SIMPLELOCK_LOCK(_malloc_lock) and above that there's: #define SIMPLELOCK_LOCK(lock) /* acquire released lock */ Just FYI ;-) From anthonyfk at gmail.com Sun Sep 15 21:19:30 2013 From: anthonyfk at gmail.com (Kyle Fisher) Date: Sun, 15 Sep 2013 13:19:30 -0600 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: References: <0D2417A7-18FB-4520-B22B-7B2385D93603@gmail.com> Message-ID: Well, don't I feel the fool. Thanks. :-) -Kyle On 2013-09-15 1:18 PM, "Tim Peters" wrote: > [ Kyle Fisher] > > ... > > Even if the iterator freelist isn't in first-level cache, I'm almost > certain > > it would exist within last-level cache. Do you know what the cost of > > fetching from this is compared to grabbing a lock like the current > > _Py_Malloc does? > > It's infinitely more expensive than grabbing a lock ;-) As Antoine > noted earlier in this thread, while obmalloc.c is sprinkled with > LOCK() and UNLOCK() macros, they all expand to "nothing" - obmalloc.c > doesn't actually grab any locks (it relies on the GIL to serialize > threads). > > For example, LOCK is defined thusly: > > #define LOCK() SIMPLELOCK_LOCK(_malloc_lock) > > and above that there's: > > #define SIMPLELOCK_LOCK(lock) /* acquire released lock */ > > Just FYI ;-) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Sun Sep 15 21:53:55 2013 From: barry at python.org (Barry Warsaw) Date: Sun, 15 Sep 2013 15:53:55 -0400 Subject: [Python-ideas] Style for multi-line generator expressions References: Message-ID: <20130915155355.2a25a1f1@anarchist> On Sep 14, 2013, at 07:49 AM, Georg Brandl wrote: >On 09/14/2013 07:36 AM, Clay Sweetser wrote: >> PEP 8 currently lacks any suggestions for how multi-line generator expressions >> and list comprehensions should be formatted. In the absence of any official >> style suggestion (that I can find), I suggest the style used the most in the >> standard library. >> >> [ >> for in >> if ] >> >> Note, lines could still be combined where it makes sense, eg, the first two >> lines could be combined if they aren't too long. > >But that amounts to simply respecting the line length limit and using >logical breakpoints. I don't think that requires a special mention in >PEP 8. Agreed. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From oscar.j.benjamin at gmail.com Sun Sep 15 22:02:33 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Sun, 15 Sep 2013 21:02:33 +0100 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: <20130914230756.GQ16820@ando> References: <20130914230756.GQ16820@ando> Message-ID: On 15 September 2013 00:07, Steven D'Aprano wrote: > On Sat, Sep 14, 2013 at 09:52:30AM -0700, David Mertz wrote: > >> def getexact(m, v): >> for x in m: >> if x==v: return x >> else: >> raise KeyError(v) > > This has the flaw that it is O(N) rather than O(1). It's really quite > unfortunate to have dicts and sets able to access keys in (almost) > constant time, but not be able to communicate that key back to the > caller except by walking the entire dict/set. I don't know whether this is relying on undefined behaviour but the following is O(1) and seems to work: >>> def canonical_key(d, k): ... k, = {k} & d.keys() ... return k ... >>> canonical_key({1:'q', 2.0:'w'}, 1.0) 1 >>> canonical_key({1:'q', 2.0:'w'}, 2) 2.0 Oscar From techtonik at gmail.com Sun Sep 15 22:34:24 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 15 Sep 2013 23:34:24 +0300 Subject: [Python-ideas] AST Pretty Printer In-Reply-To: <0759F11C-2B77-475D-9B2D-C71BD5A95582@yahoo.com> References: <70c75000-23cb-482d-b12c-4610de34e0b1@email.android.com> <0759F11C-2B77-475D-9B2D-C71BD5A95582@yahoo.com> Message-ID: On Fri, Sep 13, 2013 at 3:01 AM, Andrew Barnert wrote: > On Sep 12, 2013, at 16:31, Ryan wrote: > >> I always encounter one problem when dealing with Python ASTs: When I print it, it looks like Lisp(aka Lots of Irritated Superfluous Parenthesis). > > Why are the parentheses irritated? Have you been taunting them? :) These look like smiley monsta to myeyes. Module([ImportFrom('distutils.core', [alias('setup', None)], 0), Expr(Call(Name('setup', Load()), [], [keyword('name', Str('astdump')), keyword('version', Str('1.0')), keyword('author', Str('anatoly techtonik')), keyword('author_email', Str('techtonik at gmail.com')), keyword('description', Str('Extract information from Python module without importing it.')), keyword('license', Str('Public Domain')), keyword('py_modules', List([Str('astdump')], Load()))], None, None))]) >> In short: it's a mess. >> >> My idea is an AST pretty printer built on ast.NodeVisitor. If anyone finds this interesting, I can probably have a prototype of the class between later today and sometime tomorrow. > > Yes please! > > I'll bet most people who play with ASTs want this, build something half-assed, never finish it, and lose it by the next time they look at ASTs again three years later... So if you finish something, that'll save effort for hundreds of people in the future (who have no idea they'll want it one day). My version of half-assed, semi-finished, only one year fresh and code complete for that it does. =) $ hg clone https://bitbucket.org/techtonik/astdump $ cd astdump $ ./astdump.py --generate astdump.py > setup.py $ ./astdump.py --dump setup.py Module ImportFrom alias Expr Call Name Load keyword Str keyword Str keyword Str keyword Str keyword Str keyword Str keyword List Str Load Source code is in public domain, latest version: https://bitbucket.org/techtonik/astdump/src/tip/astdump.py?at=default The API for dumping is: TreeDumper().dump(root) class TreeDumper(ast.NodeVisitor): def dump(self, node, types=[], level=None, callback=None): """pretty-print AST tree if `types` is set, process only types in the list if `level` is set, limit output to the given depth `callback` (if set) will be called to process filtered node """ To customize, just supply a callback. Example callbacks: def printcb(node, level): nodename = node.__class__.__name__ print(' '*level*2 + nodename) It played with it on Python 2, but it should be runnable on Python 3 with simple print replacements. -- anatoly t. From dreamingforward at gmail.com Sun Sep 15 23:29:10 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Sun, 15 Sep 2013 14:29:10 -0700 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: <8D234C67-1B8E-4653-9AA9-86DE1E5A1EC5@gmail.com> References: <8D234C67-1B8E-4653-9AA9-86DE1E5A1EC5@gmail.com> Message-ID: > * all the participants on this list would be well served to teach > some python classes to get an appreciation of the negative > consequences of further expanding the APIs of the core containers. > Bigger is not better. Learnability matters. +1 on that. --mark From elazarg at gmail.com Mon Sep 16 01:39:46 2013 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Mon, 16 Sep 2013 01:39:46 +0200 Subject: [Python-ideas] Compressing excepthook output Message-ID: I suggest adding an excepthook that prints out a compressed version of the stack trace. The new excepthook should be the default at least for interactive mode. The use case is this: you are using an interactive interpreter, or perhaps in eclipse's PyDev, experimenting with some code. The code happen to has an infinite recursion - maybe an erroneous boundary condition, maybe the recursion itself was an accident. You didn't catch the RuntimeError so you get a print of the traceback. This is by default a 2000 lines of highly repetitive call chain. Most likely, a single cycle repeating some 300 times. The main problem is that in most environments, by default, you have only a limited amount of lines kept in the window. So you can't just scroll up and see what was the error in the first place - where is the entry point into the cycle. You have to reproduce it, and catch RuntimeError. You can't just use printing for debugging, either, because you won't see them. And even if you can see it, you had lost much of your "history" for nothing. I have tried to implement an alternative for sys.excepthook (see below), which compresses the last simple cycle in the call graph. Turns out it's not trivial, since the traceback object is not well documented (and maybe it shouldn't be, as it is an implementation detail) so it's non trivial (if at all possible) to change the trace list in an existing traceback. I don't think it is reasonable to just send anyone interested in such a feature to implement it themselves - especially given that newcomers ate its main target - and even if we do, there is no simple way to make it a default. Such a compression will not always help, since the call graph may be arbitrarily complex, so there has to be some threshold below which there won't be any compression. this threshold should be chosen after considering the number of lines accessible by default in common environments (Linux/Windows terminals, eclipse's console, etc.). Needless to say, the output should be correct in all cases. I am not sure that my example implementation is. Another suggestion, related but distinct: `class RecursionLimitError(RuntimeError)` should be raised instead of a plain RuntimeError. One should be able to except this specific case, and "Exception messages are not part of the Python API". --- Example for the desired result (non interactive): Traceback (most recent call last): File "/workspace/compress.py", line 48, in bar() File "/workspace/compress.py", line 46, in bar p() File "/workspace/compress.py", line 43, in p def p(): p0() File "/workspace/compress.py", line 41, in p0 def p0(): p2() File "/workspace/compress.py", line 39, in p2 def p2(): p() RuntimeError: maximum recursion depth exceeded 332.67 occurrences of cycle of size 3 detected Code: import traceback import sys def print_exception(name, value, count, size, newtrace): # this is ugly and fragile sys.stderr.write('Traceback (most recent call last):\n') sys.stderr.writelines(traceback.format_list(newtrace)) sys.stderr.write('{}: {}\n'.format(name ,value)) sys.stderr.write('{} occurrences of cycle of size {} detected\n'.format(count, size)) def analyze_cycles(tb): calls = set() size = 0 for i, call in enumerate(reversed(tb)): if size == 0: calls.add(call) if call == tb[-1]: size = i elif call not in calls: length = i break return size, length def cycle_detect_excepthook(exctype, value, trace): if exctype is RuntimeError: tb = traceback.extract_tb(trace) # Feels like a hack here if len(tb) >= sys.getrecursionlimit()-1: size, length = analyze_cycles(tb) count = round(length/size, 2) if count >= 2: print_exception(exctype.__name__, value, count, size, tb[:-length+size]) return sys.__excepthook__(exctype, value, tb) sys.excepthook = cycle_detect_excepthook if __name__ == '__main__': def p2(): p() def p0(): p2() def p(): p() def bar(): p() bar() ### Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthonyfk at gmail.com Mon Sep 16 01:50:53 2013 From: anthonyfk at gmail.com (Kyle Fisher) Date: Sun, 15 Sep 2013 17:50:53 -0600 Subject: [Python-ideas] Keep free list of popular iterator objects Message-ID: Hi Antoine, Thanks for taking the time to respond. Sorry I didn't see your comments earlier, I have my mailing list settings to digest and for some reason they weren't showing up in my inbox. Anyway, I agree that a real-world test case would be best. Marc-Andre tossed out "100 objects" for the free list size, but I'd like to point out that it probably doesn't need to be anywhere near that large. How many iterators are active in the interpreter simultaneously? I think we could get away with only a dozen or so. Perhaps it's best for me at this point to try out the patch in our application and see what some real world results would be. It'd also be nice if there was some other macro-benchmark that I could run this against to verify that it doesn't make things worse, which seems to be Raymond's biggest concern. Is there something like this available? Maybe even just the unit test suite? Thanks, -Kyle -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Mon Sep 16 01:56:25 2013 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 15 Sep 2013 18:56:25 -0500 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: References: Message-ID: [Kyle Fisher] > I've realized that my original example is far too complex, so I've > simplified it: > > Status quo: > ./python -m timeit -r 100 -s "a=[1]" "iter(a)" > 10000000 loops, best of 100: 0.0662 usec per loop > > With patch: > ./python -m timeit -r 100 -s "a=[1]" "iter(a)" > 10000000 loops, best of 100: 0.0557 usec per loop > List iter allocations: 6 > List iter reuse through freelist: 1011111554 > 100.00% reuse rate > > Which seems to show a 15% speedup. Nope! More like 19%. Sometime early in my career, benchmarks universally changed from reporting speedups via: (old - new) / old * 100 to (old - new) / new * 100 and >>> (0.0662 - 0.0557) / 0.0557 * 100 18.850987432675037 Why did they make this change? So that slowdowns were never shown as worse than -100%, and especially so that there was no upper bound on reported speedups ;-) What I'm surprised by is that you didn't get a larger speedup. You don't just save cycles allocating when using a free list, you also save cycles when free'ing the object. When free'ing, obmalloc's Py_ADDRESS_IN_RANGE alone may consume as many cycles as the free list's combined allocation and deallocation work. While I tried to make the common paths in obmalloc.c as fast as possible, it's still a mostly "general purpose" allocator so has to worry about silly things a dedicated free list can ignore (like: are they asking for 0 bytes? asking for something small enough that I _can_ use one of my internal free lists? if so, which one? did I pass out the pointer they're asking me to free. or do I have to hand it off to someone else?). I'm mostly with MAL (Marc Andre) on this one: it's worth doing if and only if many "real world" programs would benefit. :Unfortunately, there's never been a clear way to decide that :-( From ncoghlan at gmail.com Mon Sep 16 02:00:58 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 16 Sep 2013 10:00:58 +1000 Subject: [Python-ideas] Compressing excepthook output In-Reply-To: References: Message-ID: Better display of recursion errors sounds reasonable to me, as does giving them a dedicated subclass. Step one would be coming up with test cases and a solid implementation and display format for cyclic call detection. Once that is available, then using it in the default excepthook for the CPython REPL is a separate question. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Sep 16 02:07:55 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 16 Sep 2013 10:07:55 +1000 Subject: [Python-ideas] Compressing excepthook output In-Reply-To: References: Message-ID: <20130916000754.GB7914@ando> On Mon, Sep 16, 2013 at 01:39:46AM +0200, ????? wrote: > I suggest adding an excepthook that prints out a compressed version of the > stack trace. The new excepthook should be the default at least for > interactive mode. [...] > I have tried to implement an alternative for sys.excepthook (see below), > which compresses the last simple cycle in the call graph. Turns out it's > not trivial, since the traceback object is not well documented (and maybe > it shouldn't be, as it is an implementation detail) so it's non trivial (if > at all possible) to change the trace list in an existing traceback. I don't > think it is reasonable to just send anyone interested in such a feature to > implement it themselves - especially given that newcomers ate its main > target - and even if we do, there is no simple way to make it a default. I like where this is going. Tracebacks for recursive function calls are extremely noisy, with the extra lines rarely giving any useful information. Have a look at the cgitb module in the standard library. I think you should start off by cleaning up your traceback handler to be less "ugly and fragile" (your words), if possible, and then consider publishing it on ActiveState's website as a Python recipe. That would be the first step in gathering user feedback and experience in the real world, and if it turns out to be useful in practice, at a later date we can look at adding it to the standard library. http://code.activestate.com/recipes/ -- Steven From ncoghlan at gmail.com Mon Sep 16 02:30:25 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 16 Sep 2013 10:30:25 +1000 Subject: [Python-ideas] Compressing excepthook output In-Reply-To: <20130916000754.GB7914@ando> References: <20130916000754.GB7914@ando> Message-ID: On 16 September 2013 10:07, Steven D'Aprano wrote: > On Mon, Sep 16, 2013 at 01:39:46AM +0200, ????? wrote: >> I suggest adding an excepthook that prints out a compressed version of the >> stack trace. The new excepthook should be the default at least for >> interactive mode. > [...] >> I have tried to implement an alternative for sys.excepthook (see below), >> which compresses the last simple cycle in the call graph. Turns out it's >> not trivial, since the traceback object is not well documented (and maybe >> it shouldn't be, as it is an implementation detail) so it's non trivial (if >> at all possible) to change the trace list in an existing traceback. I don't >> think it is reasonable to just send anyone interested in such a feature to >> implement it themselves - especially given that newcomers ate its main >> target - and even if we do, there is no simple way to make it a default. > > I like where this is going. Tracebacks for recursive function calls > are extremely noisy, with the extra lines rarely giving any useful > information. > > Have a look at the cgitb module in the standard library. > > I think you should start off by cleaning up your traceback handler to be > less "ugly and fragile" (your words), if possible, and then consider > publishing it on ActiveState's website as a Python recipe. That would be > the first step in gathering user feedback and experience in the real > world, and if it turns out to be useful in practice, at a later > date we can look at adding it to the standard library. > > http://code.activestate.com/recipes/ Another couple of potentially useful pointers: - the traceback.py source is a good place to get more details on how traceback objects work (http://hg.python.org/cpython/file/default/Lib/traceback.py) - you may want to try out the updated traceback extraction API proposed in http://bugs.python.org/issue17911 and see if that cleans up your code. If it helps, that would be good validation of the proposed new API, if it doesn't, it may provide hints for further improvement. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Mon Sep 16 02:30:39 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 16 Sep 2013 10:30:39 +1000 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: <20130914230756.GQ16820@ando> Message-ID: <20130916003039.GC7914@ando> On Sun, Sep 15, 2013 at 09:02:33PM +0100, Oscar Benjamin wrote: > I don't know whether this is relying on undefined behaviour but the > following is O(1) and seems to work: > > >>> def canonical_key(d, k): > ... k, = {k} & d.keys() > ... return k > ... I'm pretty sure that (1) it relies on implementation-specific behaviour, and (2) it's O(N), not O(1). The implementation-specific part is whether & takes the key from the left-hand or right-hand operand when the keys are equal. For example, in Python 3.3: py> {1} & {1.0} {1.0} py> {1} & {1.0, 2.0} {1} And surely it's O(N) -- to be precise, O(M+N) -- because & has to walk all the keys in both operands? I suppose technically & could special case "one of the operands has length 1" and optimize it, but that too would be an implementation detail. -- Steven From abarnert at yahoo.com Mon Sep 16 03:24:59 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 15 Sep 2013 18:24:59 -0700 (PDT) Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: <20130914230756.GQ16820@ando> Message-ID: <1379294699.25404.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: Oscar Benjamin Sent: Sunday, September 15, 2013 1:02 PM > I don't know whether this is relying on undefined behaviour but the > following is O(1) and seems to work: > >>>> def canonical_key(d, k): > ...? ? k, = {k} & d.keys() > ...? ? return k I'm pretty sure it's undefined behavior. It does seem to work with the CPython and PyPy 3.x versions I have around, with?every test I throw at it, and if you look through the source you can see why? but there isn't any good reason it should. set.intersection(other) and set & other don't appear to be documented beyond "Return a new set with elements common to the set and all others" (http://docs.python.org/3/library/stdtypes.html#set.intersection).?dict_keys doesn't define what its methods do, beyond saying that the type is?"set-like", and?implements collections.abc.Set. (http://docs.python.org/3/library/stdtypes.html#dictionary-view-objects).?And in fact, here you're relying on the fact that dict_keys doesn't actually do the same thing as set. {1} & {1.0, 2.0} gives you {1}, but {1} & {1.0: 0, 2.0: 0}.keys() gives you {1.0}. As a side note, that "k, = " bit is going to give you an ugly "ValueError: need more than 0 values to unpack" instead of a nice "KeyError: 3" if k isn't in d, so you might want to wrap it in a try to convert the exception. Meanwhile, there is something that seems like it _should_ be guaranteed to work? but it doesn't.?intersection_update says "Update the set, keeping only elements found in it and all others", which seems to say you'll keep the elements in the original set. So this ought to work: ? ? s = set(d.keys()) ? ? s &= {k} ? ? s, = s ? ? return s But it doesn't. You have to do it the other way around, which seems to be incorrect: ? ? s = {k} ? ? s &= d.keys() ? ? s, = s ? ? return s And in fact, that's the only reason your method works.?Ultimately, what {k} & d.keys() does is to call dict_keys.np_and({k}, d.keys()). If you look at the source (http://hg.python.org/cpython/file/7df61fa27f71/Objects/dictobject.c#l3320), this is basically the backward version that works (except that it makes a copy tmp = set(s), and calls intersection_update instead of using &=). From abarnert at yahoo.com Mon Sep 16 03:34:53 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 15 Sep 2013 18:34:53 -0700 (PDT) Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: <20130916003039.GC7914@ando> References: <20130914230756.GQ16820@ando> <20130916003039.GC7914@ando> Message-ID: <1379295293.70475.YahooMailNeo@web184704.mail.ne1.yahoo.com> From: Steven D'Aprano Sent: Sunday, September 15, 2013 5:30 PM > On Sun, Sep 15, 2013 at 09:02:33PM +0100, Oscar Benjamin wrote: > >> I don't know whether this is relying on undefined behaviour but the >> following is O(1) and seems to work: >> >> >>> def canonical_key(d, k): >> ...? ? k, = {k} & d.keys() >> ...? ? return k >> ... > > I'm pretty sure that (1) it relies on implementation-specific behaviour, > and (2) it's O(N), not O(1). > > The implementation-specific part is whether & takes the key from the > left-hand or right-hand operand when the keys are equal. For example, in > Python 3.3: > > py> {1} & {1.0} > {1.0} > py> {1} & {1.0, 2.0} > {1} Actually, this isn't strictly relevant, because he's calling dict_keys.__rand__, not set.__and__. There's no reason they have to do the same thing?and, in fact, they don't, which is why it works in the first place. But the larger point is valid: both methods are implementation-specific?and the fact that they produce opposite results is a nice illustration of that. > And surely it's O(N) -- to be precise, O(M+N) -- because & has to walk > all the keys in both operands? I suppose technically & could special > case "one of the operands has length 1" and optimize it, but that too > would be an implementation detail. No it doesn't. In fact, it _can't_ walk all the keys in both operands; unless they were sorted (and they aren't), there's no way to make that work. Instead, it?does an O(N) or O(M) walk over one operand, calling the other operand's __contains__ method for each. But again, the larger point is valid: whichever one it returns the values from is obviously the one it's walking. So, in his case, it's calling {k}.__contains__(n) for each key in d.keys(), which is O(N). From tjreedy at udel.edu Mon Sep 16 03:39:34 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 15 Sep 2013 21:39:34 -0400 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: <20130916003039.GC7914@ando> References: <20130914230756.GQ16820@ando> <20130916003039.GC7914@ando> Message-ID: On 9/15/2013 8:30 PM, Steven D'Aprano wrote: > On Sun, Sep 15, 2013 at 09:02:33PM +0100, Oscar Benjamin wrote: > >> I don't know whether this is relying on undefined behaviour but the >> following is O(1) and seems to work: >> >>>>> def canonical_key(d, k): >> ... k, = {k} & d.keys() >> ... return k >> ... > > I'm pretty sure that (1) it relies on implementation-specific behaviour, > and (2) it's O(N), not O(1). > > The implementation-specific part is whether & takes the key from the > left-hand or right-hand operand when the keys are equal. For example, in > Python 3.3: > > py> {1} & {1.0} > {1.0} > py> {1} & {1.0, 2.0} > {1} > > And surely it's O(N) -- to be precise, O(M+N) -- because & has to walk > all the keys in both operands? No, just the keys in one of the operands. That can be chosen to be the smaller of the two, making it O(min(M,N)). I believe that is what is happening above: the operands are switched when the first is smaller. I know there was a tracker issue that added this optimization. -- Terry Jan Reedy From abarnert at yahoo.com Mon Sep 16 09:25:57 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 16 Sep 2013 00:25:57 -0700 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: References: <20130914230756.GQ16820@ando> <20130916003039.GC7914@ando> Message-ID: <511EE65A-AE97-476E-8796-C0BB9805A6A9@yahoo.com> On Sep 15, 2013, at 18:39, Terry Reedy wrote: > On 9/15/2013 8:30 PM, Steven D'Aprano wrote: >> On Sun, Sep 15, 2013 at 09:02:33PM +0100, Oscar Benjamin wrote: >> >>> I don't know whether this is relying on undefined behaviour but the >>> following is O(1) and seems to work: >>> >>>>>> def canonical_key(d, k): >>> ... k, = {k} & d.keys() >>> ... return k >>> ... >> >> I'm pretty sure that (1) it relies on implementation-specific behaviour, >> and (2) it's O(N), not O(1). >> >> The implementation-specific part is whether & takes the key from the >> left-hand or right-hand operand when the keys are equal. For example, in >> Python 3.3: >> >> py> {1} & {1.0} >> {1.0} >> py> {1} & {1.0, 2.0} >> {1} >> >> And surely it's O(N) -- to be precise, O(M+N) -- because & has to walk >> all the keys in both operands? > > No, just the keys in one of the operands. That can be chosen to be the smaller of the two, making it O(min(M,N)). I believe that is what is happening above: the operands are switched when the first is smaller. I know there was a tracker issue that added this optimization. But it doesn't happen for dict_keys, because that ends up calling intersection_update on a copy of the left operand, which means it always walks the right operand (in this case the dict_keys itself), instead of calling set_intersection, as it would on two sets. At any rate, whenever this does what's desired, it does a linear walk of all keys in the dict; conversely, any variant that only walks the single key in {k} ends up returning {k} instead of what you were looking for. From solipsis at pitrou.net Mon Sep 16 10:17:31 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 16 Sep 2013 10:17:31 +0200 Subject: [Python-ideas] Keep free list of popular iterator objects References: Message-ID: <20130916101731.60bb20dd@pitrou.net> Hi, Le Sun, 15 Sep 2013 17:50:53 -0600, Kyle Fisher a ?crit : > Hi Antoine, > > Thanks for taking the time to respond. Sorry I didn't see your > comments earlier, I have my mailing list settings to digest and for > some reason they weren't showing up in my inbox. Anyway, I agree > that a real-world test case would be best. Marc-Andre tossed out > "100 objects" for the free list size, but I'd like to point out that > it probably doesn't need to be anywhere near that large. I agree. There can't be that many dict iterators in flight at a given time :-) > Perhaps it's best for me at this point to try out the patch in our > application and see what some real world results would be. It'd also > be nice if there was some other macro-benchmark that I could run this > against to verify that it doesn't make things worse, which seems to > be Raymond's biggest concern. Is there something like this > available? Maybe even just the unit test suite? We have a benchmark suite here: http://hg.python.org/benchmarks/ It spans the range between micro and macro, but not to the point of running wholesale applications. Regards Antoine. From oscar.j.benjamin at gmail.com Mon Sep 16 11:16:10 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 16 Sep 2013 10:16:10 +0100 Subject: [Python-ideas] Add dict.getkey() and set.get() In-Reply-To: <511EE65A-AE97-476E-8796-C0BB9805A6A9@yahoo.com> References: <20130914230756.GQ16820@ando> <20130916003039.GC7914@ando> <511EE65A-AE97-476E-8796-C0BB9805A6A9@yahoo.com> Message-ID: On 16 September 2013 08:25, Andrew Barnert wrote: > On Sep 15, 2013, at 18:39, Terry Reedy wrote: > >> On 9/15/2013 8:30 PM, Steven D'Aprano wrote: >>> On Sun, Sep 15, 2013 at 09:02:33PM +0100, Oscar Benjamin wrote: >>> >>>> I don't know whether this is relying on undefined behaviour but the >>>> following is O(1) and seems to work: >>>> >>>>>>> def canonical_key(d, k): >>>> ... k, = {k} & d.keys() >>>> ... return k >>>> ... >>> >>> I'm pretty sure that (1) it relies on implementation-specific behaviour, >>> and (2) it's O(N), not O(1). >>> >>> The implementation-specific part is whether & takes the key from the >>> left-hand or right-hand operand when the keys are equal. For example, in >>> Python 3.3: >>> >>> py> {1} & {1.0} >>> {1.0} >>> py> {1} & {1.0, 2.0} >>> {1} >>> >>> And surely it's O(N) -- to be precise, O(M+N) -- because & has to walk >>> all the keys in both operands? >> >> No, just the keys in one of the operands. That can be chosen to be the smaller of the two, making it O(min(M,N)). I believe that is what is happening above: the operands are switched when the first is smaller. I know there was a tracker issue that added this optimization. > > But it doesn't happen for dict_keys, because that ends up calling intersection_update on a copy of the left operand, which means it always walks the right operand (in this case the dict_keys itself), instead of calling set_intersection, as it would on two sets. > > At any rate, whenever this does what's desired, it does a linear walk of all keys in the dict; conversely, any variant that only walks the single key in {k} ends up returning {k} instead of what you were looking for. Ah, right you are. That's unfortunate since it means that set intersection semantics are determined by an optimisation and dict_key intersection lacks an obvious optimisation (to iterate over the smaller set). Note that the optimisation is not incompatible with having a defined "take keys from the left (or from the right)" semantic since the set_contains_entry function that performs the lookup can access the matching key from the same lookup used for the containment test: http://hg.python.org/cpython/file/95b3efe3d7b7/Objects/setobject.c#l689 Oscar From anthonyfk at gmail.com Mon Sep 16 14:37:52 2013 From: anthonyfk at gmail.com (Kyle Fisher) Date: Mon, 16 Sep 2013 06:37:52 -0600 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: <20130916101731.60bb20dd@pitrou.net> References: <20130916101731.60bb20dd@pitrou.net> Message-ID: Fantastic, thank you. -Kyle -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Mon Sep 16 20:49:11 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 16 Sep 2013 14:49:11 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions Message-ID: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> The practice of using OS functions for time handling has its worst effects on Windows, where many functions are unable to process times from before 1970-01-01 even though there is no reason for Python to have such a limitation. It also results in uneven support for strftime specifiers. Some of these functions also suffer from the Year 2038 problem on OSes with a 32-bit time_t type. I propose supplying pure-python implementations (in accordance with PEP 399) for the entire datetime module, and additionally the asctime, strftime, strptime, and gmtime functions in the time module, and calendar.timegm. Unfortunately, functions dealing with local time stamps in the system's idea of local time are still dependent on the platform's C library functions (localtime, mktime, ctime) Or, if this is not practical, supplying alternate implementations of the relevant C functions, and calling these instead wherever these are used. If it is practical to do so, these functions should use python integers as the type for timestamps; if not, they should use 64-bit integers in preference to the platform time_t. Is it reasonable to expose the possibility of an epoch other than 1970 (or of timestamps that handle leap seconds in a different manner than POSIX) at a python level? Even if such a platform ever comes to be supported, it could be done so with a layer that hides these differences. From alexander.belopolsky at gmail.com Mon Sep 16 21:02:13 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 16 Sep 2013 15:02:13 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> Message-ID: On Mon, Sep 16, 2013 at 2:49 PM, wrote: > I propose supplying pure-python implementations (in accordance with PEP > 399) for the entire datetime module > We already have that in python 3.x: http://bugs.python.org/issue7989 I believe it still has some platform dependencies through the time module. The idea to provide pure python implementation of the time module was proposed and rejected: http://bugs.python.org/issue9528 If you would like to improve cross-platform compatibility in this area, I would start with re-implementation of strftime(). See http://bugs.python.org/issue3173 -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Mon Sep 16 21:01:18 2013 From: phd at phdru.name (Oleg Broytman) Date: Mon, 16 Sep 2013 23:01:18 +0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> Message-ID: <20130916190118.GA12268@iskra.aviel.ru> On Mon, Sep 16, 2013 at 02:49:11PM -0400, random832 at fastmail.us wrote: > I propose supplying pure-python implementations (in accordance with PEP > 399) for the entire datetime module [...] > Or, if this is not practical, supplying alternate implementations of the > relevant C functions There is a well-known module mx.DateTime. It is not a drop-in replacement for module datetime, but it's quite good for its task and has excellent documentation. eGenix provides binaries for all major OSes and Python versions under a liberal open source license. Take a look at: http://www.egenix.com/products/python/mxBase/mxDateTime/ Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From rob.cliffe at btinternet.com Tue Sep 17 00:48:24 2013 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Mon, 16 Sep 2013 23:48:24 +0100 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> Message-ID: <52378AB8.8070608@btinternet.com> From the sublime to the, er ... plebeian? Just an idea for Python 4: Is there any good reason to have separate time and datetime modules? I sometimes find myself spinning my wheels converting between a format supported by one and a format supported by the other. Rob Cliffe On 16/09/2013 19:49, random832 at fastmail.us wrote: > The practice of using OS functions for time handling has its worst > effects on Windows, where many functions are unable to process times > from before 1970-01-01 even though there is no reason for Python to have > such a limitation. It also results in uneven support for strftime > specifiers. Some of these functions also suffer from the Year 2038 > problem on OSes with a 32-bit time_t type. > > I propose supplying pure-python implementations (in accordance with PEP > 399) for the entire datetime module, and additionally the asctime, > strftime, strptime, and gmtime functions in the time module, and > calendar.timegm. Unfortunately, functions dealing with local time stamps > in the system's idea of local time are still dependent on the platform's > C library functions (localtime, mktime, ctime) > > Or, if this is not practical, supplying alternate implementations of the > relevant C functions, and calling these instead wherever these are used. > If it is practical to do so, these functions should use python integers > as the type for timestamps; if not, they should use 64-bit integers in > preference to the platform time_t. > > Is it reasonable to expose the possibility of an epoch other than 1970 > (or of timestamps that handle leap seconds in a different manner than > POSIX) at a python level? Even if such a platform ever comes to be > supported, it could be done so with a layer that hides these > differences. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > > ----- > No virus found in this message. > Checked by AVG - www.avg.com > Version: 2012.0.2242 / Virus Database: 3222/6171 - Release Date: 09/16/13 > > From ben+python at benfinney.id.au Tue Sep 17 01:14:11 2013 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 17 Sep 2013 09:14:11 +1000 Subject: [Python-ideas] =?utf-8?q?Continued_support_for_=E2=80=98time?= =?utf-8?b?4oCZIGFuZCDigJhkYXRldGltZeKAmSBtb2R1bGVzICh3YXM6IFJlZHVjZSBw?= =?utf-8?q?latform_dependence_of_date_and_time_related_functions=29?= References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <52378AB8.8070608@btinternet.com> Message-ID: <7wa9jcz7ek.fsf_-_@benfinney.id.au> Rob Cliffe writes: > From the sublime to the, er ... plebeian? When changing the subject of discussion, please change the Subject field accordingly. > Just an idea for Python 4: Is there any good reason to have separate > time and datetime modules? That's how it's been for a long time. There is now a lot of existing Python code that uses those two modules as they are. This would not be a good reason for *introducing* such a pair of modules with confusingly-different APIs. But that's not the decision we face today, many years after those modules entered the standard library. Changes to the standard library API, especially for modules that are in long-established use, must be considered conservatively. And that *is* a good reason to continue having ?time? and ?datetime? modules which both support the existing behaviour. > I sometimes find myself spinning my wheels converting between a format > supported by one and a format supported by the other. That's a different matter, and does not challenge the continued existence of separate ?time? and ?datetime? modules. The ?datetime? module has grown functionality for working with the data types of the ?time? module. What conversions are you lacking from the current ?datetime? ? -- \ ?Pinky, are you pondering what I'm pondering?? ?I think so, | `\ Brain, but if the plural of mouse is mice, wouldn't the plural | _o__) of spouse be spice?? ?_Pinky and The Brain_ | Ben Finney From steve at pearwood.info Tue Sep 17 01:24:44 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 17 Sep 2013 09:24:44 +1000 Subject: [Python-ideas] =?utf-8?q?Continued_support_for_=E2=80=98time?= =?utf-8?b?4oCZIGFuZCDigJhkYXRldGltZeKAmSBtb2R1bGVzICh3YXM6IFJlZHVj?= =?utf-8?q?e_platform_dependence_of_date_and_time_related_functions?= =?utf-8?q?=29?= In-Reply-To: <7wa9jcz7ek.fsf_-_@benfinney.id.au> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <52378AB8.8070608@btinternet.com> <7wa9jcz7ek.fsf_-_@benfinney.id.au> Message-ID: <20130916232443.GF19939@ando> On Tue, Sep 17, 2013 at 09:14:11AM +1000, Ben Finney wrote: > Rob Cliffe writes: > > Just an idea for Python 4: Is there any good reason to have separate > > time and datetime modules? > > That's how it's been for a long time. There is now a lot of existing > Python code that uses those two modules as they are. [...] > Changes to the standard library API, especially for modules that are in > long-established use, must be considered conservatively. And that *is* a > good reason to continue having ?time? and ?datetime? modules which both > support the existing behaviour. Agreed. But I suggest to Rob, or anyone else who likes the idea of merging the two modules and is willing to do the work, to start off by creating an interface module that wraps the two. Call it (for lack of a better name) "mytime". When the "mytime" module is sufficiently mature, which may require publishing it on PyPI for the public to use, it could potentially be added to the standard library as a high level interface to the lower-level time and datetime modules. That doesn't need to wait for Python 4000. I'm +0 on the general idea. I don't use either module enough to be annoyed by there being two of them. (Three if you include calendar.) -- Steven From random832 at fastmail.us Tue Sep 17 15:22:14 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 17 Sep 2013 09:22:14 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> Message-ID: <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> I have an addition to this proposal: struct_time should always provide tm_gmtoff and tm_zone, gmtime should populate them with 0 and GMT*, and if the platform does not provide values localtime should populate them with timezone or altzone and values from tzname depending on if isdst is true after calling the platform localtime function. *The current practice of the reference code of the "tz" project and of at least glibc is to use GMT. If anyone has an argument that it should be UTC or some other value on some platforms, please speak up. From victor.stinner at gmail.com Tue Sep 17 15:31:31 2013 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 17 Sep 2013 15:31:31 +0200 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> Message-ID: 2013/9/17 : > I have an addition to this proposal: struct_time should always provide > tm_gmtoff and tm_zone, gmtime should populate them with 0 and GMT*, and > if the platform does not provide values localtime should populate them > with timezone or altzone and values from tzname depending on if isdst is > true after calling the platform localtime function. In Python, "unknown" is usually written None. It's safer than filling the structure with invalid values. Victor From random832 at fastmail.us Tue Sep 17 18:01:43 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 17 Sep 2013 12:01:43 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> Message-ID: <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> On Mon, Sep 16, 2013, at 15:02, Alexander Belopolsky wrote: > On Mon, Sep 16, 2013 at 2:49 PM, wrote: > > I propose supplying pure-python implementations (in accordance with PEP > > 399) for the entire datetime module > > We already have that in python 3.x: > > http://bugs.python.org/issue7989 Sorry - it was unclear to me that simply clicking "browse" from http://hg.python.org/cpython/ did not result in browsing the latest source. (What branch is that? It's not "default") > The idea to provide pure python implementation of the time module was > proposed and rejected: > > http://bugs.python.org/issue9528 This is a much more limited scope than that. I was merely proposing a limited set of functions - this could be implemented in the same way as the posix module, with a small pure python module that imports everything from the larger C module. These could simply be implemented in C instead - are we guaranteed to have a 64-bit integer type available? My main concern (for pure python vs C) was whether or not it is possible to work with greater than 32 bit values on a 32 bit system. If necessary we could do some of the work in double - the input is double, anyway, so it won't be outside that range. Do you have any thoughts on the rest of the proposal (that gmtime, timegm, and strftime should have unlimited - or at least not limited to low platform-specific limits like 1970 or 2038 - range, that python "epoch timestamps" should be defined as beginning in 1970 and not including leap seconds regardless of hypothetical [I don't believe any currently supported systems actually do, except to the extent that individual Unix sites can use so-called "right" tz data] systems that may have a time_t that behaves otherwise, that tm_gmtoff and tm_zone should always be provided)? One concern for strftime in particular is locale support. It may be difficult to query the relevant locale data in a portable manner. From alexander.belopolsky at gmail.com Tue Sep 17 18:11:46 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 17 Sep 2013 12:11:46 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> Message-ID: On Tue, Sep 17, 2013 at 12:01 PM, wrote: > > We already have that in python 3.x: > > > > http://bugs.python.org/issue7989 > > Sorry - it was unclear to me that simply clicking "browse" from > http://hg.python.org/cpython/ did not result in browsing the latest > source. (What branch is that? It's not "default") http://hg.python.org/cpython/file/default/Lib/datetime.py -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Tue Sep 17 18:19:11 2013 From: brett at python.org (Brett Cannon) Date: Tue, 17 Sep 2013 12:19:11 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> Message-ID: On Tue, Sep 17, 2013 at 12:01 PM, wrote: > On Mon, Sep 16, 2013, at 15:02, Alexander Belopolsky wrote: > > On Mon, Sep 16, 2013 at 2:49 PM, wrote: > > > I propose supplying pure-python implementations (in accordance with PEP > > > 399) for the entire datetime module > > > > We already have that in python 3.x: > > > > http://bugs.python.org/issue7989 > > Sorry - it was unclear to me that simply clicking "browse" from > http://hg.python.org/cpython/ did not result in browsing the latest > source. (What branch is that? It's not "default") > Depends on the last commit (it's an hgweb thing; always specify the branch). > > > The idea to provide pure python implementation of the time module was > > proposed and rejected: > > > > http://bugs.python.org/issue9528 > > This is a much more limited scope than that. I was merely proposing a > limited set of functions - this could be implemented in the same way as > the posix module, with a small pure python module that imports > everything from the larger C module. These could simply be implemented > in C instead - are we guaranteed to have a 64-bit integer type > available? My main concern (for pure python vs C) was whether or not it > is possible to work with greater than 32 bit values on a 32 bit system. > If necessary we could do some of the work in double - the input is > double, anyway, so it won't be outside that range. > > Do you have any thoughts on the rest of the proposal (that gmtime, > timegm, and strftime should have unlimited - or at least not limited to > low platform-specific limits like 1970 or 2038 - range, that python > "epoch timestamps" should be defined as beginning in 1970 and not > including leap seconds regardless of hypothetical [I don't believe any > currently supported systems actually do, except to the extent that > individual Unix sites can use so-called "right" tz data] systems that > may have a time_t that behaves otherwise, that tm_gmtoff and tm_zone > should always be provided)? > > One concern for strftime in particular is locale support. It may be > difficult to query the relevant locale data in a portable manner. You also have the issue that if you port strftime then you lose the pure Python port of strptime: http://hg.python.org/cpython/file/default/Lib/_strptime.py -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 17 18:23:39 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 17 Sep 2013 12:23:39 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> Message-ID: On Tue, Sep 17, 2013 at 12:01 PM, wrote: > Do you have any thoughts on the rest of the proposal (that gmtime, > timegm, and strftime should have unlimited - or at least not limited to > low platform-specific limits like 1970 or 2038 - range, that python > "epoch timestamps" should be defined as beginning in 1970 and not > including leap seconds regardless of hypothetical [I don't believe any > currently supported systems actually do, except to the extent that > individual Unix sites can use so-called "right" tz data] systems that > may have a time_t that behaves otherwise, that tm_gmtoff and tm_zone > should always be provided)? > You should review what's new in 3.x documents. Many of the features that you ask for have already been implemented. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Tue Sep 17 18:27:40 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 17 Sep 2013 12:27:40 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> Message-ID: <1379435260.28764.23112997.115F6763@webmail.messagingengine.com> On Tue, Sep 17, 2013, at 12:19, Brett Cannon wrote: > You also have the issue that if you port strftime then you lose the pure > Python port of strptime: > http://hg.python.org/cpython/file/default/Lib/_strptime.py Why would that make you lose that? I'm not sure I understand. From random832 at fastmail.us Tue Sep 17 18:49:23 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 17 Sep 2013 12:49:23 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> Message-ID: <1379436563.9158.23113441.7127505C@webmail.messagingengine.com> On Tue, Sep 17, 2013, at 12:23, Alexander Belopolsky wrote: > You should review what's new in 3.x documents. Many of the features that > you ask for have already been implemented. To what are you referring? 3.4 what's new mentions no changes related to the time module. 3.3 mentions only new functions unrelated to time conversions. The change mentioned in 3.2 does not fix limitations caused by the platform. 32-bit platforms are still limited by the range of time_t for gmtime [and e.g. datetime.fromtimestamp], and MSVC, while having a 64-bit time_t, is limited to positive values (and arbitrarily imposes the same limitation on functions that accept a struct tm, rejecting any time that would, interpreted as local time, result in a value before 1970-01-01 00:00:00 GMT) 3.1 and 3.0 mention no changes to the time module. All of the issues I mentioned apply to 3.3 (You may not have noticed the range issue as it may not apply to your platform, and by "should always be provided" i meant _always_, even if the platform doesn't provide them - they can be populated from timezone/altzone and tzname in that case), and the epoch/leap second thing is still clearly present in the 3.4 docs. I personally confirmed every single issue I mentioned except for the one about a pure-python implementation of datetime (which was because I was misled by the web hg browser), and except for the year 2038 limitation that does not apply on this system, on this version: Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:06:53) [MSC v.1600 64 bit (AMD64)] on win32 From brett at python.org Tue Sep 17 19:02:20 2013 From: brett at python.org (Brett Cannon) Date: Tue, 17 Sep 2013 13:02:20 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379435260.28764.23112997.115F6763@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> <1379435260.28764.23112997.115F6763@webmail.messagingengine.com> Message-ID: On Tue, Sep 17, 2013 at 12:27 PM, wrote: > On Tue, Sep 17, 2013, at 12:19, Brett Cannon wrote: > > You also have the issue that if you port strftime then you lose the pure > > Python port of strptime: > > http://hg.python.org/cpython/file/default/Lib/_strptime.py > > Why would that make you lose that? I'm not sure I understand. > strptime is implemented using strftime to get the locale information. As you pointed out, getting the locale details is essentially not possible in a cross-platform way unless you use strptime or strftime, so you have to choose which is implemented in Python and relies on the other. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 17 19:29:43 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 17 Sep 2013 13:29:43 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379436563.9158.23113441.7127505C@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> <1379436563.9158.23113441.7127505C@webmail.messagingengine.com> Message-ID: On Tue, Sep 17, 2013 at 12:49 PM, wrote: > 32-bit platforms are still limited by the range of > time_t for gmtime [and e.g. datetime.fromtimestamp], > datetime.fromtimestamp() is not the same as gmtime. You should use datetime.utcfromtimestamp() which is only limited by supported date range (years 1-9999). -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 17 19:41:45 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 17 Sep 2013 13:41:45 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> <1379435260.28764.23112997.115F6763@webmail.messagingengine.com> Message-ID: On Tue, Sep 17, 2013 at 1:02 PM, Brett Cannon wrote: > As you pointed out, getting the locale details is essentially not possible > in a cross-platform way unless you use strptime or strftime, so you have to > choose which is implemented in Python and relies on the other. What we can do is to implement "C" locale behavior. In fact, in many uses of strftime() its locale-dependence is a problem. I would much rather have strftime_l()-like function and "C" locale implemented in stdlib. This is somewhat similar to the situation we have with timezone support: include utc timezone and leave it to third parties to supply interfaces to platform tz databases. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Tue Sep 17 21:08:49 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 17 Sep 2013 15:08:49 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> <1379436563.9158.23113441.7127505C@webmail.messagingengine.com> Message-ID: <1379444929.17672.23175609.4503967D@webmail.messagingengine.com> On Tue, Sep 17, 2013, at 13:29, Alexander Belopolsky wrote: > On Tue, Sep 17, 2013 at 12:49 PM, wrote: > > > 32-bit platforms are still limited by the range of > > time_t for gmtime [and e.g. datetime.fromtimestamp], > > > > datetime.fromtimestamp() is not the same as gmtime. You should use > datetime.utcfromtimestamp() which is only limited by supported date > range > (years 1-9999). fromtimestamp(timestamp, timezone.utc). And anyway, I was listing it as _another example_ of a function in datetime which is limited by the range of time_t, not as one that is somehow "the same as" gmtime. And even if you want to play this game, you are WRONG WRONG WRONG about utcfromtimestamp: Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:06:53) [MSC v.1600 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from datetime import * >>> datetime.utcfromtimestamp(-100000) # should be 1969-12-30 20:13:20 Traceback (most recent call last): File "", line 1, in OSError: [Errno 22] Invalid argument >>> datetime.utcfromtimestamp(2**63) Traceback (most recent call last): File "", line 1, in OverflowError: timestamp out of range for platform time_t (I don't care, per se, about 300 billion years from now, but I am 99% certain I'd get the same result for the latter with 2**31 on 32-bit Unix. This was to illustrate that it requires it to be in the range of the platform time_t type.) I feel like you're being deliberately obtuse at this point. From alexander.belopolsky at gmail.com Tue Sep 17 21:24:35 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 17 Sep 2013 15:24:35 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379444929.17672.23175609.4503967D@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> <1379436563.9158.23113441.7127505C@webmail.messagingengine.com> <1379444929.17672.23175609.4503967D@webmail.messagingengine.com> Message-ID: On Tue, Sep 17, 2013 at 3:08 PM, wrote: > fromtimestamp(timestamp, timezone.utc). > > And anyway, I was listing it as _another example_ of a function in > datetime which is limited by the range of time_t, not as one that is > somehow "the same as" gmtime. And even if you want to play this game, > you are WRONG WRONG WRONG about utcfromtimestamp: > I would say this is a bug. Is fromtimestamp(timestamp, timezone.utc) similarly affected? Please submit a bug report. -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthonyfk at gmail.com Tue Sep 17 22:18:32 2013 From: anthonyfk at gmail.com (Kyle Fisher) Date: Tue, 17 Sep 2013 14:18:32 -0600 Subject: [Python-ideas] Keep free list of popular iterator objects In-Reply-To: <20130916101731.60bb20dd@pitrou.net> References: <20130916101731.60bb20dd@pitrou.net> Message-ID: Story time. I was able to make a build at work with freelists enabled for iterators in dictobject.c, listobject.c and iterobject.c. When running this through our application I saw: 1) When loading several datapoints from database: 0.1% improvement (with a wider-but-forgotten standard deviation). So, no improvement but no ruined performance either. Makes sense since this was mostly an I/O bound task. 2) When parsing in-memory data files: 1.5% improvement. This is approximately what I was expecting, so far so good! At this point I decided to run the benchmark suite Antoine pointed me to. I also realized that I had been testing without some optimizations turned on. I made two new builds, both with "-O3 -DNDEBUG -march=native" and profile guided optimizations turned on. I then added a benchmark to explicitly test tight inner loops. I ran the benchmarks and saw... a 1.02x improvement on the benchmark I made and a 1.04x slow down on two others (nbody, slowunpickle). I then ran our application again and confirmed that all initial speed ups I saw were now lost in the noise. So, thank you everyone for letting me entertain this idea, but it looks like Raymond's hunch was right. :) Cheers, -Kyle -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Tue Sep 17 22:58:17 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 17 Sep 2013 16:58:17 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> Message-ID: <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> On Tue, Sep 17, 2013, at 9:31, Victor Stinner wrote: > 2013/9/17 : > > I have an addition to this proposal: struct_time should always provide > > tm_gmtoff and tm_zone, gmtime should populate them with 0 and GMT*, and > > if the platform does not provide values localtime should populate them > > with timezone or altzone and values from tzname depending on if isdst is > > true after calling the platform localtime function. > > In Python, "unknown" is usually written None. It's safer than filling > the structure with invalid values. They're not unknown. The values are provided by the system in global variables. If timezone, altzone, and tzname should not be used, then they should not be provided. You can also determine gmtoff empirically by calling timegm and subtracting the original timestamp from the result. Or you could look at the seconds, minutes, hours, year, and yday members after calling both gmtime and localtime in the first place. From alexander.belopolsky at gmail.com Tue Sep 17 23:21:27 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 17 Sep 2013 17:21:27 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> Message-ID: On Tue, Sep 17, 2013 at 4:58 PM, wrote: > > You can also determine gmtoff empirically by calling timegm and > subtracting the original timestamp from the result. Or you could look at > the seconds, minutes, hours, year, and yday members after calling both > gmtime and localtime in the first place. How is this different from what we do in datetime.astimezone()? # Compute UTC offset and compare with the value implied # by tm_isdst. If the values match, use the zone name # implied by tm_isdst. delta = local - datetime(*_time.gmtime(ts)[:6]) dst = _time.daylight and localtm.tm_isdst > 0 gmtoff = -(_time.altzone if dst else _time.timezone) if delta == timedelta(seconds=gmtoff): tz = timezone(delta, _time.tzname[dst]) else: tz = timezone(delta) http://hg.python.org/cpython/file/default/Lib/datetime.py#l1500 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Sep 17 23:15:51 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 17 Sep 2013 14:15:51 -0700 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> Message-ID: <5238C687.7080207@stoneleaf.us> On 09/17/2013 01:58 PM, random832 wrote: > On Tue, Sep 17, 2013, at 9:31, Victor Stinner wrote: >> >> In Python, "unknown" is usually written None. It's safer than filling >> the structure with invalid values. > > You can also determine gmtoff empirically by calling timegm and > subtracting the original timestamp from the result. Or you could look at > the seconds, minutes, hours, year, and yday members after calling both > gmtime and localtime in the first place. Is timegm/gmtime provided and consistent across all Python platforms? -- ~Ethan~ From random832 at fastmail.us Wed Sep 18 00:30:39 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 17 Sep 2013 18:30:39 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> Message-ID: <1379457039.10549.23252137.42075987@webmail.messagingengine.com> On Tue, Sep 17, 2013, at 17:21, Alexander Belopolsky wrote: > On Tue, Sep 17, 2013 at 4:58 PM, wrote: > > > > You can also determine gmtoff empirically by calling timegm and > > subtracting the original timestamp from the result. Or you could look at > > the seconds, minutes, hours, year, and yday members after calling both > > gmtime and localtime in the first place. > > > How is this different from what we do in datetime.astimezone()? Not very different at all, except for the fact where I want the functionality in struct_time to populate tm_gmtoff and tm_zone where it's not available. My goal is to normalize the functionality available on all platforms, to the extent that it's possible, so that people are less likely to write non-portable code and encounter example code that doesn't work. From random832 at fastmail.us Wed Sep 18 03:30:10 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 17 Sep 2013 21:30:10 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <5238C687.7080207@stoneleaf.us> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> <5238C687.7080207@stoneleaf.us> Message-ID: <1379467810.8907.23301397.3E8EF78A@webmail.messagingengine.com> On Tue, Sep 17, 2013, at 17:15, Ethan Furman wrote: > Is timegm/gmtime provided and consistent across all Python platforms? Part of what I was proposing was _to_ provide a consistent implementation - there's no reason (if we define timestamps as being objectively based in 1970 and having no leap seconds) that it couldn't be provided in python itself instead of using the system's version. From ncoghlan at gmail.com Wed Sep 18 03:37:05 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 18 Sep 2013 11:37:05 +1000 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379467810.8907.23301397.3E8EF78A@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> <5238C687.7080207@stoneleaf.us> <1379467810.8907.23301397.3E8EF78A@webmail.messagingengine.com> Message-ID: On 18 September 2013 11:30, wrote: > On Tue, Sep 17, 2013, at 17:15, Ethan Furman wrote: >> Is timegm/gmtime provided and consistent across all Python platforms? > > Part of what I was proposing was _to_ provide a consistent > implementation - there's no reason (if we define timestamps as being > objectively based in 1970 and having no leap seconds) that it couldn't > be provided in python itself instead of using the system's version. Yeah, this is a similar change to the one that was made for math.c years ago - stepping up from merely relying on the system libraries to ensuring a consistent cross-platform experience. It's just a concern with initial development and long term maintenance effort, rather than a fundamental desire to expose the raw platform behaviour (there are *some* modules where we want to let developers have access to the underlying platform specific behaviour, but the datetime APIs aren't really one of them) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mal at egenix.com Wed Sep 18 09:42:18 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 18 Sep 2013 09:42:18 +0200 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> <5238C687.7080207@stoneleaf.us> <1379467810.8907.23301397.3E8EF78A@webmail.messagingengine.com> Message-ID: <5239595A.7020208@egenix.com> On 18.09.2013 03:37, Nick Coghlan wrote: > On 18 September 2013 11:30, wrote: >> On Tue, Sep 17, 2013, at 17:15, Ethan Furman wrote: >>> Is timegm/gmtime provided and consistent across all Python platforms? >> >> Part of what I was proposing was _to_ provide a consistent >> implementation - there's no reason (if we define timestamps as being >> objectively based in 1970 and having no leap seconds) that it couldn't >> be provided in python itself instead of using the system's version. > > Yeah, this is a similar change to the one that was made for math.c > years ago - stepping up from merely relying on the system libraries to > ensuring a consistent cross-platform experience. It's just a concern > with initial development and long term maintenance effort, rather than > a fundamental desire to expose the raw platform behaviour (there are > *some* modules where we want to let developers have access to the > underlying platform specific behaviour, but the datetime APIs aren't > really one of them) I wonder why you'd want to use Unix ticks (what datetime calls a timestamp) as basis for cross-platform date/time calculations. If you really need a time_t representation of date/time values, you're stuck with the platform dependent limitations anyway. The time C functions are useful to tap into the OS's time zone library, but time zone data changes regularly, so predictions that go even only a few years into the future are bound to fail for some zones. You can only reliably use UTC/GMT for absolute future date/time values. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 18 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-20: PyCon UK 2013, Coventry, UK ... 2 days to go 2013-09-28: PyDDF Sprint ... 10 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From random832 at fastmail.us Wed Sep 18 15:25:08 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 18 Sep 2013 09:25:08 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <5239595A.7020208@egenix.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> <5238C687.7080207@stoneleaf.us> <1379467810.8907.23301397.3E8EF78A@webmail.messagingengine.com> <5239595A.7020208@egenix.com> Message-ID: <1379510708.3739.23500557.2C09DC9D@webmail.messagingengine.com> On Wed, Sep 18, 2013, at 3:42, M.-A. Lemburg wrote: > I wonder why you'd want to use Unix ticks (what datetime calls a > timestamp) as basis for cross-platform date/time calculations. Because we've already got half a dozen APIs that use them. And there's no particular reason to consider it _worse_ than any other scalar time representation. If we were defining the library from scratch today, we could argue the merits of using days vs seconds vs microseconds as the unit, of 1970 vs 1904 vs 1600 vs 0000 for the epoch, and whether leap seconds should be supported. But we've already got APIs that use time_t (and all supported platforms define time_t as seconds since 1970) From mal at egenix.com Wed Sep 18 15:34:31 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 18 Sep 2013 15:34:31 +0200 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379510708.3739.23500557.2C09DC9D@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> <5238C687.7080207@stoneleaf.us> <1379467810.8907.23301397.3E8EF78A@webmail.messagingengine.com> <5239595A.7020208@egenix.com> <1379510708.3739.23500557.2C09DC9D@webmail.messagingengine.com> Message-ID: <5239ABE7.1040005@egenix.com> On 18.09.2013 15:25, random832 at fastmail.us wrote: > On Wed, Sep 18, 2013, at 3:42, M.-A. Lemburg wrote: >> I wonder why you'd want to use Unix ticks (what datetime calls a >> timestamp) as basis for cross-platform date/time calculations. > > Because we've already got half a dozen APIs that use them. And there's > no particular reason to consider it _worse_ than any other scalar time > representation. > > If we were defining the library from scratch today, we could argue the > merits of using days vs seconds vs microseconds as the unit, of 1970 vs > 1904 vs 1600 vs 0000 for the epoch, and whether leap seconds should be > supported. But we've already got APIs that use time_t (and all supported > platforms define time_t as seconds since 1970) Right, but those APIs are all limited to what the platforms defines as t_time and like you say: those values are often limited to certain ranges. If you want platform independent representations, use one of the available conversion routines to turn the time_t values into e.g. datetime objects and ideally convert the values to UTC to avoid time zone issues. Then use those objects for date/time calculations. time_t values are really not a good basis for doing date/time calculations. Ideally, they should only be used and regarded as containers holding a platform dependent date/time value, nothing more. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 18 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-20: PyCon UK 2013, Coventry, UK ... 2 days to go 2013-09-28: PyDDF Sprint ... 10 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mistersheik at gmail.com Wed Sep 18 18:21:37 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 18 Sep 2013 09:21:37 -0700 (PDT) Subject: [Python-ideas] pickle does not work properly with cooperative multiple inheritance. Propose: "__getnewkwargs__". Message-ID: My understanding of cooperative multiple inheritance is that a class often doesn't know how your parent classes want to be constructed, pickled, etc. and so it delegates to its parents using super. In general, constructors can accept keyword arguments, and forward their unused arguments to parents effortlessly: class A(B): def __init__(self, x, **kwargs): super().__init__(**kwargs) will extract x and forward kwargs. Unfortunately, the same mechanism is not easily available for pickling because __getnewargs__ returns only a tuple. If there were a __getnewkwargs__ method, then we could have class A: def _getnewkwargs__(self): return {**super().__getnewkwargs(), a=self.a, b=self.b} # (new unpacking from PEP448) Note how additional kwargs are added to the dict of kwargs specified by the parent objects. Best, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Wed Sep 18 19:20:43 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 18 Sep 2013 13:20:43 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <5239ABE7.1040005@egenix.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> <5238C687.7080207@stoneleaf.us> <1379467810.8907.23301397.3E8EF78A@webmail.messagingengine.com> <5239595A.7020208@egenix.com> <1379510708.3739.23500557.2C09DC9D@webmail.messagingengine.com> <5239ABE7.1040005@egenix.com> Message-ID: <1379524843.28775.23598689.7002A138@webmail.messagingengine.com> On Wed, Sep 18, 2013, at 9:34, M.-A. Lemburg wrote: > Right, but those APIs are all limited to what the platforms > defines as t_time and like you say: those values are often > limited to certain ranges. We're going around in circles. I'm proposing _removing_ those limitations, so that for example code written for Unix systems (that assumes it can use negative values before 1970) will work on Windows, and code written for 64-bit systems will work on systems whose native time_t is 32 bits. It occurs to me that you might have misunderstood me. By "APIs" I was not referring to the platform functions themselves (which, obviously, are limited to what the platform's type can represent, and sometimes impose arbitrary limits on top of that), I was talking about datetime.fromtimestamp, the various functions in the time module, calendar.timegm, os.stat, and so on. There's no reason _those_ should be limited to what the platform defines. Just because "seconds since 1970" was invented by a platform does not mean it should be considered to be a platform-dependent representation. There's nothing _wrong_ with it as a representation of UTC, except for the fact that it can't represent leap seconds, and I suspect a lot of other things break in the presence of leap seconds anyway. The fact that timedelta is defined as a days/seconds combination, for example. In the presence of leap seconds, it shouldn't be possible to normalize them any more than if there were a months or years field. > If you want platform independent representations, use one of the > available conversion routines to turn the time_t values into > e.g. datetime objects and ideally convert the values to UTC > to avoid time zone issues. Then use those objects for date/time > calculations. > > time_t values are really not a good basis for doing date/time > calculations. Ideally, they should only be used and regarded > as containers holding a platform dependent date/time value, > nothing more. That ship sailed long ago. This isn't a Python 4000 thread; we're talking about the API we have, not the one we want. From alexander.belopolsky at gmail.com Wed Sep 18 19:37:53 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 18 Sep 2013 13:37:53 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379524843.28775.23598689.7002A138@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> <5238C687.7080207@stoneleaf.us> <1379467810.8907.23301397.3E8EF78A@webmail.messagingengine.com> <5239595A.7020208@egenix.com> <1379510708.3739.23500557.2C09DC9D@webmail.messagingengine.com> <5239ABE7.1040005@egenix.com> <1379524843.28775.23598689.7002A138@webmail.messagingengine.com> Message-ID: On Wed, Sep 18, 2013 at 1:20 PM, wrote: > We're going around in circles. I'm proposing _removing_ those > limitations, so that for example code written for Unix systems (that > assumes it can use negative values before 1970) will work on Windows, > and code written for 64-bit systems will work on systems whose native > time_t is 32 bits. > That's a sign that this discussion should move to the tracker where a concrete patch can be proposed and discussed. There is at least one proposal that seems to be controversial: remove platform-dependent code from datetime.utcfromtimestamp(). The change is trivial: def utcfromtimestamp(seconds): return datetime(1970, 1, 1) + timedelta(seconds=seconds) I will gladly apply such patch once it is complete with tests and C code. The case for changing time.gmtime() is weaker. We would have to add additional dependency of time module on datetime or move or duplicate a sizable chunk of C code. If someone wants to undertake this project, I would like to see an attempt to remove circular dependency between time and datetime modules rather than couple the two modules even more tightly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed Sep 18 20:38:59 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 18 Sep 2013 11:38:59 -0700 (PDT) Subject: [Python-ideas] Add PassedArgSpec class to types and expose mapping given an ArgSpec. Message-ID: <3a434bfd-6136-424d-9bea-498b637d0cc8@googlegroups.com> As far as I know, the way that arguments are mapped to a parameter specification is not exposed to the programmer. I suggest adding a PassedArgSpec class having two members: args and kwargs. Then, inspect.ArgSpec can take an argument specification and decode the PassedArgSpect (putting the right things in the right places) and return a dictionary with everything in its right place. I can only think of one use for now, which is replacing "arguments" in the returned tuple of __reduce__ and maybe allowing it to be returned by "__getnewargs__". It might also be nice to store such argument specifications instead of the pair args, kwargs when storing them in lists. Best, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed Sep 18 21:12:13 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 18 Sep 2013 12:12:13 -0700 (PDT) Subject: [Python-ideas] pickle does not work properly with cooperative multiple inheritance. Propose: "__getnewkwargs__". In-Reply-To: References: Message-ID: <849ba4fa-aa67-4beb-95c9-b2494c3d907d@googlegroups.com> An alternative is to allow __getnewargs__ to return a "PassedArgSpec" as I described in another idea. On Wednesday, September 18, 2013 12:21:37 PM UTC-4, Neil Girdhar wrote: > > My understanding of cooperative multiple inheritance is that a class often > doesn't know how your parent classes want to be constructed, pickled, etc. > and so it delegates to its parents using super. > > In general, constructors can accept keyword arguments, and forward their > unused arguments to parents effortlessly: > > class A(B): > def __init__(self, x, **kwargs): > super().__init__(**kwargs) > > will extract x and forward kwargs. > > Unfortunately, the same mechanism is not easily available for pickling > because __getnewargs__ returns only a tuple. If there were a > __getnewkwargs__ method, then we could have > > class A: > def _getnewkwargs__(self): > return {**super().__getnewkwargs(), a=self.a, b=self.b} # (new > unpacking from PEP448) > > Note how additional kwargs are added to the dict of kwargs specified by > the parent objects. > > Best, > > Neil > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Sep 19 02:23:09 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Sep 2013 10:23:09 +1000 Subject: [Python-ideas] Add PassedArgSpec class to types and expose mapping given an ArgSpec. In-Reply-To: <3a434bfd-6136-424d-9bea-498b637d0cc8@googlegroups.com> References: <3a434bfd-6136-424d-9bea-498b637d0cc8@googlegroups.com> Message-ID: (Extra copy to the list, since Google Groups breaks the recipient list :P) inspect.Signature.bind() supports this in Python 3.3+ For earlier versions, Aaron Iles backported the functionality on PyPI as "funcsigs". You can also just define an appropriate function, call it as f(*args, **kwds) and return the resulting locals() namespace. Cheers, Nick. On 19 Sep 2013 04:39, "Neil Girdhar" wrote: > As far as I know, the way that arguments are mapped to a parameter > specification is not exposed to the programmer. I suggest adding a > PassedArgSpec class having two members: args and kwargs. Then, > inspect.ArgSpec can take an argument specification and decode the > PassedArgSpect (putting the right things in the right places) and return a > dictionary with everything in its right place. > > I can only think of one use for now, which is replacing "arguments" in the > returned tuple of __reduce__ and maybe allowing it to be returned by > "__getnewargs__". It might also be nice to store such argument > specifications instead of the pair args, kwargs when storing them in lists. > > Best, > > Neil > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Sep 19 10:23:22 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 19 Sep 2013 01:23:22 -0700 (PDT) Subject: [Python-ideas] Introduce collections.Reiterable Message-ID: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> This is an idea I've wanted for a while: When I call functions that accept an iterable, I often have to check whether the function iterates over the iterable once or more than once. If it iterates more than once, I must not pass a generator, but rather cast to a list. Otherwise, the second iteration through the generator will be empty as the first has exhausted it completely. It would be nice to introduce an abstract base class in collections (docs) between Iterable and Sequence. Right now, Sequence inherits from Iterable. I propose having Sequence inherit from Reiterable, which in turn, inherits from Iterable. All sequences are reiterable, whereas generators are not. However, views in sets and dictionaries, and numpy arrays are examples of Reiterables that are not Sequences. Having such an abstract base class would be useful for debugging in its own right. Also, functions that iterate twice over an iterable can check to make sure the iterable is "re-iterable" using isinstance (the standard approach as per pep 3119 ). But, better yet, itertools could add two functions: auto_tee, which takes an iterable "I" as its parameter, and an integer n. If it is not a reiterable, it calls tee and returns n iterables independently capable of iterating "I". If it is reiterable, it returns [I] * n. This way, the client code can do whatever is easiest and the target code can call auto_tee if necessary. auto_list could do the same sort of thing, but omitting the copy that would normally be incurred if a list were passed in. Maybe this is less useful than I once thought since I've gotten by without it, but I just wanted to throw the idea out there in case it clicks for someone else. Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Sep 19 10:28:15 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 19 Sep 2013 10:28:15 +0200 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379424134.25980.23022817.0FD01849@webmail.messagingengine.com> <1379451497.3210.23219769.5E1521E5@webmail.messagingengine.com> <5238C687.7080207@stoneleaf.us> <1379467810.8907.23301397.3E8EF78A@webmail.messagingengine.com> <5239595A.7020208@egenix.com> <1379510708.3739.23500557.2C09DC9D@webmail.messagingengine.com> <5239ABE7.1040005@egenix.com> <1379524843.28775.23598689.7002A138@webmail.messagingengine.com> Message-ID: <523AB59F.6050509@egenix.com> On 18.09.2013 19:37, Alexander Belopolsky wrote: > On Wed, Sep 18, 2013 at 1:20 PM, wrote: > >> We're going around in circles. I'm proposing _removing_ those >> limitations, so that for example code written for Unix systems (that >> assumes it can use negative values before 1970) will work on Windows, >> and code written for 64-bit systems will work on systems whose native >> time_t is 32 bits. >> > > That's a sign that this discussion should move to the tracker where a > concrete patch can be proposed and discussed. There is at least one > proposal that seems to be controversial: remove platform-dependent code > from datetime.utcfromtimestamp(). > > The change is trivial: > > def utcfromtimestamp(seconds): > return datetime(1970, 1, 1) + timedelta(seconds=seconds) > > I will gladly apply such patch once it is complete with tests and C code. If you do apply this change, you will have to clearly state that the datetime module's understanding of a timestamp may differ from the platform definition of Unix ticks. > The case for changing time.gmtime() is weaker. We would have to add > additional dependency of time module on datetime or move or duplicate a > sizable chunk of C code. If someone wants to undertake this project, I > would like to see an attempt to remove circular dependency between time and > datetime modules rather than couple the two modules even more tightly. -1 on changing the time module APIs. People expect those to be wrappers of the C APIs and thus also expect these APIs to implement the platform specific behavior, e.g. supporting leap seconds with gmtime(). POSIX called for not supporting leap seconds in e.g. gmtime(), but they are part of the definition of GMT/UTC and it's possible to enable support for them: http://en.wikipedia.org/wiki/Leap_second Platform comparison: http://k5wiki.kerberos.org/wiki/Leap_second_handling That said, it's very rare to find a system that actually does not implement POSIX gmtime(). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 19 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-20: PyCon UK 2013, Coventry, UK ... tomorrow 2013-09-28: PyDDF Sprint ... 9 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Thu Sep 19 10:32:52 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Sep 2013 18:32:52 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: (Grr, why is Google Groups so broken? :P) My question would be, does the new class add anything that isn't already covered by: isinstance(c, Iterable) and not isinstance(c, Iterator) Cheers, Nick. On 19 September 2013 18:23, Neil Girdhar wrote: > This is an idea I've wanted for a while: > > When I call functions that accept an iterable, I often have to check whether > the function iterates over the iterable once or more than once. If it > iterates more than once, I must not pass a generator, but rather cast to a > list. Otherwise, the second iteration through the generator will be empty > as the first has exhausted it completely. > > It would be nice to introduce an abstract base class in collections (docs) > between Iterable and Sequence. Right now, Sequence inherits from Iterable. > I propose having Sequence inherit from Reiterable, which in turn, inherits > from Iterable. All sequences are reiterable, whereas generators are not. > However, views in sets and dictionaries, and numpy arrays are examples of > Reiterables that are not Sequences. Having such an abstract base class would > be useful for debugging in its own right. > > Also, functions that iterate twice over an iterable can check to make sure > the iterable is "re-iterable" using isinstance (the standard approach as per > pep 3119). But, better yet, itertools could add two functions: auto_tee, > which takes an iterable "I" as its parameter, and an integer n. If it is > not a reiterable, it calls tee and returns n iterables independently capable > of iterating "I". If it is reiterable, it returns [I] * n. This way, the > client code can do whatever is easiest and the target code can call auto_tee > if necessary. auto_list could do the same sort of thing, but omitting the > copy that would normally be incurred if a list were passed in. > > Maybe this is less useful than I once thought since I've gotten by without > it, but I just wanted to throw the idea out there in case it clicks for > someone else. > > Neil > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mistersheik at gmail.com Thu Sep 19 10:59:35 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 19 Sep 2013 04:59:35 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: Well, generators are iterable, but if you write a function like: def f(s): for x in s: do_something(x) for x in s: do_something_else(x) x should not be a generator. I am proposing adding a function to itertools like auto_reiterable that would take s and give you an reiterable in the most efficient way possible. On Thu, Sep 19, 2013 at 4:32 AM, Nick Coghlan wrote: > My question would be, does the new class add anything that isn't > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Sep 19 11:12:26 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Sep 2013 19:12:26 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: On 19 Sep 2013 18:59, "Neil Girdhar" wrote: > > Well, generators are iterable, but if you write a function like: > > def f(s): > for x in s: > do_something(x) > for x in s: > do_something_else(x) > > x should not be a generator. I am proposing adding a function to itertools like auto_reiterable that would take s and give you an reiterable in the most efficient way possible. Generators *are* iterators, though, so they fail the second half of the check. Hence my question - is there any obvious case where "iterable but not an iterator" gives the wrong answer? Cheers, Nick. > > > On Thu, Sep 19, 2013 at 4:32 AM, Nick Coghlan wrote: >> >> My question would be, does the new class add anything that isn't > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Sep 19 11:14:16 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 19 Sep 2013 05:14:16 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: I am proposing a new class "Reiterable" that is a subclass of Iterable. For example, a dictionary view is a reiterable. It would be fine to pass such an object to the function f. Best, Neil On Thu, Sep 19, 2013 at 5:12 AM, Nick Coghlan wrote: > > On 19 Sep 2013 18:59, "Neil Girdhar" wrote: > > > > Well, generators are iterable, but if you write a function like: > > > > def f(s): > > for x in s: > > do_something(x) > > for x in s: > > do_something_else(x) > > > > x should not be a generator. I am proposing adding a function to > itertools like auto_reiterable that would take s and give you an reiterable > in the most efficient way possible. > > Generators *are* iterators, though, so they fail the second half of the > check. Hence my question - is there any obvious case where "iterable but > not an iterator" gives the wrong answer? > > Cheers, > Nick. > > > > > > > On Thu, Sep 19, 2013 at 4:32 AM, Nick Coghlan > wrote: > >> > >> My question would be, does the new class add anything that isn't > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Sep 19 11:20:28 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Sep 2013 19:20:28 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: On 19 Sep 2013 19:14, "Neil Girdhar" wrote: > > I am proposing a new class "Reiterable" that is a subclass of Iterable. For example, a dictionary view is a reiterable. It would be fine to pass such an object to the function f. I'm afraid simply repeating your proposal still doesn't answer my question. You have indicated that you are trying to identify things that are iterable, but not iterators. That is already possible using a second isinstance check to exclude iterators. So, what is the value you see in adding a new ABC to further simplify an already simple check? Cheers, Nick. > > Best, > Neil > > > On Thu, Sep 19, 2013 at 5:12 AM, Nick Coghlan wrote: >> >> >> On 19 Sep 2013 18:59, "Neil Girdhar" wrote: >> > >> > Well, generators are iterable, but if you write a function like: >> > >> > def f(s): >> > for x in s: >> > do_something(x) >> > for x in s: >> > do_something_else(x) >> > >> > x should not be a generator. I am proposing adding a function to itertools like auto_reiterable that would take s and give you an reiterable in the most efficient way possible. >> >> Generators *are* iterators, though, so they fail the second half of the check. Hence my question - is there any obvious case where "iterable but not an iterator" gives the wrong answer? >> >> Cheers, >> Nick. >> >> > >> > >> > On Thu, Sep 19, 2013 at 4:32 AM, Nick Coghlan wrote: >> >> >> >> My question would be, does the new class add anything that isn't >> > >> > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Sep 19 11:23:04 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 19 Sep 2013 05:23:04 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: No, not things which are iterable but not iterators, things which are *reiterable* rather than merely iterable. That is, things that can be iterated multiple times without the generated elements disappearing. On Thu, Sep 19, 2013 at 5:20 AM, Nick Coghlan wrote: > > On 19 Sep 2013 19:14, "Neil Girdhar" wrote: > > > > I am proposing a new class "Reiterable" that is a subclass of Iterable. > For example, a dictionary view is a reiterable. It would be fine to pass > such an object to the function f. > > I'm afraid simply repeating your proposal still doesn't answer my > question. You have indicated that you are trying to identify things that > are iterable, but not iterators. That is already possible using a second > isinstance check to exclude iterators. > > So, what is the value you see in adding a new ABC to further simplify an > already simple check? > > Cheers, > Nick. > > > > > Best, > > Neil > > > > > > On Thu, Sep 19, 2013 at 5:12 AM, Nick Coghlan > wrote: > >> > >> > >> On 19 Sep 2013 18:59, "Neil Girdhar" wrote: > >> > > >> > Well, generators are iterable, but if you write a function like: > >> > > >> > def f(s): > >> > for x in s: > >> > do_something(x) > >> > for x in s: > >> > do_something_else(x) > >> > > >> > x should not be a generator. I am proposing adding a function to > itertools like auto_reiterable that would take s and give you an reiterable > in the most efficient way possible. > >> > >> Generators *are* iterators, though, so they fail the second half of the > check. Hence my question - is there any obvious case where "iterable but > not an iterator" gives the wrong answer? > >> > >> Cheers, > >> Nick. > >> > >> > > >> > > >> > On Thu, Sep 19, 2013 at 4:32 AM, Nick Coghlan > wrote: > >> >> > >> >> My question would be, does the new class add anything that isn't > >> > > >> > > >> > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Sep 19 11:25:08 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 19 Sep 2013 05:25:08 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: Note that neither a generator expression, nor a dictionary view, nor a list are Iterators. They are all Iterable. The list is also a Sequence. The Reiterable category applies to the latter two, since they can be iterated over multiple times without being consume, On Thu, Sep 19, 2013 at 5:23 AM, Neil Girdhar wrote: > No, not things which are iterable but not iterators, things which are > *reiterable* rather than merely iterable. That is, things that can be > iterated multiple times without the generated elements disappearing. > > > On Thu, Sep 19, 2013 at 5:20 AM, Nick Coghlan wrote: > >> >> On 19 Sep 2013 19:14, "Neil Girdhar" wrote: >> > >> > I am proposing a new class "Reiterable" that is a subclass of Iterable. >> For example, a dictionary view is a reiterable. It would be fine to pass >> such an object to the function f. >> >> I'm afraid simply repeating your proposal still doesn't answer my >> question. You have indicated that you are trying to identify things that >> are iterable, but not iterators. That is already possible using a second >> isinstance check to exclude iterators. >> >> So, what is the value you see in adding a new ABC to further simplify an >> already simple check? >> >> Cheers, >> Nick. >> >> > >> > Best, >> > Neil >> > >> > >> > On Thu, Sep 19, 2013 at 5:12 AM, Nick Coghlan >> wrote: >> >> >> >> >> >> On 19 Sep 2013 18:59, "Neil Girdhar" wrote: >> >> > >> >> > Well, generators are iterable, but if you write a function like: >> >> > >> >> > def f(s): >> >> > for x in s: >> >> > do_something(x) >> >> > for x in s: >> >> > do_something_else(x) >> >> > >> >> > x should not be a generator. I am proposing adding a function to >> itertools like auto_reiterable that would take s and give you an reiterable >> in the most efficient way possible. >> >> >> >> Generators *are* iterators, though, so they fail the second half of >> the check. Hence my question - is there any obvious case where "iterable >> but not an iterator" gives the wrong answer? >> >> >> >> Cheers, >> >> Nick. >> >> >> >> > >> >> > >> >> > On Thu, Sep 19, 2013 at 4:32 AM, Nick Coghlan >> wrote: >> >> >> >> >> >> My question would be, does the new class add anything that isn't >> >> > >> >> > >> >> > >> > >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Sep 19 11:30:47 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 19 Sep 2013 11:30:47 +0200 Subject: [Python-ideas] Introduce collections.Reiterable References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: <20130919113047.325ca3d3@pitrou.net> Le Thu, 19 Sep 2013 04:59:35 -0400, Neil Girdhar a ?crit : > Well, generators are iterable, but if you write a function like: > > def f(s): > for x in s: > do_something(x) > for x in s: > do_something_else(x) > > x should not be a generator. I am proposing adding a function to > itertools like auto_reiterable that would take s and give you an > reiterable in the most efficient way possible. Try the following: import collections import itertools class Reiterable: def __init__(self, it): self.need_cloning = isinstance(it, collections.Iterator) assert self.need_cloning or isinstance(it, collections.Iterable) self.master = it def __iter__(self): if self.need_cloning: self.master, it = itertools.tee(self.master) return it else: return iter(self.master) def gen(): yield from "ghi" for arg in ("abc", iter("def"), gen()): it = Reiterable(arg) print(list(it)) print(list(it)) print(list(it)) I don't know if that would be useful as part of the stdlib. Regards Antoine. From masklinn at masklinn.net Thu Sep 19 11:38:35 2013 From: masklinn at masklinn.net (Masklinn) Date: Thu, 19 Sep 2013 11:38:35 +0200 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: <84F931B7-80D2-4EC4-B748-892D0A1E4CE1@masklinn.net> On 2013-09-19, at 11:23 , Neil Girdhar wrote: > No, not things which are iterable but not iterators, things which are > *reiterable* rather than merely iterable. That is, things that can be > iterated multiple times without the generated elements disappearing. The point Nick is trying to bring across is that "iterable but not an iterator" seems to do *exactly* what you ask for: you can get multiple independent iterators out of it, and thus you can iterate it multiple times without the generated elements disappearing. From mistersheik at gmail.com Thu Sep 19 11:38:30 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 19 Sep 2013 05:38:30 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130919113047.325ca3d3@pitrou.net> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919113047.325ca3d3@pitrou.net> Message-ID: First of all, that's amazing and exactly what I was looking for. Second, sorry Nick, I guess we were talking past each other and I didn't understand what you were getting at. From the collections.abc documentation, I imagined that subclasses are more restricted and therefore can do more than their superclasses. However, as you were trying to tell me things that are "Iterators" (and thus also Iterable) can do *less* than things that are merely Iterable. The former cannot be iterated over twice. If I'm understanding this correctly, would it be nice if the documentation then made this promise (as I don't believe it does)? Best, Neil On Thu, Sep 19, 2013 at 5:30 AM, Antoine Pitrou wrote: > Le Thu, 19 Sep 2013 04:59:35 -0400, > Neil Girdhar a > ?crit : > > Well, generators are iterable, but if you write a function like: > > > > def f(s): > > for x in s: > > do_something(x) > > for x in s: > > do_something_else(x) > > > > x should not be a generator. I am proposing adding a function to > > itertools like auto_reiterable that would take s and give you an > > reiterable in the most efficient way possible. > > Try the following: > > > import collections > import itertools > > > class Reiterable: > > def __init__(self, it): > self.need_cloning = isinstance(it, collections.Iterator) > assert self.need_cloning or isinstance(it, collections.Iterable) > self.master = it > > def __iter__(self): > if self.need_cloning: > self.master, it = itertools.tee(self.master) > return it > else: > return iter(self.master) > > def gen(): > yield from "ghi" > > for arg in ("abc", iter("def"), gen()): > it = Reiterable(arg) > print(list(it)) > print(list(it)) > print(list(it)) > > > I don't know if that would be useful as part of the stdlib. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Sep 19 11:40:26 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 19 Sep 2013 05:40:26 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <84F931B7-80D2-4EC4-B748-892D0A1E4CE1@masklinn.net> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <84F931B7-80D2-4EC4-B748-892D0A1E4CE1@masklinn.net> Message-ID: Yes, I see that now (with Antoine's code as well). Sorry that it wasn't clear to me earlier. On Thu, Sep 19, 2013 at 5:38 AM, Masklinn wrote: > On 2013-09-19, at 11:23 , Neil Girdhar wrote: > > > No, not things which are iterable but not iterators, things which are > > *reiterable* rather than merely iterable. That is, things that can be > > iterated multiple times without the generated elements disappearing. > > The point Nick is trying to bring across is that "iterable but not an > iterator" seems to do *exactly* what you ask for: you can get multiple > independent iterators out of it, and thus you can iterate it multiple > times without the generated elements disappearing. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Sep 19 11:58:38 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 19 Sep 2013 05:58:38 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919113047.325ca3d3@pitrou.net> Message-ID: Sorry, never mind, the documentation is clear. I got caught up with collections.abc page and didn't click through. At least I have Antoine's code now to use in my project whether it gets added to the standard library or not :) Best, Neil On Thu, Sep 19, 2013 at 5:38 AM, Neil Girdhar wrote: > First of all, that's amazing and exactly what I was looking for. > > Second, sorry Nick, I guess we were talking past each other and I didn't > understand what you were getting at. From the collections.abc > documentation, I imagined that subclasses are more restricted and therefore > can do more than their superclasses. However, as you were trying to tell > me things that are "Iterators" (and thus also Iterable) can do *less* than > things that are merely Iterable. The former cannot be iterated over twice. > If I'm understanding this correctly, would it be nice if the > documentation then made this promise (as I don't believe it does)? > > Best, > > Neil > > > On Thu, Sep 19, 2013 at 5:30 AM, Antoine Pitrou wrote: > >> Le Thu, 19 Sep 2013 04:59:35 -0400, >> Neil Girdhar a >> ?crit : >> > Well, generators are iterable, but if you write a function like: >> > >> > def f(s): >> > for x in s: >> > do_something(x) >> > for x in s: >> > do_something_else(x) >> > >> > x should not be a generator. I am proposing adding a function to >> > itertools like auto_reiterable that would take s and give you an >> > reiterable in the most efficient way possible. >> >> Try the following: >> >> >> import collections >> import itertools >> >> >> class Reiterable: >> >> def __init__(self, it): >> self.need_cloning = isinstance(it, collections.Iterator) >> assert self.need_cloning or isinstance(it, collections.Iterable) >> self.master = it >> >> def __iter__(self): >> if self.need_cloning: >> self.master, it = itertools.tee(self.master) >> return it >> else: >> return iter(self.master) >> >> def gen(): >> yield from "ghi" >> >> for arg in ("abc", iter("def"), gen()): >> it = Reiterable(arg) >> print(list(it)) >> print(list(it)) >> print(list(it)) >> >> >> I don't know if that would be useful as part of the stdlib. >> >> Regards >> >> Antoine. >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> -- >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> python-ideas+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Sep 19 12:21:25 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 19 Sep 2013 06:21:25 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130919113047.325ca3d3@pitrou.net> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919113047.325ca3d3@pitrou.net> Message-ID: On 9/19/2013 5:30 AM, Antoine Pitrou wrote: >> x should not be a generator. I am proposing adding a function to >> itertools like auto_reiterable that would take s and give you an >> reiterable in the most efficient way possible. > > Try the following: > > > import collections > import itertools > > > class Reiterable: > > def __init__(self, it): > self.need_cloning = isinstance(it, collections.Iterator) > assert self.need_cloning or isinstance(it, collections.Iterable) > self.master = it > > def __iter__(self): > if self.need_cloning: > self.master, it = itertools.tee(self.master) > return it > else: > return iter(self.master) > > def gen(): > yield from "ghi" > > for arg in ("abc", iter("def"), gen()): > it = Reiterable(arg) > print(list(it)) > print(list(it)) > print(list(it)) > > > I don't know if that would be useful as part of the stdlib. A slight problem is that there is no guaranteed that a non-iterator iterable is re-iterable. -- Terry Jan Reedy From tjreedy at udel.edu Thu Sep 19 12:28:19 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 19 Sep 2013 06:28:19 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: On 9/19/2013 4:32 AM, Nick Coghlan wrote: > (Grr, why is Google Groups so broken? :P) > > My question would be, does the new class add anything that isn't > already covered by: > > isinstance(c, Iterable) and not isinstance(c, Iterator) Not everything in that category is necessarily re-iterable. Or if it is serially reiterable, it may not be parallel iterable, as needed for nested loops. -- Terry Jan Reedy From solipsis at pitrou.net Thu Sep 19 12:26:30 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 19 Sep 2013 12:26:30 +0200 Subject: [Python-ideas] Introduce collections.Reiterable References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919113047.325ca3d3@pitrou.net> Message-ID: <20130919122630.681b0a86@pitrou.net> Le Thu, 19 Sep 2013 06:21:25 -0400, Terry Reedy a ?crit : > On 9/19/2013 5:30 AM, Antoine Pitrou wrote: > > >> x should not be a generator. I am proposing adding a function to > >> itertools like auto_reiterable that would take s and give you an > >> reiterable in the most efficient way possible. > > > > Try the following: > > > > > > import collections > > import itertools > > > > > > class Reiterable: > > > > def __init__(self, it): > > self.need_cloning = isinstance(it, collections.Iterator) > > assert self.need_cloning or isinstance(it, > > collections.Iterable) self.master = it > > > > def __iter__(self): > > if self.need_cloning: > > self.master, it = itertools.tee(self.master) > > return it > > else: > > return iter(self.master) > > > > def gen(): > > yield from "ghi" > > > > for arg in ("abc", iter("def"), gen()): > > it = Reiterable(arg) > > print(list(it)) > > print(list(it)) > > print(list(it)) > > > > > > I don't know if that would be useful as part of the stdlib. > > A slight problem is that there is no guaranteed that a non-iterator > iterable is re-iterable. Any useful examples? Regards Antoine. From tjreedy at udel.edu Thu Sep 19 12:31:12 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 19 Sep 2013 06:31:12 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: On 9/19/2013 4:59 AM, Neil Girdhar wrote: > Well, generators are iterable, but if you write a function like: > > def f(s): > for x in s: > do_something(x) > for x in s: > do_something_else(x) This strikes me as bad design. It should perhaps a) be two functions or b) take two iterable arguments or c) jam the two loops together. -- Terry Jan Reedy From mistersheik at gmail.com Thu Sep 19 12:39:35 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 19 Sep 2013 06:39:35 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: That was just for illustration. Here's the code I just fixed to use Reiterable: class Network: def update(self, nodes): nodes = Reiterable(nodes) super().update(self, nodes) for node in nodes: node.setParent(self) node.propertyValuesChanged.connect(self.modelPropertiesChanged) self.modelNodesAddedRemoved.emit() On Thu, Sep 19, 2013 at 6:31 AM, Terry Reedy wrote: > On 9/19/2013 4:59 AM, Neil Girdhar wrote: > >> Well, generators are iterable, but if you write a function like: >> >> def f(s): >> for x in s: >> do_something(x) >> for x in s: >> do_something_else(x) >> > > This strikes me as bad design. It should perhaps a) be two functions or b) > take two iterable arguments or c) jam the two loops together. > > -- > Terry Jan Reedy > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit https://groups.google.com/d/** > topic/python-ideas/**OumiLGDwRWA/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe@**googlegroups.com > . > For more options, visit https://groups.google.com/**groups/opt_out > . > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Thu Sep 19 13:37:17 2013 From: joshua at landau.ws (Joshua Landau) Date: Thu, 19 Sep 2013 12:37:17 +0100 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: On 19 September 2013 11:28, Terry Reedy wrote: > On 9/19/2013 4:32 AM, Nick Coghlan wrote: >> >> (Grr, why is Google Groups so broken? :P) >> >> My question would be, does the new class add anything that isn't >> already covered by: >> >> isinstance(c, Iterable) and not isinstance(c, Iterator) > > > Not everything in that category is necessarily re-iterable. I cannot think of a non-pathological case where it is not; if it is not re-iterable it should be changed to an iterator if it isn't already. > Or if it is serially reiterable, it may not be parallel iterable, as needed > for nested loops. What do you mean? From rosuav at gmail.com Thu Sep 19 13:52:02 2013 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 19 Sep 2013 21:52:02 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: On Thu, Sep 19, 2013 at 8:39 PM, Neil Girdhar wrote: > That was just for illustration. Here's the code I just fixed to use > Reiterable: > > class Network: > def update(self, nodes): > nodes = Reiterable(nodes) > super().update(self, nodes) > for node in nodes: > node.setParent(self) > node.propertyValuesChanged.connect(self.modelPropertiesChanged) > self.modelNodesAddedRemoved.emit() Hmm. As an alternative to reiterable, can you rejig the design something like this? class Network(...): def update(self,nodes): for node in nodes: self._update(node) self.modelNodesAddedRemoved.emit() def _update(self,node): super()._update(self,node) node.setParent(self) node.propertyValuesChanged.connect(self.modelPropertiesChanged) You put update() into the highest appropriate place in the class hierarchy, and then each subclass simply overrides _update to do the work. That way, you iterate over nodes exactly once, and every point in the hierarchy gets to do its own _update. ChrisA From steve at pearwood.info Thu Sep 19 14:18:29 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 19 Sep 2013 22:18:29 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: <20130919121828.GK19939@ando> On Thu, Sep 19, 2013 at 07:12:26PM +1000, Nick Coghlan wrote: > is there any obvious case where "iterable but > not an iterator" gives the wrong answer? I'm not sure if it counts as "obvious", but one can write an iterator that is re-iterable. A trivial example: class Reiter: def __init__(self): self.i = 0 def __next__(self): i = self.i if i < 10: self.i += 1 return i self.i = 0 raise StopIteration def __iter__(self): return self I know that according to the iterator protocol, such a re-iterator counts as "broken": [quote] The intention of the protocol is that once an iterator?s next() method raises StopIteration, it will continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken. (This constraint was added in Python 2.3; in Python 2.2, various iterators are broken according to this rule.) http://docs.python.org/2/library/stdtypes.html#iterator-types but clearly there is a use-case for re-iterable "things", such as dict views, which can be re-iterated over. We just don't call them iterators. So maybe there should be a way to distinguish between "oops this iterator is broken" and "yes, this object can be iterated over repeatedly, it's all good". At the moment, dict views aren't directly iterable (you can't call next() on them). But in principle they could have been designed as re-iterable iterators. Another example might be iterators with a reset or restart method, or similar. E.g. file objects and seek(0). File objects are officially "broken" iterators, since you can seek back to the beginning of the file. I don't think that's a bad thing. But nor am I sure that it requires a special Reiterable class so we can test for it. -- Steven From steve at pearwood.info Thu Sep 19 14:28:30 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 19 Sep 2013 22:28:30 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> Message-ID: <20130919122829.GL19939@ando> On Thu, Sep 19, 2013 at 06:31:12AM -0400, Terry Reedy wrote: > On 9/19/2013 4:59 AM, Neil Girdhar wrote: > >Well, generators are iterable, but if you write a function like: > > > >def f(s): > > for x in s: > > do_something(x) > > for x in s: > > do_something_else(x) > > This strikes me as bad design. It should perhaps a) be two functions or > b) take two iterable arguments or c) jam the two loops together. Perhaps, but sometimes there are hidden loops. Here's an example near and dear to my heart... *wink* def variance(data): # Don't do this. sumx = sum(data) sumx2 = sum(x**2 for x in data) ss = sumx2 - (sumx**2)/n return ss/(n-1) Ignore the fact that this algorithm is numerically unstable. It fails for iterator arguments, because sum(data) consumes the iterator and leaves sumx2 always equal to zero. -- Steven From ncoghlan at gmail.com Thu Sep 19 15:02:57 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Sep 2013 23:02:57 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130919121828.GK19939@ando> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> Message-ID: On 19 September 2013 22:18, Steven D'Aprano wrote: > The intention of the protocol is that once an iterator?s next() method > raises StopIteration, it will continue to do so on subsequent calls. > Implementations that do not obey this property are deemed broken. (This > constraint was added in Python 2.3; in Python 2.2, various iterators are > broken according to this rule.) > > http://docs.python.org/2/library/stdtypes.html#iterator-types > > > but clearly there is a use-case for re-iterable "things", such as dict > views, which can be re-iterated over. We just don't call them iterators. > So maybe there should be a way to distinguish between "oops this > iterator is broken" and "yes, this object can be iterated over > repeatedly, it's all good". > > At the moment, dict views aren't directly iterable (you can't call > next() on them). But in principle they could have been designed as > re-iterable iterators. That's not what iterable means. The iterable/iterator distinction is well defined and reflected in the collections ABCs: * iterables are objects that return iterators from __iter__. * iterators are the subset of iterables that return "self" from __iter__, and expose a next (2.x) or __next__ (3.x) method That "iterators return self from __iter__" is important, since almost everywhere Python iterates over something, it call "_itr = iter(obj)" first. So, my question is a genuine one. While, *in theory*, an object can define a stateful __iter__ method that (e.g.) only works the first time it is called, or returns a separate object that still stores it's "current position" information on the original container, I simply can't think of a non-pathological case where "isinstance(obj, Iterable) and not isinstance(obj, Iterator)" would give the wrong answer. In theory, yes, an object could obviously pass that test and still not be Reiterable, but I'm interested in what's true in *practice*. Cheers, Nick. P.S. Generator-iterators are a further subset of iterators that expose send and throw and are integrated with the interpreter eval loop in various ways that other objects can't yet match. Although I think Mark Shannon has some ideas about refactoring that API to let other objects plug into it. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Thu Sep 19 18:07:40 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 19 Sep 2013 09:07:40 -0700 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> <1379435260.28764.23112997.115F6763@webmail.messagingengine.com> Message-ID: <67B7C352-5251-46DE-BED3-7D1FF38A8A78@yahoo.com> On Sep 17, 2013, at 10:41, Alexander Belopolsky wrote: > On Tue, Sep 17, 2013 at 1:02 PM, Brett Cannon wrote: >> As you pointed out, getting the locale details is essentially not possible in a cross-platform way unless you use strptime or strftime, so you have to choose which is implemented in Python and relies on the other. > > What we can do is to implement "C" locale behavior. In fact, in many uses of strftime() its locale-dependence is a problem. But in many cases it's useful. And the platform doesn't give us any way to get enough information about the locale to implement it ourselves. It's the same reason we have naive local times--local times are useful, the platform doesn't give us enough information about the local timezone, so we have to use what it gives us. > I would much rather have strftime_l()-like function and "C" locale implemented in stdlib. I agree that having both would be useful. If you're suggesting renaming platform-dependent locale-handling strftime to strftime_l, and adding a new "C"-locale-only strftime, I don't like the naming. The function that acts just like the POSIX function strftime, and like the Python function in every version up to now, should be called strftime; give the new function a different name instead. Otherwise, I can't see a problem. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Sep 19 18:26:53 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 19 Sep 2013 09:26:53 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130919121828.GK19939@ando> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> Message-ID: <89A6482E-0700-4414-9E09-9AEB143F30E1@yahoo.com> On Sep 19, 2013, at 5:18, Steven D'Aprano wrote: > On Thu, Sep 19, 2013 at 07:12:26PM +1000, Nick Coghlan wrote: > >> is there any obvious case where "iterable but >> not an iterator" gives the wrong answer? > > I'm not sure if it counts as "obvious", but one can write an iterator > that is re-iterable. A trivial example: > > class Reiter: > def __init__(self): > self.i = 0 > def __next__(self): > i = self.i > if i < 10: > self.i += 1 > return i > self.i = 0 > raise StopIteration > def __iter__(self): > return self > > > I know that according to the iterator protocol, such a re-iterator > counts as "broken": It also wouldn't break the OP's code, or any other reasonable code that cares about the distinction; at worst, it would cause it to unnecessarily make an extra list copy. The only thing that would break the code is something that isn't an iterator, is an iterable, and can only be iterated once. Of course you could build that as well, but it would be even more pathological than your example. For example: class OneShot: def __init__(self, it): self.it = iter(it) def __iter__(self): return self.it Besides being something no one should ever write, it's also something no code could ever guard against. If we had the Reiterable ABC, I could just register OneShot as Reiterable. > Another example might be iterators with a reset or restart method, or > similar. E.g. file objects and seek(0). File objects are officially > "broken" iterators, since you can seek back to the beginning of the > file. I don't think that's a bad thing. They can't be reiterated if you just treat them as iterators. You have to treat them as files--e.g., call seek(0)--if you want to reiterate them. So there's no way this could be a problem in any real code. > But nor am I sure that it requires a special Reiterable class so we can > test for it. Unless you added a "__reiter__" method, or some other way to get a new, reset-to-the-start, iterator from the iterable, such a class wouldn't help anyway. And even if we had that method, for loops and yield from and so on would all have to try __reiter__ first and fall back to __iter__. Otherwise, you still wouldn't be able to pass a file to the OP's code, or any other code that distinguishes on Reiterable to decide whether to copy or tee or use a one-pass algorithm instead of multi-pass or whatever. From alexander.belopolsky at gmail.com Thu Sep 19 19:57:08 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 19 Sep 2013 13:57:08 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <67B7C352-5251-46DE-BED3-7D1FF38A8A78@yahoo.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> <1379435260.28764.23112997.115F6763@webmail.messagingengine.com> <67B7C352-5251-46DE-BED3-7D1FF38A8A78@yahoo.com> Message-ID: On Thu, Sep 19, 2013 at 12:07 PM, Andrew Barnert wrote: > If you're suggesting renaming platform-dependent locale-handling strftime > to strftime_l, ... I was thinking of changing datetime.strftime(fmt) signature to strftime(fmt, locale=None) with default behavior being the same as now and d.strftime(fmt, "C") invoking new internal C-locale implementation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Sep 19 23:25:25 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 19 Sep 2013 17:25:25 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130919122829.GL19939@ando> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919122829.GL19939@ando> Message-ID: On 9/19/2013 8:28 AM, Steven D'Aprano wrote: > On Thu, Sep 19, 2013 at 06:31:12AM -0400, Terry Reedy wrote: >> On 9/19/2013 4:59 AM, Neil Girdhar wrote: >>> Well, generators are iterable, but if you write a function like: >>> >>> def f(s): >>> for x in s: >>> do_something(x) >>> for x in s: >>> do_something_else(x) >> >> This strikes me as bad design. It should perhaps a) be two functions or >> b) take two iterable arguments or c) jam the two loops together. > > Perhaps, but sometimes there are hidden loops. Here's an example near > and dear to my heart... *wink* > > def variance(data): > # Don't do this. > sumx = sum(data) > sumx2 = sum(x**2 for x in data) > ss = sumx2 - (sumx**2)/n > return ss/(n-1) > > > Ignore the fact that this algorithm is numerically unstable. Lets not ;-) > It fails > for iterator arguments, because sum(data) consumes the iterator and > leaves sumx2 always equal to zero. This is doubly bad design because the two 'hidden' loops are trivially jammed together in one explicit loop, while use of Reiterable would not remove the numerical instability. While it may seem that a numerically stable solution needs two loops (the second to sum (x-sumx)**2), the two loops can still be jammed together with the Method of Provisional Means. http://www.stat.wisc.edu/~larget/math496/mean-var.html http://www.statistical-solutions-software.com/BMDP-documents/BMDP-Formula1.pdf Also called 'online algorithm' and 'Weighted incremental algorithm' in https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance This was invented and used back when re-iteration of large datasets (on cards or tape) was possible but very slow (1970s or before). (Restack or rewind and reread might triple the (expensive) run time.) -- Terry Jan Reedy From tjreedy at udel.edu Thu Sep 19 23:40:20 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 19 Sep 2013 17:40:20 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130919121828.GK19939@ando> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> Message-ID: On 9/19/2013 8:18 AM, Steven D'Aprano wrote: > On Thu, Sep 19, 2013 at 07:12:26PM +1000, Nick Coghlan wrote: > >> is there any obvious case where "iterable but >> not an iterator" gives the wrong answer? > > I'm not sure if it counts as "obvious", but one can write an iterator > that is re-iterable. A trivial example: > > class Reiter: > def __init__(self): > self.i = 0 > def __next__(self): > i = self.i > if i < 10: > self.i += 1 > return i > self.i = 0 This, I agree, is bad. > raise StopIteration > def __iter__(self): > return self > > > I know that according to the iterator protocol, such a re-iterator > counts as "broken": > > [quote] > The intention of the protocol is that once an iterator?s next() method > raises StopIteration, it will continue to do so on subsequent calls. I would add 'unless and until iter() or another reset method is called. Once one pokes at a iterator with another mutation method, all bets are off. I would consider Reiter less broken or not at all if the reset in __next__ were removed, since then it would continue to raise until explicity reset with __iter__ > Implementations that do not obey this property are deemed broken. (This > constraint was added in Python 2.3; in Python 2.2, various iterators are > broken according to this rule.) > > http://docs.python.org/2/library/stdtypes.html#iterator-types > > but clearly there is a use-case for re-iterable "things", such as dict > views, which can be re-iterated over. We just don't call them iterators. > So maybe there should be a way to distinguish between "oops this > iterator is broken" and "yes, this object can be iterated over > repeatedly, it's all good". > > At the moment, dict views aren't directly iterable (you can't call > next() on them). But in principle they could have been designed as > re-iterable iterators. > > Another example might be iterators with a reset or restart method, or > similar. E.g. file objects and seek(0). File objects are officially > "broken" iterators, since you can seek back to the beginning of the > file. I don't think that's a bad thing. > > But nor am I sure that it requires a special Reiterable class so we can > test for it. > > -- Terry Jan Reedy From abarnert at yahoo.com Fri Sep 20 00:00:27 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 19 Sep 2013 15:00:27 -0700 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> <1379435260.28764.23112997.115F6763@webmail.messagingengine.com> <67B7C352-5251-46DE-BED3-7D1FF38A8A78@yahoo.com> Message-ID: On Sep 19, 2013, at 10:57, Alexander Belopolsky wrote: > > On Thu, Sep 19, 2013 at 12:07 PM, Andrew Barnert wrote: >> If you're suggesting renaming platform-dependent locale-handling strftime to strftime_l, ... > > I was thinking of changing datetime.strftime(fmt) signature to strftime(fmt, locale=None) with default behavior being the same as now and d.strftime(fmt, "C") invoking new internal C-locale implementation. But that API implies that you could call, e.g., d.strftime(fmt, "pt_BR"), which I assume isn't something anyone is planning on implementing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri Sep 20 00:22:29 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 19 Sep 2013 18:22:29 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> Message-ID: Why not do it the way Antoine suggested, but instead of self.need_cloning = isinstance(it, collections.Iterator) have self.need_cloning = isinstance(it, collections.Reiterable) Then, mark the appropriate classes as subclasses of collections.Reiterable where collections.Sequence < collections.Reiterable < collections.Iterable? Best, Neil On Thu, Sep 19, 2013 at 5:40 PM, Terry Reedy wrote: > On 9/19/2013 8:18 AM, Steven D'Aprano wrote: > >> On Thu, Sep 19, 2013 at 07:12:26PM +1000, Nick Coghlan wrote: >> >> is there any obvious case where "iterable but >>> not an iterator" gives the wrong answer? >>> >> >> I'm not sure if it counts as "obvious", but one can write an iterator >> that is re-iterable. A trivial example: >> >> class Reiter: >> def __init__(self): >> self.i = 0 >> def __next__(self): >> i = self.i >> if i < 10: >> self.i += 1 >> return i >> self.i = 0 >> > > This, I agree, is bad. > > > raise StopIteration >> def __iter__(self): >> return self >> >> >> I know that according to the iterator protocol, such a re-iterator >> counts as "broken": >> >> [quote] >> The intention of the protocol is that once an iterator?s next() method >> raises StopIteration, it will continue to do so on subsequent calls. >> > > I would add 'unless and until iter() or another reset method is called. > Once one pokes at a iterator with another mutation method, all bets are > off. I would consider Reiter less broken or not at all if the reset in > __next__ were removed, since then it would continue to raise until > explicity reset with __iter__ > > > Implementations that do not obey this property are deemed broken. (This >> constraint was added in Python 2.3; in Python 2.2, various iterators are >> broken according to this rule.) >> >> http://docs.python.org/2/**library/stdtypes.html#**iterator-types >> >> but clearly there is a use-case for re-iterable "things", such as dict >> views, which can be re-iterated over. We just don't call them iterators. >> So maybe there should be a way to distinguish between "oops this >> iterator is broken" and "yes, this object can be iterated over >> repeatedly, it's all good". >> >> At the moment, dict views aren't directly iterable (you can't call >> next() on them). But in principle they could have been designed as >> re-iterable iterators. >> >> Another example might be iterators with a reset or restart method, or >> similar. E.g. file objects and seek(0). File objects are officially >> "broken" iterators, since you can seek back to the beginning of the >> file. I don't think that's a bad thing. >> >> But nor am I sure that it requires a special Reiterable class so we can >> test for it. >> >> >> > > -- > Terry Jan Reedy > > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit https://groups.google.com/d/** > topic/python-ideas/**OumiLGDwRWA/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe@**googlegroups.com > . > For more options, visit https://groups.google.com/**groups/opt_out > . > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Sep 20 03:28:04 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 19 Sep 2013 21:28:04 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> Message-ID: Answering am going answer three people in one response. In no particular order... On 9/19/2013 9:02 AM, Nick Coghlan wrote: > So, my question is a genuine one. While, *in theory*, an object can > define a stateful __iter__ method that (e.g.) only works the first > time it is called, or returns a separate object that still stores it's > "current position" information on the original container, I simply > can't think of a non-pathological case where "isinstance(obj, > Iterable) and not isinstance(obj, Iterator)" would give the wrong > answer. > In theory, yes, an object could obviously pass that test and still not > be Reiterable, but I'm interested in what's true in *practice*. On 9/19/2013 6:26 AM, Antoine Pitrou wrote: >> A slight problem is that there is no guaranteed that a non-iterator >> iterable is re-iterable. > Any useful examples? On 9/19/2013 7:37 AM, Joshua Landau wrote:> On 19 September 2013 11:28, Terry Reedy wrote: >> Not everything in that category is necessarily re-iterable. > I cannot think of a non-pathological case where it is not; if it is > not re-iterable it should be changed to an iterator if it isn't > already. [I think 'pathological' is a bit 'heavy' as a synonym for 'poorly written' ;=] >> Or if it is serially reiterable, it may not be parallel iterable, >> as needed for nested loops. > What do you mean? To back up a bit: When dev write a function, dev is responsible to specify acceptible inputs. Neither the language or custom require dev to test that inputs meet the specification. Looking before leaping may not always work. I believe this to be true when inputs are iterables. When user calls a function, user is responsible to provide arguments that meet the specification and accept the consequences either way. When dev specifies an 'iterable' argument, he is (should be) saying that the argument will be iterated at most once and probably will be iterated eventually. If user passes an iterator, user should (except possibly in rare cases) not use it otherwise. The first problem, which impinges on both specification and reiteration, is than an iterable may be either finite, or not, or 'in between' depending the hardware and user needs. I think we should take 'iterable' to mean 'finite iterable' unless dev explicitly relaxes that by saying 'possibly infinite iterable'. (To be clear, infinite iterables are extremely useful.) An additional complication, including for reiteration, is that 'practically' finite may be different for time and space. For instance, 'for i in range(10000000000): pass # 10 billion iterations' would take about 5 minute on my machine while list(range(10000000000)) would fail. (The opposite situation is possible, but less relevant to this issue.) Currently, if dev needs to iterate an input more than once, the specification should say so. If the user wants to pass an iterator, the user can instead pass list(iter). The reason to have user rather than dev make this call is that user is in a better position than dev to know whether iter is effectively finite. Now to the varieties of reiteration: A. Serial: iterate the input (typically to exhaustion) and then reiterate (typically to exhaustion). In the typical case, the iterable must be finite. Given finite iterator iter, list(iter) is probably more efficient than tee(iter). But let user decide if either is sensible. B. Parallel: iterate the input with two iterators that march along more or less in parallel. The degenerate extreme 'for a,b in zip(iter,iter):' would be better written 'for a in iter: b = a'. If the two iterators are mostly in sync, then the second iterator is only really needed when they diverge. In any case, parallel iteration is best handled internally, invisible to the caller, with tee or two or more indexes. (Indexes into a concrete collection are nice because it is so easy to sync one to the other -- 'i = j' or 'j = i'.) While re does this with finite strings, the underlying iterable for such functions does not, in general, need to be finite. C: Crossed: iterate different dimensions in 'crossed' fashion. "for i in row: for j in column". For this to involve reiteration, case one is square arrays iterated by index. But then it is not an issue, as that will be done with a reiterable range. Case two is with multiple iterator inputs, with cross products as one example: def cross(itera, iterb): for a in itera: for b in iterb: yield a,b The doc should specify that itera and iterb must be independent iterables. Note that the outermost iterator does not have to be finite. Useful example and determinism: generator functions are callable but not iterable. For the simple iterate once situation, one calls and passes the resulting generator. For reiteration, the following may work: class GenfIt: def __init__(self, genf, *args): self.genf = genf self.args = args def __iter__(self): return self.genf(*args) However, another hidden assumption in this thread has been that non-iterator iterables are deterministic, in the sense that re-calling iter(it) returns an iterator that yields the same sequence of items before raising StopIteration. Some very useful iterator-producing functions do not do that (ones returning iterators based on pseudo-random or external inputs). So we need to add 'deterministic' to the notion of 'reiterable'. And that cannot be mechanically determined. (Other possible complications: a resource can only be accessed by one connection at a time. Or it limits the frequency of connections.) In summary: A. There are multiple iterable and iteration use cases. B. We cannot really get away from documenting the requirements for iterable inputs and keeping some responsibility for meeting them in the hands of callers. -- Terry Jan Reedy From abarnert at yahoo.com Fri Sep 20 04:19:02 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 19 Sep 2013 19:19:02 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> Message-ID: <023C2EA0-9716-4FE1-888D-C556B8917FA1@yahoo.com> On Sep 19, 2013, at 15:22, Neil Girdhar wrote: > Why not do it the way Antoine suggested, but instead of > > self.need_cloning = isinstance(it, collections.Iterator) > > have > > self.need_cloning = isinstance(it, collections.Reiterable) Because we already have Iterator today, and we don't have Reiterable, and nobody has yet come up with a useful case where the latter would do the right thing and the former wouldn't. (Also because the second one is the exact opposite of what you meant... But I assume that's a simple typo.) > > Then, mark the appropriate classes as subclasses of collections.Reiterable where collections.Sequence < collections.Reiterable < collections.Iterable? > > Best, > > Neil > > > On Thu, Sep 19, 2013 at 5:40 PM, Terry Reedy wrote: >> On 9/19/2013 8:18 AM, Steven D'Aprano wrote: >>> On Thu, Sep 19, 2013 at 07:12:26PM +1000, Nick Coghlan wrote: >>> >>>> is there any obvious case where "iterable but >>>> not an iterator" gives the wrong answer? >>> >>> I'm not sure if it counts as "obvious", but one can write an iterator >>> that is re-iterable. A trivial example: >>> >>> class Reiter: >>> def __init__(self): >>> self.i = 0 >>> def __next__(self): >>> i = self.i >>> if i < 10: >>> self.i += 1 >>> return i >>> self.i = 0 >> >> This, I agree, is bad. >> >> >>> raise StopIteration >>> def __iter__(self): >>> return self >>> >>> >>> I know that according to the iterator protocol, such a re-iterator >>> counts as "broken": >>> >>> [quote] >>> The intention of the protocol is that once an iterator?s next() method >>> raises StopIteration, it will continue to do so on subsequent calls. >> >> I would add 'unless and until iter() or another reset method is called. Once one pokes at a iterator with another mutation method, all bets are off. I would consider Reiter less broken or not at all if the reset in __next__ were removed, since then it would continue to raise until explicity reset with __iter__ >> >> >>> Implementations that do not obey this property are deemed broken. (This >>> constraint was added in Python 2.3; in Python 2.2, various iterators are >>> broken according to this rule.) >>> >>> http://docs.python.org/2/library/stdtypes.html#iterator-types >>> >>> but clearly there is a use-case for re-iterable "things", such as dict >>> views, which can be re-iterated over. We just don't call them iterators. >>> So maybe there should be a way to distinguish between "oops this >>> iterator is broken" and "yes, this object can be iterated over >>> repeatedly, it's all good". >>> >>> At the moment, dict views aren't directly iterable (you can't call >>> next() on them). But in principle they could have been designed as >>> re-iterable iterators. >>> >>> Another example might be iterators with a reset or restart method, or >>> similar. E.g. file objects and seek(0). File objects are officially >>> "broken" iterators, since you can seek back to the beginning of the >>> file. I don't think that's a bad thing. >>> >>> But nor am I sure that it requires a special Reiterable class so we can >>> test for it. >> >> >> -- >> Terry Jan Reedy >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> -- >> >> --- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 20 04:34:26 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 19 Sep 2013 19:34:26 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> Message-ID: On Sep 19, 2013, at 18:28, Terry Reedy wrote: > B. Parallel: iterate the input with two iterators that march along more or less in parallel. The degenerate extreme 'for a,b in zip(iter,iter):' would be better written 'for a in iter: b = a'. If the two iterators are mostly in sync, then the second iterator is only really needed when they diverge But this does something totally different for iterators and other iterables. For an iterator, zip(iter, iter) will get you items (0, 1), then (2, 3), then (4, 5), etc. For anything else, it'll get you items (0, 0), then (1, 1), etc. And the same basic difference holds for less extreme versions, except more obviously. So, even if there is any useful distinction between reiterable iterators and non-reiterable here (and I don't think I see one), it pales next to the distinction between iterables that return themselves on iter and those that don't. From mistersheik at gmail.com Fri Sep 20 05:18:54 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 19 Sep 2013 23:18:54 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <023C2EA0-9716-4FE1-888D-C556B8917FA1@yahoo.com> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <023C2EA0-9716-4FE1-888D-C556B8917FA1@yahoo.com> Message-ID: You're right, but first I'm not sure that I agree with Terry's comment that "if dev needs to iterate an input more than once, the specification should say so. If the user wants to pass an iterator, the user can instead pass list(iter). The reason to have user rather than dev make this call is that user is in a better position than dev to know whether iter is effectively finite." The problem with this is that it's exhausting to keep checking whether a function needs an iterable or not, and it's noisy for the user of the function to have to cast to things list. I've had silent breakages when I change the return value of one function from list to generator not realizing that it was passed somewhere to another function that wanted a reiterable. Most importantly, there's no sure *and* easy way to assert that the input a function is reiterable, and so the silent breakages are hard to discover. Even if the user should be the one deciding what to do, the dev has to be able to assert that the right thing was done. Therefore, I feel there should be a definitive test for reiterability. Either: * The documentation should promise that Iterable and not Iterator is reiterable, or * Reiterable should be added to collections.abc, or * some other definitive test that hasn't been brought up yet. Best, Neil On Thu, Sep 19, 2013 at 10:19 PM, Andrew Barnert wrote: > On Sep 19, 2013, at 15:22, Neil Girdhar wrote: > > Why not do it the way Antoine suggested, but instead of > > self.need_cloning = isinstance(it, collections.Iterator) > > have > > self.need_cloning = isinstance(it, collections.Reiterable) > > > Because we already have Iterator today, and we don't have Reiterable, and > nobody has yet come up with a useful case where the latter would do the > right thing and the former wouldn't. > > (Also because the second one is the exact opposite of what you meant... > But I assume that's a simple typo.) > > > Then, mark the appropriate classes as subclasses of collections.Reiterable > where collections.Sequence < collections.Reiterable < collections.Iterable? > > Best, > > Neil > > > On Thu, Sep 19, 2013 at 5:40 PM, Terry Reedy wrote: > >> On 9/19/2013 8:18 AM, Steven D'Aprano wrote: >> >>> On Thu, Sep 19, 2013 at 07:12:26PM +1000, Nick Coghlan wrote: >>> >>> is there any obvious case where "iterable but >>>> not an iterator" gives the wrong answer? >>>> >>> >>> I'm not sure if it counts as "obvious", but one can write an iterator >>> that is re-iterable. A trivial example: >>> >>> class Reiter: >>> def __init__(self): >>> self.i = 0 >>> def __next__(self): >>> i = self.i >>> if i < 10: >>> self.i += 1 >>> return i >>> self.i = 0 >>> >> >> This, I agree, is bad. >> >> >> raise StopIteration >>> def __iter__(self): >>> return self >>> >>> >>> I know that according to the iterator protocol, such a re-iterator >>> counts as "broken": >>> >>> [quote] >>> The intention of the protocol is that once an iterator?s next() method >>> raises StopIteration, it will continue to do so on subsequent calls. >>> >> >> I would add 'unless and until iter() or another reset method is called. >> Once one pokes at a iterator with another mutation method, all bets are >> off. I would consider Reiter less broken or not at all if the reset in >> __next__ were removed, since then it would continue to raise until >> explicity reset with __iter__ >> >> >> Implementations that do not obey this property are deemed broken. (This >>> constraint was added in Python 2.3; in Python 2.2, various iterators are >>> broken according to this rule.) >>> >>> http://docs.python.org/2/**library/stdtypes.html#**iterator-types >>> >>> but clearly there is a use-case for re-iterable "things", such as dict >>> views, which can be re-iterated over. We just don't call them iterators. >>> So maybe there should be a way to distinguish between "oops this >>> iterator is broken" and "yes, this object can be iterated over >>> repeatedly, it's all good". >>> >>> At the moment, dict views aren't directly iterable (you can't call >>> next() on them). But in principle they could have been designed as >>> re-iterable iterators. >>> >>> Another example might be iterators with a reset or restart method, or >>> similar. E.g. file objects and seek(0). File objects are officially >>> "broken" iterators, since you can seek back to the beginning of the >>> file. I don't think that's a bad thing. >>> >>> But nor am I sure that it requires a special Reiterable class so we can >>> test for it. >>> >>> >>> >> >> -- >> Terry Jan Reedy >> >> >> >> ______________________________**_________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/**mailman/listinfo/python-ideas >> >> -- >> >> --- You received this message because you are subscribed to a topic in >> the Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/** >> topic/python-ideas/**OumiLGDwRWA/unsubscribe >> . >> To unsubscribe from this group and all its topics, send an email to >> python-ideas+unsubscribe@**googlegroups.com >> . >> For more options, visit https://groups.google.com/**groups/opt_out >> . >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Fri Sep 20 07:15:31 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 20 Sep 2013 14:15:31 +0900 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <023C2EA0-9716-4FE1-888D-C556B8917FA1@yahoo.com> Message-ID: <874n9g3wgc.fsf@uwakimon.sk.tsukuba.ac.jp> Neil Girdhar writes: > Most importantly, there's no sure *and* easy way to assert that the > input a function is reiterable, and so the silent breakages are > hard to discover. But you haven't defined "reiterable" yet, except "it fixes the breakage I've experienced". "Reiterable" could mean that the same object can be passed to iteration contexts freely, and in each one it will start from the beginning and the context will receive the same sequence of objects (in the same order). Or order might not be guaranteed. It might mean that the same object once exhausted can be passed to another iteration context and it will restart. Or it might mean that the object supports a rewind method that must be explicitly called, but can be called even if the object hasn't been exhausted. Or it might mean that the object is clonable, and functions that iterate objects passed into them must clone them unless they know that the object will never be reiterated. All of the above also have concurrent variations: in a threading context, multiple threads have access to each object and might be iterating with arbitrary timing. (Eg, if a program is rewritten to use threads, a sequential reiteration could easily become a parallel/concurrent reiteration.) Oh, another: AFAIK even non-iterator iterables may change their content when iterated. Eg, weak containers: I forget if there are any iterables that allow deletions and insertions in underlying containers, but in the case of a weak ref deleting a ref elsewhere may cause the ref itself to disappear. Should "reiterable" provide any guarantees there? >?Even if the user should be the one deciding what to do, the dev has > to be able to assert that the right thing was done. But there's no such need *between user and dev*. Assertions protect a dev from *herself*. Users do what they do, and devs either protect themselves from user vagaries, or they don't. If the dev wants to protect herself from undesirable user choices, cloning an iterator in Python should be cheap. If it isn't, let's fix that. Assertions are useful, indeed. But in this case, where the assertion itself is based on an undefined term as far as I can tell (I suspect this is because different use cases actually want different definitions), rather than an assertion the dev should treat herself (as a writer of other modules) as a user. Ie, she should protect herself from herself in the same way by cloning the iterator (for efficiency converting the iterable to an iterator then cloning). Regards, From abarnert at yahoo.com Fri Sep 20 09:51:03 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 20 Sep 2013 00:51:03 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <023C2EA0-9716-4FE1-888D-C556B8917FA1@yahoo.com> Message-ID: <4BD119E4-2ED9-47B4-8AB5-3A7756C926E8@yahoo.com> On Sep 19, 2013, at 20:18, Neil Girdhar wrote: > Therefore, I feel there should be a definitive test for reiterability. Either: > * The documentation should promise that Iterable and not Iterator is reiterable, or > * Reiterable should be added to collections.abc, or > * some other definitive test that hasn't been brought up yet. Everyone agrees that, in theory, someone could create a non-iterator Iterable that can only be iterated once. The question is: has anyone ever done such a thing? (Intentionally, that is. Anyone who accidentally created a broken iterable wouldn't be helped by a new ABC--they can't protect against unintentionally broken semantics--and even less so by a documentation change.) Can you think of any good reason anyone might ever want to do such a thing? If not, what are you hoping to protect against, and how do you hope this change to help you do so? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri Sep 20 10:10:35 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 20 Sep 2013 04:10:35 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <874n9g3wgc.fsf@uwakimon.sk.tsukuba.ac.jp> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <023C2EA0-9716-4FE1-888D-C556B8917FA1@yahoo.com> <874n9g3wgc.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Sep 20, 2013 at 1:15 AM, Stephen J. Turnbull wrote: > Neil Girdhar writes: > > > Most importantly, there's no sure *and* easy way to assert that the > > input a function is reiterable, and so the silent breakages are > > hard to discover. > > But you haven't defined "reiterable" yet, except "it fixes the > breakage I've experienced". > > "Reiterable" could mean that the same object can be passed to > iteration contexts freely, and in each one it will start from the > beginning and the context will receive the same sequence of objects > (in the same order). Or order might not be guaranteed. It might mean > that the same object once exhausted can be passed to another iteration > context and it will restart. Or it might mean that the object > supports a rewind method that must be explicitly called, but can be > called even if the object hasn't been exhausted. Or it might mean > that the object is clonable, and functions that iterate objects passed > into them must clone them unless they know that the object will never > be reiterated. > > Many different solutions would fix the problems I've seen. My suggestion is that Reiterable should be define as an iteratable for which calling the __iter__ method yields the same elements in the same order irrespective of whether __iter__ is called while a previously returned iterator is still iterating. That way Antoine's above code would turn any non-reiterable into a reiterable of this strong definition. Correct me if I'm wrong, but views on dicts are reiterable. All of the above also have concurrent variations: in a threading > context, multiple threads have access to each object and might be > iterating with arbitrary timing. (Eg, if a program is rewritten to > use threads, a sequential reiteration could easily become a > parallel/concurrent reiteration.) Oh, another: AFAIK even > non-iterator iterables may change their content when iterated. Eg, > weak containers: I forget if there are any iterables that allow > deletions and insertions in underlying containers, but in the case of > a weak ref deleting a ref elsewhere may cause the ref itself to > disappear. Should "reiterable" provide any guarantees there? > I think it shouldn't because Sequence doesn't guarantee that x = len(a) f(a) a[x-1] = 5 won't throw if e.g., f does something to a. > > > Even if the user should be the one deciding what to do, the dev has > > to be able to assert that the right thing was done. > > But there's no such need *between user and dev*. Assertions protect a > dev from *herself*. Users do what they do, and devs either protect > themselves from user vagaries, or they don't. If the dev wants to > protect herself from undesirable user choices, cloning an iterator in > Python should be cheap. If it isn't, let's fix that. > the dev/user terminology I think is unfortunate. In my case, I'm both the dev and the user. I think a better way to word is that I personally want to encapsulate the double-iteration in the member function. I don't want to have to know how my iterable is going to be used as a caller. Your second point that the method should be able to cheaply clone an iterator cheaply is precisely what I'd like to achieve with a "Reiterator" class like Antoine's. Its problem is that it makes an assumption that non-iterator iterables are reiterable, which is not promised. For that class to work, that should either be promised or another mechanism should be provided to satisfy its initial check. > > Assertions are useful, indeed. But in this case, where the assertion > itself is based on an undefined term as far as I can tell (I suspect > this is because different use cases actually want different > definitions), rather than an assertion the dev should treat herself > (as a writer of other modules) as a user. Ie, she should protect > herself from herself in the same way by cloning the iterator (for > efficiency converting the iterable to an iterator then cloning). > It sounds like you're saying put the items into a list no matter what. That's what I was doing before this thread. I just thought it would be more efficient if the object were a view, a list, a tuple, or a numpy array, for the code to elide the list construction. This could be achieved as described above. Best, Neil > Regards, > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Sep 20 10:33:05 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 20 Sep 2013 10:33:05 +0200 Subject: [Python-ideas] Introduce collections.Reiterable References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> Message-ID: <20130920103305.0d20539e@pitrou.net> Le Thu, 19 Sep 2013 21:28:04 -0400, Terry Reedy a ?crit : > The first problem, which impinges on both specification and > reiteration, is than an iterable may be either finite, or not, or 'in > between' depending the hardware and user needs. This isn't a problem. itertools.tee() will deal with it fine. > An additional complication, including for reiteration, is that > 'practically' finite may be different for time and space. This is a strawman, since the "complication" applies to all kinds of iterables. > Currently, if dev needs to iterate an input more than once, the > specification should say so. If the user wants to pass an iterator, > the user can instead pass list(iter). Not if the user really wants, or needs, the iterator to be consumed lazily. This can matter if the iterator is infinite, or if consuming it has resouce-consuming side effects such as doing I/O, etc. list(iter) is a limited solution to the problem. And the thing is, using a Reiterable helper doesn't preclude the caller from calling list() as well, so it's a strawman here. > Now to the varieties of reiteration: > > A. Serial: [...] > > B. Parallel: [...] > > C: Crossed: [...] Nice discussion, but unrelated. If the iterable doesn't work in those situations, it is purely a bug in the iterable, and it's not related to "reiteration". In other words, if an API returns something that cannot be iterated an arbitrary number of times, it should return an iterator, not an iterable ;-) > However, another hidden assumption in this thread has been that > non-iterator iterables are deterministic, in the sense that > re-calling iter(it) returns an iterator that yields the same sequence > of items before raising StopIteration. Some very useful > iterator-producing functions do not do that (ones returning iterators > based on pseudo-random or external inputs). Well, I hardly ever use non-deterministic iterables, and I can't remember passing a "pseudo-random iterator" to a function expecting a generic iterable. YMMV. > So we need to add > 'deterministic' to the notion of 'reiterable'. And that cannot be > mechanically determined. Many things are not mechanically determined that still make sense to specifiy in an API. "Mechanically determined" is a rather silly criterion when designing APIs, especially in a dynamic language where nothing can ever be taken for granted. (in other words, if you want "mechanically determined" API guarantees, perhaps you should try Haskell or Rust :-)) > (Other possible complications: a resource can only be accessed by one > connection at a time. Or it limits the frequency of connections.) That's true, but the caller can still call list() regardless of how the callee is implemented. Regards Antoine. From steve at pearwood.info Fri Sep 20 11:48:58 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 20 Sep 2013 19:48:58 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> Message-ID: <20130920094854.GO19939@ando> On Thu, Sep 19, 2013 at 11:02:57PM +1000, Nick Coghlan wrote: > On 19 September 2013 22:18, Steven D'Aprano wrote: [...] > > At the moment, dict views aren't directly iterable (you can't call > > next() on them). But in principle they could have been designed as > > re-iterable iterators. > > That's not what iterable means. The iterable/iterator distinction is > well defined and reflected in the collections ABCs: Actually, I think the collections ABC gets it wrong, according to both common practice and the definition given in the glossary: http://docs.python.org/3.4/glossary.html More on this below. As for my comment above, dict views don't obey the iterator protocol themselves, as they have no __next__ method, nor do they obey the sequence protocol, as they are not indexable. Hence they are not *directly* iterable, but they are *indirectly* iterable, since they have an __iter__ method which returns an iterator. I don't think this is a critical distinction. I think it is fine to call views "iterable", since they can be iterated over. On the rare occasion that it matters, we can just do what I did above, and talk about objects which are directly iterable (e.g. iterators, sequences, generator objects) and those which are indirectly iterable (e.g. dict views). > * iterables are objects that return iterators from __iter__. That definition is incomplete, because iterable objects include those that obey the sequence protocol. This is not only by long-standing tradition (pre-dating the introduction of iterators, if I remember correctly), but also as per the definition in the glossary. Alas, collections.Iterable gets this wrong: py> class Seq: ... def __getitem__(self, index): ... if 0 <= index < 5: return index+1000 ... raise IndexError ... py> s = Seq() py> isinstance(s, Iterable) False py> list(s) # definitely iterable [1000, 1001, 1002, 1003, 1004] (Note that although Seq obeys the sequence protocol, and is can be iterated over, it is not a fully-fledged Sequence since it has no __len__.) I think this is a bug in the Iterable ABC, but I'm not sure how one might fix it. > * iterators are the subset of iterables that return "self" from > __iter__, and expose a next (2.x) or __next__ (3.x) method That is certainly correct. All iterators are iterables, but not all iterables are iterators. > That "iterators return self from __iter__" is important, since almost > everywhere Python iterates over something, it call "_itr = iter(obj)" > first. And then falls back on the sequence protocol. > So, my question is a genuine one. While, *in theory*, an object can > define a stateful __iter__ method that (e.g.) only works the first > time it is called, or returns a separate object that still stores it's > "current position" information on the original container, I simply > can't think of a non-pathological case where "isinstance(obj, > Iterable) and not isinstance(obj, Iterator)" would give the wrong > answer. > > In theory, yes, an object could obviously pass that test and still not > be Reiterable, but I'm interested in what's true in *practice*. I don't think you and I are actually in disagreement here. This is Python, and one could write an iterator class that is reiterable, or an iterable object (as determined by isinstance) which cannot be iterated over, but I think we can dismiss them as pathological cases. Even if such unusual objects are useful, it is the caller's responsibility, not the callee's, to use them safely and appropriately with functions that are expecting them. -- Steven From p.f.moore at gmail.com Fri Sep 20 12:03:17 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 20 Sep 2013 11:03:17 +0100 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130920094854.GO19939@ando> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> Message-ID: On 20 September 2013 10:48, Steven D'Aprano wrote: > Actually, I think the collections ABC gets it wrong, according to both > common practice and the definition given in the glossary: > > http://docs.python.org/3.4/glossary.html > > More on this below. > > As for my comment above, dict views don't obey the iterator protocol > themselves, as they have no __next__ method, nor do they obey the > sequence protocol, as they are not indexable. Hence they are not > *directly* iterable, but they are *indirectly* iterable, since they have > an __iter__ method which returns an iterator. > > I don't think this is a critical distinction. I think it is fine to call > views "iterable", since they can be iterated over. On the rare occasion > that it matters, we can just do what I did above, and talk about objects > which are directly iterable (e.g. iterators, sequences, generator > objects) and those which are indirectly iterable (e.g. dict views). An iterable is an object that returns an iterator when passed to iter(). It's *iterators* that have to have __next__, not iterables. An iterable hast to have __iter__, which as far as I know dict views do >>> {}.keys().__iter__ Paul From mistersheik at gmail.com Fri Sep 20 12:18:47 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 20 Sep 2013 06:18:47 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130920094854.GO19939@ando> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> Message-ID: On Fri, Sep 20, 2013 at 5:48 AM, Steven D'Aprano wrote: > On Thu, Sep 19, 2013 at 11:02:57PM +1000, Nick Coghlan wrote: > > On 19 September 2013 22:18, Steven D'Aprano wrote: > [...] > > > At the moment, dict views aren't directly iterable (you can't call > > > next() on them). But in principle they could have been designed as > > > re-iterable iterators. > > > > That's not what iterable means. The iterable/iterator distinction is > > well defined and reflected in the collections ABCs: > > Actually, I think the collections ABC gets it wrong, according to both > common practice and the definition given in the glossary: > > http://docs.python.org/3.4/glossary.html Where does the glossary disagree with collections.abc? > > > More on this below. > > As for my comment above, dict views don't obey the iterator protocol > themselves, as they have no __next__ method, nor do they obey the > sequence protocol, as they are not indexable. Hence they are not > *directly* iterable, but they are *indirectly* iterable, since they have > an __iter__ method which returns an iterator. > What you're calling "indirectly iterable" is what the docs call "Iterable" and what collections.abc call Iterable, right? > > I don't think this is a critical distinction. I think it is fine to call > views "iterable", since they can be iterated over. On the rare occasion > that it matters, we can just do what I did above, and talk about objects > which are directly iterable (e.g. iterators, sequences, generator > objects) and those which are indirectly iterable (e.g. dict views). > > > > * iterables are objects that return iterators from __iter__. > > That definition is incomplete, because iterable objects include those > that obey the sequence protocol. This is not only by long-standing > tradition (pre-dating the introduction of iterators, if I remember > correctly), but also as per the definition in the glossary. Alas, > collections.Iterable gets this wrong: > > py> class Seq: > ... def __getitem__(self, index): > ... if 0 <= index < 5: return index+1000 > ... raise IndexError > ... > py> s = Seq() > py> isinstance(s, Iterable) > False > py> list(s) # definitely iterable > [1000, 1001, 1002, 1003, 1004] > PEP 3119 makes it clear that isinstance( collections.Sequence) is the de facto way of checking whether something is a sequence. Casting to list is not the de facto way. Therefore, Seq is neither Iterable nor a Sequence according to collections.abc. If you inherit from the collections.Sequence (you'll need to implement __len__) you'll get the Iterable stuff for free as desired: Sequence subclasses Iterable. > > > (Note that although Seq obeys the sequence protocol, and is can be > iterated over, it is not a fully-fledged Sequence since it has no > __len__.) > I guess we disagree that Seq obeys the sequence protocol. > > I think this is a bug in the Iterable ABC, but I'm not sure how one > might fix it. > > > > > * iterators are the subset of iterables that return "self" from > > __iter__, and expose a next (2.x) or __next__ (3.x) method > > That is certainly correct. All iterators are iterables, but not all > iterables are iterators. > > > > That "iterators return self from __iter__" is important, since almost > > everywhere Python iterates over something, it call "_itr = iter(obj)" > > first. > > And then falls back on the sequence protocol. > > > > So, my question is a genuine one. While, *in theory*, an object can > > define a stateful __iter__ method that (e.g.) only works the first > > time it is called, or returns a separate object that still stores it's > > "current position" information on the original container, I simply > > can't think of a non-pathological case where "isinstance(obj, > > Iterable) and not isinstance(obj, Iterator)" would give the wrong > > answer. > > > > In theory, yes, an object could obviously pass that test and still not > > be Reiterable, but I'm interested in what's true in *practice*. > > I don't think you and I are actually in disagreement here. This is > Python, and one could write an iterator class that is reiterable, or an > iterable object (as determined by isinstance) which cannot be iterated > over, but I think we can dismiss them as pathological cases. Even if > such unusual objects are useful, it is the caller's responsibility, not > the callee's, to use them safely and appropriately with functions that > are expecting them. > Is it possible minimize the mental load on the caller by encapsulating the distinction between parameters that accept iterables and reiterables? One of the big problems with C++ for example is the great care that must be taken, e.g. to not write past the ends of arrays. A small mistake can take a week to track down. One does become more careful with years of experience, but it is much simpler if the language prevents such catastrophes. For me, Python has been this language in many ways. Reiterables would be another such defensively motivated distinction. Of course, you could just ask callers to "be more careful", but I don't see the problem with fixing the language specification so that Antoine's Reiterable adaptor works properly. Cheers, Neil > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Fri Sep 20 12:45:33 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 20 Sep 2013 11:45:33 +0100 Subject: [Python-ideas] Numerical instability was: Re: Introduce collections.Reiterable Message-ID: On 19 September 2013 22:25, Terry Reedy wrote: > On 9/19/2013 8:28 AM, Steven D'Aprano wrote: >> >> def variance(data): >> # Don't do this. >> sumx = sum(data) >> sumx2 = sum(x**2 for x in data) >> ss = sumx2 - (sumx**2)/n >> return ss/(n-1) >> >> Ignore the fact that this algorithm is numerically unstable. > > Lets not ;-) > >> It fails >> for iterator arguments, because sum(data) consumes the iterator and >> leaves sumx2 always equal to zero. > > This is doubly bad design because the two 'hidden' loops are trivially > jammed together in one explicit loop, while use of Reiterable would not > remove the numerical instability. While it may seem that a numerically > stable solution needs two loops (the second to sum (x-sumx)**2), the two > loops can still be jammed together with the Method of Provisional Means. > > http://www.stat.wisc.edu/~larget/math496/mean-var.html > http://www.statistical-solutions-software.com/BMDP-documents/BMDP-Formula1.pdf > > Also called 'online algorithm' and 'Weighted incremental algorithm' in > https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance > > This was invented and used back when re-iteration of large datasets (on > cards or tape) was possible but very slow (1970s or before). (Restack or > rewind and reread might triple the (expensive) run time.) I'm never quite sure what exactly is meant by "numerical instability" in most contexts because I'm mainly familiar with the use of the term in ODE solvers where an unstable solver will literally diverge from the true solution as if it were an unstable equilibrium. However in that context the error is truncation error rather than rounding error and would occur even with infinite precision arithmetic: http://en.wikipedia.org/wiki/Stiff_equation I'm going to assume that numerical instability is just a way of saying that a method is inaccurate in some cases. Although the incremental algorithm is much better than the naive approach Steven (knowingly) showed above I don't think it's true that constraining yourself to a single pass doesn't limit the possible accuracy. Another point of relevance here is that the incremental formula cannot be as efficiently implemented in Python since you don't get to take advantage of the super fast math.fsum function which is also more accurate than a naive Kahan algorithm. The script at the bottom of this post tests a few methods on a deliberately awkward set of random numbers and typical output is: $ ./stats.py exact: 0.989661716301 naive -> error = -21476.0408922 incremental -> error = -1.0770901604e-07 two_pass -> error = 1.29118937764e-13 three_pass -> error = 0.0 For these numbers the three_pass method usually has an error of 0 but otherwise 1ulp (1e-16). (It can actually be collapsed into a two pass method but then we couldn't use fsum.) If you know of a one-pass algorithm (or a way to improve the implementation I showed) that is as accurate as either the two_pass or three_pass methods I'd be very interested to see it (I'm sure Steven would be as well). Oscar $ cat stats.py #!/usr/bin/env python from __future__ import print_function from random import gauss from math import fsum from fractions import Fraction # Return the exact result as a Fraction. Nothing wrong # with using the computational formula for variance here. def variance_exact(data): data = [Fraction(x) for x in data] n = len(data) sumx = sum(data) sumx2 = sum(x**2 for x in data) ss = sumx2 - (sumx**2)/n return ss/(n-1) # Although this is the most efficient formula when using # exact computation it fails under fixed precision # floating point since it ends up subtracting two large # almost equal numbers leading to a catastrophic loss of # precision. def variance_naive(data): n = len(data) sumx = fsum(data) sumx2 = fsum(x**2 for x in data) ss = sumx2 - (sumx**2)/n return ss/(n-1) # Incremental variance calculation from Wikipedia. If # the above uses fsum then a fair comparison should # use some compensated summation here also. However # it's not clear (to me) how to incorporate compensated # summation here. # http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Compensated_variant def variance_incremental(data): n = 0 mean = 0 M2 = 0 for x in data: n = n + 1 delta = x - mean mean = mean + delta/n M2 = M2 + delta*(x - mean) variance = M2/(n - 1) return variance # This is to me the obvious formula since I think of # this as the definition of the variance. def variance_twopass(data): n = len(data) mean = fsum(data) / n sumdev2 = fsum((x - mean)**2 for x in data) variance = sumdev2 / (n - 1) return variance # This is the three-pass algorithm used in Steven's # statistics module. It's not one I had seen before but # AFAICT it's very accurate. In fact the 2nd and 3rd passes # can be merged as in variance_incremental but then we # wouldn't be able to take advantage of fsum. def variance_threepass(data): n = len(data) mean = fsum(data) / n sumdev2 = fsum((x-mean)**2 for x in data) # The following sum should mathematically equal zero, but due to rounding # error may not. sumdev2 -= fsum((x-mean) for x in data)**2 / n return sumdev2 / (n - 1) methods = [ ('naive', variance_naive), ('incremental', variance_incremental), ('two_pass', variance_twopass), ('three_pass', variance_threepass), ] # Test numbers with large mean and small standard deviation. # This is the case that causes trouble for the naive formula. N = 100000 testnums = [gauss(mu=10**10, sigma=1) for n in range(N)] # First compute the exact result exact = variance_incremental([Fraction(num) for num in testnums]) print('exact:', float(exact)) # Compare each with the exact result for name, var in methods: print(name, '-> error =', var(testnums) - exact) From steve at pearwood.info Fri Sep 20 13:10:00 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 20 Sep 2013 21:10:00 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> Message-ID: <20130920111000.GQ19939@ando> On Fri, Sep 20, 2013 at 11:03:17AM +0100, Paul Moore wrote: > An iterable is an object that returns an iterator when passed to > iter(). It's *iterators* that have to have __next__, not iterables. An > iterable hast to have __iter__, which as far as I know dict views do It is not correct to say that iterables have to have an __iter__ method, by both common usage of the term, and by the definition in the glossary. Sorry to repeat myself, but iterables can also be objects which obey the sequence protocol. I already showed an example of a non-pathological object which can be iterated over but where isinstance(obj, Iterable) returns the wrong result. See my previous post, or the end of this one, for that example. Here's the glossary entry in full: iterable An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() or __getitem__() method. Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), ...). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop. See also iterator, sequence, and generator. http://docs.python.org/3.4/glossary.html I think we can all agree that dict views are iterables. You can iterate over them, and they have an __iter__ method. We can also agree that views aren't sequences, nor are they iterators themselves: py> keys = {}.keys() py> iter(keys) is keys False and isinstance gives the correct result for views: py> isinstance(keys, Sequence) False py> isinstance(keys, Iterator) False py> isinstance(keys, Iterable) True None of this is in dispute! But (and this was really a very minor point in my original post, seemingly blown all out of proportion) you can't iterate over a view directly. Or perhaps, for the avoidance of doubt, I should say you can't iterate over a view *manually* without creating an itermediate iterator object. Iteration in Python is implemented by two protocols: 1) the iterator protocol, which repeatedly calls __next__ until StopIteration is raised; and 2) the sequence protocol, which repeatedly calls __getitem__(0), __getitem__(1), __getitem__(2), ... until IndexError is raised. Dict views don't obey either of these, as it has no __next__ or __getitem__ method. That is all I mean when I say that dict views aren't "directly [manually] iterable". Instead, they have an __iter__ method which returns an object which is directly iterable, a dict_keyiterator object. This really was a very minor point, I've already spent far more words on this than it deserves. But the important point seems to have been missed, namely that the Iterable ABC gives the wrong result for some objects which are iterable. Here it is again: py> class Seq: ... def __getitem__(self, index): ... if 0 <= index < 5: return index+1000 ... raise IndexError ... py> s = Seq() py> isinstance(s, Iterable) # The ABC claims Seq is not iterable. False py> for x in s: # But it actually is. ... print(x) ... 1000 1001 1002 1003 1004 Can anyone convince me this is not a bug in the Iterable ABC? -- Steven From steve at pearwood.info Fri Sep 20 14:45:05 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 20 Sep 2013 22:45:05 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> Message-ID: <20130920124505.GT19939@ando> On Fri, Sep 20, 2013 at 06:18:47AM -0400, Neil Girdhar wrote: > On Fri, Sep 20, 2013 at 5:48 AM, Steven D'Aprano wrote: > > > On Thu, Sep 19, 2013 at 11:02:57PM +1000, Nick Coghlan wrote: > > > On 19 September 2013 22:18, Steven D'Aprano wrote: > > [...] > > > > At the moment, dict views aren't directly iterable (you can't call > > > > next() on them). But in principle they could have been designed as > > > > re-iterable iterators. > > > > > > That's not what iterable means. The iterable/iterator distinction is > > > well defined and reflected in the collections ABCs: > > > > Actually, I think the collections ABC gets it wrong, according to both > > common practice and the definition given in the glossary: > > > > http://docs.python.org/3.4/glossary.html > > > Where does the glossary disagree with collections.abc? I show below a class that is iterable, yet is not an instance of collections.Iterable. By the glossary definition it is iterable (it has a __getitem__ method that raises IndexError when there are no more items to be returned). [...] > What you're calling "indirectly iterable" is what the docs call "Iterable" > and what collections.abc call Iterable, right? I've explained this further in my reply to Paul Moore. What I should have said was *manually* iterable, in the sense of directly calling __next__ or __getitem__ on the view. Here's an example of an iterable class that collections.Iterable claims is not an iterable: > > py> class Seq: > > ... def __getitem__(self, index): > > ... if 0 <= index < 5: return index+1000 > > ... raise IndexError > > ... > > py> s = Seq() > > py> isinstance(s, Iterable) > > False > > py> list(s) # definitely iterable > > [1000, 1001, 1002, 1003, 1004] > > > > PEP 3119 makes it clear that isinstance( collections.Sequence) is the de > facto way of checking whether something is a sequence. I'm not testing whether it is a sequence. I explicitly stated it isn't a sequence, since it doesn't implement __len__. The Sequence ABC gets this right. > Casting to list is not the de facto way. No, but casting to list demonstrates that it can be iterated over. In my reply to Paul, I explicitly used it in a for-loop. > Therefore, Seq is neither Iterable nor a Sequence > according to collections.abc. I'm not concerned by Sequence. It's not a Sequence. No dispute there. But it is an iterable, since it obeys the sequence protocol and can be iterated over. (Which is not the same as being a sequence.) > > (Note that although Seq obeys the sequence protocol, and is can be > > iterated over, it is not a fully-fledged Sequence since it has no > > __len__.) > > > > I guess we disagree that Seq obeys the sequence protocol. I'm not sure why you think it doesn't obey the sequence protocol. It is demonstrably true that it does. If it wasn't obvious from the source code, it should be obvious from a few seconds' experimentation at the interactive interpreter: py> s = Seq() py> s[0] 1000 py> s[1] 1001 [...cut s[2], s[3], s[4] for brevity...] py> s[5] Traceback (most recent call last): File "", line 1, in File "", line 4, in __getitem__ IndexError That's all there is to the sequence protocol, and it's enough to make Seq objects iterable. -- Steven From p.f.moore at gmail.com Fri Sep 20 15:24:50 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 20 Sep 2013 14:24:50 +0100 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130920111000.GQ19939@ando> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> Message-ID: On 20 September 2013 12:10, Steven D'Aprano wrote: > This really was a very minor point, I've already spent far more words > on this than it deserves. But the important point seems to have been > missed, namely that the Iterable ABC gives the wrong result for some > objects which are iterable. Here it is again: > > py> class Seq: > ... def __getitem__(self, index): > ... if 0 <= index < 5: return index+1000 > ... raise IndexError > ... > py> s = Seq() > py> isinstance(s, Iterable) # The ABC claims Seq is not iterable. > False > py> for x in s: # But it actually is. > ... print(x) > ... > 1000 > 1001 > 1002 > 1003 > 1004 > > > Can anyone convince me this is not a bug in the Iterable ABC? Ah, I see. I misread your point and got it backwards. My apologies. As regards whether it is a bug, the best I can do is to refer to the definition of collections.abc.Iterable: class collections.abc.Iterable ABC for classes that provide the __iter__() method. See also the definition of iterable. Clearly the behaviour is as defined (there is no __iter__). And quite possibly the full definition of iterable (... or it has a __getitem__ *that behaves correctly when passed the interers 1, 2, 3...*) is not computable, so it's not possible to define a completely accurate spec for "what an iterable is". The ABC appears therefore to be taking a conservative approach of accepting a few false negatives for the sake of avoiding false positives. I can accept that trade-off, although I concede that it's unfortunate. But the messages I take from this are: 1. There's no way of defining an iterable ABC that covers 100% of the things that are commonly referred to as "iterables". 2. ABCs and LBYL-style coding have their own set of risks, and once again "Easier to ask for forgiveness" appears to be the approach to take :-) Paul From random832 at fastmail.us Fri Sep 20 16:46:11 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 20 Sep 2013 10:46:11 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> <1379435260.28764.23112997.115F6763@webmail.messagingengine.com> <67B7C352-5251-46DE-BED3-7D1FF38A8A78@yahoo.com> Message-ID: <1379688371.25215.24411669.1642F9A6@webmail.messagingengine.com> On Thu, Sep 19, 2013, at 18:00, Andrew Barnert wrote: > But that API implies that you could call, e.g., d.strftime(fmt, "pt_BR"), > which I assume isn't something anyone is planning on implementing. Well, you could implement it by acquiring the GIL, setting the locale (putenv + setlocale), calling the platform strftime, and then resetting the locale afterward - all while locked, to prevent exposing the temporary strftime change to other code. (This also suggests a way to implement a tzinfo object in terms of native timezones) Long-term it would be nice to have python ship its own locale data, and/or to acquire platform-specific locale data via GetLocaleInfo[Ex] on windows and nl_langinfo on POSIX OSes where it is provided. (Note that the latter still would require stopping everything and setting the global locale to acquire the data, but since you've got to translate a locale name to a handle to use GetLocaleInfo or xlocale, it'd make sense to encapsulate this in a locale object which does all this upon being created. With platform strftime as a fallback. The issue with using platform strftime to populate things in advance is that %O is difficult and %E may be intractable. From abarnert at yahoo.com Fri Sep 20 17:59:28 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 20 Sep 2013 08:59:28 -0700 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <1379688371.25215.24411669.1642F9A6@webmail.messagingengine.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> <1379435260.28764.23112997.115F6763@webmail.messagingengine.com> <67B7C352-5251-46DE-BED3-7D1FF38A8A78@yahoo.com> <1379688371.25215.24411669.1642F9A6@webmail.messagingengine.com> Message-ID: <6DFE1EDA-7BEC-4CEA-8BCB-585EEA49FB2A@yahoo.com> On Sep 20, 2013, at 7:46, random832 at fastmail.us wrote: > On Thu, Sep 19, 2013, at 18:00, Andrew Barnert wrote: >> But that API implies that you could call, e.g., d.strftime(fmt, "pt_BR"), >> which I assume isn't something anyone is planning on implementing. > > Well, you could implement it by acquiring the GIL, setting the locale > (putenv + setlocale), calling the platform strftime, and then resetting > the locale afterward - all while locked, to prevent exposing the > temporary strftime change to other code. (This also suggests a way to > implement a tzinfo object in terms of native timezones) OK, yes, you could do that, but are you actually proposing that the stdlib should do so? If not, it's a misleading API. If so, it's a much larger proposal than what we initially started with. And I think providing C-locale str[fp]time with very wide, platform-independent limits is a useful idea even without this much more radical idea. > Long-term it would be nice to have python ship its own locale data, > and/or to acquire platform-specific locale data via GetLocaleInfo[Ex] on > windows and nl_langinfo on POSIX OSes where it is provided. IIRC, OS X has a different set of (CoreFoundation-based?) APIs that take the system preferences into account as well as the locale setting, which might be worth using if you're designing the ultimate locale handling system; otherwise your apps won't act like native Cocoa apps. For that matter, both Windows and OS X have more than one notion of the local date format (long vs. short names, etc.); do you want to expose that as well, or just stick to the POSIX-like subset of each platform's capabilities? > (Note that > the latter still would require stopping everything and setting the > global locale to acquire the data, but since you've got to translate a > locale name to a handle to use GetLocaleInfo or xlocale, it'd make sense > to encapsulate this in a locale object which does all this upon being > created. With platform strftime as a fallback. The issue with using > platform strftime to populate things in advance is that %O is difficult > and %E may be intractable. From abarnert at yahoo.com Fri Sep 20 18:21:43 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 20 Sep 2013 09:21:43 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130920124505.GT19939@ando> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920124505.GT19939@ando> Message-ID: On Sep 20, 2013, at 5:45, Steven D'Aprano wrote: > On Fri, Sep 20, 2013 at 06:18:47AM -0400, Neil Girdhar wrote: >> On Fri, Sep 20, 2013 at 5:48 AM, Steven D'Aprano wrote: >> >>> On Thu, Sep 19, 2013 at 11:02:57PM +1000, Nick Coghlan wrote: >>>> On 19 September 2013 22:18, Steven D'Aprano wrote: >>> [...] >>>>> At the moment, dict views aren't directly iterable (you can't call >>>>> next() on them). But in principle they could have been designed as >>>>> re-iterable iterators. >>>> >>>> That's not what iterable means. The iterable/iterator distinction is >>>> well defined and reflected in the collections ABCs: >>> >>> Actually, I think the collections ABC gets it wrong, according to both >>> common practice and the definition given in the glossary: >>> >>> http://docs.python.org/3.4/glossary.html >> >> >> Where does the glossary disagree with collections.abc? > > I show below a class that is iterable, yet is not an instance of > collections.Iterable. By the glossary definition it is iterable (it has > a __getitem__ method that raises IndexError when there are no more items > to be returned). > > > [...] >> What you're calling "indirectly iterable" is what the docs call "Iterable" >> and what collections.abc call Iterable, right? > > I've explained this further in my reply to Paul Moore. What I should > have said was *manually* iterable, in the sense of directly calling > __next__ or __getitem__ on the view. Being able to call __next__ on something is not a property of an iterable. It's only a property of an iterator. (In fact, I think python could have defined an iterator as "an iterable with __next__" just as profitably as "an iterable whose __iter__() returns itself", and ended up with the exact same categories as today. But that's not important.) So, I'm not sure what your "manually iterable" is supposed to represent. Iterators and sequences but not other iterables? What does this distinction buy you? It seems as useful as inventing a word for all all wedge-headed cats plus tabby apple-headed cats but no other apple-headed cats: a perfectly definable category, but one of no value. And that makes me think you're still confusing iterables and iterators. Except that you've pointed out a valid distinction--making something indexable (by an initial sequence of natural numbers? or does a 1-based array or an otherwise-not-iterable mapping-like object count as an empty iterator?) makes it work with the iterable protocol, but not the Iterable ABC, so clearly you know what you're talking about. And that makes me think that I (and the people who have been responding to you before me) have missed something important in this "manually iterable" or "directly iterable" idea. So, maybe you should try explaining it a different way? From stephen at xemacs.org Fri Sep 20 18:53:02 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 21 Sep 2013 01:53:02 +0900 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <023C2EA0-9716-4FE1-888D-C556B8917FA1@yahoo.com> <874n9g3wgc.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <8738oz4eq9.fsf@uwakimon.sk.tsukuba.ac.jp> Neil Girdhar writes: > Many different solutions would fix the problems I've seen. My > suggestion is that Reiterable should be define as an iteratable for > which calling the __iter__ method yields the same elements in the > same order irrespective of whether __iter__ is called while a > previously returned iterator is still iterating. > Correct me if I'm wrong, but views on dicts are reiterable. For the same reason that sequences are: a view is not an iterator, so every time you iterate it, it gets passed to iter, and you get a new iterator, which then iterates. This is *why* Nick says that "isinstance(x, Iterable) and not isinstance(x, Iterator)" is the test you want. I can't speak for Nick on Steven A's example of an object with a __getitem__ taking a numeric argument that isn't an Iterable but is iterable, but I think that falls under "consenting adults" aka "if you're afraid it will hurt, don't". > Your second point that the method should be able to cheaply clone > an iterator cheaply is precisely what I'd like to achieve with a > "Reiterator" class like Antoine's. Well, I've kinda convinced myself that it isn't going to be easy to do that, without changing the type. The problem is that __next__ is (abstractly) a closure, and there's no way I know of to copy a function (copy.copy just returns the function object unchanged). So you'd need to expose the hidden state in the closure, and that is a change of type. > It sounds like you're saying put the items into a list no matter > what. No, I'm saying if you don't know if you may consume the iterable, you should convert to iterator, clone the iterator, and iterate the clone. But that probably requires a change of type, at which point you may as well call it "Reiterable". From random832 at fastmail.us Fri Sep 20 18:55:29 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 20 Sep 2013 12:55:29 -0400 Subject: [Python-ideas] Reduce platform dependence of date and time related functions In-Reply-To: <6DFE1EDA-7BEC-4CEA-8BCB-585EEA49FB2A@yahoo.com> References: <1379357351.28759.22678441.285099D6@webmail.messagingengine.com> <1379433703.14501.23084249.72F8456E@webmail.messagingengine.com> <1379435260.28764.23112997.115F6763@webmail.messagingengine.com> <67B7C352-5251-46DE-BED3-7D1FF38A8A78@yahoo.com> <1379688371.25215.24411669.1642F9A6@webmail.messagingengine.com> <6DFE1EDA-7BEC-4CEA-8BCB-585EEA49FB2A@yahoo.com> Message-ID: <1379696129.22168.24469709.2BA72C5C@webmail.messagingengine.com> On Fri, Sep 20, 2013, at 11:59, Andrew Barnert wrote: > OK, yes, you could do that, but are you actually proposing that the > stdlib should do so? If not, it's a misleading API. If so, it's a much > larger proposal than what we initially started with. And I think > providing C-locale str[fp]time with very wide, platform-independent > limits is a useful idea even without this much more radical idea. We've basically got five "kinds" of locale we are talking about: "C" locale - this is the easiest one to implement in a platform-independent way, but probably the least useful (if you're not intending locale-specific display, you should probably be using numeric values) Current platform locale, including all the subtleties like user preferences you mentioned, when available. This is what we support now. Specified platform locale (e.g. pt_BR, and we may still want to translate from a single format rather than needing to specify 0x0416 or "PTB" on Windows) Platform-independent version of a specified locale, using e.g. CLDR. This is the second-easiest to implement in a platform-independent way. Platform-independent version of user's current locale. There are limits to what can be achieved with this, for example Windows (and maybe Mac OS - I know the pre-OSX versions did) lets you set certain things individually. For example, I have my short date format set to yyyy-MM-dd, but otherwise I'm in the en-US locale. Anyway, this should be separate from the discussion of removing the limitations of the platform code. Locale-specific data can be acquired by calling the platform's strftime for a platform-independent strftime just as it's done for strptime now - and we'd need it as a fallback anyway. You can reduce the impact of platform's range limitations and incompatible repertoire of format specifiers by doing them individually, with a "safe" value for the year if needed, rather than throwing the whole format string to the platform function. For local time on windows, incidentally, we could extend the usable range by calling SystemTimeToTzSpecificLocalTime, but that loses the ability to use MSVCRT's version of the POSIX TZ variable. From stephen at xemacs.org Fri Sep 20 19:34:58 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 21 Sep 2013 02:34:58 +0900 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920124505.GT19939@ando> Message-ID: <871u4j4csd.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > And that makes me think that I (and the people who have been > responding to you before me) have missed something important in > this "manually iterable" or "directly iterable" idea. So, maybe you > should try explaining it a different way? How about "'Iterable' is a terrible name for an ABC that excludes an important class of iterables"? You see, the library manual lies (section 4.5 "Iterator types"): One method needs to be defined for container objects to provide iteration support: container.__iter__() But in fact this is contradicted (section 2 "Built-in functions"): iter(object[, sentinel]) Return an iterator object. The first argument is interpreted very differently depending on the presence of the second argument. Without a second argument, object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0). I suggest that section 4.5 be corrected to Container objects provide iteration support when either of the methods container.__iter__() # iteration protocol container.__getitem__() # sequence protocol is defined. In the latter case, __getitem__() must accept integer arguments starting at 0. Curiously, all of the built-in sequences support both protocols. I suppose this section ought to say which is preferred. The net result is that I guess Nick's test needs to be refined to def isIterable(o): try: iter(o) return True except TypeError: return False def isReiterable(o): return isIterable and not isinstance(o, collections.abc.Iterator) From tjreedy at udel.edu Fri Sep 20 22:01:28 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 20 Sep 2013 16:01:28 -0400 Subject: [Python-ideas] Numerical instability was: Re: Introduce collections.Reiterable In-Reply-To: References: Message-ID: On 9/20/2013 6:45 AM, Oscar Benjamin wrote: > I'm going to assume that numerical instability is just a way of saying > that a method is inaccurate in some cases. Good enough for me. > Although the incremental algorithm is much better than the naive > approach Steven (knowingly) showed above I don't think it's true that > constraining yourself to a single pass doesn't limit the possible > accuracy. That may be one difference between integer and float arithmetic. The order of operations makes a difference. > Another point of relevance here is that the incremental > formula cannot be as efficiently implemented in Python since you don't > get to take advantage of the super fast math.fsum function which is > also more accurate than a naive Kahan algorithm. Yes. One of the differences between 'theoretical' algorithms and practical algorithms coded in CPython is the bias toward using functions already coded in C. > The script at the bottom of this post tests a few methods on a > deliberately awkward set of random numbers and typical output is: Thanks for doing this. > $ ./stats.py > exact: 0.989661716301 > naive -> error = -21476.0408922 > incremental -> error = -1.0770901604e-07 > two_pass -> error = 1.29118937764e-13 > three_pass -> error = 0.0 The incremental method is good enough for data measured to 3 significant figures, as is typical in as least parts of some sciences, and the data I worked with for a decade. But it is not good enough for substantially more accurate data. The Python statistics module should cater to the latter. The doc should just say that is requires a serially re-iterable input. (A person with data too large to fit in memory could write an iterable that opens a file and returns an iterator that reads blocks of values and yields them one at a time.) The incremental method is useful for returning running means and deviations for data collected sporadically and indefinitely, without needing to store the cumulative data. It is a nice, non-obvious example of the principle that it is sometimes possible to summarize cumulative data with a relatively small and fixed set of sufficient statistics. > For these numbers the three_pass method usually has an error of 0 but > otherwise 1ulp (1e-16). (It can actually be collapsed into a two pass > method but then we couldn't use fsum.) > > If you know of a one-pass algorithm (or a way to improve the > implementation I showed) that is as accurate as either the two_pass or > three_pass methods I'd be very interested to see it (I'm sure Steven > would be as well). If I were trying to improve the incremental variance algorithm, I would study the fsum method until a really understood it and then see if I could apply the same ideas. > > > Oscar > > > $ cat stats.py > #!/usr/bin/env python > > from __future__ import print_function > > from random import gauss > from math import fsum > from fractions import Fraction > > # Return the exact result as a Fraction. Nothing wrong > # with using the computational formula for variance here. > def variance_exact(data): > data = [Fraction(x) for x in data] > n = len(data) > sumx = sum(data) > sumx2 = sum(x**2 for x in data) > ss = sumx2 - (sumx**2)/n > return ss/(n-1) > > # Although this is the most efficient formula when using > # exact computation it fails under fixed precision > # floating point since it ends up subtracting two large > # almost equal numbers leading to a catastrophic loss of > # precision. > def variance_naive(data): > n = len(data) > sumx = fsum(data) > sumx2 = fsum(x**2 for x in data) > ss = sumx2 - (sumx**2)/n > return ss/(n-1) > > # Incremental variance calculation from Wikipedia. If > # the above uses fsum then a fair comparison should > # use some compensated summation here also. However > # it's not clear (to me) how to incorporate compensated > # summation here. > # http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Compensated_variant > def variance_incremental(data): > n = 0 > mean = 0 > M2 = 0 > > for x in data: > n = n + 1 > delta = x - mean > mean = mean + delta/n > M2 = M2 + delta*(x - mean) > > variance = M2/(n - 1) > return variance > > # This is to me the obvious formula since I think of > # this as the definition of the variance. > def variance_twopass(data): > n = len(data) > mean = fsum(data) / n > sumdev2 = fsum((x - mean)**2 for x in data) > variance = sumdev2 / (n - 1) > return variance > > > # This is the three-pass algorithm used in Steven's > # statistics module. It's not one I had seen before but > # AFAICT it's very accurate. In fact the 2nd and 3rd passes > # can be merged as in variance_incremental but then we > # wouldn't be able to take advantage of fsum. > def variance_threepass(data): > n = len(data) > mean = fsum(data) / n > sumdev2 = fsum((x-mean)**2 for x in data) > # The following sum should mathematically equal zero, but due to rounding > # error may not. > sumdev2 -= fsum((x-mean) for x in data)**2 / n > return sumdev2 / (n - 1) > > methods = [ > ('naive', variance_naive), > ('incremental', variance_incremental), > ('two_pass', variance_twopass), > ('three_pass', variance_threepass), > ] > > # Test numbers with large mean and small standard deviation. > # This is the case that causes trouble for the naive formula. > N = 100000 > testnums = [gauss(mu=10**10, sigma=1) for n in range(N)] > > # First compute the exact result > exact = variance_incremental([Fraction(num) for num in testnums]) > print('exact:', float(exact)) > > # Compare each with the exact result > for name, var in methods: > print(name, '-> error =', var(testnums) - exact) -- Terry Jan Reedy From tim.peters at gmail.com Fri Sep 20 22:25:07 2013 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 20 Sep 2013 15:25:07 -0500 Subject: [Python-ideas] Numerical instability was: Re: Introduce collections.Reiterable In-Reply-To: References: Message-ID: [Terry Reedy] > ... > If I were trying to improve the incremental variance algorithm, I would > study the fsum method until a really understood it and then see if I could > apply the same ideas. There are a number of ways to do floating "as if with infinite precision" addition, implemented in pure Python, here: http://code.activestate.com/recipes/393090-binary-floating-point-summation-accurate-to-full-p/ Not saying they're applicable here, just saying that if anyone wants to fully understand this, it's a lot easier to read Python code ;-) `msum` there is closest to Python's math.fsum(). From tjreedy at udel.edu Fri Sep 20 23:15:05 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 20 Sep 2013 17:15:05 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130920094854.GO19939@ando> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> Message-ID: On 9/20/2013 5:48 AM, Steven D'Aprano wrote: > py> class Seq: > ... def __getitem__(self, index): > ... if 0 <= index < 5: return index+1000 > ... raise IndexError > ... > py> s = Seq() > py> isinstance(s, Iterable) > False > py> list(s) # definitely iterable > [1000, 1001, 1002, 1003, 1004] I tested and iter() recognizes Seqs as iterables: for i in iter(Seq()): print(i) It does, however, wrap them in an adaptor iterator class >>> type(iter(Seq())) (which I was not really aware of before ;-) with proper __iter__ and __next__ methods >>> si is iter(si) True >>> next(si) 1000 So I agree that collections.Iterable is limited relative to glossary and Python definition. The glossary might say that the older __getitem__ protocol is semi-deprecated (it is no longer used directly) but is adapted for back compatibility. The problem with the protocol is that an iteration __getitem__ may be a fake __getitem__ in that it ignores *index* (because it calculates the next item from stored data). A fake-getitem iterable, if it also had __len__, would look like a Sequence even though it really is not, because it cannot be properly indexed. Such iterables are likely to not be reiterable. -- Terry Jan Reedy From raymond.hettinger at gmail.com Fri Sep 20 23:48:45 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 20 Sep 2013 14:48:45 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> Message-ID: <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> On Sep 20, 2013, at 2:15 PM, Terry Reedy wrote: > . The glossary might say that the older __getitem__ protocol is semi-deprecated (it is no longer used directly) but is adapted for back compatibility. It is NOT deprecated. People use and rely on this behavior. It is a guaranteed behavior. Please don't use the glossary as a place to introduce changes to the language. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Sep 20 23:53:43 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 21 Sep 2013 00:53:43 +0300 Subject: [Python-ideas] Numerical instability was: Re: Introduce collections.Reiterable In-Reply-To: References: Message-ID: 20.09.13 13:45, Oscar Benjamin ???????(??): > If you know of a one-pass algorithm (or a way to improve the > implementation I showed) that is as accurate as either the two_pass or > three_pass methods I'd be very interested to see it (I'm sure Steven > would be as well). import sys fmin = sys.float_info.min finvmin = int(1 / sys.float_info.min) def i2f1(i): return i // finvmin + (i % finvmin) * fmin def i2f2(i): return i2f1(i // finvmin) + (i % finvmin) * fmin * fmin def variance_incremental_exact(data): n = 0 sumx = 0 sumx2 = 0 for x in data: i = int(x) x = i * finvmin + int(round((x - i) * finvmin)) n += 1 sumx += x sumx2 += x * x ss = sumx2 * n - sumx * sumx d = n * (n - 1) return i2f2(ss // d) + i2f2(ss % d) / float(d) From mistersheik at gmail.com Fri Sep 20 23:56:01 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 20 Sep 2013 17:56:01 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> Message-ID: On Fri, Sep 20, 2013 at 5:48 PM, Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > > On Sep 20, 2013, at 2:15 PM, Terry Reedy wrote: > > . The glossary might say that the older __getitem__ protocol is > semi-deprecated (it is no longer used directly) but is adapted for back > compatibility. > > > It is NOT deprecated. People use and rely on this behavior. It is a > guaranteed behavior. Please don't use the glossary as a place to introduce > changes to the language. > Just curious, but who uses __getitem__ to implement an iterable that's not a sequence? > > > Raymond > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Sat Sep 21 00:00:30 2013 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Sat, 21 Sep 2013 08:00:30 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130920111000.GQ19939@ando> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> Message-ID: On 20 September 2013 21:10, Steven D'Aprano wrote: > > py> class Seq: > ... def __getitem__(self, index): > ... if 0 <= index < 5: return index+1000 > ... raise IndexError > ... > py> s = Seq() > py> isinstance(s, Iterable) # The ABC claims Seq is not iterable. > False > py> for x in s: # But it actually is. > ... print(x) > ... > 1000 > 1001 > 1002 > 1003 > 1004 > > > Can anyone convince me this is not a bug in the Iterable ABC? > I think there is a distinction here between collections.Iterable (as a defined ABC) and something that is "iterable" (lowercase "i"). As you've noted, an "iterable" is "An object capable of returning its members one at a time". So I think a valid definition of reiterable (barring pathological cases) is: obj is not iter(obj) (assuming of course that obj is iterable). Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AM D64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> class Seq: ... def __getitem__(self, index): ... if 0 <= index < 5: ... return index+1000 ... raise IndexError ... >>> s = Seq() >>> s is iter(s) False >>> i = iter(s) >>> i is iter(i) True >>> t = () >>> t is iter(t) False >>> i = iter(t) >>> i is iter(i) True >>> Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Sep 21 00:02:34 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 20 Sep 2013 18:02:34 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <8738oz4eq9.fsf@uwakimon.sk.tsukuba.ac.jp> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <023C2EA0-9716-4FE1-888D-C556B8917FA1@yahoo.com> <874n9g3wgc.fsf@uwakimon.sk.tsukuba.ac.jp> <8738oz4eq9.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Sep 20, 2013 at 12:53 PM, Stephen J. Turnbull wrote: > Neil Girdhar writes: > > > Many different solutions would fix the problems I've seen. My > > suggestion is that Reiterable should be define as an iteratable for > > which calling the __iter__ method yields the same elements in the > > same order irrespective of whether __iter__ is called while a > > previously returned iterator is still iterating. > > > Correct me if I'm wrong, but views on dicts are reiterable. > > For the same reason that sequences are: a view is not an iterator, so > every time you iterate it, it gets passed to iter, and you get a new > iterator, which then iterates. > > This is *why* Nick says that "isinstance(x, Iterable) and not > isinstance(x, Iterator)" is the test you want. I can't speak for Nick > on Steven A's example of an object with a __getitem__ taking a numeric > argument that isn't an Iterable but is iterable, but I think that > falls under "consenting adults" aka "if you're afraid it will hurt, > don't". > > I want that test if the documentation will promise that that test is supposed to be right. > > Your second point that the method should be able to cheaply clone > > an iterator cheaply is precisely what I'd like to achieve with a > > "Reiterator" class like Antoine's. > > Well, I've kinda convinced myself that it isn't going to be easy to do > that, without changing the type. The problem is that __next__ is > (abstractly) a closure, and there's no way I know of to copy a > function (copy.copy just returns the function object unchanged). So > you'd need to expose the hidden state in the closure, and that is a > change of type. > > > It sounds like you're saying put the items into a list no matter > > what. > > No, I'm saying if you don't know if you may consume the iterable, you > should convert to iterator, clone the iterator, and iterate the > clone. But that probably requires a change of type, at which point > you may as well call it "Reiterable". > > > okay, so we're on the same page it sounds like. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Sep 21 01:48:48 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 20 Sep 2013 19:48:48 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> Message-ID: On 9/20/2013 6:00 PM, Tim Delaney wrote: > I think there is a distinction here between collections.Iterable (as a > defined ABC) and something that is "iterable" (lowercase "i"). As you've > noted, an "iterable" is "An object capable of returning its members one > at a time". > > So I think a valid definition of reiterable (barring pathological cases) is: > > obj is not iter(obj) If obj has a fake __getitem__, that will not work. class Cnt: def __init__(self, maxn): self.n = 0 self.maxn = maxn def __getitem__(self, dummy): n = self.n + 1 if n <= self.maxn: self.n = n return n else: raise IndexError c3 = Cnt(3) print(c3 is not iter(c3), list(c3), list(c3)) >>> True [1, 2, 3] [] Dismissing legal code as 'pathological', as more than one person has, does not cut it as a design principle. -- Terry Jan Reedy From tjreedy at udel.edu Sat Sep 21 01:59:25 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 20 Sep 2013 19:59:25 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> Message-ID: On 9/20/2013 5:48 PM, Raymond Hettinger wrote: > > On Sep 20, 2013, at 2:15 PM, Terry Reedy > > wrote: > >> . The glossary might say that the older __getitem__ protocol is >> semi-deprecated (it is no longer used directly) but is adapted for >> back compatibility. > > It is NOT deprecated. And I did not suggest that is was. It is, however, not fully supported in that collections. Iterable does not recognize __getitem__ iterables and the same will be true of code that uses Iterable. > People use and rely on this behavior. Are people still writing fake __getitem__ methods? (that are really next methods rather than random access methods). It believe that usage of the protocol to be informally deprecated in favor of __iter__ and __next__. -- Terry Jan Reedy From timothy.c.delaney at gmail.com Sat Sep 21 02:05:35 2013 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Sat, 21 Sep 2013 10:05:35 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> Message-ID: On 21 September 2013 09:48, Terry Reedy wrote: > On 9/20/2013 6:00 PM, Tim Delaney wrote: > > I think there is a distinction here between collections.Iterable (as a >> defined ABC) and something that is "iterable" (lowercase "i"). As you've >> noted, an "iterable" is "An object capable of returning its members one >> at a time". >> >> So I think a valid definition of reiterable (barring pathological cases) >> is: >> >> obj is not iter(obj) >> > > If obj has a fake __getitem__, that will not work. > > class Cnt: > def __init__(self, maxn): > self.n = 0 > self.maxn = maxn > def __getitem__(self, dummy): > n = self.n + 1 > if n <= self.maxn: > self.n = n > return n > else: > raise IndexError > > c3 = Cnt(3) > print(c3 is not iter(c3), list(c3), list(c3)) > >>> > True [1, 2, 3] [] > > Dismissing legal code as 'pathological', as more than one person has, does > not cut it as a design principle. To me, that is a reiterable. It might not give the same results each time through, but you can iterate, it stops, then you can iterate over it again - it won't raise an exception trying to do so. So not what I would consider a pathological case - though definitely an unusual case and one that obviously wouldn't work in many situations that require reiterables to return the same values in the same order each time through. So we've got two classes of reiterables here - anything that can be iterated through, and then iterated through again, for which obj is not iter(obj) will work in all but what I consider to be pathological cases; - iterables that can be iterated through multiple times, returning the same objects in the same order each time through, for which I don't think a test is possible. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Sat Sep 21 02:18:26 2013 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Sat, 21 Sep 2013 10:18:26 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> Message-ID: On 21 September 2013 10:05, Tim Delaney wrote: > >> Dismissing legal code as 'pathological', as more than one person has, >> does not cut it as a design principle. > > > To me, that is a reiterable. It might not give the same results each time > through, but you can iterate, it stops, then you can iterate over it again > - it won't raise an exception trying to do so. So not what I would consider > a pathological case - though definitely an unusual case and one that > obviously wouldn't work in many situations that require reiterables to > return the same values in the same order each time through. > > So we've got two classes of reiterables here > > - anything that can be iterated through, and then iterated through again, > for which obj is not iter(obj) will work in all but what I consider to be > pathological cases; > > - iterables that can be iterated through multiple times, returning the > same objects in the same order each time through, for which I don't think a > test is possible. > Also, pathological is probably not the best term to use. Instead, substitute "deliberately breaks a well-established protocol". It may make sense to do so in certain circumstances, but you can't expect anyone else to play nice with you if you don't play nice with them. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Sat Sep 21 02:34:22 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 20 Sep 2013 17:34:22 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> Message-ID: <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> On Sep 20, 2013, at 4:59 PM, Terry Reedy wrote: >>> . The glossary might say that the older __getitem__ protocol is >>> semi-deprecated (it is no longer used directly) but is adapted for >>> back compatibility. >> >> It is NOT deprecated. > > And I did not suggest that is was. It is, however, not fully supported in that collections. Iterable does not recognize __getitem__ iterables and the same will be true of code that uses Iterable. The collections ABCs are all just a subset of things real collections do. For example, there is no slicing support. This was intentional. To some degree, the only test of whether something is iterable is to call iter() on it and see what happens. With Python's __getattr__ method, the only way to test for many behaviors is to attempt to call invoke them to see what happens. That is why hasattr() has to invoke getattr(). Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Sep 21 02:40:23 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 20 Sep 2013 20:40:23 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> Message-ID: On Fri, Sep 20, 2013 at 8:34 PM, Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > > On Sep 20, 2013, at 4:59 PM, Terry Reedy wrote: > > . The glossary might say that the older __getitem__ protocol is > semi-deprecated (it is no longer used directly) but is adapted for > back compatibility. > > > It is NOT deprecated. > > > And I did not suggest that is was. It is, however, not fully supported in > that collections. Iterable does not recognize __getitem__ iterables and the > same will be true of code that uses Iterable. > > > The collections ABCs are all just a subset of things real collections do. > For example, there is no slicing support. This was intentional. > > To some degree, the only test of whether something is iterable is to call > iter() on it and see what happens. With Python's __getattr__ method, the > only way to test for many behaviors is to attempt to call invoke them to > see what happens. That is why hasattr() has to invoke getattr(). > > Is that how you see PEP 3119? It states that the "standardized test" if something is iterable is precisely to use isinstance(x, collections.Iterable), which is how I read these paragraphs: On the other hand, one of the criticisms of inspection by classic OOP theorists is the lack of formalisms and the ad hoc nature of what is being inspected. In a language such as Python, in which almost any aspect of an object can be reflected and directly accessed by external code, there are many different ways to test whether an object conforms to a particular protocol or not. For example, if asking 'is this object a mutable sequence container?', one can look for a base class of 'list', or one can look for a method named '__getitem__'. But note that although these tests may seem obvious, neither of them are correct, as one generates false negatives, and the other false positives. The generally agreed-upon remedy is to standardize the tests, and group them into a formal arrangement. This is most easily done by associating with each class a set of standard testable properties, either via the inheritance mechanism or some other means. Each test carries with it a set of promises: it contains a promise about the general behavior of the class, and a promise as to what other class methods will be available. This PEP proposes a particular strategy for organizing these tests known as Abstract Base Classes, or ABC. ABCs are simply Python classes that are added into an object's inheritance tree to signal certain features of that object to an external inspector. Tests are done using isinstance(), and the presence of a particular ABC means that the test has passed. > > Raymond > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Sep 21 02:52:21 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 21 Sep 2013 10:52:21 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> Message-ID: <20130921005221.GV19939@ando> On Fri, Sep 20, 2013 at 07:48:48PM -0400, Terry Reedy wrote: > On 9/20/2013 6:00 PM, Tim Delaney wrote: > > >I think there is a distinction here between collections.Iterable (as a > >defined ABC) and something that is "iterable" (lowercase "i"). As you've > >noted, an "iterable" is "An object capable of returning its members one > >at a time". > > > >So I think a valid definition of reiterable (barring pathological cases) > >is: > > > > obj is not iter(obj) > > If obj has a fake __getitem__, that will not work. I don't understand what is "fake" about the following example. It is a *calculated* __getitem__, but that is perfectly legitimate. I suspect it is a buggy calculation, since obj[0] == obj[0] returns False, but that's another story. To me, a "fake __getitem__" would be something like "__getitem__ = None", there only to fool hasattr() tests but not actually doing anything. So I'm not actually sure what you are getting at to call this "fake". > class Cnt: > def __init__(self, maxn): > self.n = 0 > self.maxn = maxn > def __getitem__(self, dummy): > n = self.n + 1 > if n <= self.maxn: > self.n = n > return n > else: > raise IndexError > > c3 = Cnt(3) > print(c3 is not iter(c3), list(c3), list(c3)) > >>> > True [1, 2, 3] [] > > Dismissing legal code as 'pathological', as more than one person has, > does not cut it as a design principle. When I call something "pathological", I don't necessarily mean it is bad code. I mean it in the mathematical sense of being *either* bad/harmful or unexpected/unintuitive: https://en.wikipedia.org/wiki/Pathological_%28mathematics%29 Perhaps I should use the term "exceptional" rather than "pathological", but that carries it's own baggage. For instance, infinite iterators are (in my usage) pathological. You can't pass them to list(), but they are very useful in practice and shouldn't be dismissed as necessarily harmful. The point is, I don't expect general-purpose Python functions to *necessarily* deal with every pathological/exceptional case. It is no fault of list() that it cannot convert an infinite iterator to a list, nor should list() include special code to detect and avoid infinite iterators, even if it could, which it cannot. Cycles in lists are another example of pathology, but in this case, list repr *should* (and does) deal with it correctly: py> L = [] py> L.append(L) py> print(L) [[...]] -- Steven From stephen at xemacs.org Sat Sep 21 04:12:12 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 21 Sep 2013 11:12:12 +0900 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> Message-ID: <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > Dismissing legal code as 'pathological', as more than one person has, > does not cut it as a design principle. But you don't even need to write a class with __getitem__() to get that behavior. >>> l = [11, 12, 13] >>> for i in l: ... print(i) ... if i%2 == 0: ... l.remove(i) ... 11 12 >>> l [11, 13] >>> Of course the iteration itself is probably buggy (ie, the writer didn't mean to skip printing '13'), but in general iterables can change themselves. Neil himself seems to be of two minds about such cases. On the one hand, he said the above behavior is built in to list, so it's acceptable to him. (I think that's inconsistent: I would say the property of being completely consumed is built in to iterator, so it should be acceptable, too.) On the other hand, he's defined a reiterable as a collection that when iterated produces the same objects in the same order. Maybe what we really want is for copy.deepcopy to do the right thing with iterables. Then code that doesn't want to consume consumable iterables can do a deepcopy (including replication of the closed-over state of __next__() for iterators) before iterating. Or perhaps the right thing is a copy.itercopy that creates a new composite object as a shallow copy of everything except that it clones the state of __next__() in case the object was an iterator to start with. From stephen at xemacs.org Sat Sep 21 04:41:10 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 21 Sep 2013 11:41:10 +0900 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> Message-ID: <87y56q3ni1.fsf@uwakimon.sk.tsukuba.ac.jp> Tim Delaney writes: > Also, pathological is probably not the best term to use. Instead, > substitute "deliberately breaks a well-established protocol". Note that in Neil's use case (the OP) it's not deliberate. His function receives an iterable, it naively iterates it and (if an iterator) consumes it, and then some other function loses. Silently. Also, as long as __getitem__(0) succeeds, this *is* the "sequence protocol". (A Sequence also has a __len__() method, but iterability doesn't depend on that.) I don't see why Python would deprecate this. For example, consider the sequence of factors of integers: [(1,2), (1,3), (1,2,2,4), (1,5), (1,2,3,6), ...]. Factorization being in general a fairly expensive operation, you might want to define this in terms of __getitem__() but __len__() is infinite. I admit this is a somewhat artificial example (I don't know of non-academic applications for this sequence, although factorization itself is very useful in applications like crypto). From ethan at stoneleaf.us Sat Sep 21 05:01:08 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 20 Sep 2013 20:01:08 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> Message-ID: <523D0BF4.5020404@stoneleaf.us> On 09/20/2013 05:40 PM, Neil Girdhar wrote: > > [...] Each test carries with it a set of promises: it contains a promise about the general behavior of > the class, and a promise as to what other class methods will be available. > > [...] Tests are done using isinstance(), and the presence of a particular ABC means that the test has passed. So if the test passes, you know you're good. Those paragraphs said nothing about the meaning of a failing test. -- ~Ethan~ From mistersheik at gmail.com Sat Sep 21 06:00:14 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 00:00:14 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <523D0BF4.5020404@stoneleaf.us> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> Message-ID: If someone allows their class to fail the standardized test for Iterable/Reiterable/Sequence, that class doesn't deserve to be treated as one. (Anyone can register their class as a subclass of the ABCs, or more simply inherit from one.) On Fri, Sep 20, 2013 at 11:01 PM, Ethan Furman wrote: > On 09/20/2013 05:40 PM, Neil Girdhar wrote: > >> >> [...] Each test carries with it a set of promises: it contains a promise >> about the general behavior of >> >> the class, and a promise as to what other class methods will be available. >> >> [...] Tests are done using isinstance(), and the presence of a >> particular ABC means that the test has passed. >> > > So if the test passes, you know you're good. Those paragraphs said > nothing about the meaning of a failing test. > > -- > ~Ethan~ > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit https://groups.google.com/d/** > topic/python-ideas/**OumiLGDwRWA/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe@**googlegroups.com > . > For more options, visit https://groups.google.com/**groups/opt_out > . > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Sep 21 06:09:04 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 21 Sep 2013 14:09:04 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> Message-ID: <20130921040904.GW19939@ando> On Sat, Sep 21, 2013 at 12:00:14AM -0400, Neil Girdhar wrote: > If someone allows their class to fail the standardized test for > Iterable/Reiterable/Sequence, that class doesn't deserve to be treated as > one. (Anyone can register their class as a subclass of the ABCs, or more > simply inherit from one.) This is Python, and duck-typing rules, not Java-like type checking. If you want a language with strict type checking designed by theorists, try Haskell. The ultimate test in Python of whether something is iterable or not is to try iterating over it, and see if it succeeds or not. If it iterates like a duck, that's good enough to be treated as a duck. -- Steven From mistersheik at gmail.com Sat Sep 21 06:16:41 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 00:16:41 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130921040904.GW19939@ando> References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> Message-ID: You're right that you should go ahead and use something however you want to. However, there are plenty of times where you can't do that, e.g., you want to know if something is callable before calling it, and similarly if something is reiterable before iterating it and exhausting. That is the purpose of collections.abc, and that's what I thought we were discussing. Could you make mistakes trying to look ahead like this? Sure. An object could appear callable only to raise NotImplementedError on calling it. Looking ahead does not have to be foolproof. This is Python, and of course (almost) *any test* can be fooled. That doesn't just go for reiterability, it goes for callability as well. Best, Neil On Sat, Sep 21, 2013 at 12:09 AM, Steven D'Aprano wrote: > On Sat, Sep 21, 2013 at 12:00:14AM -0400, Neil Girdhar wrote: > > If someone allows their class to fail the standardized test for > > Iterable/Reiterable/Sequence, that class doesn't deserve to be treated as > > one. (Anyone can register their class as a subclass of the ABCs, or more > > simply inherit from one.) > > This is Python, and duck-typing rules, not Java-like type checking. If > you want a language with strict type checking designed by theorists, try > Haskell. > > The ultimate test in Python of whether something is iterable or not is > to try iterating over it, and see if it succeeds or not. If it iterates > like a duck, that's good enough to be treated as a duck. > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Sep 21 06:20:39 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 00:20:39 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <87y56q3ni1.fsf@uwakimon.sk.tsukuba.ac.jp> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87y56q3ni1.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I can humbly suggest why Python would deprecate the sequence protocol: there "should be one obvious way" to answer iter(), and in my opinion that's the __iter__() method. I considered infinite iterators, and if you happen to have __getitem__ written, you can trivially write an __iter__ function as follows: def __iter__(self): return (self.__getitem__(x) for x in itertools.count()) Now your class will be Iterable in the abc sense, and no longer relies on the sequence protocol Best, Neil On Fri, Sep 20, 2013 at 10:41 PM, Stephen J. Turnbull wrote: > Tim Delaney writes: > > > Also, pathological is probably not the best term to use. Instead, > > substitute "deliberately breaks a well-established protocol". > > Note that in Neil's use case (the OP) it's not deliberate. His > function receives an iterable, it naively iterates it and (if an > iterator) consumes it, and then some other function loses. Silently. > > Also, as long as __getitem__(0) succeeds, this *is* the "sequence > protocol". (A Sequence also has a __len__() method, but iterability > doesn't depend on that.) > > I don't see why Python would deprecate this. For example, consider > the sequence of factors of integers: [(1,2), (1,3), (1,2,2,4), (1,5), > (1,2,3,6), ...]. Factorization being in general a fairly expensive > operation, you might want to define this in terms of __getitem__() but > __len__() is infinite. I admit this is a somewhat artificial example > (I don't know of non-academic applications for this sequence, although > factorization itself is very useful in applications like crypto). > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Sep 21 06:18:29 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 20 Sep 2013 21:18:29 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> Message-ID: <204C6E80-AF20-4D5C-B313-0B009E924859@yahoo.com> On Sep 20, 2013, at 14:15, Terry Reedy wrote: >> False >> py> list(s) # definitely iterable >> [1000, 1001, 1002, 1003, 1004] > > I tested and iter() recognizes Seqs as iterables: > > for i in iter(Seq()): print(i) > > > It does, however, wrap them in an adaptor iterator class What else did you expect? Sequences are iterables, but they aren't iterators. So calling iter on one can't return the sequence itself. >>>> type(iter(Seq())) > > (which I was not really aware of before ;-) with proper __iter__ and __next__ methods >>>> si is iter(si) > True >>>> next(si) > 1000 Having an __iter__ that returns itself and a __next__ is the definition of what an iterator is. And returning an iterator is the whole point of the iter function. So what else could it do in this case? Think about how you'd implement iter in pure python. You'd try to return its __iter__(), and on AttributeError, you'd return a generator. So the C implementation does the same thing, but, as usual, substitutes a custom C iterator for a generator. From mistersheik at gmail.com Sat Sep 21 06:23:54 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 00:23:54 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I appreciate the discussion illuminating various aspects of this I hadn't considered. Finally, what I think I want is for * all sequences * all views * numpy arrays to answer yes to reiterable, and * all generators to answer no to reiterable. Best, Neil On Fri, Sep 20, 2013 at 10:12 PM, Stephen J. Turnbull wrote: > Terry Reedy writes: > > > Dismissing legal code as 'pathological', as more than one person has, > > does not cut it as a design principle. > > But you don't even need to write a class with __getitem__() to get > that behavior. > > >>> l = [11, 12, 13] > >>> for i in l: > ... print(i) > ... if i%2 == 0: > ... l.remove(i) > ... > 11 > 12 > >>> l > [11, 13] > >>> > > Of course the iteration itself is probably buggy (ie, the writer > didn't mean to skip printing '13'), but in general iterables can > change themselves. > > Neil himself seems to be of two minds about such cases. On the one > hand, he said the above behavior is built in to list, so it's > acceptable to him. (I think that's inconsistent: I would say the > property of being completely consumed is built in to iterator, so it > should be acceptable, too.) On the other hand, he's defined a > reiterable as a collection that when iterated produces the same > objects in the same order. > > Maybe what we really want is for copy.deepcopy to do the right thing > with iterables. Then code that doesn't want to consume consumable > iterables can do a deepcopy (including replication of the closed-over > state of __next__() for iterators) before iterating. > > Or perhaps the right thing is a copy.itercopy that creates a new > composite object as a shallow copy of everything except that it clones > the state of __next__() in case the object was an iterator to start > with. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Sep 21 06:43:26 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 20 Sep 2013 21:43:26 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> Message-ID: On Sep 20, 2013, at 21:16, Neil Girdhar wrote: > You're right that you should go ahead and use something however you want to. However, there are plenty of times where you can't do that, e.g., you want to know if something is callable before calling it, Why? What's the harm in just calling it and handling the exception? And surely, if you really need to LBYL here, you need to know that it's callable with the argument you plan to pass it. There are good uses for checking if something is callable, but this isn't a good example. And it's very different from your other example. > and similarly if something is reiterable before iterating it and exhausting. This one is different. You can't just handle failure, because (a) there's no unambiguous sign of failure, and (b) it's too late to deal with it if you've already exhausted the iterator. However, if you just turn the test around, it _is_ syntactically checkable: if "isinstance(it, Iterator)", or "iter(it) is it" or "hasattr(it, __next__)" or "next(it)" doesn't raise... then you have to do a single-pass algorithm or tee the values or make a list or whatever. Either Reiterable is just Iterable and not Iterator (barring any flaws in the definition of Iterable, which is a separate problem), or it's not an abstract type. And if it's just Iterable and not Iterator, besides being complicated to implement (you can't inherit from the negation of a class), it's also more complicated to use. The obvious use case is: If you get an Iterator, you have to tee, make a list, use a one-pass algorithm instead of two-pass, whatever. Rewriting that instead as if you get an Iterable but it's not a Reiterable buys you nothing but verbosity. Turning it around so if you get a Reiterable you can skip the fallback just means a double negative that's harder to process. > That is the purpose of collections.abc, and that's what I thought we were discussing. Could you make mistakes trying to look ahead like this? Sure. An object could appear callable only to raise NotImplementedError on calling it. Looking ahead does not have to be foolproof. This is Python, and of course (almost) *any test* can be fooled. That doesn't just go for reiterability, it goes for callability as well. > > Best, > Neil > > > On Sat, Sep 21, 2013 at 12:09 AM, Steven D'Aprano wrote: >> On Sat, Sep 21, 2013 at 12:00:14AM -0400, Neil Girdhar wrote: >> > If someone allows their class to fail the standardized test for >> > Iterable/Reiterable/Sequence, that class doesn't deserve to be treated as >> > one. (Anyone can register their class as a subclass of the ABCs, or more >> > simply inherit from one.) >> >> This is Python, and duck-typing rules, not Java-like type checking. If >> you want a language with strict type checking designed by theorists, try >> Haskell. >> >> The ultimate test in Python of whether something is iterable or not is >> to try iterating over it, and see if it succeeds or not. If it iterates >> like a duck, that's good enough to be treated as a duck. >> >> >> -- >> Steven >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> -- >> >> --- >> You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Sep 21 06:52:29 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 00:52:29 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: We discussed this upthread: I only want "not iterator" if not iterator promises reiterability. Right now, we have what may be a happy accident that can easily be violated by someone else. Best, Neil On Sat, Sep 21, 2013 at 12:50 AM, Andrew Barnert wrote: > On Sep 20, 2013, at 21:23, Neil Girdhar wrote: > > I appreciate the discussion illuminating various aspects of this I hadn't > considered. Finally, what I think I want is for > * all sequences > * all views > * numpy arrays > to answer yes to reiterable, and > * all generators > to answer no to reiterable. > > > All sequences, views, and numpy arrays answer no to iterator (and so do > sets, mappings, etc.), and all generators answer yes (and so do the > iterators you get back from calling iter on a sequence, map, filter, your > favorite itertools function, etc.) > > So you just want "not iterator". Even Haskell doesn't attempt to provide > negative types like that. (And you can very easily show that it's iterator > that's the normal type: it's syntactically checkable in various ways--e.g., > it.hasattr('__next__'), but the only positive way to check reiterable is > not just semantic, but destructive.) > > Best, Neil > > On Fri, Sep 20, 2013 at 10:12 PM, Stephen J. Turnbull wrote: > >> Terry Reedy writes: >> >> > Dismissing legal code as 'pathological', as more than one person has, >> > does not cut it as a design principle. >> >> But you don't even need to write a class with __getitem__() to get >> that behavior. >> >> >>> l = [11, 12, 13] >> >>> for i in l: >> ... print(i) >> ... if i%2 == 0: >> ... l.remove(i) >> ... >> 11 >> 12 >> >>> l >> [11, 13] >> >>> >> >> Of course the iteration itself is probably buggy (ie, the writer >> didn't mean to skip printing '13'), but in general iterables can >> change themselves. >> >> Neil himself seems to be of two minds about such cases. On the one >> hand, he said the above behavior is built in to list, so it's >> acceptable to him. (I think that's inconsistent: I would say the >> property of being completely consumed is built in to iterator, so it >> should be acceptable, too.) On the other hand, he's defined a >> reiterable as a collection that when iterated produces the same >> objects in the same order. >> >> Maybe what we really want is for copy.deepcopy to do the right thing >> with iterables. Then code that doesn't want to consume consumable >> iterables can do a deepcopy (including replication of the closed-over >> state of __next__() for iterators) before iterating. >> >> Or perhaps the right thing is a copy.itercopy that creates a new >> composite object as a shallow copy of everything except that it clones >> the state of __next__() in case the object was an iterator to start >> with. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> -- >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> python-ideas+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Sep 21 06:50:25 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 20 Sep 2013 21:50:25 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sep 20, 2013, at 21:23, Neil Girdhar wrote: > I appreciate the discussion illuminating various aspects of this I hadn't considered. Finally, what I think I want is for > * all sequences > * all views > * numpy arrays > to answer yes to reiterable, and > * all generators > to answer no to reiterable. All sequences, views, and numpy arrays answer no to iterator (and so do sets, mappings, etc.), and all generators answer yes (and so do the iterators you get back from calling iter on a sequence, map, filter, your favorite itertools function, etc.) So you just want "not iterator". Even Haskell doesn't attempt to provide negative types like that. (And you can very easily show that it's iterator that's the normal type: it's syntactically checkable in various ways--e.g., it.hasattr('__next__'), but the only positive way to check reiterable is not just semantic, but destructive.) > Best, Neil > > On Fri, Sep 20, 2013 at 10:12 PM, Stephen J. Turnbull wrote: >> Terry Reedy writes: >> >> > Dismissing legal code as 'pathological', as more than one person has, >> > does not cut it as a design principle. >> >> But you don't even need to write a class with __getitem__() to get >> that behavior. >> >> >>> l = [11, 12, 13] >> >>> for i in l: >> ... print(i) >> ... if i%2 == 0: >> ... l.remove(i) >> ... >> 11 >> 12 >> >>> l >> [11, 13] >> >>> >> >> Of course the iteration itself is probably buggy (ie, the writer >> didn't mean to skip printing '13'), but in general iterables can >> change themselves. >> >> Neil himself seems to be of two minds about such cases. On the one >> hand, he said the above behavior is built in to list, so it's >> acceptable to him. (I think that's inconsistent: I would say the >> property of being completely consumed is built in to iterator, so it >> should be acceptable, too.) On the other hand, he's defined a >> reiterable as a collection that when iterated produces the same >> objects in the same order. >> >> Maybe what we really want is for copy.deepcopy to do the right thing >> with iterables. Then code that doesn't want to consume consumable >> iterables can do a deepcopy (including replication of the closed-over >> state of __next__() for iterators) before iterating. >> >> Or perhaps the right thing is a copy.itercopy that creates a new >> composite object as a shallow copy of everything except that it clones >> the state of __next__() in case the object was an iterator to start >> with. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> -- >> >> --- >> You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Sat Sep 21 06:54:50 2013 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 21 Sep 2013 14:54:50 +1000 Subject: [Python-ideas] Introduce collections.Reiterable References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> Message-ID: <7wpps2wz8l.fsf@benfinney.id.au> Neil Girdhar writes: > However, there are plenty of times where you can't do that, e.g., you > want to know if something is callable before calling it What is a concrete example of *needing* to know whether an object is callable? Why not just use the object *as if it is* callable, and the TypeError will propagate back to whoever fed you the object if it's not? > and similarly if something is reiterable before iterating it and > exhausting. I have somewhat more sympathy for this desire; duck typing doesn't work so well for this, because by the time the iterable is exhausted it's too late to deal with its inability to re-start. Still, though, this is the kind of division of responsibility that makes a good program: tell the user of your code (in the docstring of your class or function) that you require a sequence or some other re-iterable object. If you try something that fails on what object you've been given, that's the responsibility of the code that gave it to you. You can be nice by ensuring it'll fail in such a way the caller gets a meaningful exception. -- \ ?The number of UNIX installations has grown to 10, with more | `\ expected.? ?Unix Programmer's Manual, 2nd Ed., 1972-06-12 | _o__) | Ben Finney From mistersheik at gmail.com Sat Sep 21 06:54:36 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 00:54:36 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> Message-ID: I check for callable when accepting callbacks because I will call them much later and raising the error then is harder to track down. Like I said in the other mail, your alternative ways of checking reiterability have no corresponding guarantee that it should work. Checking other abcs are supposed to work according to pep 3118. Neil On Sat, Sep 21, 2013 at 12:43 AM, Andrew Barnert wrote: > On Sep 20, 2013, at 21:16, Neil Girdhar wrote: > > You're right that you should go ahead and use something however you want > to. However, there are plenty of times where you can't do that, e.g., you > want to know if something is callable before calling it, > > > Why? What's the harm in just calling it and handling the exception? And > surely, if you really need to LBYL here, you need to know that it's > callable with the argument you plan to pass it. > > There are good uses for checking if something is callable, but this isn't > a good example. And it's very different from your other example. > > and similarly if something is reiterable before iterating it and > exhausting. > > > This one is different. You can't just handle failure, because (a) there's > no unambiguous sign of failure, and (b) it's too late to deal with it if > you've already exhausted the iterator. > > However, if you just turn the test around, it _is_ syntactically > checkable: if "isinstance(it, Iterator)", or "iter(it) is it" or > "hasattr(it, __next__)" or "next(it)" doesn't raise... then you have to do > a single-pass algorithm or tee the values or make a list or whatever. > > Either Reiterable is just Iterable and not Iterator (barring any flaws in > the definition of Iterable, which is a separate problem), or it's not an > abstract type. > > And if it's just Iterable and not Iterator, besides being complicated to > implement (you can't inherit from the negation of a class), it's also more > complicated to use. The obvious use case is: If you get an Iterator, you > have to tee, make a list, use a one-pass algorithm instead of two-pass, > whatever. Rewriting that instead as if you get an Iterable but it's not a > Reiterable buys you nothing but verbosity. Turning it around so if you get > a Reiterable you can skip the fallback just means a double negative that's > harder to process. > > That is the purpose of collections.abc, and that's what I thought we were > discussing. Could you make mistakes trying to look ahead like this? Sure. > An object could appear callable only to raise NotImplementedError on > calling it. Looking ahead does not have to be foolproof. This is Python, > and of course (almost) *any test* can be fooled. That doesn't just go for > reiterability, it goes for callability as well. > > Best, > Neil > > > On Sat, Sep 21, 2013 at 12:09 AM, Steven D'Aprano wrote: > >> On Sat, Sep 21, 2013 at 12:00:14AM -0400, Neil Girdhar wrote: >> > If someone allows their class to fail the standardized test for >> > Iterable/Reiterable/Sequence, that class doesn't deserve to be treated >> as >> > one. (Anyone can register their class as a subclass of the ABCs, or >> more >> > simply inherit from one.) >> >> This is Python, and duck-typing rules, not Java-like type checking. If >> you want a language with strict type checking designed by theorists, try >> Haskell. >> >> The ultimate test in Python of whether something is iterable or not is >> to try iterating over it, and see if it succeeds or not. If it iterates >> like a duck, that's good enough to be treated as a duck. >> >> >> -- >> Steven >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> -- >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> python-ideas+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Sat Sep 21 07:04:30 2013 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 21 Sep 2013 15:04:30 +1000 Subject: [Python-ideas] Introduce collections.Reiterable References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> Message-ID: <7wli2qwysh.fsf@benfinney.id.au> Neil Girdhar writes: > I check for callable when accepting callbacks because I will call them > much later and raising the error then is harder to track down. Why is it harder to track down? That sounds like the problem to be fixed. -- \ ?Simplicity is prerequisite for reliability.? ?Edsger W. | `\ Dijkstra | _o__) | Ben Finney From mistersheik at gmail.com Sat Sep 21 07:08:28 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 01:08:28 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <7wli2qwysh.fsf@benfinney.id.au> References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> <7wli2qwysh.fsf@benfinney.id.au> Message-ID: Because the caller who sent the bad callable is no longer in the stack trace. On Sat, Sep 21, 2013 at 1:04 AM, Ben Finney wrote: > Neil Girdhar > writes: > > > I check for callable when accepting callbacks because I will call them > > much later and raising the error then is harder to track down. > > Why is it harder to track down? That sounds like the problem to be > fixed. > > -- > \ ?Simplicity is prerequisite for reliability.? ?Edsger W. | > `\ Dijkstra | > _o__) | > Ben Finney > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Sep 21 07:37:11 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 21 Sep 2013 07:37:11 +0200 Subject: [Python-ideas] Introduce collections.Reiterable References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> <7wli2qwysh.fsf@benfinney.id.au> Message-ID: <20130921073711.6313f9e1@fsol> On Sat, 21 Sep 2013 15:04:30 +1000 Ben Finney wrote: > Neil Girdhar > writes: > > > I check for callable when accepting callbacks because I will call them > > much later and raising the error then is harder to track down. > > Why is it harder to track down? That sounds like the problem to be > fixed. Well, there is no need to try and rehash this mantra. callable() was revived for a reason. I will suggest anyone wanting a LBYL vs. EAFP discussion to go discuss it on python-list, really ;-) python-ideas is not the place for the same old platonic language design discussions that everyone's been having for 10+ years. Regards Antoine. From ethan at stoneleaf.us Sat Sep 21 07:54:30 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 20 Sep 2013 22:54:30 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87y56q3ni1.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <523D3496.9050602@stoneleaf.us> On 09/20/2013 09:20 PM, Neil Girdhar wrote: > I can humbly suggest why Python would deprecate the sequence protocol: there "should be one obvious way" to answer > iter(), and in my opinion that's the __iter__() method. I considered infinite iterators, and if you happen to have > __getitem__ written, you can trivially write an __iter__ function as follows: 1) One Obvious Way != Only One Way (we can have both) 2) Deprecating (and removing) __getitem__ will break lots of code. It's not going to happen. -- ~Ethan~ From mistersheik at gmail.com Sat Sep 21 08:21:21 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 02:21:21 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <523D3496.9050602@stoneleaf.us> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87y56q3ni1.fsf@uwakimon.sk.tsukuba.ac.jp> <523D3496.9050602@stoneleaf.us> Message-ID: No one suggested removing __getitem__. Some people have suggested deprecating (without removing) the sequence protocol. Do you know of any object that relies on the sequence protocol? That is, that implements __getitem__ without implementing __iter__ (or using a mixin like collections.Sequence to provide __iter__)? On Sat, Sep 21, 2013 at 1:54 AM, Ethan Furman wrote: > On 09/20/2013 09:20 PM, Neil Girdhar wrote: > >> I can humbly suggest why Python would deprecate the sequence protocol: >> there "should be one obvious way" to answer >> iter(), and in my opinion that's the __iter__() method. I considered >> infinite iterators, and if you happen to have >> __getitem__ written, you can trivially write an __iter__ function as >> follows: >> > > 1) One Obvious Way != Only One Way (we can have both) > > 2) Deprecating (and removing) __getitem__ will break lots of code. It's > not going to happen. > > -- > ~Ethan~ > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit https://groups.google.com/d/** > topic/python-ideas/**OumiLGDwRWA/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe@**googlegroups.com > . > For more options, visit https://groups.google.com/**groups/opt_out > . > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat Sep 21 08:25:08 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 21 Sep 2013 15:25:08 +0900 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> Message-ID: <87siwy3d4r.fsf@uwakimon.sk.tsukuba.ac.jp> Neil Girdhar writes: > You're right that you should go ahead and use something however you > want to. ?However, there are plenty of times where you can't do that, > e.g., you want to know if something is callable before calling it, > and similarly if something is reiterable before iterating it and > exhausting. ?That is the purpose of collections.abc, I don't think so. It's documented that way: This module provides abstract base classes that can be used to test whether a class provides a particular interface; for example, whether it is hashable or whether it is a mapping. But I wouldn't do explicit testing with isinstance, but rather use implicit assertions (at instantiation time) by deriving from the ABC. I don't see how Reiterable could be adapted to this style of programming because the API of iterables is basically fixed (support __iter__ or __getitem__). > and that's what I thought we were discussing. You were, I agree. But you proposed a new API, which pretty well guarantees many discussants will take a more global view, like "do the use cases justify this addition?" Another such question is "what exactly is the specification?" Tim Delany, for example, AIUI doesn't have a problem with saying that any iterable is reiterable, because it won't raise an exception if the program iterates it after exhaustion. It simply does nothing, but in some cases that's perfectly acceptable. I know you disagree, and I don't think that's a useful definition. Still it demonstrates the wide range of opinions on what "reiterable" can or should guarantee. From mistersheik at gmail.com Sat Sep 21 08:56:29 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 02:56:29 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <87siwy3d4r.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> <87siwy3d4r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sat, Sep 21, 2013 at 2:25 AM, Stephen J. Turnbull wrote: > Neil Girdhar writes: > > > You're right that you should go ahead and use something however you > > want to. However, there are plenty of times where you can't do that, > > e.g., you want to know if something is callable before calling it, > > and similarly if something is reiterable before iterating it and > > exhausting. That is the purpose of collections.abc, > > I don't think so. It's documented that way: > > This module provides abstract base classes that can be used to > test whether a class provides a particular interface; for example, > whether it is hashable or whether it is a mapping. > > But I wouldn't do explicit testing with isinstance, but rather use > implicit assertions (at instantiation time) by deriving from the ABC. > I don't see how Reiterable could be adapted to this style of > programming because the API of iterables is basically fixed (support > __iter__ or __getitem__). > Wouldn't you need to define a new ABC to do this? Here's one possibility with Tim's iterable and not iterator: class Reiterable(collections.Iterable, metaclass=collections.abc.ABCMeta): @classmethod def __subclasshook__(cls, subclass): if (collections.Iterable.__subclasshook__(subclass) and not issubclass(subclass, collections.Iterator)): return True return NotImplemented for obj in [list(), tuple(), dict(), set(), range(4), (x * x for x in range(4))]: print(type(obj), isinstance(obj, Reiterable)) Another possibility would be to explicitly register Views and so on using Reiterable.register(...) > > > and that's what I thought we were discussing. > > You were, I agree. But you proposed a new API, which pretty well > guarantees many discussants will take a more global view, like "do the > use cases justify this addition?" > It's a good point. I think if I'm the only with this problem, then the answer is clearly no. I will just cast to list and so what if it's a little bit slower in some cases. How could I know that I was the only one with this problem? > Another such question is "what exactly is the specification?" Tim > Delany, for example, AIUI doesn't have a problem with saying that any > iterable is reiterable, because it won't raise an exception if the > program iterates it after exhaustion. It simply does nothing, but in > some cases that's perfectly acceptable. I know you disagree, and I > don't think that's a useful definition. Still it demonstrates the > wide range of opinions on what "reiterable" can or should guarantee. > Yes, agreed that there are a wide range of reasonable opinions. Best, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sat Sep 21 08:36:57 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 20 Sep 2013 23:36:57 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87y56q3ni1.fsf@uwakimon.sk.tsukuba.ac.jp> <523D3496.9050602@stoneleaf.us> Message-ID: <523D3E89.7090406@stoneleaf.us> On 09/20/2013 11:21 PM, Neil Girdhar wrote: > No one suggested removing __getitem__. Some people have suggested deprecating (without removing) the sequence protocol. > Do you know of any object that relies on the sequence protocol? That is, that implements __getitem__ without > implementing __iter__ (or using a mixin like collections.Sequence to provide __iter__)? The goal of deprecation is removal. Any item that supports index access, such as lists, tuples, and dictionaries, needs __getitem__. Iteration is not the only way to access an iterable object. -- ~Ethan~ From stephen at xemacs.org Sat Sep 21 09:02:35 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 21 Sep 2013 16:02:35 +0900 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87y56q3ni1.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87r4ci3bec.fsf@uwakimon.sk.tsukuba.ac.jp> Neil Girdhar writes: > I can humbly suggest why Python would deprecate the sequence > protocol: there "should be one obvious way" to answer iter(), and > in my opinion that's the ?__iter__() ?method. ?I considered > infinite iterators, and if you happen to have ?__getitem__ written, > you can trivially write an __iter__ function Better yet, Python can do it for me. That's *why* it makes sense for iter() to accept an object with a __getitem__ method. I wonder if it would be possible for Iterable to provide an __iter__ method at instantiation if and only if __iter__ is not defined in the derived class and __getitem__ is. Then >>> class GoodIterable1(Iterable): ... def __iter__(self): ... return iter([]) ... >>> gi1 = GoodIterable1(Iterable) >>> dir(gi1) [..., __iter__, ...] >>> class GoodIterable2(Iterable) ... def __getitem__(self, i): ... return [][0] ... >>> dir(gi2) [..., __getitem__, __iter__, ...] # it's magic! >>> class BadIterable(Iterable): ... pass ... >>> bi = BadIterable() # ordinary mixin __iter__ wouldn't raise # but magic one does TypeError: can't instantiate abstract class BadIterable with abstract methods __iter__ >>> Although I guess the ordinary mixin will raise anyway when it tries to call __getitem__. From mistersheik at gmail.com Sat Sep 21 09:03:35 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 03:03:35 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <523D3E89.7090406@stoneleaf.us> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87y56q3ni1.fsf@uwakimon.sk.tsukuba.ac.jp> <523D3496.9050602@stoneleaf.us> <523D3E89.7090406@stoneleaf.us> Message-ID: We're not talking about deprecating __getitem__. We're talking about deprecating the "sequence protocol" whereby iter(obj) falls back to calling __getitem__ when an object doesn't have __iter__. No one is talking about removing __getitem__! Neil On Sat, Sep 21, 2013 at 2:36 AM, Ethan Furman wrote: > On 09/20/2013 11:21 PM, Neil Girdhar wrote: > >> No one suggested removing __getitem__. Some people have suggested >> deprecating (without removing) the sequence protocol. >> Do you know of any object that relies on the sequence protocol? That >> is, that implements __getitem__ without >> implementing __iter__ (or using a mixin like collections.Sequence to >> provide __iter__)? >> > > The goal of deprecation is removal. > > Any item that supports index access, such as lists, tuples, and > dictionaries, needs __getitem__. Iteration is not the only way to access > an iterable object. > > > -- > ~Ethan~ > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit https://groups.google.com/d/** > topic/python-ideas/**OumiLGDwRWA/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe@**googlegroups.com > . > For more options, visit https://groups.google.com/**groups/opt_out > . > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Sep 21 09:04:50 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 21 Sep 2013 00:04:50 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sep 20, 2013, at 21:52, Neil Girdhar wrote: > We discussed this upthread: I only want "not iterator" if not iterator promises reiterability. Right now, we have what may be a happy accident that can easily be violated by someone else. And if you define your new ABC, it can be just as easily violated by someone else. In fact, it will be violated in the exact _same_ cases. There's no check you can do besides the reverse of the checks done by iterator. More importantly, it's not just "a happy accident". I've asked repeatedly if anyone can come up with a single example of a non-iterator, non-reiterable iterator, or even imagine what one would look like, and nobody's come up with one. And it's not like iterators are some new feature nobody's had time to explore yet. So, in order to solve a problem that doesn't exist, you want to add a new feature that wouldn't solve it any better than what we have today. > Best, > Neil > > > On Sat, Sep 21, 2013 at 12:50 AM, Andrew Barnert wrote: >> On Sep 20, 2013, at 21:23, Neil Girdhar wrote: >> >>> I appreciate the discussion illuminating various aspects of this I hadn't considered. Finally, what I think I want is for >>> * all sequences >>> * all views >>> * numpy arrays >>> to answer yes to reiterable, and >>> * all generators >>> to answer no to reiterable. >> >> All sequences, views, and numpy arrays answer no to iterator (and so do sets, mappings, etc.), and all generators answer yes (and so do the iterators you get back from calling iter on a sequence, map, filter, your favorite itertools function, etc.) >> >> So you just want "not iterator". Even Haskell doesn't attempt to provide negative types like that. (And you can very easily show that it's iterator that's the normal type: it's syntactically checkable in various ways--e.g., it.hasattr('__next__'), but the only positive way to check reiterable is not just semantic, but destructive.) >> >>> Best, Neil >>> >>> On Fri, Sep 20, 2013 at 10:12 PM, Stephen J. Turnbull wrote: >>>> Terry Reedy writes: >>>> >>>> > Dismissing legal code as 'pathological', as more than one person has, >>>> > does not cut it as a design principle. >>>> >>>> But you don't even need to write a class with __getitem__() to get >>>> that behavior. >>>> >>>> >>> l = [11, 12, 13] >>>> >>> for i in l: >>>> ... print(i) >>>> ... if i%2 == 0: >>>> ... l.remove(i) >>>> ... >>>> 11 >>>> 12 >>>> >>> l >>>> [11, 13] >>>> >>> >>>> >>>> Of course the iteration itself is probably buggy (ie, the writer >>>> didn't mean to skip printing '13'), but in general iterables can >>>> change themselves. >>>> >>>> Neil himself seems to be of two minds about such cases. On the one >>>> hand, he said the above behavior is built in to list, so it's >>>> acceptable to him. (I think that's inconsistent: I would say the >>>> property of being completely consumed is built in to iterator, so it >>>> should be acceptable, too.) On the other hand, he's defined a >>>> reiterable as a collection that when iterated produces the same >>>> objects in the same order. >>>> >>>> Maybe what we really want is for copy.deepcopy to do the right thing >>>> with iterables. Then code that doesn't want to consume consumable >>>> iterables can do a deepcopy (including replication of the closed-over >>>> state of __next__() for iterators) before iterating. >>>> >>>> Or perhaps the right thing is a copy.itercopy that creates a new >>>> composite object as a shallow copy of everything except that it clones >>>> the state of __next__() in case the object was an iterator to start >>>> with. >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> >>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. >>>> To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe at googlegroups.com. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Sep 21 09:08:05 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 21 Sep 2013 00:08:05 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> <87siwy3d4r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <9997B519-F903-4E16-A3FD-E830AEB0BCE0@yahoo.com> On Sep 20, 2013, at 23:56, Neil Girdhar wrote: >> You were, I agree. But you proposed a new API, which pretty well >> guarantees many discussants will take a more global view, like "do the >> use cases justify this addition?" > > It's a good point. I think if I'm the only with this problem, then the answer is clearly no. I will just cast to list and so what if it's a little bit slower in some cases. How could I know that I was the only one with this problem? That seems more than a little stubborn. Today, you can create a list iff you're given an iterator. You'd prefer to write that in terms of creating a list iff you're given a non-reiterable iterable. And, if you can't have that, screw all your users, you'll just always make a list? And again, if you have an actual problem that iterator doesn't solve, I'd love to see it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Sep 21 09:21:24 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 03:21:24 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I'm happy with iterable and not iterator if it comes with a promise. Then my first ABC is what what I probably want. If not, then I think it's better to do something lke class Reiterable(collections.Iterable, metaclass=collections.abc.ABCMeta): @classmethod def __subclasshook__(cls, subclass): if (issubclass(subclass, collections.MappingView) or issubclass(subclass, collections.Sequence) or issubclass(subclass, collections.Set) or issubclass(subclass, collections.Mapping)): return True return NotImplemented Other classes can be added with register. On Sat, Sep 21, 2013 at 3:04 AM, Andrew Barnert wrote: > On Sep 20, 2013, at 21:52, Neil Girdhar wrote: > > We discussed this upthread: I only want "not iterator" if not iterator > promises reiterability. Right now, we have what may be a happy accident > that can easily be violated by someone else. > > > And if you define your new ABC, it can be just as easily violated by > someone else. In fact, it will be violated in the exact _same_ > cases. There's no check you can do besides the reverse of the checks done > by iterator. > > More importantly, it's not just "a happy accident". I've asked repeatedly > if anyone can come up with a single example of a non-iterator, > non-reiterable iterator, or even imagine what one would look like, and > nobody's come up with one. And it's not like iterators are some new feature > nobody's had time to explore yet. > > So, in order to solve a problem that doesn't exist, you want to add a new > feature that wouldn't solve it any better than what we have today. > > Best, > Neil > > > On Sat, Sep 21, 2013 at 12:50 AM, Andrew Barnert wrote: > >> On Sep 20, 2013, at 21:23, Neil Girdhar wrote: >> >> I appreciate the discussion illuminating various aspects of this I hadn't >> considered. Finally, what I think I want is for >> * all sequences >> * all views >> * numpy arrays >> to answer yes to reiterable, and >> * all generators >> to answer no to reiterable. >> >> >> All sequences, views, and numpy arrays answer no to iterator (and so do >> sets, mappings, etc.), and all generators answer yes (and so do the >> iterators you get back from calling iter on a sequence, map, filter, your >> favorite itertools function, etc.) >> >> So you just want "not iterator". Even Haskell doesn't attempt to provide >> negative types like that. (And you can very easily show that it's iterator >> that's the normal type: it's syntactically checkable in various ways--e.g., >> it.hasattr('__next__'), but the only positive way to check reiterable is >> not just semantic, but destructive.) >> >> Best, Neil >> >> On Fri, Sep 20, 2013 at 10:12 PM, Stephen J. Turnbull > > wrote: >> >>> Terry Reedy writes: >>> >>> > Dismissing legal code as 'pathological', as more than one person has, >>> > does not cut it as a design principle. >>> >>> But you don't even need to write a class with __getitem__() to get >>> that behavior. >>> >>> >>> l = [11, 12, 13] >>> >>> for i in l: >>> ... print(i) >>> ... if i%2 == 0: >>> ... l.remove(i) >>> ... >>> 11 >>> 12 >>> >>> l >>> [11, 13] >>> >>> >>> >>> Of course the iteration itself is probably buggy (ie, the writer >>> didn't mean to skip printing '13'), but in general iterables can >>> change themselves. >>> >>> Neil himself seems to be of two minds about such cases. On the one >>> hand, he said the above behavior is built in to list, so it's >>> acceptable to him. (I think that's inconsistent: I would say the >>> property of being completely consumed is built in to iterator, so it >>> should be acceptable, too.) On the other hand, he's defined a >>> reiterable as a collection that when iterated produces the same >>> objects in the same order. >>> >>> Maybe what we really want is for copy.deepcopy to do the right thing >>> with iterables. Then code that doesn't want to consume consumable >>> iterables can do a deepcopy (including replication of the closed-over >>> state of __next__() for iterators) before iterating. >>> >>> Or perhaps the right thing is a copy.itercopy that creates a new >>> composite object as a shallow copy of everything except that it clones >>> the state of __next__() in case the object was an iterator to start >>> with. >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> >>> -- >>> >>> --- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "python-ideas" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> python-ideas+unsubscribe at googlegroups.com. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Sep 21 09:21:55 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 03:21:55 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Whoa! I'm not trying to be stubborn!! I'm just suggesting that what we have today is find if the problem is isolated. On Sat, Sep 21, 2013 at 3:21 AM, Neil Girdhar wrote: > I'm happy with iterable and not iterator if it comes with a promise. Then > my first ABC is what what I probably want. If not, then I think it's > better to do something lke > > class Reiterable(collections.Iterable, > metaclass=collections.abc.ABCMeta): > @classmethod > def __subclasshook__(cls, subclass): > if (issubclass(subclass, collections.MappingView) > or issubclass(subclass, collections.Sequence) > or issubclass(subclass, collections.Set) > or issubclass(subclass, collections.Mapping)): > return True > return NotImplemented > > Other classes can be added with register. > > > On Sat, Sep 21, 2013 at 3:04 AM, Andrew Barnert wrote: > >> On Sep 20, 2013, at 21:52, Neil Girdhar wrote: >> >> We discussed this upthread: I only want "not iterator" if not iterator >> promises reiterability. Right now, we have what may be a happy accident >> that can easily be violated by someone else. >> >> >> And if you define your new ABC, it can be just as easily violated by >> someone else. In fact, it will be violated in the exact _same_ >> cases. There's no check you can do besides the reverse of the checks done >> by iterator. >> >> More importantly, it's not just "a happy accident". I've asked repeatedly >> if anyone can come up with a single example of a non-iterator, >> non-reiterable iterator, or even imagine what one would look like, and >> nobody's come up with one. And it's not like iterators are some new feature >> nobody's had time to explore yet. >> >> So, in order to solve a problem that doesn't exist, you want to add a new >> feature that wouldn't solve it any better than what we have today. >> >> Best, >> Neil >> >> >> On Sat, Sep 21, 2013 at 12:50 AM, Andrew Barnert wrote: >> >>> On Sep 20, 2013, at 21:23, Neil Girdhar wrote: >>> >>> I appreciate the discussion illuminating various aspects of this I >>> hadn't considered. Finally, what I think I want is for >>> * all sequences >>> * all views >>> * numpy arrays >>> to answer yes to reiterable, and >>> * all generators >>> to answer no to reiterable. >>> >>> >>> All sequences, views, and numpy arrays answer no to iterator (and so do >>> sets, mappings, etc.), and all generators answer yes (and so do the >>> iterators you get back from calling iter on a sequence, map, filter, your >>> favorite itertools function, etc.) >>> >>> So you just want "not iterator". Even Haskell doesn't attempt to provide >>> negative types like that. (And you can very easily show that it's iterator >>> that's the normal type: it's syntactically checkable in various ways--e.g., >>> it.hasattr('__next__'), but the only positive way to check reiterable is >>> not just semantic, but destructive.) >>> >>> Best, Neil >>> >>> On Fri, Sep 20, 2013 at 10:12 PM, Stephen J. Turnbull < >>> stephen at xemacs.org> wrote: >>> >>>> Terry Reedy writes: >>>> >>>> > Dismissing legal code as 'pathological', as more than one person has, >>>> > does not cut it as a design principle. >>>> >>>> But you don't even need to write a class with __getitem__() to get >>>> that behavior. >>>> >>>> >>> l = [11, 12, 13] >>>> >>> for i in l: >>>> ... print(i) >>>> ... if i%2 == 0: >>>> ... l.remove(i) >>>> ... >>>> 11 >>>> 12 >>>> >>> l >>>> [11, 13] >>>> >>> >>>> >>>> Of course the iteration itself is probably buggy (ie, the writer >>>> didn't mean to skip printing '13'), but in general iterables can >>>> change themselves. >>>> >>>> Neil himself seems to be of two minds about such cases. On the one >>>> hand, he said the above behavior is built in to list, so it's >>>> acceptable to him. (I think that's inconsistent: I would say the >>>> property of being completely consumed is built in to iterator, so it >>>> should be acceptable, too.) On the other hand, he's defined a >>>> reiterable as a collection that when iterated produces the same >>>> objects in the same order. >>>> >>>> Maybe what we really want is for copy.deepcopy to do the right thing >>>> with iterables. Then code that doesn't want to consume consumable >>>> iterables can do a deepcopy (including replication of the closed-over >>>> state of __next__() for iterators) before iterating. >>>> >>>> Or perhaps the right thing is a copy.itercopy that creates a new >>>> composite object as a shallow copy of everything except that it clones >>>> the state of __next__() in case the object was an iterator to start >>>> with. >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> >>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "python-ideas" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> python-ideas+unsubscribe at googlegroups.com. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Sep 21 10:02:17 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 21 Sep 2013 18:02:17 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130920111000.GQ19939@ando> <87y56q3ni1.fsf@uwakimon.sk.tsukuba.ac.jp> <523D3496.9050602@stoneleaf.us> Message-ID: <20130921080217.GX19939@ando> On Sat, Sep 21, 2013 at 02:21:21AM -0400, Neil Girdhar wrote: > No one suggested removing __getitem__. Some people have suggested > deprecating (without removing) the sequence protocol. Some people -- that would be you, I believe. What's the point of deprecating something if you have no intention of removing it? What's the point of deprecating something which works? It isn't like it's doing any harm. You'll just cause unnecessary code-churn in other people's working code. Removing broken, unfixable code -- sure. Removing working code just because it annoys some people's idea of purity? I'm against that. > Do you know of any > object that relies on the sequence protocol? That is, that implements > __getitem__ without implementing __iter__ (or using a mixin like > collections.Sequence to provide __iter__)? Not off the top of my head, but that doesn't mean there aren't masses of code that does so. Off the top of my head, I don't know of any code that relies on exception tracebacks being printed to stderr rather that stdout, but that doesn't mean we should feel free to change that on a whim. -- Steven From steve at pearwood.info Sat Sep 21 10:04:06 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 21 Sep 2013 18:04:06 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20130921080406.GY19939@ando> On Sat, Sep 21, 2013 at 12:23:54AM -0400, Neil Girdhar wrote: > I appreciate the discussion illuminating various aspects of this I hadn't > considered. Finally, what I think I want is for > * all sequences > * all views > * numpy arrays > to answer yes to reiterable, and > * all generators > to answer no to reiterable. Which brings us full circle to: if isinstance(obj, Iterable) and not isinstance(obj, Iterator): print("Is re-iterable") else: print("Is not re-iterable") which I believe satisfies your requirement. Can you show any standard, non-pathological type where this test fails to give the correct answer? If not, what exactly is the problem with just using that test? As far as I am concerned, not every one-line test needs an ABC. -- Steven From rymg19 at gmail.com Sat Sep 21 20:15:58 2013 From: rymg19 at gmail.com (Ryan) Date: Sat, 21 Sep 2013 13:15:58 -0500 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <7wpps2wz8l.fsf@benfinney.id.au> References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> <7wpps2wz8l.fsf@benfinney.id.au> Message-ID: <56a4d514-2f52-4dbe-a861-5298f4970101@email.android.com> I can still see why checking if its callable is a good idea in some cases. Say you call the callback in line 1024 of module mymod: self.call[item]() And someone hands over a string: TypeError: 'string' object is not callable And this is what's probably going through the person's head: Stupid Python! What did I do wrong now? Checking if it's callable works better: if not callable(self.call[item]): raise CallbackError('given callback %s must be callable' % str(item)) Now the user says: Ohhhhh....so that's what I did wrong!!! Ben Finney wrote: >Neil Girdhar >writes: > >> However, there are plenty of times where you can't do that, e.g., you >> want to know if something is callable before calling it > >What is a concrete example of *needing* to know whether an object is >callable? Why not just use the object *as if it is* callable, and the >TypeError will propagate back to whoever fed you the object if it's >not? > >> and similarly if something is reiterable before iterating it and >> exhausting. > >I have somewhat more sympathy for this desire; duck typing doesn't work >so well for this, because by the time the iterable is exhausted it's >too >late to deal with its inability to re-start. > >Still, though, this is the kind of division of responsibility that >makes >a good program: tell the user of your code (in the docstring of your >class or function) that you require a sequence or some other >re-iterable >object. If you try something that fails on what object you've been >given, that's the responsibility of the code that gave it to you. You >can be nice by ensuring it'll fail in such a way the caller gets a >meaningful exception. > >-- >\ ?The number of UNIX installations has grown to 10, with more | > `\ expected.? ?Unix Programmer's Manual, 2nd Ed., 1972-06-12 | >_o__) >| >Ben Finney > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat Sep 21 21:04:39 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 22 Sep 2013 04:04:39 +0900 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <56a4d514-2f52-4dbe-a861-5298f4970101@email.android.com> References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> <7wpps2wz8l.fsf@benfinney.id.au> <56a4d514-2f52-4dbe-a861-5298f4970101@email.android.com> Message-ID: <87mwn62dyw.fsf@uwakimon.sk.tsukuba.ac.jp> Ryan writes: > I can still see why checking if its callable is a good idea in some > cases. You can always rewrite the LBYL in EAFP form: try: self.call[item]() except TypeError as e: raise CallbackError(...) from e This is definitely preferred if you can enclose a whole suite in a try and expect the CallbackError to be infrequent, eg: try: while item in input: self.call[item]() except TypeError as e: raise CallbackError(...) from e However, what you probably really want to do (which is a better argument for LBYL, anyway) is if callable(callback): self.call[item] = callback else: what_would_jruser_do() From rymg19 at gmail.com Sat Sep 21 22:02:36 2013 From: rymg19 at gmail.com (Ryan) Date: Sat, 21 Sep 2013 15:02:36 -0500 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <87mwn62dyw.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> <7wpps2wz8l.fsf@benfinney.id.au> <56a4d514-2f52-4dbe-a861-5298f4970101@email.android.com> <87mwn62dyw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: The problem with the try...except statement is that it'll catch errors that occur inside the function. Say the function calls zip and accidentally gives an int instead of their list. It'll raise a TypeError, which will be caught and a CallbackError will be raised. But, in some programs, that behavior doesn't work. "Stephen J. Turnbull" wrote: >Ryan writes: > > > I can still see why checking if its callable is a good idea in some > > cases. > >You can always rewrite the LBYL in EAFP form: > > try: > self.call[item]() > except TypeError as e: > raise CallbackError(...) from e > >This is definitely preferred if you can enclose a whole suite in a try >and expect the CallbackError to be infrequent, eg: > > try: > while item in input: > self.call[item]() > except TypeError as e: > raise CallbackError(...) from e > >However, what you probably really want to do (which is a better >argument for LBYL, anyway) is > > if callable(callback): > self.call[item] = callback > else: > what_would_jruser_do() -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Sep 21 23:08:06 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 21 Sep 2013 14:08:06 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1060C40C-4255-456C-9B2F-49DAC7B9FB03@yahoo.com> On Sep 21, 2013, at 0:21, Neil Girdhar wrote: > I'm happy with iterable and not iterator if it comes with a promise. Then my first ABC is what what I probably want. If not, then I think it's better to do something lke > > class Reiterable(collections.Iterable, > metaclass=collections.abc.ABCMeta): > @classmethod > def __subclasshook__(cls, subclass): > if (issubclass(subclass, collections.MappingView) > or issubclass(subclass, collections.Sequence) > or issubclass(subclass, collections.Set) > or issubclass(subclass, collections.Mapping)): > return True > return NotImplemented Which leaves out numpy arrays, most sorted list and dict classes from PyPI, ElementTree and similar element/node/etc. types, ScriptingBridge/appscript collections, win32com IWhateverCollections, and all kinds of other types that can be reiterated, which are correctly diagnosed by Iterable and not Iterator. I haven't tested all of them, so some could fail to register as Iterable (especially given the possibility that Iterable may be incorrect, as mentioned elsewhere on this thread). But getting false negatives on a few types and having to deal with them by fixing a bug is surely better than getting false negatives on all types and having to deal with them by adding new, otherwise-unnecessary code. > Other classes can be added with register. So anyone who wants to use your module with numpy or appscript or ElementTree has to find all of the iterable types the class exposes (some of which aren't part of the public API--in some the case of appscript or win32com the may even be built dynamically as needed) and register all of them? You're putting the burden in the wrong place. Because you're worried that some class could theoretically be a non-reiterable non-iterator iterable, even though neither you nor anyone else can think of a sensible example of such a thing, you're requiring the user to certify that every iterable single class he uses is not pathological. That's not LBYL, that's perform a comprehensive survey and environmental impact report on the entire region and file papers in triplicate before you leap. If you're really worried about this unlikely possibility making it hard to debug the use of your code with some as-yet-unknown type, there are easier ways to verify things. For example, if the iterable works the first time, but is empty the second, the user has given you a non-reiterable, and you can assert or raise appropriately, which will make the code error just as easy to debug as having forgotten to register with Reiterable--and far easier to debug than having mistakenly registered with Reiterable when they shouldn't have. Plus, this lets you test for exactly what you want, not just a rough approximation. You could just as easily verify that the first element of each iteration matches, to ensure that it's not a random-reiterable type like Terry discussed that would ruin your particular two-pass algorithm. Or whatever is appropriate. > On Sat, Sep 21, 2013 at 3:04 AM, Andrew Barnert wrote: >> On Sep 20, 2013, at 21:52, Neil Girdhar wrote: >> >>> We discussed this upthread: I only want "not iterator" if not iterator promises reiterability. Right now, we have what may be a happy accident that can easily be violated by someone else. >> >> And if you define your new ABC, it can be just as easily violated by someone else. In fact, it will be violated in the exact _same_ cases. There's no check you can do besides the reverse of the checks done by iterator. >> >> More importantly, it's not just "a happy accident". I've asked repeatedly if anyone can come up with a single example of a non-iterator, non-reiterable iterator, or even imagine what one would look like, and nobody's come up with one. And it's not like iterators are some new feature nobody's had time to explore yet. >> >> So, in order to solve a problem that doesn't exist, you want to add a new feature that wouldn't solve it any better than what we have today. >> >>> Best, >>> Neil >>> >>> >>> On Sat, Sep 21, 2013 at 12:50 AM, Andrew Barnert wrote: >>>> On Sep 20, 2013, at 21:23, Neil Girdhar wrote: >>>> >>>>> I appreciate the discussion illuminating various aspects of this I hadn't considered. Finally, what I think I want is for >>>>> * all sequences >>>>> * all views >>>>> * numpy arrays >>>>> to answer yes to reiterable, and >>>>> * all generators >>>>> to answer no to reiterable. >>>> >>>> All sequences, views, and numpy arrays answer no to iterator (and so do sets, mappings, etc.), and all generators answer yes (and so do the iterators you get back from calling iter on a sequence, map, filter, your favorite itertools function, etc.) >>>> >>>> So you just want "not iterator". Even Haskell doesn't attempt to provide negative types like that. (And you can very easily show that it's iterator that's the normal type: it's syntactically checkable in various ways--e.g., it.hasattr('__next__'), but the only positive way to check reiterable is not just semantic, but destructive.) >>>> >>>>> Best, Neil >>>>> >>>>> On Fri, Sep 20, 2013 at 10:12 PM, Stephen J. Turnbull wrote: >>>>>> Terry Reedy writes: >>>>>> >>>>>> > Dismissing legal code as 'pathological', as more than one person has, >>>>>> > does not cut it as a design principle. >>>>>> >>>>>> But you don't even need to write a class with __getitem__() to get >>>>>> that behavior. >>>>>> >>>>>> >>> l = [11, 12, 13] >>>>>> >>> for i in l: >>>>>> ... print(i) >>>>>> ... if i%2 == 0: >>>>>> ... l.remove(i) >>>>>> ... >>>>>> 11 >>>>>> 12 >>>>>> >>> l >>>>>> [11, 13] >>>>>> >>> >>>>>> >>>>>> Of course the iteration itself is probably buggy (ie, the writer >>>>>> didn't mean to skip printing '13'), but in general iterables can >>>>>> change themselves. >>>>>> >>>>>> Neil himself seems to be of two minds about such cases. On the one >>>>>> hand, he said the above behavior is built in to list, so it's >>>>>> acceptable to him. (I think that's inconsistent: I would say the >>>>>> property of being completely consumed is built in to iterator, so it >>>>>> should be acceptable, too.) On the other hand, he's defined a >>>>>> reiterable as a collection that when iterated produces the same >>>>>> objects in the same order. >>>>>> >>>>>> Maybe what we really want is for copy.deepcopy to do the right thing >>>>>> with iterables. Then code that doesn't want to consume consumable >>>>>> iterables can do a deepcopy (including replication of the closed-over >>>>>> state of __next__() for iterators) before iterating. >>>>>> >>>>>> Or perhaps the right thing is a copy.itercopy that creates a new >>>>>> composite object as a shallow copy of everything except that it clones >>>>>> the state of __next__() in case the object was an iterator to start >>>>>> with. >>>>>> _______________________________________________ >>>>>> Python-ideas mailing list >>>>>> Python-ideas at python.org >>>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>>> >>>>>> -- >>>>>> >>>>>> --- >>>>>> You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. >>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. >>>>>> To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe at googlegroups.com. >>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>>> _______________________________________________ >>>>> Python-ideas mailing list >>>>> Python-ideas at python.org >>>>> https://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Sep 21 23:14:56 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 17:14:56 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <1060C40C-4255-456C-9B2F-49DAC7B9FB03@yahoo.com> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> <1060C40C-4255-456C-9B2F-49DAC7B9FB03@yahoo.com> Message-ID: On Sat, Sep 21, 2013 at 5:08 PM, Andrew Barnert wrote: > On Sep 21, 2013, at 0:21, Neil Girdhar wrote: > > I'm happy with iterable and not iterator if it comes with a promise. Then > my first ABC is what what I probably want. If not, then I think it's > better to do something lke > > class Reiterable(collections.Iterable, > metaclass=collections.abc.ABCMeta): > @classmethod > def __subclasshook__(cls, subclass): > if (issubclass(subclass, collections.MappingView) > or issubclass(subclass, collections.Sequence) > or issubclass(subclass, collections.Set) > or issubclass(subclass, collections.Mapping)): > return True > return NotImplemented > > > Which leaves out numpy arrays, most sorted list and dict classes from > PyPI, ElementTree and similar element/node/etc. types, > ScriptingBridge/appscript collections, win32com IWhateverCollections, and > all kinds of other types that can be reiterated, which are correctly > diagnosed by Iterable and not Iterator. > > I haven't tested all of them, so some could fail to register as Iterable > (especially given the possibility that Iterable may be incorrect, as > mentioned elsewhere on this thread). But getting false negatives on a few > types and having to deal with them by fixing a bug is surely better than > getting false negatives on all types and having to deal with them by adding > new, otherwise-unnecessary code. > > Other classes can be added with register. > > > So anyone who wants to use your module with numpy or appscript or > ElementTree has to find all of the iterable types the class exposes (some > of which aren't part of the public API--in some the case of appscript or > win32com the may even be built dynamically as needed) and register all of > them? > > You're putting the burden in the wrong place. Because you're worried that > some class could theoretically be a non-reiterable non-iterator iterable, > even though neither you nor anyone else can think of a sensible example of > such a thing, you're requiring the user to certify that every iterable > single class he uses is not pathological. That's not LBYL, that's perform a > comprehensive survey and environmental impact report on the entire region > and file papers in triplicate before you leap. > If you really think that there will never be a non-reiterable non-iterator iterable, then the standard should promise that and we're in total agreement. > > If you're really worried about this unlikely possibility making it hard to > debug the use of your code with some as-yet-unknown type, there are easier > ways to verify things. For example, if the iterable works the first time, > but is empty the second, the user has given you a non-reiterable, and you > can assert or raise appropriately, which will make the code error just as > easy to debug as having forgotten to register with Reiterable--and far > easier to debug than having mistakenly registered with Reiterable when they > shouldn't have. Plus, this lets you test for exactly what you want, not > just a rough approximation. You could just as easily verify that the first > element of each iteration matches, to ensure that it's not a > random-reiterable type like Terry discussed that would ruin your particular > two-pass algorithm. Or whatever is appropriate. > I think "asserting on" the iterator that was passed in is a much worse solution than "casting it to iterable". Don't annoy the user with implementation details is a good rule to follow. Best, Neil > > On Sat, Sep 21, 2013 at 3:04 AM, Andrew Barnert wrote: > >> On Sep 20, 2013, at 21:52, Neil Girdhar wrote: >> >> We discussed this upthread: I only want "not iterator" if not iterator >> promises reiterability. Right now, we have what may be a happy accident >> that can easily be violated by someone else. >> >> >> And if you define your new ABC, it can be just as easily violated by >> someone else. In fact, it will be violated in the exact _same_ >> cases. There's no check you can do besides the reverse of the checks done >> by iterator. >> >> More importantly, it's not just "a happy accident". I've asked repeatedly >> if anyone can come up with a single example of a non-iterator, >> non-reiterable iterator, or even imagine what one would look like, and >> nobody's come up with one. And it's not like iterators are some new feature >> nobody's had time to explore yet. >> >> So, in order to solve a problem that doesn't exist, you want to add a new >> feature that wouldn't solve it any better than what we have today. >> >> Best, >> Neil >> >> >> On Sat, Sep 21, 2013 at 12:50 AM, Andrew Barnert wrote: >> >>> On Sep 20, 2013, at 21:23, Neil Girdhar wrote: >>> >>> I appreciate the discussion illuminating various aspects of this I >>> hadn't considered. Finally, what I think I want is for >>> * all sequences >>> * all views >>> * numpy arrays >>> to answer yes to reiterable, and >>> * all generators >>> to answer no to reiterable. >>> >>> >>> All sequences, views, and numpy arrays answer no to iterator (and so do >>> sets, mappings, etc.), and all generators answer yes (and so do the >>> iterators you get back from calling iter on a sequence, map, filter, your >>> favorite itertools function, etc.) >>> >>> So you just want "not iterator". Even Haskell doesn't attempt to provide >>> negative types like that. (And you can very easily show that it's iterator >>> that's the normal type: it's syntactically checkable in various ways--e.g., >>> it.hasattr('__next__'), but the only positive way to check reiterable is >>> not just semantic, but destructive.) >>> >>> Best, Neil >>> >>> On Fri, Sep 20, 2013 at 10:12 PM, Stephen J. Turnbull < >>> stephen at xemacs.org> wrote: >>> >>>> Terry Reedy writes: >>>> >>>> > Dismissing legal code as 'pathological', as more than one person has, >>>> > does not cut it as a design principle. >>>> >>>> But you don't even need to write a class with __getitem__() to get >>>> that behavior. >>>> >>>> >>> l = [11, 12, 13] >>>> >>> for i in l: >>>> ... print(i) >>>> ... if i%2 == 0: >>>> ... l.remove(i) >>>> ... >>>> 11 >>>> 12 >>>> >>> l >>>> [11, 13] >>>> >>> >>>> >>>> Of course the iteration itself is probably buggy (ie, the writer >>>> didn't mean to skip printing '13'), but in general iterables can >>>> change themselves. >>>> >>>> Neil himself seems to be of two minds about such cases. On the one >>>> hand, he said the above behavior is built in to list, so it's >>>> acceptable to him. (I think that's inconsistent: I would say the >>>> property of being completely consumed is built in to iterator, so it >>>> should be acceptable, too.) On the other hand, he's defined a >>>> reiterable as a collection that when iterated produces the same >>>> objects in the same order. >>>> >>>> Maybe what we really want is for copy.deepcopy to do the right thing >>>> with iterables. Then code that doesn't want to consume consumable >>>> iterables can do a deepcopy (including replication of the closed-over >>>> state of __next__() for iterators) before iterating. >>>> >>>> Or perhaps the right thing is a copy.itercopy that creates a new >>>> composite object as a shallow copy of everything except that it clones >>>> the state of __next__() in case the object was an iterator to start >>>> with. >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> >>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "python-ideas" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> python-ideas+unsubscribe at googlegroups.com. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Sep 21 23:17:19 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 21 Sep 2013 14:17:19 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> <7wpps2wz8l.fsf@benfinney.id.au> <56a4d514-2f52-4dbe-a861-5298f4970101@email.android.com> <87mwn62dyw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sep 21, 2013, at 13:02, Ryan wrote: > The problem with the try...except statement is that it'll catch errors that occur inside the function. > Say the function calls zip and accidentally gives an int instead of their list. It'll raise a TypeError, which will be caught and a CallbackError will be raised. You can always use EAFP with a bit of "look back after you leaped" to help diagnose the error: try: self.call[item]() except TypeError as e: if not callable(self.call[item]): raise CallbackError(...) from e raise I'm not sure that would be appropriate in this case, but similar code is very common when dealing with, e.g., the filesystem (especially in 2.x, where you had to distinguish errors on errno... but even in 3.x it's often worth telling the user that his error was because the folder be specified doesn't exist, as opposed to just the filename being wrong). But anyway, I think this is way off topic for this thread. We already have both EAFP and LBYL mechanisms for dealing with iteration; the argument isn't which one you should use, but whether the existing ABCs are sufficient or leave an important gap if you choose to use them for LBYL. > But, in some programs, that behavior doesn't work. > > "Stephen J. Turnbull" wrote: >> >> Ryan writes: >> >>> I can still see why checking if its callable is a good idea in some >>> cases. >> >> You can always rewrite the LBYL in EAFP form: >> >> try: >> self.call[item]() >> except TypeError as e: >> raise CallbackError(...) from e >> >> This is definitely preferred if you can enclose a whole suite in a try >> and expect the CallbackError to be infrequent, eg: >> >> try: >> while item in input: >> self.call[item]() >> except TypeError as e: >> raise CallbackError(...) from e >> >> However, what you probably really want to do (which is a better >> argument for LBYL, anyway) is >> >> if callable(callback): >> self.call[item] = callback >> else: >> what_would_jruser_do() > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sun Sep 22 00:31:50 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 21 Sep 2013 18:31:50 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> <1060C40C-4255-456C-9B2F-49DAC7B9FB03@yahoo.com> Message-ID: On 9/21/2013 5:14 PM, Neil Girdhar wrote: > If you really think that there will never be a non-reiterable > non-iterator iterable, I already posted a sensible non-iterator iterable that is no more reiterable than an iterator. I expect that there are examples in the wild. If nothing else, there are probably some written before the new iterator protocol was added. These are explicitly supported. -- Terry Jan Reedy From mistersheik at gmail.com Sun Sep 22 00:33:36 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 21 Sep 2013 18:33:36 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> <1060C40C-4255-456C-9B2F-49DAC7B9FB03@yahoo.com> Message-ID: "new iterator protocol" :) Is it still new? On Sat, Sep 21, 2013 at 6:31 PM, Terry Reedy wrote: > On 9/21/2013 5:14 PM, Neil Girdhar wrote: > > If you really think that there will never be a non-reiterable >> non-iterator iterable, >> > > I already posted a sensible non-iterator iterable that is no more > reiterable than an iterator. I expect that there are examples in the wild. > If nothing else, there are probably some written before the new iterator > protocol was added. These are explicitly supported. > > -- > Terry Jan Reedy > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit https://groups.google.com/d/** > topic/python-ideas/**OumiLGDwRWA/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe@**googlegroups.com > . > For more options, visit https://groups.google.com/**groups/opt_out > . > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Sun Sep 22 03:11:14 2013 From: rymg19 at gmail.com (Ryan) Date: Sat, 21 Sep 2013 20:11:14 -0500 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <84A21A05-BA4E-4324-98DB-6ADE7EB98D1D@gmail.com> <5354F5F4-8052-451E-BAFB-BED214484AF8@gmail.com> <523D0BF4.5020404@stoneleaf.us> <20130921040904.GW19939@ando> <7wpps2wz8l.fsf@benfinney.id.au> <56a4d514-2f52-4dbe-a861-5298f4970101@email.android.com> <87mwn62dyw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: At that rate, why not just check for callability(?) in the first place? Andrew Barnert wrote: >On Sep 21, 2013, at 13:02, Ryan wrote: > >> The problem with the try...except statement is that it'll catch >errors that occur inside the function. >> Say the function calls zip and accidentally gives an int instead of >their list. It'll raise a TypeError, which will be caught and a >CallbackError will be raised. > >You can always use EAFP with a bit of "look back after you leaped" to >help diagnose the error: > >try: > self.call[item]() >except TypeError as e: > if not callable(self.call[item]): > raise CallbackError(...) from e > raise > >I'm not sure that would be appropriate in this case, but similar code >is very common when dealing with, e.g., the filesystem (especially in >2.x, where you had to distinguish errors on errno... but even in 3.x >it's often worth telling the user that his error was because the folder >be specified doesn't exist, as opposed to just the filename being >wrong). > >But anyway, I think this is way off topic for this thread. We already >have both EAFP and LBYL mechanisms for dealing with iteration; the >argument isn't which one you should use, but whether the existing ABCs >are sufficient or leave an important gap if you choose to use them for >LBYL. > >> But, in some programs, that behavior doesn't work. >> >> "Stephen J. Turnbull" wrote: >>> >>> Ryan writes: >>> >>>> I can still see why checking if its callable is a good idea in some >>>> cases. >>> >>> You can always rewrite the LBYL in EAFP form: >>> >>> try: >>> self.call[item]() >>> except TypeError as e: >>> raise CallbackError(...) from e >>> >>> This is definitely preferred if you can enclose a whole suite in a >try >>> and expect the CallbackError to be infrequent, eg: >>> >>> try: >>> while item in input: >>> self.call[item]() >>> except TypeError as e: >>> raise CallbackError(...) from e >>> >>> However, what you probably really want to do (which is a better >>> argument for LBYL, anyway) is >>> >>> if callable(callback): >>> self.call[item] = callback >>> else: >>> what_would_jruser_do() >> >> -- >> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sun Sep 22 03:33:19 2013 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 21 Sep 2013 20:33:19 -0500 Subject: [Python-ideas] Numerical instability was: Re: Introduce collections.Reiterable In-Reply-To: References: Message-ID: [Oscar Benjamin ] > ... > If you know of a one-pass algorithm (or a way to improve the > implementation I showed) that is as accurate as either the two_pass or > three_pass methods I'd be very interested to see it (I'm sure Steven > would be as well). This looks interesting: ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf It give detailed error analyses of the methods already on the table (although without use of `fsum()`), and invents some new ones. It gets hairy ;-) From abarnert at yahoo.com Sun Sep 22 04:05:06 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 21 Sep 2013 19:05:06 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> <1060C40C-4255-456C-9B2F-49DAC7B9FB03@yahoo.com> Message-ID: <05C362B4-4491-4745-94AE-4EBFC0AA5DDC@yahoo.com> On Sep 21, 2013, at 15:31, Terry Reedy wrote: > On 9/21/2013 5:14 PM, Neil Girdhar wrote: > >> If you really think that there will never be a non-reiterable >> non-iterator iterable, > > I already posted a sensible non-iterator iterable that is no more reiterable than an iterator. You posted a long discussion of different ways in which "reiterable" could be defined, and gave vague examples of things that are reiterable in one sense but not in another. Accepting all of that at face value, there's no way Neil's Reiterable ABC would help that problem, because it would obviously only cover one of the possible senses. Beyond that, I've looked through your posts on that thread, and I can't find anything that looks like a sensible non-iterator non-reiterable (in Neil's intended sense) iterable. Did I miss something? > I expect that there are examples in the wild. If nothing else, there are probably some written before the new iterator protocol was added. These are explicitly supported. > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas From steve at pearwood.info Sun Sep 22 04:56:43 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 22 Sep 2013 12:56:43 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> <1060C40C-4255-456C-9B2F-49DAC7B9FB03@yahoo.com> Message-ID: <20130922025643.GD19939@ando> On Sat, Sep 21, 2013 at 05:14:56PM -0400, Neil Girdhar wrote: > If you really think that there will never be a non-reiterable non-iterator > iterable, then the standard should promise that and we're in total > agreement. Which standard are you referring to? It would help if you specified a concrete place in the documentation that you would like to see changed, and a concrete suggestion for what change you would like to see. You should start with a definition of what precisely you mean by a Reiterable, and an example of what does, and what doesn't, count under that defintion. Even if you've already done so, this thread has become too big and too lumbering for me to keep track of everything discussed in it, and I'm sure I'm not the only one. -- Steven From ncoghlan at gmail.com Sun Sep 22 06:56:52 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 22 Sep 2013 14:56:52 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130920094854.GO19939@ando> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> Message-ID: On 20 Sep 2013 19:49, "Steven D'Aprano" wrote: > > On Thu, Sep 19, 2013 at 11:02:57PM +1000, Nick Coghlan wrote: > > On 19 September 2013 22:18, Steven D'Aprano wrote: > [...] > > > At the moment, dict views aren't directly iterable (you can't call > > > next() on them). But in principle they could have been designed as > > > re-iterable iterators. > > > > That's not what iterable means. The iterable/iterator distinction is > > well defined and reflected in the collections ABCs: > > Actually, I think the collections ABC gets it wrong, according to both > common practice and the definition given in the glossary: > > http://docs.python.org/3.4/glossary.html > > More on this below. > > As for my comment above, dict views don't obey the iterator protocol > themselves, as they have no __next__ method, nor do they obey the > sequence protocol, as they are not indexable. Hence they are not > *directly* iterable, but they are *indirectly* iterable, since they have > an __iter__ method which returns an iterator. Um, no. Everywhere Python iterates over anything, we call iter(obj) first. If there is anywhere we don't do that, it's a bug. > I don't think this is a critical distinction. I think it is fine to call > views "iterable", since they can be iterated over. On the rare occasion > that it matters, we can just do what I did above, and talk about objects > which are directly iterable (e.g. iterators, sequences, generator > objects) and those which are indirectly iterable (e.g. dict views). Or you could just use the existing terminology and talk about iterables vs iterators instead of inventing your own terms. > > * iterables are objects that return iterators from __iter__. > > That definition is incomplete, because iterable objects include those > that obey the sequence protocol. This is not only by long-standing > tradition (pre-dating the introduction of iterators, if I remember > correctly), but also as per the definition in the glossary. Alas, > collections.Iterable gets this wrong: > > py> class Seq: > ... def __getitem__(self, index): > ... if 0 <= index < 5: return index+1000 > ... raise IndexError > ... > py> s = Seq() > py> isinstance(s, Iterable) > False > py> list(s) # definitely iterable > [1000, 1001, 1002, 1003, 1004] > > > (Note that although Seq obeys the sequence protocol, and is can be > iterated over, it is not a fully-fledged Sequence since it has no > __len__.) > > I think this is a bug in the Iterable ABC, but I'm not sure how one > might fix it. The ducktyping check could technically be expanded to use the same fallback iter() does (i.e. __len__ and __getitem__). However, that would reintroduce the Sequence/Mapping ambiguity that ABCs were expressly designed to eliminate, so we don't want to do that: >>> class BadFallback: ... def __len__(self): ... return 1 ... def __getitem__(self, key): ... if key != "the_one": raise KeyError(key) ... return "the_value" ... >>> c = BadFallback() >>> c["the_one"] 'the_value' >>> iter(c) >>> next(iter(c)) Traceback (most recent call last): File "", line 1, in File "", line 5, in __getitem__ KeyError: 0 In cases like this, the default behaviour is actually correct. Since the fallback iterator only supports sequences rather than arbitrary mappings, merely implementing __len__ and __getitem__ isn't considered a reliable enough indication that an object is actually iterable. Fortunately, we also designed the ABC system to make it trivial for people to notify Python that their container is an iterable sequence when the automatic ducktyping fails: they can just call register on Iterable or one of its subclasses, and the interpreter will believe them. >>> from collections.abc import Iterable, Mapping >>> isinstance(c, Iterable) False >>> isinstance(c, Mapping) False >>> Mapping.register(BadFallback) >>> isinstance(c, Iterable) True >>> isinstance(c, Mapping) True In this case, it's a bad registration, since the object in question *doesn't* implement those interfaces properly, but it's easy to define a type where it's more accurate: >>> from collections import Sequence >>> @Sequence.register ... class GoodFallback: ... def __len__(self): ... return 1 ... def __getitem__(self, idx): ... if idx != 0: raise IndexError(idx) ... return "the_entry" ... >>> c2 = GoodFallback() >>> list(c2) ['the_entry'] >>> isinstance(c2, Iterable) True Even "GoodFallback" doesn't implement the full Sequence API, but it's likely to provide enough of it for many use cases. This is why type checks on ABCs are vastly different to those on concrete classes - ABCs still leave full control in the hands of the application integrator (through explicit registrations), whereas strict interface checks in a language like Java demand *full* interface compliance to pass the check, even if you really only need a fraction of it. > > That "iterators return self from __iter__" is important, since almost > > everywhere Python iterates over something, it call "_itr = iter(obj)" > > first. > > And then falls back on the sequence protocol. And that final fallback *won't work properly* if the object in question isn't actually a sequence. Cheers, Nick. From g.brandl at gmx.net Sun Sep 22 10:58:40 2013 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 22 Sep 2013 10:58:40 +0200 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> Message-ID: On 09/22/2013 06:56 AM, Nick Coghlan wrote: > On 20 Sep 2013 19:49, "Steven D'Aprano" wrote: >> >> On Thu, Sep 19, 2013 at 11:02:57PM +1000, Nick Coghlan wrote: >> > On 19 September 2013 22:18, Steven D'Aprano wrote: >> [...] >> > > At the moment, dict views aren't directly iterable (you can't call >> > > next() on them). But in principle they could have been designed as >> > > re-iterable iterators. >> > >> > That's not what iterable means. The iterable/iterator distinction is >> > well defined and reflected in the collections ABCs: >> >> Actually, I think the collections ABC gets it wrong, according to both >> common practice and the definition given in the glossary: >> >> http://docs.python.org/3.4/glossary.html >> >> More on this below. >> >> As for my comment above, dict views don't obey the iterator protocol >> themselves, as they have no __next__ method, nor do they obey the >> sequence protocol, as they are not indexable. Hence they are not >> *directly* iterable, but they are *indirectly* iterable, since they have >> an __iter__ method which returns an iterator. > > Um, no. Everywhere Python iterates over anything, we call iter(obj) > first. If there is anywhere we don't do that, it's a bug. > >> I don't think this is a critical distinction. I think it is fine to call >> views "iterable", since they can be iterated over. On the rare occasion >> that it matters, we can just do what I did above, and talk about objects >> which are directly iterable (e.g. iterators, sequences, generator >> objects) and those which are indirectly iterable (e.g. dict views). > > Or you could just use the existing terminology and talk about > iterables vs iterators instead of inventing your own terms. Ack. Please don't create new terms, rather suggest an improvement to the glossary definition if you think it's inadequate. Georg From ncoghlan at gmail.com Sun Sep 22 12:30:43 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 22 Sep 2013 20:30:43 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> Message-ID: On 22 September 2013 18:58, Georg Brandl wrote: > On 09/22/2013 06:56 AM, Nick Coghlan wrote: >> On 20 Sep 2013 19:49, "Steven D'Aprano" wrote: >>> I don't think this is a critical distinction. I think it is fine to call >>> views "iterable", since they can be iterated over. On the rare occasion >>> that it matters, we can just do what I did above, and talk about objects >>> which are directly iterable (e.g. iterators, sequences, generator >>> objects) and those which are indirectly iterable (e.g. dict views). >> >> Or you could just use the existing terminology and talk about >> iterables vs iterators instead of inventing your own terms. > > Ack. Please don't create new terms, rather suggest an improvement to the > glossary definition if you think it's inadequate. As near as I can tell, Steven's observation is that, for backwards compatibility reasons, iter() tolerates sequences that define __len__ and __getitem__ without defining __iter__, whereas the collections ABCs require an __iter__ method for their ducktyping to trigger. This means that there are a small number of legacy cases where "isinstance(c, collections.abc.Iterable)" can be False, while calling "iter(c)" would still give you a working iterator. My take on it is that when Guido formalised the container model in PEP 3119, he was *deliberately* relegating those "iterable without defining __iter__" cases to be purely a backwards compatibility hack without forming part of the formal object model. The class definitions that aren't defining the full Sequence ABC (including __iter__) aren't really proper sequences in Python 3, even though they'll still mostly work (thanks to the prevalence of ducktyping). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Sun Sep 22 12:55:58 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 22 Sep 2013 20:55:58 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> Message-ID: <20130922105558.GF19939@ando> On Sun, Sep 22, 2013 at 10:58:40AM +0200, Georg Brandl wrote: > > Or you could just use the existing terminology and talk about > > iterables vs iterators instead of inventing your own terms. > > Ack. Please don't create new terms, rather suggest an improvement to the > glossary definition if you think it's inadequate. I'm not inventing new terminology. I'm using the plain English meanings of "directly" and "indirectly", and the standard meaning of "iterate", "iterator", "iterable" as used by Python and described in the glossary. As the glossary says, "The for statement [calls iter] for you, creating a TEMPORARY UNNAMED VARIABLE to hold the iterator for the duration of the loop." [emphasis added] All I am doing is distinguishing between the iterable object that the for-loop calls iter() on, which need not have a __next__ method, and the iterable object that the for-loop calls __next__ on. They're not always the same object. But as I've already said, the distinction usually doesn't matter. I've already forgotten the context of why I thought it mattered when I first raised it *wink* -- Steven From techtonik at gmail.com Sun Sep 22 13:21:07 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 22 Sep 2013 14:21:07 +0300 Subject: [Python-ideas] +1 button/counter for bugs.python.org Message-ID: Does anybody think it is a good idea to personally approve good issues and messages on bugs.python.org? If yes, should it be a Google's +1 (easier to add), or a pythonic solution for Roundup? -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun Sep 22 13:57:03 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 22 Sep 2013 04:57:03 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130922105558.GF19939@ando> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922105558.GF19939@ando> Message-ID: <1EEA7DC3-1D24-4295-8BB8-B4099DE86A8A@yahoo.com> On Sep 22, 2013, at 3:55, Steven D'Aprano wrote: > On Sun, Sep 22, 2013 at 10:58:40AM +0200, Georg Brandl wrote: > >>> Or you could just use the existing terminology and talk about >>> iterables vs iterators instead of inventing your own terms. >> >> Ack. Please don't create new terms, rather suggest an improvement to the >> glossary definition if you think it's inadequate. > > I'm not inventing new terminology. I'm using the plain English meanings > of "directly" and "indirectly", and the standard meaning of "iterate", > "iterator", "iterable" as used by Python and described in the glossary. No you aren't. You're using iterable as a synonym for iterator, which is not how it's used by Python or described in the glossary. > As the glossary says, "The for statement [calls iter] for you, creating > a TEMPORARY UNNAMED VARIABLE to hold the iterator for the duration of > the loop." [emphasis added] All I am doing is distinguishing between the > iterable object that the for-loop calls iter() on, which need not have a > __next__ method, and the iterable object that the for-loop calls > __next__ on. They're not always the same object. This is precisely the distinction between iterables and iterators. The object that the for loop calls iter on is an iterable. The object that the for loop gets back from that iter call, binds a temporary unnamed variable to, and calls __next__ on is an iterator. > But as I've already said, the distinction usually doesn't matter. Yes it does. This is a very important distinction. That's why Python already has separate terminology. And separate ABCs. From solipsis at pitrou.net Sun Sep 22 14:03:47 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 22 Sep 2013 14:03:47 +0200 Subject: [Python-ideas] +1 button/counter for bugs.python.org References: Message-ID: <20130922140347.2c54318d@fsol> On Sun, 22 Sep 2013 14:21:07 +0300 anatoly techtonik wrote: > Does anybody think it is a good idea to personally approve good issues and > messages on bugs.python.org? Bug voting would be ok (not for "good issues", but "issues people care about"), but -1 on voting on messages. This isn't a popularity contest. From steve at pearwood.info Sun Sep 22 14:21:49 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 22 Sep 2013 22:21:49 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> Message-ID: <20130922122149.GH19939@ando> On Sun, Sep 22, 2013 at 08:30:43PM +1000, Nick Coghlan wrote: > As near as I can tell, Steven's observation is that, for backwards > compatibility reasons, iter() tolerates sequences that define __len__ > and __getitem__ without defining __iter__, whereas the collections > ABCs require an __iter__ method for their ducktyping to trigger. The sequence protocol doesn't require a __len__ method, it only requires a __getitem__ method that takes consecutive ints 0, 1, 2, ... and raises IndexError when there are no more items to get. But apart from that, yes, that's correct. There are iterables that fail the Iterable ABC test. > This > means that there are a small number of legacy cases where > "isinstance(c, collections.abc.Iterable)" can be False, while calling > "iter(c)" would still give you a working iterator. I'm sure you realise this, but just to be clear, there's no need to explicitly call iter(c). More to my point, you can simply iterate over c using a for-loop: for element in c: ... thus proving that c is iterable, since you've just iterated over it. -- Steven From g.brandl at gmx.net Sun Sep 22 14:43:05 2013 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 22 Sep 2013 14:43:05 +0200 Subject: [Python-ideas] +1 button/counter for bugs.python.org In-Reply-To: <20130922140347.2c54318d@fsol> References: <20130922140347.2c54318d@fsol> Message-ID: On 09/22/2013 02:03 PM, Antoine Pitrou wrote: > On Sun, 22 Sep 2013 14:21:07 +0300 > anatoly techtonik > wrote: >> Does anybody think it is a good idea to personally approve good issues and >> messages on bugs.python.org? > > Bug voting would be ok (not for "good issues", but "issues people care > about"), but -1 on voting on messages. This isn't a popularity contest. As long as you can also vote -1 on messages :) Seriously, I agree; if somebody implements it, voting "I want to see this fixed/implemented" seems fine to me. Georg From mbuttu at oa-cagliari.inaf.it Sun Sep 22 15:33:48 2013 From: mbuttu at oa-cagliari.inaf.it (Marco Buttu) Date: Sun, 22 Sep 2013 15:33:48 +0200 Subject: [Python-ideas] +1 button/counter for bugs.python.org In-Reply-To: <20130922140347.2c54318d@fsol> References: <20130922140347.2c54318d@fsol> Message-ID: <523EF1BC.7050306@oa-cagliari.inaf.it> On 09/22/2013 02:03 PM, Antoine Pitrou wrote: > On Sun, 22 Sep 2013 14:21:07 +0300 > anatoly techtonik > wrote: >> >Does anybody think it is a good idea to personally approve good issues and >> >messages on bugs.python.org? > Bug voting would be ok (not for "good issues", but "issues people care > about"), + 1 for bug voting :) -- Marco Buttu INAF Osservatorio Astronomico di Cagliari Loc. Poggio dei Pini, Strada 54 - 09012 Capoterra (CA) - Italy Phone: +39 070 71180255 Email: mbuttu at oa-cagliari.inaf.it From ncoghlan at gmail.com Sun Sep 22 16:22:19 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 23 Sep 2013 00:22:19 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130922122149.GH19939@ando> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> Message-ID: On 22 September 2013 22:21, Steven D'Aprano wrote: >> This >> means that there are a small number of legacy cases where >> "isinstance(c, collections.abc.Iterable)" can be False, while calling >> "iter(c)" would still give you a working iterator. > > I'm sure you realise this, but just to be clear, there's no need to > explicitly call iter(c). More to my point, you can simply iterate over c > using a for-loop: > > for element in c: > ... > > > thus proving that c is iterable, since you've just iterated over it. It's still the implicit call to iter() inside the for loop that converts the iterable to an iterator though. And these are exactly the cases that I am saying *deliberately* fail the more formal check instituted in PEP 3119. The __getitem__ fallback is a backwards compatibility hack, not part of the formal definition of an iterable. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tshepang at gmail.com Sun Sep 22 17:52:58 2013 From: tshepang at gmail.com (Tshepang Lekhonkhobe) Date: Sun, 22 Sep 2013 17:52:58 +0200 Subject: [Python-ideas] +1 button/counter for bugs.python.org In-Reply-To: References: Message-ID: On Sun, Sep 22, 2013 at 1:21 PM, anatoly techtonik wrote: > Does anybody think it is a good idea to personally approve good issues and > messages on bugs.python.org? > > If yes, should it be a Google's +1 (easier to add), or a pythonic solution > for Roundup? Is it not enough that one can subscribe to the bug? It's very easy (click the '+' button, then hit subscribe). That way, one can also keep track of where the conversation is going, instead of a mere vote-n-forget. From stephen at xemacs.org Sun Sep 22 18:25:32 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 23 Sep 2013 01:25:32 +0900 Subject: [Python-ideas] +1 button/counter for bugs.python.org In-Reply-To: References: Message-ID: <87li2o3jsz.fsf@uwakimon.sk.tsukuba.ac.jp> Tshepang Lekhonkhobe writes: > Is it not enough that one can subscribe to the bug? It's very easy > (click the '+' button, then hit subscribe). That way, one can also > keep track of where the conversation is going, instead of a mere > vote-n-forget. More important in the context of this thread, it says you care enough to accept mail, which is a much stronger endorsement than clicking a +1 button. I think it might be useful to add a "most subscribed open issues" table in the weekly report, but I hope that Guido, Antoine, Barry, Benjamin, Brett, Georg, Nick, Raymond, Tim, ... *ignore* any "cheap talk" voting mechanism and go on picking issues on their intuitions about what's important to make Python beautiful. After all, don't *all* issues deserve enough attention to close them (if only as "wontfix")? Put me down for +1 on everything! From tjreedy at udel.edu Sun Sep 22 18:28:01 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 22 Sep 2013 12:28:01 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <05C362B4-4491-4745-94AE-4EBFC0AA5DDC@yahoo.com> References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> <1060C40C-4255-456C-9B2F-49DAC7B9FB03@yahoo.com> <05C362B4-4491-4745-94AE-4EBFC0AA5DDC@yahoo.com> Message-ID: On 9/21/2013 10:05 PM, Andrew Barnert wrote: > On Sep 21, 2013, at 15:31, Terry Reedy > wrote: > >> On 9/21/2013 5:14 PM, Neil Girdhar wrote: >> >>> If you really think that there will never be a non-reiterable >>> non-iterator iterable, >> >> I already posted a sensible non-iterator iterable that is no more >> reiterable than an iterator. class Cnt: def __init__(self, maxn): self.n = 0 self.maxn = maxn def __getitem__(self, dummy): n = self.n + 1 if n <= self.maxn: self.n = n return n else: raise IndexError c3 = Cnt(3) print(c3 is not iter(c3), list(c3), list(c3)) >>> True [1, 2, 3] [] The only difference between this and an equivalent iterator is that True would instead be False. I would not call this reiterable, unless one says that all iterables, including exhausted iterators, are reiterable, because you can always call iter on them again and do a null iteration. While I sympathize with the desire to classify, there is a reason why the inventors of the newer protocol left boundedness and reiterablity to negotiation between writers and users of functions taking iterable args. They were not unaware of the issues and problems that have been discussed in this thread. > You posted a long discussion of different ways in which "reiterable" > could be defined, and gave vague examples of things that are > reiterable in one sense but not in another. Accepting all of that at > face value, there's no way Neil's Reiterable ABC would help that > problem, because it would obviously only cover one of the possible > senses. We agree on the last sentence. > Beyond that, I've looked through your posts on that thread, and I > can't find anything that looks like a sensible non-iterator > non-reiterable (in Neil's intended sense) iterable. Did I miss > something? See above, though I do not know that Neil's intended sense is, or if indeed he has exactly one intended sense. >> I expect that there are examples in the wild. If nothing else, >> there are probably some written before the new iterator protocol >> was added. These are explicitly supported. /new/newer/ -- Terry Jan Reedy From stephen at xemacs.org Sun Sep 22 18:30:40 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 23 Sep 2013 01:30:40 +0900 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> Message-ID: <87k3i83jkf.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > And these are exactly the cases that I am saying *deliberately* > fail the more formal check instituted in PEP 3119. The __getitem__ > fallback is a backwards compatibility hack, not part of the formal > definition of an iterable. I think that resolves my issue that __getitem__ is polymorphic, too. That is, item access by integer index doesn't care about order (could be a table of Goedel numbers) any more item access by arbitrary hashable does, and __iter__ takes care of the cases where the programmer does care about order. From tjreedy at udel.edu Sun Sep 22 18:37:52 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 22 Sep 2013 12:37:52 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> Message-ID: On 9/22/2013 10:22 AM, Nick Coghlan wrote: > The __getitem__ fallback is a backwards > compatibility hack, not part of the formal definition of an iterable. When I suggested that, by suggesting that the fallback *perhaps* could be called 'semi-deprecated, but kept for back compatibility' in the glossary entry, Raymond screamed at me and accused me of trying to change the language. He considers it an intended language feature that one can write a sequence class and not bother with __iter__. I guess we do not all agree ;-). -- Terry Jan Reedy From mistersheik at gmail.com Sun Sep 22 21:04:46 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 22 Sep 2013 15:04:46 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> Message-ID: I'm with you on this. If you want an Iterable and you wrote __getitem__, then it's not too much to ask that you either write a trivial __iter__: def __iter__(self): return (self.__getitem__(i) for i in itertools.count()) or you write __len__ and inherit from collections.Sequence. We should deprecate the sequence protocol. Neil On Sun, Sep 22, 2013 at 12:37 PM, Terry Reedy wrote: > On 9/22/2013 10:22 AM, Nick Coghlan wrote: > > The __getitem__ fallback is a backwards >> compatibility hack, not part of the formal definition of an iterable. >> > > When I suggested that, by suggesting that the fallback *perhaps* could be > called 'semi-deprecated, but kept for back compatibility' in the glossary > entry, Raymond screamed at me and accused me of trying to change the > language. He considers it an intended language feature that one can write a > sequence class and not bother with __iter__. I guess we do not all agree > ;-). > > -- > Terry Jan Reedy > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit https://groups.google.com/d/** > topic/python-ideas/**OumiLGDwRWA/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe@**googlegroups.com > . > For more options, visit https://groups.google.com/**groups/opt_out > . > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Sep 22 21:23:26 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 22 Sep 2013 21:23:26 +0200 Subject: [Python-ideas] +1 button/counter for bugs.python.org References: <87li2o3jsz.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20130922212326.1402d7b2@fsol> On Mon, 23 Sep 2013 01:25:32 +0900 "Stephen J. Turnbull" wrote: > Tshepang Lekhonkhobe writes: > > > Is it not enough that one can subscribe to the bug? It's very easy > > (click the '+' button, then hit subscribe). That way, one can also > > keep track of where the conversation is going, instead of a mere > > vote-n-forget. > > More important in the context of this thread, it says you care enough > to accept mail, which is a much stronger endorsement than clicking a > +1 button. > > I think it might be useful to add a "most subscribed open issues" > table in the weekly report, but I hope that Guido, Antoine, Barry, > Benjamin, Brett, Georg, Nick, Raymond, Tim, ... *ignore* any "cheap > talk" voting mechanism and go on picking issues on their intuitions > about what's important to make Python beautiful. Well, intuition and personal taste are of course a major factor, but sometimes it can be useful to know that a particular problem affects a lot of people (especially when it's the kind of very un-sexy problem, e.g. distutils). Of course any request that core developers tackle the most voted issues in strict order would be silly. Regards Antoine. From abarnert at yahoo.com Sun Sep 22 21:23:43 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 22 Sep 2013 12:23:43 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130920111000.GQ19939@ando> <87zjr63oub.fsf@uwakimon.sk.tsukuba.ac.jp> <1060C40C-4255-456C-9B2F-49DAC7B9FB03@yahoo.com> <05C362B4-4491-4745-94AE-4EBFC0AA5DDC@yahoo.com> Message-ID: On Sep 22, 2013, at 9:28, Terry Reedy wrote: > On 9/21/2013 10:05 PM, Andrew Barnert wrote: >> On Sep 21, 2013, at 15:31, Terry Reedy >> wrote: >> >>> On 9/21/2013 5:14 PM, Neil Girdhar wrote: >>> >>>> If you really think that there will never be a non-reiterable >>>> non-iterator iterable, >>> >>> I already posted a sensible non-iterator iterable that is no more >>> reiterable than an iterator. > > class Cnt: > def __init__(self, maxn): > self.n = 0 > self.maxn = maxn > def __getitem__(self, dummy): > n = self.n + 1 > if n <= self.maxn: > self.n = n > return n > else: > raise IndexError But this is a silly class, not a reasonable one. Why would you ever write this class, except to deceive users of it? It's more complicated and more verbose than an equivalent iterator, or the equivalent sequence (which you'd spell "range(n)", and the only "benefit" is that it pretends not to be an iterator. You can just as easily write something that claims to be a sequence but iterates it's elements in random order; that wouldn't prove that sequences are unordered, just that the ABCs don't test for all possible incorrect semantics. > c3 = Cnt(3) > print(c3 is not iter(c3), list(c3), list(c3)) > >>> > True [1, 2, 3] [] > > The only difference between this and an equivalent iterator is that True would instead be False. I would not call this reiterable, unless one says that all iterables, including exhausted iterators, are reiterable, because you can always call iter on them again and do a null iteration. > > While I sympathize with the desire to classify, there is a reason why the inventors of the newer protocol left boundedness and reiterablity to negotiation between writers and users of functions taking iterable args. They were not unaware of the issues and problems that have been discussed in this thread. > >> You posted a long discussion of different ways in which "reiterable" >> could be defined, and gave vague examples of things that are >> reiterable in one sense but not in another. Accepting all of that at >> face value, there's no way Neil's Reiterable ABC would help that >> problem, because it would obviously only cover one of the possible >> senses. > > We agree on the last sentence. > >> Beyond that, I've looked through your posts on that thread, and I >> can't find anything that looks like a sensible non-iterator >> non-reiterable (in Neil's intended sense) iterable. Did I miss >> something? > > See above, though I do not know that Neil's intended sense is, or if indeed he has exactly one intended sense. > >>> I expect that there are examples in the wild. If nothing else, >>> there are probably some written before the new iterator protocol >>> was added. These are explicitly supported. > > /new/newer/ > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas From brian at python.org Mon Sep 23 00:57:53 2013 From: brian at python.org (Brian Curtin) Date: Sun, 22 Sep 2013 17:57:53 -0500 Subject: [Python-ideas] +1 button/counter for bugs.python.org In-Reply-To: <20130922140347.2c54318d@fsol> References: <20130922140347.2c54318d@fsol> Message-ID: On Sun, Sep 22, 2013 at 7:03 AM, Antoine Pitrou wrote: > On Sun, 22 Sep 2013 14:21:07 +0300 > anatoly techtonik > wrote: >> Does anybody think it is a good idea to personally approve good issues and >> messages on bugs.python.org? > > Bug voting would be ok (not for "good issues", but "issues people care > about"), but -1 on voting on messages. This isn't a popularity contest. Adding a +1 on issues seems fine, but I doubt it'll change anything. I think, for the most part, we're pretty aware of issues people care about that need to be fixed. We'll probably need to document the feature to set expectations for what those votes actually mean, which is probably close to nothing. I can mostly just see this being abused. "Remove the GIL" will get submitted, then posted to reddit, then we'll have 5,000 votes to remove the GIL and zero attempts to do it. From timothy.c.delaney at gmail.com Mon Sep 23 01:43:17 2013 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Mon, 23 Sep 2013 09:43:17 +1000 Subject: [Python-ideas] +1 button/counter for bugs.python.org In-Reply-To: References: <20130922140347.2c54318d@fsol> Message-ID: On 23 September 2013 08:57, Brian Curtin wrote: > On Sun, Sep 22, 2013 at 7:03 AM, Antoine Pitrou > wrote: > > On Sun, 22 Sep 2013 14:21:07 +0300 > > anatoly techtonik > > wrote: > >> Does anybody think it is a good idea to personally approve good issues > and > >> messages on bugs.python.org? > > > > Bug voting would be ok (not for "good issues", but "issues people care > > about"), but -1 on voting on messages. This isn't a popularity contest. > > Adding a +1 on issues seems fine, but I doubt it'll change anything. I > think, for the most part, we're pretty aware of issues people care > about that need to be fixed. We'll probably need to document the > feature to set expectations for what those votes actually mean, which > is probably close to nothing. > > I can mostly just see this being abused. "Remove the GIL" will get > submitted, then posted to reddit, then we'll have 5,000 votes to > remove the GIL and zero attempts to do it. That's why I agree that the number of subscribers to the bug is a more useful figure. If someone cares enough about a bug to be notified when it's modified, that's someone who's really interested in the bug being fixed. Unfortunately, the only subscription option (that I'm aware of) is nosy. I think people might be willing to subscribe to be notified when a bug is closed (or possibly even any time its status changes), but not want to receive all the notifications you get when you're on the nosy list. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Sep 23 01:46:37 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 23 Sep 2013 09:46:37 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> Message-ID: <20130922234637.GK19939@ando> On Sun, Sep 22, 2013 at 12:37:52PM -0400, Terry Reedy wrote: > On 9/22/2013 10:22 AM, Nick Coghlan wrote: > > >The __getitem__ fallback is a backwards > >compatibility hack, not part of the formal definition of an iterable. > > When I suggested that, by suggesting that the fallback *perhaps* could > be called 'semi-deprecated, but kept for back compatibility' in the > glossary entry, Raymond screamed at me and accused me of trying to > change the language. He considers it an intended language feature that > one can write a sequence class and not bother with __iter__. I guess we > do not all agree ;-). Raymond did not "scream", he wrote *one* word in uppercase for emphasis. I quote: It is NOT deprecated. People use and rely on this behavior. It is a guaranteed behavior. Please don't use the glossary as a place to introduce changes to the language. I agree, and I disagree with Nick's characterization of the sequence protocol as a "backwards-compatibility hack". It is an elegant protocol for implementing iteration of sequences, an old and venerable one that predates iterators, and just as much of Python's defined iterable behaviour as the business with calling next with no argument until it raises StopIteration. If it were considered *merely* for backward compatibility with Python 1.5 code, there was plenty of opportunity to drop it when Python 3 came out. The sequence protocol allows one to write a lazily generated, potentially infinite sequence that still allows random access to items. Here's a toy example: py> class Squares: ... def __getitem__(self, index): ... return index**2 ... py> for sq in Squares(): ... if sq > 9: break ... print(sq) ... 0 1 4 9 Because it's infinite, there's no value that __len__ can return, and no need for a __len__. Because it supports random access to items, writing this as an iterator with __next__ is inappropriate. Writing *both* is unnecessary, and complicates the class for no benefit. As written, Squares is naturally thread-safe -- two threads can iterate over the same Squares object without interfering. -- Steven From steve at pearwood.info Mon Sep 23 02:12:39 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 23 Sep 2013 10:12:39 +1000 Subject: [Python-ideas] +1 button/counter for bugs.python.org In-Reply-To: References: Message-ID: <20130923001239.GN19939@ando> On Sun, Sep 22, 2013 at 05:52:58PM +0200, Tshepang Lekhonkhobe wrote: > On Sun, Sep 22, 2013 at 1:21 PM, anatoly techtonik wrote: > > Does anybody think it is a good idea to personally approve good issues and > > messages on bugs.python.org? > > > > If yes, should it be a Google's +1 (easier to add), or a pythonic solution > > for Roundup? > > Is it not enough that one can subscribe to the bug? It's very easy > (click the '+' button, then hit subscribe). That way, one can also > keep track of where the conversation is going, instead of a mere > vote-n-forget. Exactly. I think that masses of +1 votes from people who care so little about an issue that they can't be bothered to add themselves to the Nosy list is next to worthless. -- Steven From mistersheik at gmail.com Mon Sep 23 02:24:05 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 22 Sep 2013 20:24:05 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130922234637.GK19939@ando> References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> Message-ID: Why not just add one line? def __iter__(self): return (self.__getitem__(i) for i in itertools.count()) On Sun, Sep 22, 2013 at 7:46 PM, Steven D'Aprano wrote: > On Sun, Sep 22, 2013 at 12:37:52PM -0400, Terry Reedy wrote: > > On 9/22/2013 10:22 AM, Nick Coghlan wrote: > > > > >The __getitem__ fallback is a backwards > > >compatibility hack, not part of the formal definition of an iterable. > > > > When I suggested that, by suggesting that the fallback *perhaps* could > > be called 'semi-deprecated, but kept for back compatibility' in the > > glossary entry, Raymond screamed at me and accused me of trying to > > change the language. He considers it an intended language feature that > > one can write a sequence class and not bother with __iter__. I guess we > > do not all agree ;-). > > Raymond did not "scream", he wrote *one* word in uppercase for emphasis. > I quote: > > It is NOT deprecated. People use and rely on this behavior. It is > a guaranteed behavior. Please don't use the glossary as a place to > introduce changes to the language. > > > I agree, and I disagree with Nick's characterization of the sequence > protocol as a "backwards-compatibility hack". It is an elegant protocol > for implementing iteration of sequences, an old and venerable one that > predates iterators, and just as much of Python's defined iterable > behaviour as the business with calling next with no argument until it > raises StopIteration. If it were considered *merely* for backward > compatibility with Python 1.5 code, there was plenty of opportunity to > drop it when Python 3 came out. > > The sequence protocol allows one to write a lazily generated, > potentially infinite sequence that still allows random access to items. > Here's a toy example: > > > py> class Squares: > ... def __getitem__(self, index): > ... return index**2 > ... > py> for sq in Squares(): > ... if sq > 9: break > ... print(sq) > ... > 0 > 1 > 4 > 9 > > > Because it's infinite, there's no value that __len__ can return, and no > need for a __len__. Because it supports random access to items, writing > this as an iterator with __next__ is inappropriate. Writing *both* is > unnecessary, and complicates the class for no benefit. As written, > Squares is naturally thread-safe -- two threads can iterate over the > same Squares object without interfering. > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Sep 23 01:55:49 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 22 Sep 2013 16:55:49 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130922234637.GK19939@ando> References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> Message-ID: <523F8385.9040903@stoneleaf.us> On 09/22/2013 04:46 PM, Steven D'Aprano wrote: > > The sequence protocol allows one to write a lazily generated, > potentially infinite sequence that still allows random access to items. > Here's a toy example: > > > py> class Squares: > ... def __getitem__(self, index): > ... return index**2 > ... > py> for sq in Squares(): > ... if sq > 9: break > ... print(sq) > ... > 0 > 1 > 4 > 9 > > > Because it's infinite, there's no value that __len__ can return, and no > need for a __len__. Because it supports random access to items, writing > this as an iterator with __next__ is inappropriate. Writing *both* is > unnecessary, and complicates the class for no benefit. As written, > Squares is naturally thread-safe -- two threads can iterate over the > same Squares object without interfering. Nice example. :) -- ~Ethan~ From ethan at stoneleaf.us Mon Sep 23 02:48:52 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 22 Sep 2013 17:48:52 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> Message-ID: <523F8FF4.8070505@stoneleaf.us> On 09/22/2013 05:24 PM, Neil Girdhar wrote: > Why not just add one line? > > def __iter__(self): return (self.__getitem__(i) for i in itertools.count()) Why should he? Python treats his class just fine the way it is. -- ~Ethan~ From ethan at stoneleaf.us Mon Sep 23 03:41:53 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 22 Sep 2013 18:41:53 -0700 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <575f4071-1b5c-4a16-b36c-b5f925cdd2f7@googlegroups.com> <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> Message-ID: <523F9C61.2020506@stoneleaf.us> On 09/22/2013 12:04 PM, Neil Girdhar wrote: > I'm with you on this. > > If you want an Iterable and you wrote __getitem__, then it's not too much to ask that you either write a trivial __iter__: If you want to be in full compliance, sure. But one of the nice things about Python is you aren't forced to write more than you need. If I have objects that I want to have be equal to each other I can just write __eq__ -- I don't have to write __lt__, __gt__, __le__, nor __ge__. And if I want to not equal to be the opposite of equal (as opposed to something weird) I don't even need to write __ne__ any more. > or you write __len__ and inherit from collections.Sequence. I don't like inheriting from any of the abc's (maybe I just haven't written a large enough framework yet). And I'm not writing __len__ unless I plan on supporting len(). > We should deprecate the sequence protocol. No, we shouldn't. -- ~Ethan~ From anikom15 at gmail.com Mon Sep 23 05:55:58 2013 From: anikom15 at gmail.com (=?iso-8859-1?Q?Westley_Mart=EDnez?=) Date: Sun, 22 Sep 2013 20:55:58 -0700 Subject: [Python-ideas] +1 button/counter for bugs.python.org In-Reply-To: References: <20130922140347.2c54318d@fsol> Message-ID: <001801ceb810$d0e63b60$72b2b220$@gmail.com> > -----Original Message----- > From: Python-ideas [mailto:python-ideas-bounces+anikom15=gmail.com at python.org] > On Behalf Of Brian Curtin > Sent: Sunday, September 22, 2013 3:58 PM > To: Antoine Pitrou > Cc: python-ideas > Subject: Re: [Python-ideas] +1 button/counter for bugs.python.org > > On Sun, Sep 22, 2013 at 7:03 AM, Antoine Pitrou wrote: > > On Sun, 22 Sep 2013 14:21:07 +0300 > > anatoly techtonik > > wrote: > >> Does anybody think it is a good idea to personally approve good issues and > >> messages on bugs.python.org? > > > > Bug voting would be ok (not for "good issues", but "issues people care > > about"), but -1 on voting on messages. This isn't a popularity contest. > > Adding a +1 on issues seems fine, but I doubt it'll change anything. I > think, for the most part, we're pretty aware of issues people care > about that need to be fixed. We'll probably need to document the > feature to set expectations for what those votes actually mean, which > is probably close to nothing. > > I can mostly just see this being abused. "Remove the GIL" will get > submitted, then posted to reddit, then we'll have 5,000 votes to > remove the GIL and zero attempts to do it. +1 I don't think this can work without having some sort of karma system which does not need to happen. Python is not a democracy. It's a tyrannical dictatorship. From ncoghlan at gmail.com Mon Sep 23 06:58:08 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 23 Sep 2013 14:58:08 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130922234637.GK19939@ando> References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> Message-ID: On 23 Sep 2013 09:47, "Steven D'Aprano" wrote: > > On Sun, Sep 22, 2013 at 12:37:52PM -0400, Terry Reedy wrote: > > On 9/22/2013 10:22 AM, Nick Coghlan wrote: > > > > >The __getitem__ fallback is a backwards > > >compatibility hack, not part of the formal definition of an iterable. > > > > When I suggested that, by suggesting that the fallback *perhaps* could > > be called 'semi-deprecated, but kept for back compatibility' in the > > glossary entry, Raymond screamed at me and accused me of trying to > > change the language. He considers it an intended language feature that > > one can write a sequence class and not bother with __iter__. I guess we > > do not all agree ;-). > > Raymond did not "scream", he wrote *one* word in uppercase for emphasis. > I quote: > > It is NOT deprecated. People use and rely on this behavior. It is > a guaranteed behavior. Please don't use the glossary as a place to > introduce changes to the language. > > > I agree, and I disagree with Nick's characterization of the sequence > protocol as a "backwards-compatibility hack". It is an elegant protocol > for implementing iteration of sequences, an old and venerable one that > predates iterators, and just as much of Python's defined iterable > behaviour as the business with calling next with no argument until it > raises StopIteration. If it were considered *merely* for backward > compatibility with Python 1.5 code, there was plenty of opportunity to > drop it when Python 3 came out. > > The sequence protocol allows one to write a lazily generated, > potentially infinite sequence that still allows random access to items. > Here's a toy example: > > > py> class Squares: > ... def __getitem__(self, index): > ... return index**2 > ... > py> for sq in Squares(): > ... if sq > 9: break > ... print(sq) > ... > 0 > 1 > 4 > 9 > > > Because it's infinite, there's no value that __len__ can return, and no > need for a __len__. Because it supports random access to items, writing > this as an iterator with __next__ is inappropriate. Writing *both* is > unnecessary, and complicates the class for no benefit. As written, > Squares is naturally thread-safe -- two threads can iterate over the > same Squares object without interfering. And PEP 3119 means you have to decorate it with "@Iterable.register" for Python to *formally* consider it an iterable (or a third party can do the registration later). Merely defining __getitem__ is considered insufficient, since it is possible to define that *without* intending to create an iterable. Cheers, Nick. > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Sep 23 10:04:10 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 23 Sep 2013 17:04:10 +0900 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130922234637.GK19939@ando> References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> Message-ID: <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> Executive summary: The ability to create a quick iterable with just a simple __getitem__ is cool and not a "hack" (ie, no need whatsoever to deprecate it), but it is clearly a "consenting adults" construction (which includes "knowing where your children are at 10pm"). Steven D'Aprano writes: > I agree, and I disagree with Nick's characterization of the > sequence protocol as a "backwards-compatibility hack". It is an > elegant protocol Gotta disagree with you there (except I agree there's no need for a word like "hack"). Because __getitem__ is polymorphic (at the abstract level of duck-typing), this protocol is ugly. The "must accept 0" clause is a wart. > The sequence protocol allows one to write a lazily generated, > potentially infinite sequence that still allows random access to items. Sure, but it's not fully general. One may not *want* to write __next__ using __getitem__. A somewhat pathological example is the case of Goedel numbering of syntactically correct programs. programs.__getitem__ can be implemented directly by arithmetic, while programs.__next__ is best implemented by "unrolling" the grammar. Of course it makes sense to use an already written __getitem__ to implement __next__ when the numerical indicies provide a semantically useful order. But that's already done by the Sequence ABC: class Squares(Sequence): # implies mixin Iterable def __getitem(self, n): return n*n # __iter__ is provided as a mixin method using __getitem__ # by Iterable The problem is that Sequence requires a __len__ method. OK, so # put this in your toolbox class UndefinedLengthError(TypeError): pass class InfiniteSequence(Sequence): def __len__(self): raise UndefinedLengthError # in programs from toolbox import InfiniteSequence class Squares(InfiniteSequence): def __getitem__(self, i): return i*i > Because it's infinite, there's no value that __len__ can return, > and no need for a __len__. Well, it *could* return an infinite value or None, but list() isn't prepared for that. list() isn't even prepared for class Squares(object): def __init__(self, n): self.listsize = n def __getitem__(self, i): return i*i def __len__(self): return self.listsize (It doesn't return in a sane amount of time. I guess it goes ahead and attempts to construct an infinite list with l = [] for x in squares: l.append(x) Perhaps it's a shame it doesn't detect that there's a __len__ and use it to truncate the sequence, but most of the time it would just be overhead, I guess.) A lot of other functions are also going to be upset when they get a Squares object. This discussion is relevant because these are the kinds of things that bothered the OP. > Because it supports random access to items, writing this as an > iterator with __next__ is inappropriate. Writing *both* is > unnecessary, Incorrect, as written. In order to iterate over a sequence (small "s"), "somebody" has to write __next__. It's just that the function is generic, already written, and the compiler automatically binds it (actually, a closure using it) to the __next__ attribute of the automatically created iterator. This makes it unnecessary for the application programmer to write it. That is indeed elegant. > and complicates the class for no benefit. As written, Squares is > naturally thread-safe -- two threads can iterate over the same > Squares object without interfering. The obvious way of writing this as a generator would also be naturally thread-safe: class Squares(object): def __iter__(self): n = 0 while True: yield n*n n = n + 1 AFAICS this is faster (less function-call overhead). In this application it doesn't matter, but it could. And anything where a bit of state is useful (eg, the Fibonacci sequence) would be a lot faster with a hand-written __iter__. From solipsis at pitrou.net Mon Sep 23 10:14:51 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 23 Sep 2013 10:14:51 +0200 Subject: [Python-ideas] +1 button/counter for bugs.python.org References: <20130923001239.GN19939@ando> Message-ID: <20130923101451.691dfa7b@pitrou.net> Le Mon, 23 Sep 2013 10:12:39 +1000, Steven D'Aprano a ?crit : > On Sun, Sep 22, 2013 at 05:52:58PM +0200, Tshepang Lekhonkhobe wrote: > > On Sun, Sep 22, 2013 at 1:21 PM, anatoly techtonik > > wrote: > > > Does anybody think it is a good idea to personally approve good > > > issues and messages on bugs.python.org? > > > > > > If yes, should it be a Google's +1 (easier to add), or a pythonic > > > solution for Roundup? > > > > Is it not enough that one can subscribe to the bug? It's very easy > > (click the '+' button, then hit subscribe). That way, one can also > > keep track of where the conversation is going, instead of a mere > > vote-n-forget. > > Exactly. > > I think that masses of +1 votes from people who care so little about > an issue that they can't be bothered to add themselves to the Nosy > list is next to worthless. I don't know about you, but I don't add myself to the Nosy list of every bug that irks me on third-party software. There's no reason to subscribe to an issue's messages when you are a mere end-user. That doesn't mean the bug isn't affecting you. Regards Antoine. From ncoghlan at gmail.com Mon Sep 23 10:44:12 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 23 Sep 2013 18:44:12 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 23 September 2013 18:04, Stephen J. Turnbull wrote: > Executive summary: > > The ability to create a quick iterable with just a simple __getitem__ > is cool and not a "hack" (ie, no need whatsoever to deprecate it), but > it is clearly a "consenting adults" construction (which includes > "knowing where your children are at 10pm"). > > Steven D'Aprano writes: > > > I agree, and I disagree with Nick's characterization of the > > sequence protocol as a "backwards-compatibility hack". It is an > > elegant protocol > > Gotta disagree with you there (except I agree there's no need for a > word like "hack"). Because __getitem__ is polymorphic (at the > abstract level of duck-typing), this protocol is ugly. The "must > accept 0" clause is a wart. I think others object to the word "hack" more than I do (or give it additional implications like "in danger of being deprecated"). To me it's just a shorthand for saying "this is a case where practicality beat purity". Just because something is a hack doesn't mean it isn't useful and isn't a good idea. I consider functools.wraps to be a hack that managed to preserve introspectability of most decorated functions with minimal development effort. runpy and the -m switch took quite a while to evolve into something that wasn't a hack (although they still have some hacky parts due to limitations of the import protocol). The code that makes objects that override __eq__ without overriding __hash__ non-hashable (and the associated "__hash__ = None") trick is a hack. Python 3's new super is incredibly nice and easy to use, but there's also a lot of hackery lurking behind it. The fact Python 3 lets you create ranges you can't directly take the length of is a bit of a hack, too (because of the pain involved in defining an alternative __len__ protocol that didn't funnel everything through an ssize_t value): >>> x = range(10**100) >>> len(x) Traceback (most recent call last): File "", line 1, in OverflowError: Python int too large to convert to C ssize_t >>> (x.stop - x.start) // x.step 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 To create a properly defined iterable in modern Python, you must either implement __iter__ or implement an iter() compatible __getitem__ and explicitly register with Iterable (to indicate that your __getitem__ *is* compatible with the fallback protocol in iter()). Steven's right that I left out that second alternative when stating what it takes for an item to be considered an iterable, but I still consider the __getitem__ fallback to be just a neat backwards compatibility hack for sequences that were defined before the iterator protocol existed and before the Iterable ABC provided a way to explicitly declare that your __getitem__ implementation was compatible with the sequence-iterator protocol. I was also wrong about iter() checking for __len__ - that's part of the sequence API fallback in reversed(), rather than the one in iter(): >>> class InfiniteIter: ... def __getitem__(self, idx): ... return idx ... >>> reversed(InfiniteIter()) Traceback (most recent call last): File "", line 1, in TypeError: object of type 'InfiniteIter' has no len() Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From oscar.j.benjamin at gmail.com Mon Sep 23 10:55:19 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 23 Sep 2013 09:55:19 +0100 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 23 September 2013 09:44, Nick Coghlan wrote: > > The fact Python 3 lets you create ranges you can't directly take the > length of is a bit of a hack, too (because of the pain involved in > defining an alternative __len__ protocol that didn't funnel everything > through an ssize_t value): > >>>> x = range(10**100) >>>> len(x) > Traceback (most recent call last): > File "", line 1, in > OverflowError: Python int too large to convert to C ssize_t >>>> (x.stop - x.start) // x.step > 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 I wouldn't call that a "hack". It's clearly a bug. Oscar From ncoghlan at gmail.com Mon Sep 23 12:43:22 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 23 Sep 2013 20:43:22 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 23 September 2013 18:55, Oscar Benjamin wrote: > On 23 September 2013 09:44, Nick Coghlan wrote: >> >> The fact Python 3 lets you create ranges you can't directly take the >> length of is a bit of a hack, too (because of the pain involved in >> defining an alternative __len__ protocol that didn't funnel everything >> through an ssize_t value): >> >>>>> x = range(10**100) >>>>> len(x) >> Traceback (most recent call last): >> File "", line 1, in >> OverflowError: Python int too large to convert to C ssize_t >>>>> (x.stop - x.start) // x.step >> 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > > I wouldn't call that a "hack". It's clearly a bug. The hack is that we were able to make those large ranges possible *without* fixing the limitation that len() (or, more accurately, the CPython tp_len slot) only supports 64-bit containers. Solving the latter is a *much* harder problem that would require a PEP to add a new type slot, and it's hard to justify doing all that work for such a niche use case, especially when the workaround is relatively simple. If we'd taken the purist approach, then the result would more likely have been that ranges would have remained limited to lengths that fit in 64 bits rather than that the 64-bit limitation would have been removed. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From oscar.j.benjamin at gmail.com Mon Sep 23 12:53:44 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 23 Sep 2013 11:53:44 +0100 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 23 September 2013 11:43, Nick Coghlan wrote: > On 23 September 2013 18:55, Oscar Benjamin wrote: >> On 23 September 2013 09:44, Nick Coghlan wrote: >>> >>> The fact Python 3 lets you create ranges you can't directly take the >>> length of is a bit of a hack, too (because of the pain involved in >>> defining an alternative __len__ protocol that didn't funnel everything >>> through an ssize_t value): >>> >>>>>> x = range(10**100) >>>>>> len(x) >>> Traceback (most recent call last): >>> File "", line 1, in >>> OverflowError: Python int too large to convert to C ssize_t >>>>>> (x.stop - x.start) // x.step >>> 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 >> >> I wouldn't call that a "hack". It's clearly a bug. > > The hack is that we were able to make those large ranges possible > *without* fixing the limitation that len() (or, more accurately, the > CPython tp_len slot) only supports 64-bit containers. Solving the > latter is a *much* harder problem that would require a PEP to add a > new type slot, and it's hard to justify doing all that work for such a > niche use case, especially when the workaround is relatively simple. > > If we'd taken the purist approach, then the result would more likely > have been that ranges would have remained limited to lengths that fit > in 64 bits rather than that the 64-bit limitation would have been > removed. It may not be worth fixing but I still consider it a bug. It also doesn't work for Python classes: $ python3 Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> class myrange: ... def __len__(self): return 10 ** 1000 ... >>> len(myrange()) Traceback (most recent call last): File "", line 1, in OverflowError: cannot fit 'int' into an index-sized integer Oscar From ncoghlan at gmail.com Mon Sep 23 12:57:19 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 23 Sep 2013 20:57:19 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130919121828.GK19939@ando> <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 23 September 2013 20:53, Oscar Benjamin wrote: > It may not be worth fixing but I still consider it a bug. len() being limited to 64-bit values is indeed a bug in CPython. That's not what I was citing as an example of what I consider a neat hack, though - the neat hack is the fact ranges that don't fit in 64-bits are still mostly supported *despite* that bug. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Mon Sep 23 16:23:37 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 24 Sep 2013 00:23:37 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20130923142336.GA7989@ando> On Mon, Sep 23, 2013 at 05:04:10PM +0900, Stephen J. Turnbull wrote: > Executive summary: > > The ability to create a quick iterable with just a simple __getitem__ > is cool and not a "hack" (ie, no need whatsoever to deprecate it), but > it is clearly a "consenting adults" construction (which includes > "knowing where your children are at 10pm"). When I first raised the issue that some iterables are not recognised by collections.Iterable as iterable, I asked to be convinced that it is not a bug. Now I'm convinced. 1) Objects which inherit from the Iterable ABC are not merely iterables in the sense of "can be iterated over", but also iterables in the ABC sense. Obviously. These can be considered "official" iterables, or perhaps Iterables with a capital I. 2) Objects with a __getitem__ method that obey the sequence protocol are also iterable in the sense of "can be iterated over", but if they don't inherit from Iterable they don't pass the ABC isinstance test. Since "objects with a __getitem__ method that can be iterated over but that don't inherit from collections.Iterable" is a bit of a mouthful, for brevity I'm going to call them "de facto iterables", at the risk of being told off for inventing my own terminology *wink* 3) While it may appear strange to have something that can be iterated over not be recognised as an iterable, this is not very different from what can happen with duck-typing in general. We might write a class that duck-types as (say) a string, and have it not be recognised as such by isinstance(obj, string). Such is life. 4) If you want your de facto iterable to pass isinstance(obj, Iterable) tests, then you have to register it to make it official. [...] > This discussion is relevant because these are the kinds of things that > bothered the OP. Yes, we've certainly covered a lot of ground from the question of "Reiterable". -- Steven From mistersheik at gmail.com Tue Sep 24 02:19:40 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 23 Sep 2013 20:19:40 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130923142336.GA7989@ando> References: <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> <20130923142336.GA7989@ando> Message-ID: If infinite sequences are so common, it might be better to add a collections.abc for them. Then, besides a default __iter__ being generated, you could automatically generate advanced slicing into the original infinite sequence that returns a Sequence. This slice would then support the automatically-generated .index method, etc. Also, the big advantage to inheriting from an abc rather than defining __getitem__and counting on Python silently allowing __iter__ to work is that the former is an explicit declaration of intent. The latter counts on people knowing a weird feature of python. Best Neil On Mon, Sep 23, 2013 at 10:23 AM, Steven D'Aprano wrote: > On Mon, Sep 23, 2013 at 05:04:10PM +0900, Stephen J. Turnbull wrote: > > Executive summary: > > > > The ability to create a quick iterable with just a simple __getitem__ > > is cool and not a "hack" (ie, no need whatsoever to deprecate it), but > > it is clearly a "consenting adults" construction (which includes > > "knowing where your children are at 10pm"). > > When I first raised the issue that some iterables are not recognised by > collections.Iterable as iterable, I asked to be convinced that it is not > a bug. Now I'm convinced. > > > 1) Objects which inherit from the Iterable ABC are not merely iterables > in the sense of "can be iterated over", but also iterables in the ABC > sense. Obviously. These can be considered "official" iterables, or > perhaps Iterables with a capital I. > > 2) Objects with a __getitem__ method that obey the sequence protocol are > also iterable in the sense of "can be iterated over", but if they don't > inherit from Iterable they don't pass the ABC isinstance test. > > Since "objects with a __getitem__ method that can be iterated over but > that don't inherit from collections.Iterable" is a bit of a mouthful, > for brevity I'm going to call them "de facto iterables", at the risk of > being told off for inventing my own terminology *wink* > > 3) While it may appear strange to have something that can be iterated > over not be recognised as an iterable, this is not very different from > what can happen with duck-typing in general. We might write a class that > duck-types as (say) a string, and have it not be recognised as such by > isinstance(obj, string). Such is life. > > 4) If you want your de facto iterable to pass isinstance(obj, Iterable) > tests, then you have to register it to make it official. > > > [...] > > This discussion is relevant because these are the kinds of things that > > bothered the OP. > > Yes, we've certainly covered a lot of ground from the question of > "Reiterable". > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/OumiLGDwRWA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Sep 24 03:04:03 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 24 Sep 2013 10:04:03 +0900 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: References: <20130920094854.GO19939@ando> <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> <20130923142336.GA7989@ando> Message-ID: <87wqm7114s.fsf@uwakimon.sk.tsukuba.ac.jp> Neil Girdhar writes: > If infinite sequences are so common, it might be better to add a > collections.abc for them. I suspect this falls under the "not every 3-line function" clause, because it would really require a PEP to get right (changes to builtins like list and dict would be needed, IIUC). Just inherit from Sequence and add a __len__ which returns a unique object (probably could be None, actually), and check for that private protocol yourself. P.S. Please don't post via Google Groups. It results in spam for those of us who don't subscribe to the Google Group. From steve at pearwood.info Tue Sep 24 03:37:04 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 24 Sep 2013 11:37:04 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <87wqm7114s.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> <20130923142336.GA7989@ando> <87wqm7114s.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20130924013704.GE7989@ando> On Tue, Sep 24, 2013 at 10:04:03AM +0900, Stephen J. Turnbull wrote: > Neil Girdhar writes: > > > If infinite sequences are so common, it might be better to add a > > collections.abc for them. > > I suspect this falls under the "not every 3-line function" clause, > because it would really require a PEP to get right (changes to > builtins like list and dict would be needed, IIUC). A lot of work for virtually no benefit. Besides, who said that infinite iterators are common? > Just inherit from Sequence and add a __len__ which returns a unique > object (probably could be None, actually), and check for that private > protocol yourself. Alas, that doesn't work. py> class X: ... def __len__(self): ... return None ... py> x = X() py> len(x) Traceback (most recent call last): File "", line 1, in TypeError: 'NoneType' object cannot be interpreted as an integer If you care about infinite iterators, you can add your own "isinfinite" flag on them. Personally, I wouldn't bother. I just consider this a case for programming by contract: unless the function you are calling promises to be safe with infinite iterators, you should not use them. -- Steven From stephen at xemacs.org Tue Sep 24 05:42:18 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 24 Sep 2013 12:42:18 +0900 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130924013704.GE7989@ando> References: <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> <20130923142336.GA7989@ando> <87wqm7114s.fsf@uwakimon.sk.tsukuba.ac.jp> <20130924013704.GE7989@ando> Message-ID: <87vc1q28dh.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > A lot of work for virtually no benefit. Besides, who said that infinite > iterators are common? Infinite, no. Don't know the length until you're done, common. Length nondeterministic and in principle unbounded, common. > If you care about infinite iterators, you can add your own "isinfinite" > flag on them. Personally, I wouldn't bother. I just consider this a case > for programming by contract: unless the function you are calling > promises to be safe with infinite iterators, you should not use them. But finite iterators can cause problems too (eg, Nick's length=1google range -- even with an attosecond processor, that will take a while to exhaust :-). It would be nice if a program could choose its own value of "too big", and process "large finite" and "infinite" lists in the same way by taking "as much as possible". That's what frustrates the OP -- it's *hard* to write a function that makes a valid promise to be safe with all iterables. (Of course his definition of "safe" is much stricter, he requires "reiterable", not just "finite and of 'reasonable' size". But the principle is the same -- Python should make it easy to write safe functions. Of course Nick is right: "Although practicality beats purity.") From steve at pearwood.info Tue Sep 24 06:21:05 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 24 Sep 2013 14:21:05 +1000 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <87vc1q28dh.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> <20130923142336.GA7989@ando> <87wqm7114s.fsf@uwakimon.sk.tsukuba.ac.jp> <20130924013704.GE7989@ando> <87vc1q28dh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20130924042105.GH7989@ando> On Tue, Sep 24, 2013 at 12:42:18PM +0900, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > A lot of work for virtually no benefit. Besides, who said that infinite > > iterators are common? > > Infinite, no. Don't know the length until you're done, common. Which is why iterators don't require a length. Another way of spelling "length is unknown" is "object has no __len__". > Length nondeterministic and in principle unbounded, common. Maybe it's the mathematician in me speaking, but I don't think very many unbounded iterators are found outside of maths sequences. After all, even if you were to iterate over every atom in the universe, that would be bounded, and quite small compared to some of the numbers mathematicians deal with... :-) > > If you care about infinite iterators, you can add your own "isinfinite" > > flag on them. Personally, I wouldn't bother. I just consider this a case > > for programming by contract: unless the function you are calling > > promises to be safe with infinite iterators, you should not use them. > > But finite iterators can cause problems too (eg, Nick's length=1google > range -- even with an attosecond processor, that will take a while to > exhaust :-). It would be nice if a program could choose its own value > of "too big", and process "large finite" and "infinite" lists in the > same way by taking "as much as possible". You can already do that, although it requires a bit of manual work and preperation. Within Python, you can use itertools.islice, and take slices of everything to limit the number of items processed: process(islice(some_iterator, MAXIMUM)) Or you can use your operating system to manage resource limits, e.g. on Linux systems ulimit -v seems to work for me: py> def big(): ... while True: ... yield 1 ... py> list(big()) Traceback (most recent call last): File "", line 1, in MemoryError It would be nice if Python allowed you to tune memory consumption within Python itself, but failing that, that's what the OS is for. Mind you, I have repeatedly been bitten by accidently calling list() on a too large iterator. So I'm sympathetic to the view that this is a hard problem to solve and Python should help solve it. -- Steven From stephen at xemacs.org Tue Sep 24 07:59:03 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 24 Sep 2013 14:59:03 +0900 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130924042105.GH7989@ando> References: <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> <20130923142336.GA7989@ando> <87wqm7114s.fsf@uwakimon.sk.tsukuba.ac.jp> <20130924013704.GE7989@ando> <87vc1q28dh.fsf@uwakimon.sk.tsukuba.ac.jp> <20130924042105.GH7989@ando> Message-ID: <87siwu221k.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > Maybe it's the mathematician in me speaking, but I don't think very many > unbounded iterators are found outside of maths sequences. Nothing *in* math is ever found *outside* of math. Even the number 0 is unreliable in quantum physics. :-) In the real world, the use of unbounded math models is intended to remind us of the fact that the Tokyo Electric Power Company learned the hard way on March 11, 2011: if you put a practical bound on the size of tsunamis, soon enough one of size BOUND + 1 comes along. > > It would be nice if a program could choose its own value of "too > > big", and process "large finite" and "infinite" lists in the same > > way by taking "as much as possible". > > You can already do that, Of course we can. Are we not Men? No, we are HACKERS.[1] :-) > although it requires a bit of manual work and preperation. Aye, and there's the rub. But I'll grant it "never is often better than right now". And the actual gripe ("re-iteration") hasn't really been given a math definition yet, and is therefore much harder to diagnose syntactically. Footnotes: [1] In Nick's sense, of course. From ram.rachum at gmail.com Tue Sep 24 13:49:20 2013 From: ram.rachum at gmail.com (Ram Rachum) Date: Tue, 24 Sep 2013 04:49:20 -0700 (PDT) Subject: [Python-ideas] `OrderedDict.sort` Message-ID: What do you think about providing an `OrderedDict.sort` method? I've been using my own `OrderedDict` subclass that defines `sort` for years, and I always wondered why the stdlib one doesn't provide `sort`. I can write the patch if needed. Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Sep 24 14:13:15 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 24 Sep 2013 22:13:15 +1000 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: References: Message-ID: <20130924121315.GI7989@ando> On Tue, Sep 24, 2013 at 04:49:20AM -0700, Ram Rachum wrote: > What do you think about providing an `OrderedDict.sort` method? I've been > using my own `OrderedDict` subclass that defines `sort` for years, and I > always wondered why the stdlib one doesn't provide `sort`. > > I can write the patch if needed. I'm not entirely sure why anyone would need an OrderedDict sort method. Ordered Dicts store keys by insertion order. Sorting the keys goes against the purpose of an OrderedDict. I can understand a request for a SortedDict, that keeps the keys in sorted order as they are deleted or inserted. I personally don't have any need for one, since when I need the keys in sorted order I just sort them on the fly: for key in sorted(dict): ... but in any case, that's a separate issue from sorting an OrderedDict. Can you explain the use-case for why somebody might want to throw away the insertion order and replace with sorted order? -- Steven From ram.rachum at gmail.com Tue Sep 24 14:27:08 2013 From: ram.rachum at gmail.com (Ram Rachum) Date: Tue, 24 Sep 2013 05:27:08 -0700 (PDT) Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <20130924121315.GI7989@ando> References: <20130924121315.GI7989@ando> Message-ID: <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> I think that your mistake is defining OrderedDict as a dict sorting by insertion order. I see no reason to define it that way, and the fact that insertion order is the default is not a reason in my opinion. It's just a dict with an order, and I see no reason to not let users move elements about as they wish. Yes, I'm aware that the documentation defined OrderedDict your way too; I still think it's a pointless restriction. Regarding examples: I've used my `OrderedDict.sort` at least 10 times. Just today I've used it again. I was putting three items in an ordrered dict, with keys 'low', 'medium' and 'high'. I wanted to have them sorted as 'low', 'medium' and 'high' but the insertion order was different because of the algorithm that calculated them. (Also not all 3 items were guaranteed to exist, I wanted to sort those that existed.) So I created an OrderedDict of my subclass and called `.sort`. I'm sure you can think of a bunch more examples, if not I can give them to you. On Tuesday, September 24, 2013 3:13:15 PM UTC+3, Steven D'Aprano wrote: > > On Tue, Sep 24, 2013 at 04:49:20AM -0700, Ram Rachum wrote: > > What do you think about providing an `OrderedDict.sort` method? I've > been > > using my own `OrderedDict` subclass that defines `sort` for years, and I > > always wondered why the stdlib one doesn't provide `sort`. > > > > I can write the patch if needed. > > I'm not entirely sure why anyone would need an OrderedDict sort method. > Ordered Dicts store keys by insertion order. Sorting the keys goes > against the purpose of an OrderedDict. > > I can understand a request for a SortedDict, that keeps the keys in > sorted order as they are deleted or inserted. I personally don't have > any need for one, since when I need the keys in sorted order I just > sort them on the fly: > > for key in sorted(dict): > ... > > > but in any case, that's a separate issue from sorting an OrderedDict. > Can you explain the use-case for why somebody might want to throw away > the insertion order and replace with sorted order? > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Sep 24 14:29:55 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 24 Sep 2013 14:29:55 +0200 Subject: [Python-ideas] `OrderedDict.sort` References: <20130924121315.GI7989@ando> Message-ID: <20130924142955.33f17503@fsol> On Tue, 24 Sep 2013 22:13:15 +1000 Steven D'Aprano wrote: > On Tue, Sep 24, 2013 at 04:49:20AM -0700, Ram Rachum wrote: > > What do you think about providing an `OrderedDict.sort` method? I've been > > using my own `OrderedDict` subclass that defines `sort` for years, and I > > always wondered why the stdlib one doesn't provide `sort`. > > > > I can write the patch if needed. > > I'm not entirely sure why anyone would need an OrderedDict sort method. > Ordered Dicts store keys by insertion order. Sorting the keys goes > against the purpose of an OrderedDict. An OrderedDict is basically an associative container with a well-defined ordering. It's not only "insertion order", because you can use move_to_end() to reorder it piecewise. (at some point I also filed a feature request to rotate an OrderedDict: http://bugs.python.org/issue17100) However, sorting would be difficult to implement efficiently with the natural implementation of an OrderedDict, which uses linked lists. Basically, you're probably as good sorting the items separately and reinitializing the OrderedDict with them. Regards Antoine. From ram at rachum.com Tue Sep 24 14:50:08 2013 From: ram at rachum.com (Ram Rachum) Date: Tue, 24 Sep 2013 15:50:08 +0300 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <20130924142955.33f17503@fsol> References: <20130924121315.GI7989@ando> <20130924142955.33f17503@fsol> Message-ID: Antoine, my concern is not efficiency but convenience. Whoever has high efficiency requirements and wants to sort an ordered dict will have to find their own solution anyway, and the other 99% of people who just want to sort a 20-items-long ordered dict in their small web app could happily use `OrderedDict.sort`. And while we're on that subject, can we also add `OrderedDict.index`? On Tue, Sep 24, 2013 at 3:29 PM, Antoine Pitrou wrote: > On Tue, 24 Sep 2013 22:13:15 +1000 > Steven D'Aprano wrote: > > On Tue, Sep 24, 2013 at 04:49:20AM -0700, Ram Rachum wrote: > > > What do you think about providing an `OrderedDict.sort` method? I've > been > > > using my own `OrderedDict` subclass that defines `sort` for years, and > I > > > always wondered why the stdlib one doesn't provide `sort`. > > > > > > I can write the patch if needed. > > > > I'm not entirely sure why anyone would need an OrderedDict sort method. > > Ordered Dicts store keys by insertion order. Sorting the keys goes > > against the purpose of an OrderedDict. > > An OrderedDict is basically an associative container with a > well-defined ordering. It's not only "insertion order", because you can > use move_to_end() to reorder it piecewise. > (at some point I also filed a feature request to rotate an OrderedDict: > http://bugs.python.org/issue17100) > > However, sorting would be difficult to implement efficiently with the > natural implementation of an OrderedDict, which uses linked lists. > Basically, you're probably as good sorting the items separately and > reinitializing the OrderedDict with them. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/-RFTqV8_aS0/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian at python.org Tue Sep 24 15:50:31 2013 From: brian at python.org (Brian Curtin) Date: Tue, 24 Sep 2013 08:50:31 -0500 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> Message-ID: On Tue, Sep 24, 2013 at 7:27 AM, Ram Rachum wrote: > I think that your mistake is defining OrderedDict as a dict sorting by > insertion order. That's the definition straight out of the documentation. "An OrderedDict is a dict that remembers the order that keys were first inserted." From ram.rachum at gmail.com Tue Sep 24 15:52:34 2013 From: ram.rachum at gmail.com (Ram Rachum) Date: Tue, 24 Sep 2013 16:52:34 +0300 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> Message-ID: Then how do you explain `move_to_end`? Sent from my phone. On Sep 24, 2013 3:50 PM, "Brian Curtin" wrote: > On Tue, Sep 24, 2013 at 7:27 AM, Ram Rachum wrote: > > I think that your mistake is defining OrderedDict as a dict sorting by > > insertion order. > > That's the definition straight out of the documentation. "An > OrderedDict is a dict that remembers the order that keys were first > inserted." > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Sep 24 16:17:25 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 24 Sep 2013 07:17:25 -0700 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> Message-ID: <52419EF5.1070305@stoneleaf.us> On 09/24/2013 05:27 AM, Ram Rachum wrote: > > I think that your mistake is defining OrderedDict as a dict sorting by insertion order. I see no reason to define it > that way [...] How would you like it sorted? - ascending? you can write an algorithm for that - descending? you can write an algorithm for that - cyclic? you can write an algorithm for that - insertion order? you can *not* write an algorithm for that Insertion order is the one that you either remember, or is lost. As for a practical example, think of classes that want to know which order their attributes were created in -- OrderedDict to the rescue! :) -- ~Ethan~ From ram at rachum.com Tue Sep 24 17:23:58 2013 From: ram at rachum.com (Ram Rachum) Date: Tue, 24 Sep 2013 18:23:58 +0300 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <52419EF5.1070305@stoneleaf.us> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> Message-ID: Ethan, you've misunderstood my message and given a correct objection to an argument I did not make. I did not argue against ordering by insertion order on init. I agree with that decision. I disagree with defining the entire class as an insertion ordering class and refusing to allow users to reorder it as they wish after it's created. Sent from my phone. On Sep 24, 2013 4:42 PM, "Ethan Furman" wrote: > On 09/24/2013 05:27 AM, Ram Rachum wrote: > >> >> I think that your mistake is defining OrderedDict as a dict sorting by >> insertion order. I see no reason to define it >> that way [...] >> > > How would you like it sorted? > > - ascending? you can write an algorithm for that > > - descending? you can write an algorithm for that > > - cyclic? you can write an algorithm for that > > - insertion order? you can *not* write an algorithm for that > > Insertion order is the one that you either remember, or is lost. > > As for a practical example, think of classes that want to know which order > their attributes were created in -- OrderedDict to the rescue! :) > > -- > ~Ethan~ > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit https://groups.google.com/d/** > topic/python-ideas/-RFTqV8_**aS0/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe@**googlegroups.com > . > For more options, visit https://groups.google.com/**groups/opt_out > . > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Tue Sep 24 17:49:12 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 24 Sep 2013 17:49:12 +0200 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> Message-ID: <5241B478.5060605@egenix.com> On 24.09.2013 17:23, Ram Rachum wrote: > Ethan, you've misunderstood my message and given a correct objection to an > argument I did not make. > > I did not argue against ordering by insertion order on init. I agree with > that decision. I disagree with defining the entire class as an insertion > ordering class and refusing to allow users to reorder it as they wish after > it's created. The overhead introduced by completely recreating the internal data structure after the sort is just as high as creating a new OrderedDict, so I don't understand why you don't like about: from collections import OrderedDict o = OrderedDict(((3,4), (5,4), (1,2))) p = OrderedDict(sorted(o.iteritems())) This even allows you to keep the original insert order should you need it again. If you don't need this, you can just use: o = dict(((3,4), (5,4), (1,2))) p = OrderedDict(sorted(o.iteritems())) which is also faster than first creating an OrderedDict and then recreating it with sorted entries. Put those two lines into a function and you have: def SortedOrderedDict(*args, **kws): o = dict(*args, **kws) return OrderedDict(sorted(o.iteritems())) p = SortedOrderedDict(((3,4), (5,4), (1,2))) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 24 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-28: PyDDF Sprint ... 4 days to go 2013-10-14: PyCon DE 2013, Cologne, Germany ... 20 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ram at rachum.com Tue Sep 24 17:51:43 2013 From: ram at rachum.com (Ram Rachum) Date: Tue, 24 Sep 2013 18:51:43 +0300 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <5241B478.5060605@egenix.com> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> Message-ID: I get your point. It's a nice idea. But I think it's slightly less elegant to create another dict. So I think it's almost as good as having a `.sort` method, but not quite as nice. (By the way, couldn't you make the same argument about `list.sort`?) On Tue, Sep 24, 2013 at 6:49 PM, M.-A. Lemburg wrote: > On 24.09.2013 17:23, Ram Rachum wrote: > > Ethan, you've misunderstood my message and given a correct objection to > an > > argument I did not make. > > > > I did not argue against ordering by insertion order on init. I agree with > > that decision. I disagree with defining the entire class as an insertion > > ordering class and refusing to allow users to reorder it as they wish > after > > it's created. > > The overhead introduced by completely recreating the internal > data structure after the sort is just as high as creating a > new OrderedDict, so I don't understand why you don't like about: > > from collections import OrderedDict > o = OrderedDict(((3,4), (5,4), (1,2))) > p = OrderedDict(sorted(o.iteritems())) > > This even allows you to keep the original insert order should > you need it again. If you don't need this, you can just use: > > o = dict(((3,4), (5,4), (1,2))) > p = OrderedDict(sorted(o.iteritems())) > > which is also faster than first creating an OrderedDict and > then recreating it with sorted entries. > > Put those two lines into a function and you have: > > def SortedOrderedDict(*args, **kws): > o = dict(*args, **kws) > return OrderedDict(sorted(o.iteritems())) > > p = SortedOrderedDict(((3,4), (5,4), (1,2))) > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Sep 24 2013) > >>> Python Projects, Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 > 2013-09-28: PyDDF Sprint ... 4 days to go > 2013-10-14: PyCon DE 2013, Cologne, Germany ... 20 days to go > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Sep 24 17:36:17 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 24 Sep 2013 08:36:17 -0700 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> Message-ID: <5241B171.80501@stoneleaf.us> On 09/24/2013 08:23 AM, Ram Rachum wrote: > On Sep 24, 2013 4:42 PM, Ethan Furman wrote: >> On 09/24/2013 05:27 AM, Ram Rachum wrote: >>> >>> I think that your mistake is defining OrderedDict as a dict sorting by insertion order. I see no reason to define it >>> that way [...] >> >> Insertion order is the one that you either remember, or is lost. > > Ethan, you've misunderstood my message and given a correct objection to an argument I did not make. > > I did not argue against ordering by insertion order on init. I agree with that decision. I disagree with defining the > entire class as an insertion ordering class and refusing to allow users to reorder it as they wish after it's created. Two points: - What happens when a new element is added to the OrderedDict after the user sorts it? - If by 'init' you mean something like `d = OrderedDict(a=1, b=2, c=3)` -- this does not preserve an insertion order as the keywords end up in a regular, unsorted dict that is passed to OrderedDict.__init__ -- ~Ethan~ From ram at rachum.com Tue Sep 24 18:02:12 2013 From: ram at rachum.com (Ram Rachum) Date: Tue, 24 Sep 2013 19:02:12 +0300 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <5241B171.80501@stoneleaf.us> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B171.80501@stoneleaf.us> Message-ID: On Tue, Sep 24, 2013 at 6:36 PM, Ethan Furman wrote: > On 09/24/2013 08:23 AM, Ram Rachum wrote: > >> On Sep 24, 2013 4:42 PM, Ethan Furman wrote: >> >>> On 09/24/2013 05:27 AM, Ram Rachum wrote: >>> >>>> >>>> I think that your mistake is defining OrderedDict as a dict sorting by >>>> insertion order. I see no reason to define it >>>> that way [...] >>>> >>> >>> Insertion order is the one that you either remember, or is lost. >>> >> >> Ethan, you've misunderstood my message and given a correct objection to >> an argument I did not make. >> >> I did not argue against ordering by insertion order on init. I agree with >> that decision. I disagree with defining the >> entire class as an insertion ordering class and refusing to allow users >> to reorder it as they wish after it's created. >> > > Two points: > > - What happens when a new element is added to the OrderedDict after the > user sorts it? > The exact same thing that happens if the user does `.move_to_end` and then adds a new element, and the exact same thing that happens when a user does `list.sort` and adds a new element, and the exact same thing that happens when a user does `sorted(whatever)` and adds a new element. It just gets put in the end. > > - If by 'init' you mean something like `d = OrderedDict(a=1, b=2, c=3)` > -- > this does not preserve an insertion order as the keywords end up in a > regular, unsorted dict that is passed to OrderedDict.__init__ Does this relate to my proposal in any way? I don't see how. (I meant __init__, I was typing from a phone.) > > > -- > ~Ethan~ > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit https://groups.google.com/d/** > topic/python-ideas/-RFTqV8_**aS0/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe@**googlegroups.com > . > For more options, visit https://groups.google.com/**groups/opt_out > . > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Sep 24 18:02:33 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 25 Sep 2013 01:02:33 +0900 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> Message-ID: <87ppry1a3q.fsf@uwakimon.sk.tsukuba.ac.jp> Ram Rachum writes: > I disagree with defining the entire class as an insertion ordering > class and refusing There's no refusal. It's just not in the battery pack. > to allow users to reorder it as they wish after it's created. You can put your inefficient but useful implementation on PyPI. You can write a PEP in which you define the API. You can provide an efficient implementation suitable for the stdlib, or you can convince the gatekeepers that it doesn't need to be efficient. You can promise to maintain it for 5 years.[1] Why don't you? Four or five hackers do it every cycle (although sometimes it takes more than a cycle to actually get approval). Recent successes include Ethan and Steven, who are giving you the benefit of their experience. OTOH, the barrier for mere suggestions (even backed up by proof of concept implementations) these days is quite high. You need to convince somebody to do all of the above, which usually requires an argument that it's at least tricky to do right, and perhaps hard to do at all. Footnotes: [1] Or whatever the going rate is these days. From ericsnowcurrently at gmail.com Tue Sep 24 18:15:37 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 24 Sep 2013 10:15:37 -0600 Subject: [Python-ideas] Indicate if an iterable is ordered or not Message-ID: Iterables are not necessarily ordered (e.g. dict vs. OrderedDict). Sequences are but Sets aren't. I'm not aware of any good way currently to know if an arbitrary iterable is ordered. Without an explicit indicator of ordered-ness, you must know in advance for each specific type. One possible solution is an __isordered__ attribute (on the class), set to a boolean. The absence of the attribute would imply False. Such an attribute would be added to existing types: * collections.abc.Iterable (default: False) * list (True) * tuple (True) * set (False) * dict (False) * collections.OrderedDict (True) * ... Thoughts? -eric From ram at rachum.com Tue Sep 24 18:19:56 2013 From: ram at rachum.com (Ram Rachum) Date: Tue, 24 Sep 2013 19:19:56 +0300 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <87ppry1a3q.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <87ppry1a3q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Sep 24, 2013 at 7:02 PM, Stephen J. Turnbull wrote: > Ram Rachum writes: > > > I disagree with defining the entire class as an insertion ordering > > class and refusing > > There's no refusal. It's just not in the battery pack. > > > to allow users to reorder it as they wish after it's created. > > You can put your inefficient but useful implementation on PyPI. > You can write a PEP in which > you define the API. > You can provide an efficient implementation suitable for the stdlib, or > you can convince the gatekeepers that it doesn't need to be efficient. > You can promise to maintain it for 5 years.[1] > I can do an inefficient implementation and put it on PyPI. I don't see the need for writing a PEP for a simple method. ("Define the API"? Anything I'm missing beyond a call signature `def sort(self, key=None)`?) If people here are opposed to allowing an implementation of `OrderedDict.sort` in the stdlib, I don't see a reason to waste my time putting an implementation on PyPI. What's that implementation going to help if you won't allow it anyway? Here's a simple inefficient implementation you can use: def sort(self, key=None): ''' Sort the items according to their keys, changing the order in-place. The optional `key` argument, (not to be confused with the dictionary keys,) will be passed to the `sorted` function as a key function. ''' sorted_keys = sorted(self.keys(), key=key) for key_ in sorted_keys[1:]: self.move_to_end(key_) Regarding committing to maintain it for N years: Sorry, that's beyond what I'm willing to do. If that's a requirement for contributing a minor feature to Python, I'll have to withdraw my suggestion. > > Why don't you? Four or five hackers do it every cycle (although > sometimes it takes more than a cycle to actually get approval). > Recent successes include Ethan and Steven, who are giving you the > benefit of their experience. > > OTOH, the barrier for mere suggestions (even backed up by proof of > concept implementations) these days is quite high. You need to > convince somebody to do all of the above, which usually requires an > argument that it's at least tricky to do right, and perhaps hard to do > at all. > > Footnotes: > [1] Or whatever the going rate is these days. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/-RFTqV8_aS0/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Sep 24 18:22:49 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 24 Sep 2013 09:22:49 -0700 Subject: [Python-ideas] Indicate if an iterable is ordered or not In-Reply-To: References: Message-ID: What do you want to do with this knowledge? On Tue, Sep 24, 2013 at 9:15 AM, Eric Snow wrote: > Iterables are not necessarily ordered (e.g. dict vs. OrderedDict). > Sequences are but Sets aren't. I'm not aware of any good way > currently to know if an arbitrary iterable is ordered. Without an > explicit indicator of ordered-ness, you must know in advance for each > specific type. > > One possible solution is an __isordered__ attribute (on the class), > set to a boolean. The absence of the attribute would imply False. > > Such an attribute would be added to existing types: > > * collections.abc.Iterable (default: False) > * list (True) > * tuple (True) > * set (False) > * dict (False) > * collections.OrderedDict (True) > * ... > > Thoughts? > > -eric > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From abarnert at yahoo.com Tue Sep 24 18:27:08 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 24 Sep 2013 09:27:08 -0700 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> Message-ID: <35E8DF33-71BC-4831-8B3B-4A9695613A75@yahoo.com> On Sep 24, 2013, at 8:51, Ram Rachum wrote: > I get your point. It's a nice idea. But I think it's slightly less elegant to create another dict. So I think it's almost as good as having a `.sort` method, but not quite as nice. Honestly, I think having a sorted mapping in the stdlib would be even nicer in almost any situation where this might be nice. But, given that we don't have such a thing, and getting one into the stdlib is harder than it appears, maybe that's not an argument against your (obviously simpler) idea. Of course in most cases, you just want to iterate once in sorted order, and it's hard to beat this: for k, v in sorted(o.items()): > (By the way, couldn't you make the same argument about `list.sort`?) You could. Except that list.sort predates sorted. And it's faster and saves memory, which isn't true of your suggestion. I don't know if that would be enough to add it today, but it's more than enough to keep it around. > On Tue, Sep 24, 2013 at 6:49 PM, M.-A. Lemburg wrote: >> On 24.09.2013 17:23, Ram Rachum wrote: >> > Ethan, you've misunderstood my message and given a correct objection to an >> > argument I did not make. >> > >> > I did not argue against ordering by insertion order on init. I agree with >> > that decision. I disagree with defining the entire class as an insertion >> > ordering class and refusing to allow users to reorder it as they wish after >> > it's created. >> >> The overhead introduced by completely recreating the internal >> data structure after the sort is just as high as creating a >> new OrderedDict, so I don't understand why you don't like about: >> >> from collections import OrderedDict >> o = OrderedDict(((3,4), (5,4), (1,2))) >> p = OrderedDict(sorted(o.iteritems())) >> >> This even allows you to keep the original insert order should >> you need it again. If you don't need this, you can just use: >> >> o = dict(((3,4), (5,4), (1,2))) >> p = OrderedDict(sorted(o.iteritems())) >> >> which is also faster than first creating an OrderedDict and >> then recreating it with sorted entries. >> >> Put those two lines into a function and you have: >> >> def SortedOrderedDict(*args, **kws): >> o = dict(*args, **kws) >> return OrderedDict(sorted(o.iteritems())) >> >> p = SortedOrderedDict(((3,4), (5,4), (1,2))) >> >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Source (#1, Sep 24 2013) >> >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >> >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >> >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ >> ________________________________________________________________________ >> 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 >> 2013-09-28: PyDDF Sprint ... 4 days to go >> 2013-10-14: PyCon DE 2013, Cologne, Germany ... 20 days to go >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Tue Sep 24 18:33:15 2013 From: ram at rachum.com (Ram Rachum) Date: Tue, 24 Sep 2013 19:33:15 +0300 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <35E8DF33-71BC-4831-8B3B-4A9695613A75@yahoo.com> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> <35E8DF33-71BC-4831-8B3B-4A9695613A75@yahoo.com> Message-ID: On Tue, Sep 24, 2013 at 7:27 PM, Andrew Barnert wrote: > On Sep 24, 2013, at 8:51, Ram Rachum wrote: > > I get your point. It's a nice idea. But I think it's slightly less elegant > to create another dict. So I think it's almost as good as having a `.sort` > method, but not quite as nice. > > > Honestly, I think having a sorted mapping in the stdlib would be even > nicer in almost any situation where this might be nice. But, given that we > don't have such a thing, and getting one into the stdlib is harder than it > appears, maybe that's not an argument against your (obviously simpler) idea. > For the record, I think that having a SortedDict in the stdlib would be awesome. > > Of course in most cases, you just want to iterate once in sorted order, > and it's hard to beat this: > > for k, v in sorted(o.items()): > I think that in most of my cases it won't work. Either because I iterate in Django templates, or I iterate several times which would make this cumbersome and wasteful. > > (By the way, couldn't you make the same argument about `list.sort`?) > > > You could. Except that list.sort predates sorted. And it's faster and > saves memory, which isn't true of your suggestion. I don't know if that > would be enough to add it today, but it's more than enough to keep it > around. > > On Tue, Sep 24, 2013 at 6:49 PM, M.-A. Lemburg wrote: > >> On 24.09.2013 17:23, Ram Rachum wrote: >> > Ethan, you've misunderstood my message and given a correct objection to >> an >> > argument I did not make. >> > >> > I did not argue against ordering by insertion order on init. I agree >> with >> > that decision. I disagree with defining the entire class as an insertion >> > ordering class and refusing to allow users to reorder it as they wish >> after >> > it's created. >> >> The overhead introduced by completely recreating the internal >> data structure after the sort is just as high as creating a >> new OrderedDict, so I don't understand why you don't like about: >> >> from collections import OrderedDict >> o = OrderedDict(((3,4), (5,4), (1,2))) >> p = OrderedDict(sorted(o.iteritems())) >> >> This even allows you to keep the original insert order should >> you need it again. If you don't need this, you can just use: >> >> o = dict(((3,4), (5,4), (1,2))) >> p = OrderedDict(sorted(o.iteritems())) >> >> which is also faster than first creating an OrderedDict and >> then recreating it with sorted entries. >> >> Put those two lines into a function and you have: >> >> def SortedOrderedDict(*args, **kws): >> o = dict(*args, **kws) >> return OrderedDict(sorted(o.iteritems())) >> >> p = SortedOrderedDict(((3,4), (5,4), (1,2))) >> >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Source (#1, Sep 24 2013) >> >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >> >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >> >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ >> ________________________________________________________________________ >> 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 >> 2013-09-28 : PyDDF Sprint ... >> 4 days to go >> 2013-10-14: PyCon DE 2013, Cologne, Germany ... 20 days to go >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Sep 24 18:37:26 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 24 Sep 2013 09:37:26 -0700 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <87ppry1a3q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <24238F83-89EA-4746-819C-D821CA427284@yahoo.com> On Sep 24, 2013, at 9:19, Ram Rachum wrote: > If people here are opposed to allowing an implementation of `OrderedDict.sort` in the stdlib, I don't see a reason to waste my time putting an implementation on PyPI. What's that implementation going to help if you won't allow it anyway? Do you not see the benefit to ipython, numpy, requests, the various popular web frameworks, fancy collections like blist, tools like scrapy, etc. being a simple pip away? Why wouldn't the same be true for your module? A useful module on PyPI helps thousands of people who otherwise would have had to reproduce all the work themselves or settled for not having it. It also leads to de facto standard ways to do things, which makes it easier to communicate with devs on other projects. (Imagine trying to get help with "my custom multidimensional array class" or "a web scraper that I built from scratch" vs. numpy or scrapy.) Do you think your idea is so trivial that there really is no benefit in any of that? From ram at rachum.com Tue Sep 24 18:48:35 2013 From: ram at rachum.com (Ram Rachum) Date: Tue, 24 Sep 2013 19:48:35 +0300 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <24238F83-89EA-4746-819C-D821CA427284@yahoo.com> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <87ppry1a3q.fsf@uwakimon.sk.tsukuba.ac.jp> <24238F83-89EA-4746-819C-D821CA427284@yahoo.com> Message-ID: My code* is *on PyPI, just not isolated to the OrderedDict improvements. My OrderedDict improvements are here: http://pypi.python.org/pypi/python_toolbox This is a big package with all my stuff. On Tue, Sep 24, 2013 at 7:37 PM, Andrew Barnert wrote: > On Sep 24, 2013, at 9:19, Ram Rachum wrote: > > > If people here are opposed to allowing an implementation of > `OrderedDict.sort` in the stdlib, I don't see a reason to waste my time > putting an implementation on PyPI. What's that implementation going to help > if you won't allow it anyway? > > Do you not see the benefit to ipython, numpy, requests, the various > popular web frameworks, fancy collections like blist, tools like scrapy, > etc. being a simple pip away? Why wouldn't the same be true for your module? > > A useful module on PyPI helps thousands of people who otherwise would have > had to reproduce all the work themselves or settled for not having it. It > also leads to de facto standard ways to do things, which makes it easier to > communicate with devs on other projects. (Imagine trying to get help with > "my custom multidimensional array class" or "a web scraper that I built > from scratch" vs. numpy or scrapy.) > > Do you think your idea is so trivial that there really is no benefit in > any of that? > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/-RFTqV8_aS0/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Tue Sep 24 19:24:55 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 24 Sep 2013 13:24:55 -0400 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> <35E8DF33-71BC-4831-8B3B-4A9695613A75@yahoo.com> Message-ID: <1380043495.6261.25918509.61E22EA3@webmail.messagingengine.com> On Tue, Sep 24, 2013, at 12:33, Ram Rachum wrote: > For the record, I think that having a SortedDict in the stdlib would be > awesome. There are two issues with that. First of all, this demands that every element be orderable with every other element. Since not every element is going to be compared with every other element on insertion, it's easy to imagine a case where this won't be caught until it's sorted again later on. And this is ignoring the pathological behavior of floating-point NaN values, which already silently break list sorting. (Can someone explain to me how nan works as a dict key, by the way?) Secondly, a SortedDict (or SortedSet) implies that the sorting is used _instead of_ hashing, for lookup. This raises the question as to whether keys/elements should be required to be hashable. On the one hand, requiring them to be hashable gives you the implied guarantee of an immutable equality relationship, which is _likely_ to also imply (on orderable types) an immutable ordering, whereas there is nothing else that can be used that directly implies an immutable ordering. From solipsis at pitrou.net Tue Sep 24 19:36:59 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 24 Sep 2013 19:36:59 +0200 Subject: [Python-ideas] `OrderedDict.sort` References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> Message-ID: <20130924193659.63fca9fc@fsol> On Tue, 24 Sep 2013 18:51:43 +0300 Ram Rachum wrote: > I get your point. It's a nice idea. But I think it's slightly less elegant > to create another dict. So I think it's almost as good as having a `.sort` > method, but not quite as nice. > > (By the way, couldn't you make the same argument about `list.sort`?) list.sort() sorts the list in-place, it doesn't reallocate a new vector to replace the old one. (AFAIR anyway, but I trust Tim and Raymond here (or was it Tim, Tim, Raymond and Tim? :-)). Regards Antoine. From mal at egenix.com Tue Sep 24 20:10:36 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 24 Sep 2013 20:10:36 +0200 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> Message-ID: <5241D59C.9020009@egenix.com> On 24.09.2013 17:51, Ram Rachum wrote: > I get your point. It's a nice idea. But I think it's slightly less elegant > to create another dict. So I think it's almost as good as having a `.sort` > method, but not quite as nice. You can avoid the temp dict by doing some introspection of the arguments and using iterators instead. > (By the way, couldn't you make the same argument about `list.sort`?) The use case is different. With list.sort() you don't want to create a copy of the list, but instead have the list sort itself, since you're not interested in the original order. You'd only use an OrderedDict to begin with if you're interested in the insert order, otherwise you'd start out with a plain dict(). > On Tue, Sep 24, 2013 at 6:49 PM, M.-A. Lemburg wrote: > >> On 24.09.2013 17:23, Ram Rachum wrote: >>> Ethan, you've misunderstood my message and given a correct objection to >> an >>> argument I did not make. >>> >>> I did not argue against ordering by insertion order on init. I agree with >>> that decision. I disagree with defining the entire class as an insertion >>> ordering class and refusing to allow users to reorder it as they wish >> after >>> it's created. >> >> The overhead introduced by completely recreating the internal >> data structure after the sort is just as high as creating a >> new OrderedDict, so I don't understand why you don't like about: >> >> from collections import OrderedDict >> o = OrderedDict(((3,4), (5,4), (1,2))) >> p = OrderedDict(sorted(o.iteritems())) >> >> This even allows you to keep the original insert order should >> you need it again. If you don't need this, you can just use: >> >> o = dict(((3,4), (5,4), (1,2))) >> p = OrderedDict(sorted(o.iteritems())) >> >> which is also faster than first creating an OrderedDict and >> then recreating it with sorted entries. >> >> Put those two lines into a function and you have: >> >> def SortedOrderedDict(*args, **kws): >> o = dict(*args, **kws) >> return OrderedDict(sorted(o.iteritems())) >> >> p = SortedOrderedDict(((3,4), (5,4), (1,2))) >> >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Source (#1, Sep 24 2013) >>>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ >> ________________________________________________________________________ >> 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 >> 2013-09-28: PyDDF Sprint ... 4 days to go >> 2013-10-14: PyCon DE 2013, Cologne, Germany ... 20 days to go >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ >> > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 24 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-28: PyDDF Sprint ... 4 days to go 2013-10-14: PyCon DE 2013, Cologne, Germany ... 20 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From kim.grasman at gmail.com Tue Sep 24 20:41:32 2013 From: kim.grasman at gmail.com (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Tue, 24 Sep 2013 20:41:32 +0200 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: References: Message-ID: Hi all, On Sun, Aug 25, 2013 at 8:26 PM, Kim Gr?sman wrote: > Ping? > > Can I clarify something to move this forward? It seems like a good > idea to me, but I don't have the history of Py_DeleteFileW -- maybe > somebody tried this already? Is there a better place to look for opinions? I'm happy to see Python getting more link-aware on Windows, and I think this could help getting further in that direction. Thanks, - Kim From tjreedy at udel.edu Tue Sep 24 22:39:28 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 24 Sep 2013 16:39:28 -0400 Subject: [Python-ideas] Introduce collections.Reiterable In-Reply-To: <20130924042105.GH7989@ando> References: <20130922122149.GH19939@ando> <20130922234637.GK19939@ando> <87bo3k2ccl.fsf@uwakimon.sk.tsukuba.ac.jp> <20130923142336.GA7989@ando> <87wqm7114s.fsf@uwakimon.sk.tsukuba.ac.jp> <20130924013704.GE7989@ando> <87vc1q28dh.fsf@uwakimon.sk.tsukuba.ac.jp> <20130924042105.GH7989@ando> Message-ID: On 9/24/2013 12:21 AM, Steven D'Aprano wrote: > Maybe it's the mathematician in me speaking, but I don't think very many > unbounded iterators are found outside of maths sequences. Perhaps you are confusing 'actual infinity' or mathematics with the potential infinity of iterators. Unbound, or more exactly, potentially unbounded iterators are quite common. First, many source iterators based on external sources are or are potentially unbounded. For example, text-mode files are text line iterators. Files based on finite disk files are bounded, but others (based on keyboard, socket, or other input channels) may not be. Consider the following example (simplified, like all examples, for illustrative purposes). def source(prompt): "Yield user responses to prompt." while True: yield input(prompt)) # Even if 'quit' were recognized and turned into StopIteration, # it still might never happen. or def measures(read_instrument): "Yield values returned by read_instrument." while True: yield read_instrument() A queue can yield an unbounded sequence even if it is always finite and even it it has a maximum size, perhaps because the pool of potential queue members is finite. Second, many transform iterators are unbounded if the input iterable is unbounded. def transform(func, iterable): for item in iterable: try: yield func(item) except ValueError: pass for i in transform(int, source('Enter an integer: ')): # process unbounded stream of ints. Filter, map, and some itertools potentially produce infinite iterators. Itertools.islice turns infinite iterables finite. Itertools.cycle turns finite iterables infinite. At the highest level, interactive apps, including OSes, usually process indefinite streams of user-generated events. > After all, > even if you were to iterate over every atom in the universe, that would > be bounded, and quite small compared to some of the numbers > mathematicians deal with... :-) The atoms of the universe can be reused over and over again in the same or different combinations to keep the iteration going indefinitely. -- Terry Jan Reedy From timothy.c.delaney at gmail.com Tue Sep 24 22:42:19 2013 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Wed, 25 Sep 2013 06:42:19 +1000 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <5241D59C.9020009@egenix.com> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> <5241D59C.9020009@egenix.com> Message-ID: On 25 September 2013 04:10, M.-A. Lemburg wrote: > On 24.09.2013 17:51, Ram Rachum wrote: > > I get your point. It's a nice idea. But I think it's slightly less > elegant > > to create another dict. So I think it's almost as good as having a > `.sort` > > method, but not quite as nice. > > You can avoid the temp dict by doing some introspection of > the arguments and using iterators instead. > > > (By the way, couldn't you make the same argument about `list.sort`?) > > The use case is different. With list.sort() you don't want to create > a copy of the list, but instead have the list sort itself, since > you're not interested in the original order. > > You'd only use an OrderedDict to begin with if you're interested in > the insert order, otherwise you'd start out with a plain dict(). Not quite. As Ram showed, it's perfectly possible to sort an OrderedDict in-place, which you couldn't do with a normal dict. In which case you're looking at equivalent semantics as for a list (where items are just added using append) - using Ram's implementation above: >>> import collections >>> >>> class SortableOrderedDict(collections.OrderedDict): ... def sort(self, key=None): ... sorted_keys = sorted(self.keys(), key=key) ... for key_ in sorted_keys[1:]: ... self.move_to_end(key_) ... >>> x = [] >>> x.append('c') >>> x.append('b') >>> x.sort() >>> x.append('a') >>> >>> y = SortableOrderedDict() >>> y['c'] = 1 >>> y['b'] = 2 >>> y.sort() >>> y['a'] = 3 >>> >>> print(x) ['b', 'c', 'a'] >>> print(y) SortableOrderedDict([('b', 2), ('c', 1), ('a', 3)]) >>> print(x == list(y.keys())) True >>> FWIW Ram I think you should put the implementation up on PyPI. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Sep 24 23:16:40 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 24 Sep 2013 17:16:40 -0400 Subject: [Python-ideas] Indicate if an iterable is ordered or not In-Reply-To: References: Message-ID: On 9/24/2013 12:15 PM, Eric Snow wrote: > Iterables are not necessarily ordered (e.g. dict vs. OrderedDict). > Sequences are but Sets aren't. I'm not aware of any good way > currently to know if an arbitrary iterable is ordered. Without an > explicit indicator of ordered-ness, you must know in advance for each > specific type. > > One possible solution is an __isordered__ attribute (on the class), > set to a boolean. The absence of the attribute would imply False. > > Such an attribute would be added to existing types: > > * collections.abc.Iterable (default: False) > * list (True) > * tuple (True) > * set (False) > * dict (False) > * collections.OrderedDict (True) > * ... > > Thoughts? The iterator protocol is intentionally simple. It only requires an __iter__ method or a __next__ method with a standard __iter__ method. This makes iterables -- and generator functions that produce iterators -- easy to write. A generator instance may and may not produce items in an intented order, so a class attribute is not possible. The same is generally true of transform iterators, like map and filter instances, and most itertools classes. It is also not true that lists (and tuples) always have a significant order. list(set) has the artificial order of set iteration. Both are reiterable with the same order. Why would you call one True and the other False? In general, list(iterable) has as much order as the iterable. The __isordered__ attribute would have to be an instance attribute, properly propagated. How would you do that with generator functions? or generator expression? Anyone is free to privately extend the protocol for special purposes and restrict their universe to object that follow. Builtins can be extended, wrapped, or mapped, or their internal iterator classes mapped, to make them conform. The following helps with the last idea. >>> for cls in list, tuple, set, frozenset, dict: type(iter(cls())) -- Terry Jan Reedy From ncoghlan at gmail.com Wed Sep 25 00:41:00 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 25 Sep 2013 08:41:00 +1000 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> Message-ID: On 25 Sep 2013 01:52, "Ram Rachum" wrote: > > I get your point. It's a nice idea. But I think it's slightly less elegant to create another dict. So I think it's almost as good as having a `.sort` method, but not quite as nice. > > (By the way, couldn't you make the same argument about `list.sort`?) No, because list.sort() both predates the sorted builtin and is optimised to be blazingly fast with reasonable memory overhead by directly interacting with internal details of the list object. It's actually the pre-existing list sorting machinery that powers the builtin. The situation is different now: the sorted builtin provides a generic API to get a sorted version of any iterable. This means a proposed in-place sort() method on a container has to demonstrate a few things to overcome the "default deny" that is applied to any proposal to add more methods to an object interface: - there are common use cases that can't be handled by sorting the input when creating the container in the first place - there are significant speed gains from an in-place sorting operation - there are significant memory gains from an in-place sorting operation Now, in the case of OrderedDict it *may* be possible to back up one or more of those assertions (especially the latter two if you talk to Eric Snow about an in-place sort method for his C implementation of the API). However, in the absence of such evidence, the default reaction will always be to avoid expanding APIs with functionality that can be provided by applying external algorithms to the existing API. Cheers, Nick. > > > On Tue, Sep 24, 2013 at 6:49 PM, M.-A. Lemburg wrote: >> >> On 24.09.2013 17:23, Ram Rachum wrote: >> > Ethan, you've misunderstood my message and given a correct objection to an >> > argument I did not make. >> > >> > I did not argue against ordering by insertion order on init. I agree with >> > that decision. I disagree with defining the entire class as an insertion >> > ordering class and refusing to allow users to reorder it as they wish after >> > it's created. >> >> The overhead introduced by completely recreating the internal >> data structure after the sort is just as high as creating a >> new OrderedDict, so I don't understand why you don't like about: >> >> from collections import OrderedDict >> o = OrderedDict(((3,4), (5,4), (1,2))) >> p = OrderedDict(sorted(o.iteritems())) >> >> This even allows you to keep the original insert order should >> you need it again. If you don't need this, you can just use: >> >> o = dict(((3,4), (5,4), (1,2))) >> p = OrderedDict(sorted(o.iteritems())) >> >> which is also faster than first creating an OrderedDict and >> then recreating it with sorted entries. >> >> Put those two lines into a function and you have: >> >> def SortedOrderedDict(*args, **kws): >> o = dict(*args, **kws) >> return OrderedDict(sorted(o.iteritems())) >> >> p = SortedOrderedDict(((3,4), (5,4), (1,2))) >> >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Source (#1, Sep 24 2013) >> >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >> >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >> >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ >> ________________________________________________________________________ >> 2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 >> 2013-09-28: PyDDF Sprint ... 4 days to go >> 2013-10-14: PyCon DE 2013, Cologne, Germany ... 20 days to go >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Sep 25 01:00:18 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 24 Sep 2013 17:00:18 -0600 Subject: [Python-ideas] Indicate if an iterable is ordered or not In-Reply-To: References: Message-ID: On Tue, Sep 24, 2013 at 10:22 AM, Guido van Rossum wrote: > What do you want to do with this knowledge? At this point, nothing. :) I realized while writing the message that my use case was not helped by knowing whether or not the iterable is ordered. I sent the message anyway because it does seem like there's a gap--just not one that perhaps anyone cares about. -eric From ncoghlan at gmail.com Wed Sep 25 01:01:47 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 25 Sep 2013 09:01:47 +1000 Subject: [Python-ideas] Indicate if an iterable is ordered or not In-Reply-To: References: Message-ID: On 25 Sep 2013 02:24, "Guido van Rossum" wrote: > > What do you want to do with this knowledge? My reaction is the same as Guido's. There's already an implicit expectation that iterables will be *consistent* in the absence of mutation (i.e. arbitrarily ordered rather than unordered), but I don't see how "ordered based on container internal details" is meaningfully different from "ordered by some external criterion". Cheers, Nick. > > On Tue, Sep 24, 2013 at 9:15 AM, Eric Snow wrote: > > Iterables are not necessarily ordered (e.g. dict vs. OrderedDict). > > Sequences are but Sets aren't. I'm not aware of any good way > > currently to know if an arbitrary iterable is ordered. Without an > > explicit indicator of ordered-ness, you must know in advance for each > > specific type. > > > > One possible solution is an __isordered__ attribute (on the class), > > set to a boolean. The absence of the attribute would imply False. > > > > Such an attribute would be added to existing types: > > > > * collections.abc.Iterable (default: False) > > * list (True) > > * tuple (True) > > * set (False) > > * dict (False) > > * collections.OrderedDict (True) > > * ... > > > > Thoughts? > > > > -eric > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Sep 25 01:01:40 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 24 Sep 2013 16:01:40 -0700 Subject: [Python-ideas] Indicate if an iterable is ordered or not In-Reply-To: References: Message-ID: On Tue, Sep 24, 2013 at 4:00 PM, Eric Snow wrote: > On Tue, Sep 24, 2013 at 10:22 AM, Guido van Rossum wrote: >> What do you want to do with this knowledge? > > At this point, nothing. :) I realized while writing the message that > my use case was not helped by knowing whether or not the iterable is > ordered. I sent the message anyway because it does seem like there's > a gap--just not one that perhaps anyone cares about. > > -eric -- --Guido van Rossum (python.org/~guido) From guido at python.org Wed Sep 25 01:02:06 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 24 Sep 2013 16:02:06 -0700 Subject: [Python-ideas] Indicate if an iterable is ordered or not In-Reply-To: References: Message-ID: On Tue, Sep 24, 2013 at 4:01 PM, Guido van Rossum wrote: > On Tue, Sep 24, 2013 at 4:00 PM, Eric Snow wrote: >> On Tue, Sep 24, 2013 at 10:22 AM, Guido van Rossum wrote: >>> What do you want to do with this knowledge? >> >> At this point, nothing. :) I realized while writing the message that >> my use case was not helped by knowing whether or not the iterable is >> ordered. I sent the message anyway because it does seem like there's >> a gap--just not one that perhaps anyone cares about. To the contrary, I say there is no gap and there is nothing to gain by adding the proposed API. [Sorry for the blank reply earlier.] -- --Guido van Rossum (python.org/~guido) From ericsnowcurrently at gmail.com Wed Sep 25 01:10:38 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 24 Sep 2013 17:10:38 -0600 Subject: [Python-ideas] Indicate if an iterable is ordered or not In-Reply-To: References: Message-ID: FYI, at this point I not longer have a use case for this feature, and I'm not in favor of this idea without one. On Tue, Sep 24, 2013 at 3:16 PM, Terry Reedy wrote: > The iterator protocol is intentionally simple. It only requires an __iter__ > method or a __next__ method with a standard __iter__ method. This makes > iterables -- and generator functions that produce iterators -- easy to > write. This is not a proposal for an addition to the iterator protocol. It is about indicating (without iterating) that the iteration order of instances of a particular class will be consistent. > A generator instance may and may not produce items in an intented order, so > a class attribute is not possible. The same is generally true of transform > iterators, like map and filter instances, and most itertools classes. It is > also not true that lists (and tuples) always have a significant order. > list(set) has the artificial order of set iteration. Both are reiterable > with the same order. Why would you call one True and the other False? In > general, list(iterable) has as much order as the iterable. However, once values are added to the list, that order is consistent. -eric From ericsnowcurrently at gmail.com Wed Sep 25 01:44:04 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 24 Sep 2013 17:44:04 -0600 Subject: [Python-ideas] Indicate if an iterable is ordered or not In-Reply-To: References: Message-ID: On Tue, Sep 24, 2013 at 5:01 PM, Nick Coghlan wrote: > There's already an implicit expectation that iterables will be *consistent* > in the absence of mutation (i.e. arbitrarily ordered rather than > unordered), but I don't see how "ordered based on container internal > details" is meaningfully different from "ordered by some external > criterion". "container internal details" is a good way to put it. "ordered" is a little too vague, isn't it. :) -eric From dreamingforward at gmail.com Wed Sep 25 03:50:44 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Tue, 24 Sep 2013 18:50:44 -0700 Subject: [Python-ideas] Indicate if an iterable is ordered or not In-Reply-To: References: Message-ID: > Iterables are not necessarily ordered (e.g. dict vs. OrderedDict). > Sequences are but Sets aren't. I'm not aware of any good way > currently to know if an arbitrary iterable is ordered. Without an > explicit indicator of ordered-ness, you must know in advance for each > specific type. > > One possible solution is an __isordered__ attribute (on the class), > set to a boolean. The absence of the attribute would imply False. Isn't the traditional way to do this via "inheritance"? Then you call issubclass(list, OrderedContainer), etc. But, then, no Python hasn't completely ordered its data structures yet. Mark From shane at umbrellacode.com Wed Sep 25 04:09:56 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 24 Sep 2013 19:09:56 -0700 Subject: [Python-ideas] Indicate if an iterable is ordered or not In-Reply-To: References: Message-ID: <54726930-A1B8-4874-B4D4-D06B8AFAB0E0@umbrellacode.com> I suppose you could support some subset of slice/index operations, with some serious limitations? On Sep 24, 2013, at 4:01 PM, Guido van Rossum wrote: > On Tue, Sep 24, 2013 at 4:00 PM, Eric Snow wrote: >> On Tue, Sep 24, 2013 at 10:22 AM, Guido van Rossum wrote: >>> What do you want to do with this knowledge? >> >> At this point, nothing. :) I realized while writing the message that >> my use case was not helped by knowing whether or not the iterable is >> ordered. I sent the message anyway because it does seem like there's >> a gap--just not one that perhaps anyone cares about. >> >> -eric > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas From abarnert at yahoo.com Wed Sep 25 05:27:36 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 24 Sep 2013 20:27:36 -0700 (PDT) Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <1380043495.6261.25918509.61E22EA3@webmail.messagingengine.com> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> <35E8DF33-71BC-4831-8B3B-4A9695613A75@yahoo.com> <1380043495.6261.25918509.61E22EA3@webmail.messagingengine.com> Message-ID: <1380079656.50137.YahooMailNeo@web184704.mail.ne1.yahoo.com> From: "random832 at fastmail.us" Sent: Tuesday, September 24, 2013 10:24 AM > On Tue, Sep 24, 2013, at 12:33, Ram Rachum wrote: >> For the record, I think that having a SortedDict in the stdlib would be >> awesome. > > There are two issues with that.? This discussion comes up at least once every two months, and I don't think anyone wants to have the whole discussion all over again. See?http://stupidpythonideas.blogspot.com/2013/07/sorted-collections-in-stdlib.html, which I wrote one or two iterations ago to collect all of the issues, and please let me know if I missed any or you have anything to add. Your two issues aren't really problems, just choices to be made, and I think everyone who's interested in this who has an opinion is unanimous. (There _is_ a problem, however: there are multiple good implementations out there, but none of them comes with someone who's willing to stdlibify it and maintain it for a few years?)?But briefly:? Yes, every key must be comparable with every other key, and the comparison must define a strict weak order, and the keys must be comparison-immutable, and there's no way to test either of those automatically. By comparison, a dict needs hashable keys, which can be tested automatically, and equality-immutable and hash-immutable keys, which can't really be tested but in practice hash is an acceptable test. But it's no worse?than many other requirements in the stdlib that can't be tested automatically. And yes, NaN is a problem, but it's exactly the same problem it is everywhere else in Python. From stephen at xemacs.org Wed Sep 25 06:27:08 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 25 Sep 2013 13:27:08 +0900 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <1380079656.50137.YahooMailNeo@web184704.mail.ne1.yahoo.com> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> <35E8DF33-71BC-4831-8B3B-4A9695613A75@yahoo.com> <1380043495.6261.25918509.61E22EA3@webmail.messagingengine.com> <1380079656.50137.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: <87k3i51q77.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > See?http://stupidpythonideas.blogspot.com/2013/07/sorted-collections-in-stdlib.html, > which I wrote one or two iterations ago to collect all of the > issues, and please let me know if I missed any or you have anything > to add. A small nit: SortedSequence and SortedDicts should be mappings, guaranteeing "fast" (preferably O(1)) access for any key (integral and arbitrary, respectively). Therefore, in the case of a SortedDict the user should be no more surprised at a complaint about hashability than they should be in the case of a dict (especially considering the name!) I'll grant that some users might be perfectly happy with O(log N) "reasonably fast" access, but others would not be pleased. From g.brandl at gmx.net Wed Sep 25 08:59:05 2013 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 25 Sep 2013 08:59:05 +0200 Subject: [Python-ideas] +1 button/counter for bugs.python.org In-Reply-To: <20130923101451.691dfa7b@pitrou.net> References: <20130923001239.GN19939@ando> <20130923101451.691dfa7b@pitrou.net> Message-ID: Am 23.09.2013 10:14, schrieb Antoine Pitrou: > Le Mon, 23 Sep 2013 10:12:39 +1000, > Steven D'Aprano a > ?crit : >> On Sun, Sep 22, 2013 at 05:52:58PM +0200, Tshepang Lekhonkhobe wrote: >> > On Sun, Sep 22, 2013 at 1:21 PM, anatoly techtonik >> > wrote: >> > > Does anybody think it is a good idea to personally approve good >> > > issues and messages on bugs.python.org? >> > > >> > > If yes, should it be a Google's +1 (easier to add), or a pythonic >> > > solution for Roundup? >> > >> > Is it not enough that one can subscribe to the bug? It's very easy >> > (click the '+' button, then hit subscribe). That way, one can also >> > keep track of where the conversation is going, instead of a mere >> > vote-n-forget. >> >> Exactly. >> >> I think that masses of +1 votes from people who care so little about >> an issue that they can't be bothered to add themselves to the Nosy >> list is next to worthless. > > I don't know about you, but I don't add myself to the Nosy list of > every bug that irks me on third-party software. There's no reason to > subscribe to an issue's messages when you are a mere end-user. That > doesn't mean the bug isn't affecting you. I agree. We don't have to call it "vote"; a second-tier nosy list would probably the most useful thing for both sides: a button [This affects me] meaning "please count me amoung those who like to see it fixed and send me an email when the issue is closed" But I think that effort may be better spent on some of the existing 98 open issues in the meta-tracker here: http://psf.upfronthosting.co.za/ unless this feature is contributed. cheers, Georg From oscar.j.benjamin at gmail.com Wed Sep 25 12:06:58 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 25 Sep 2013 11:06:58 +0100 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: References: Message-ID: On 24 September 2013 19:41, Kim Gr?sman wrote: > On Sun, Aug 25, 2013 at 8:26 PM, Kim Gr?sman wrote: >> Ping? >> >> Can I clarify something to move this forward? It seems like a good >> idea to me, but I don't have the history of Py_DeleteFileW -- maybe >> somebody tried this already? > > Is there a better place to look for opinions? > > I'm happy to see Python getting more link-aware on Windows, and I > think this could help getting further in that direction. Since no one has responded to this for some time I would estimate that not many people particularly dislike your idea. So feel free to open an issue about it on the tracker (after checking that there isn't already an open issue and that your problem is not already solved in the most recent release): http://bugs.python.org/ On the other hand evidently not many people are very enthusiastic about this idea so it's possible that the tracker issue will not go anywhere unless you write the patch yourself. Oscar From kim.grasman at gmail.com Wed Sep 25 13:01:59 2013 From: kim.grasman at gmail.com (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Wed, 25 Sep 2013 13:01:59 +0200 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: References: Message-ID: Hi Oscar, On Wed, Sep 25, 2013 at 12:06 PM, Oscar Benjamin wrote: > > Since no one has responded to this for some time I would estimate that > not many people particularly dislike your idea. So feel free to open > an issue about it on the tracker (after checking that there isn't > already an open issue and that your problem is not already solved in > the most recent release): > http://bugs.python.org/ > > On the other hand evidently not many people are very enthusiastic > about this idea so it's possible that the tracker issue will not go > anywhere unless you write the patch yourself. Thanks for responding! I opened an issue before posting here: http://bugs.python.org/issue18314 I'd be happy to provide a patch, but I only want to put time into it if there's a reasonable chance it gets committed. That's why I wanted to hear if there were any objections, so I don't end up writing, testing and posting a patch only to end up in quibbles around the general idea. I'm new to Python development; would a concrete patch help move this forward? Thanks, - Kim From oscar.j.benjamin at gmail.com Wed Sep 25 13:50:14 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 25 Sep 2013 12:50:14 +0100 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: References: Message-ID: On 25 September 2013 12:01, Kim Gr?sman wrote: > Hi Oscar, > > On Wed, Sep 25, 2013 at 12:06 PM, Oscar Benjamin > wrote: >> >> Since no one has responded to this for some time I would estimate that >> not many people particularly dislike your idea. So feel free to open >> an issue about it on the tracker (after checking that there isn't >> already an open issue and that your problem is not already solved in >> the most recent release): >> http://bugs.python.org/ >> >> On the other hand evidently not many people are very enthusiastic >> about this idea so it's possible that the tracker issue will not go >> anywhere unless you write the patch yourself. > > Thanks for responding! > > I opened an issue before posting here: http://bugs.python.org/issue18314 Sorry, I've just looked back over this thread and I see that now. > > I'd be happy to provide a patch, but I only want to put time into it > if there's a reasonable chance it gets committed. That's why I wanted > to hear if there were any objections, so I don't end up writing, > testing and posting a patch only to end up in quibbles around the > general idea. > > I'm new to Python development; would a concrete patch help move this forward? It doesn't look like anyone else will write a patch so I don't think much will happen if you don't either. I don't know anything about junction points though so I have no idea how likely it is that a patch would be accepted. Oscar From ncoghlan at gmail.com Wed Sep 25 14:04:50 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 25 Sep 2013 22:04:50 +1000 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: References: Message-ID: On 25 Sep 2013 21:51, "Oscar Benjamin" wrote: > > On 25 September 2013 12:01, Kim Gr?sman wrote: > > Hi Oscar, > > > > On Wed, Sep 25, 2013 at 12:06 PM, Oscar Benjamin > > wrote: > >> > >> Since no one has responded to this for some time I would estimate that > >> not many people particularly dislike your idea. So feel free to open > >> an issue about it on the tracker (after checking that there isn't > >> already an open issue and that your problem is not already solved in > >> the most recent release): > >> http://bugs.python.org/ > >> > >> On the other hand evidently not many people are very enthusiastic > >> about this idea so it's possible that the tracker issue will not go > >> anywhere unless you write the patch yourself. > > > > Thanks for responding! > > > > I opened an issue before posting here: http://bugs.python.org/issue18314 > > Sorry, I've just looked back over this thread and I see that now. > > > > > I'd be happy to provide a patch, but I only want to put time into it > > if there's a reasonable chance it gets committed. That's why I wanted > > to hear if there were any objections, so I don't end up writing, > > testing and posting a patch only to end up in quibbles around the > > general idea. > > > > I'm new to Python development; would a concrete patch help move this forward? > > It doesn't look like anyone else will write a patch so I don't think > much will happen if you don't either. I don't know anything about > junction points though so I have no idea how likely it is that a patch > would be accepted. My recollection is that permissions around junction points are a little weird at the Windows OS level (so the access denied might be genuine for a regular user account), but if a patch can make os.unlink handle them more like *nix symlinks, that sounds reasonable to me. Cheers, Nick. > > > Oscar > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Wed Sep 25 15:47:18 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 25 Sep 2013 09:47:18 -0400 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <1380079656.50137.YahooMailNeo@web184704.mail.ne1.yahoo.com> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> <35E8DF33-71BC-4831-8B3B-4A9695613A75@yahoo.com> <1380043495.6261.25918509.61E22EA3@webmail.messagingengine.com> <1380079656.50137.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: <1380116838.26707.26290665.7C90A335@webmail.messagingengine.com> On Tue, Sep 24, 2013, at 23:27, Andrew Barnert wrote: > Yes, every key must be comparable with every other key, and the > comparison must define a strict weak order, and the keys must be > comparison-immutable, and there's no way to test either of those > automatically. By comparison, a dict needs hashable keys, which can be > tested automatically, and equality-immutable and hash-immutable keys, > which can't really be tested but in practice hash is an acceptable test. I think of this as part of the hashable protocol, whereas we know that lists are orderable despite being mutable. > But it's no worse?than many other requirements in the stdlib that can't > be tested automatically. > > And yes, NaN is a problem, but it's exactly the same problem it is > everywhere else in Python. I was serious about wanting to know how dictionaries handle NaN as a key. Is it a special case? The obvious way of implementing it would conclude it is a hash collision but not a match. I notice that Decimal('NaN') and float nan don't match each other (as do any other float/Decimal with the same value) but they do both work as dictionary keys. From oscar.j.benjamin at gmail.com Wed Sep 25 15:53:43 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 25 Sep 2013 14:53:43 +0100 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <1380116838.26707.26290665.7C90A335@webmail.messagingengine.com> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> <35E8DF33-71BC-4831-8B3B-4A9695613A75@yahoo.com> <1380043495.6261.25918509.61E22EA3@webmail.messagingengine.com> <1380079656.50137.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1380116838.26707.26290665.7C90A335@webmail.messagingengine.com> Message-ID: On 25 September 2013 14:47, wrote: >> And yes, NaN is a problem, but it's exactly the same problem it is >> everywhere else in Python. > > I was serious about wanting to know how dictionaries handle NaN as a > key. Is it a special case? The obvious way of implementing it would > conclude it is a hash collision but not a match. I notice that > Decimal('NaN') and float nan don't match each other (as do any other > float/Decimal with the same value) but they do both work as dictionary > keys. They're effectively compared by identity: >>> {float('nan'), float('nan')} set([nan, nan]) >>> a = float('nan') >>> {a, a} set([nan]) Oscar From vernondcole at gmail.com Wed Sep 25 16:11:41 2013 From: vernondcole at gmail.com (Vernon D. Cole) Date: Wed, 25 Sep 2013 15:11:41 +0100 Subject: [Python-ideas] Subject: Re: `OrderedDict.sort` Message-ID: > > > > And yes, NaN is a problem, but it's exactly the same problem it is > > everywhere else in Python. > > I was serious about wanting to know how dictionaries handle NaN as a > key. Is it a special case? The obvious way of implementing it would > conclude it is a hash collision but not a match. I notice that > Decimal('NaN') and float nan don't match each other (as do any other > float/Decimal with the same value) but they do both work as dictionary > keys. > NaN is, by definition, never equal to another NaN, which is why the following happens: >>> nan = float('NAN') >>> n2 = nan >>> n2 == nan False >>> n2 is nan True It turns out that many other things which I never thought about before can be dictionary keys... >>> d = {'a':1, nan: 2} >>> d[n2] 2 >>> d[NotImplemented] = 3 >>> d[...] = 4 >>> d[None] = 5 >>> d[True] = 6 >>> d[False] = 7 >>> d {'a': 1, nan: 2, False: 7, True: 6, NotImplemented: 3, Ellipsis: 4, None: 5} -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Sep 25 18:02:21 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 25 Sep 2013 09:02:21 -0700 Subject: [Python-ideas] Subject: Re: `OrderedDict.sort` In-Reply-To: References: Message-ID: On Sep 25, 2013, at 7:11, "Vernon D. Cole" wrote: >> >> > And yes, NaN is a problem, but it's exactly the same problem it is >> > everywhere else in Python. >> >> I was serious about wanting to know how dictionaries handle NaN as a >> key. Is it a special case? The obvious way of implementing it would >> conclude it is a hash collision but not a match. I notice that >> Decimal('NaN') and float nan don't match each other (as do any other >> float/Decimal with the same value) but they do both work as dictionary >> keys. > > NaN is, by definition, never equal to another NaN, which is why the following happens: > > >>> nan = float('NAN') > >>> n2 = nan > >>> n2 == nan > False > >>> n2 is nan > True > > It turns out that many other things which I never thought about before can be dictionary keys... > > >>> d = {'a':1, nan: 2} > >>> d[n2] > 2 > >>> d[NotImplemented] = 3 > >>> d[...] = 4 > >>> d[None] = 5 > >>> d[True] = 6 > >>> d[False] = 7 > >>> d > {'a': 1, nan: 2, False: 7, True: 6, NotImplemented: 3, Ellipsis: 4, None: 5} While some of these are odd things to use as keys, they don't have any odd behavior with equality except nan. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Sep 25 17:59:20 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 25 Sep 2013 08:59:20 -0700 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <87k3i51q77.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> <35E8DF33-71BC-4831-8B3B-4A9695613A75@yahoo.com> <1380043495.6261.25918509.61E22EA3@webmail.messagingengine.com> <1380079656.50137.YahooMailNeo@web184704.mail.ne1.yahoo.com> <87k3i51q77.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <09D64CCF-A19D-489D-BDF6-045D5BA16C8A@yahoo.com> On Sep 24, 2013, at 21:27, "Stephen J. Turnbull" wrote: > Andrew Barnert writes: > >> See http://stupidpythonideas.blogspot.com/2013/07/sorted-collections-in-stdlib.html, >> which I wrote one or two iterations ago to collect all of the >> issues, and please let me know if I missed any or you have anything >> to add. > > A small nit: SortedSequence and SortedDicts should be mappings, > guaranteeing "fast" (preferably O(1)) access for any key (integral and > arbitrary, respectively). Therefore, in the case of a SortedDict the > user should be no more surprised at a complaint about hashability than > they should be in the case of a dict (especially considering the name!) > > I'll grant that some users might be perfectly happy with O(log N) > "reasonably fast" access, but others would not be pleased. O(log N) is fast enough for the standard mappings in C++, Java, etc., are python users more demanding of performance than C++? I don't know of any language that has a SortedAndHashedDict in it's stdlib, but there are many that have a SortedDict based on a tree. I don't know of any modules on PyPI that offer the former, but multiple popular modules offer the latter. Also, given a SortedSequence and a dict, you can trivially build a SortedAndHashedDict if you really want it for something; without SortedSequence, you can't. The other way around isn't true; if you want a SortedDict, without the time and space and requirements burden, a SortedAndHashedSequence is no help. If you think the name SortedDict is misleading, we could call it something different, with fewer implications. But, given that libraries like blist generally offer the type under a name like SortedDict, and in other languages that offer both tree-based and hash-based collections the names are always parallel (like map and unordered_map in C++), I don't think this is a problem. From abarnert at yahoo.com Wed Sep 25 18:35:14 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 25 Sep 2013 09:35:14 -0700 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <1380116838.26707.26290665.7C90A335@webmail.messagingengine.com> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> <35E8DF33-71BC-4831-8B3B-4A9695613A75@yahoo.com> <1380043495.6261.25918509.61E22EA3@webmail.messagingengine.com> <1380079656.50137.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1380116838.26707.26290665.7C90A335@webmail.messagingengine.com> Message-ID: On Sep 25, 2013, at 6:47, random832 at fastmail.us wrote: > On Tue, Sep 24, 2013, at 23:27, Andrew Barnert wrote: >> Yes, every key must be comparable with every other key, and the >> comparison must define a strict weak order, and the keys must be >> comparison-immutable, and there's no way to test either of those >> automatically. By comparison, a dict needs hashable keys, which can be >> tested automatically, and equality-immutable and hash-immutable keys, >> which can't really be tested but in practice hash is an acceptable test. > > I think of this as part of the hashable protocol, whereas we know that > lists are orderable despite being mutable. Please read the blog post rather than the one-line summary if you want to discuss the contents. >> But it's no worse than many other requirements in the stdlib that can't >> be tested automatically. >> >> And yes, NaN is a problem, but it's exactly the same problem it is >> everywhere else in Python. > > I was serious about wanting to know how dictionaries handle NaN as a > key. Is it a special case? The obvious way of implementing it would > conclude it is a hash collision but not a match. I believe that, at least in CPython and PyPy, a hash collision is a match if they're identical or equal, which is why NaN values work, and why float("nan") and Decimal("nan") aren't matches, and so on. But is there anything in the documentation that requires this, or is it just a side effect of implementation specifics? I don't know. > I notice that > Decimal('NaN') and float nan don't match each other (as do any other > float/Decimal with the same value) but they do both work as dictionary > keys. From ethan at stoneleaf.us Wed Sep 25 18:53:47 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 25 Sep 2013 09:53:47 -0700 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <09D64CCF-A19D-489D-BDF6-045D5BA16C8A@yahoo.com> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> <35E8DF33-71BC-4831-8B3B-4A9695613A75@yahoo.com> <1380043495.6261.25918509.61E22EA3@webmail.messagingengine.com> <1380079656.50137.YahooMailNeo@web184704.mail.ne1.yahoo.com> <87k3i51q77.fsf@uwakimon.sk.tsukuba.ac.jp> <09D64CCF-A19D-489D-BDF6-045D5BA16C8A@yahoo.com> Message-ID: <5243151B.5030904@stoneleaf.us> On 09/25/2013 08:59 AM, Andrew Barnert wrote: > On Sep 24, 2013, at 21:27, "Stephen J. Turnbull" wrote: >> >> I'll grant that some users might be perfectly happy with O(log N) >> "reasonably fast" access, but others would not be pleased. > > O(log N) is fast enough for the standard mappings in C++, Java, etc., are python users more demanding of performance than C++? I admit I know next to nothing about C++ and Java, but in Python the dict is ubiquitous: modules have them, classes have them, nearly every user defined instance has them, they're passed into functions, they're used for dispatch tables, etc., etc.. So I suspect that Python is more demanding of its mapping than the others are. -- ~Ethan~ From abarnert at yahoo.com Wed Sep 25 21:29:51 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 25 Sep 2013 12:29:51 -0700 Subject: [Python-ideas] `OrderedDict.sort` In-Reply-To: <5243151B.5030904@stoneleaf.us> References: <20130924121315.GI7989@ando> <993eee00-84f4-4219-9637-797fd995055a@googlegroups.com> <52419EF5.1070305@stoneleaf.us> <5241B478.5060605@egenix.com> <35E8DF33-71BC-4831-8B3B-4A9695613A75@yahoo.com> <1380043495.6261.25918509.61E22EA3@webmail.messagingengine.com> <1380079656.50137.YahooMailNeo@web184704.mail.ne1.yahoo.com> <87k3i51q77.fsf@uwakimon.sk.tsukuba.ac.jp> <09D64CCF-A19D-489D-BDF6-045D5BA16C8A@yahoo.com> <5243151B.5030904@stoneleaf.us> Message-ID: <14468CAA-FD00-42BE-9C99-7D417409FAE7@yahoo.com> On Sep 25, 2013, at 9:53, Ethan Furman wrote: > On 09/25/2013 08:59 AM, Andrew Barnert wrote: >> On Sep 24, 2013, at 21:27, "Stephen J. Turnbull" wrote: >>> >>> I'll grant that some users might be perfectly happy with O(log N) >>> "reasonably fast" access, but others would not be pleased. >> >> O(log N) is fast enough for the standard mappings in C++, Java, etc., are python users more demanding of performance than C++? > > I admit I know next to nothing about C++ and Java, but in Python the dict is ubiquitous: modules have them, classes have them, nearly every user defined instance has them, they're passed into functions, they're used for dispatch tables, etc., etc.. > > So I suspect that Python is more demanding of its mapping than the others are. Nobody is suggesting replacing dict with a tree-based mapping, just adding one in the collections module for the use cases where it's what you want. From tjreedy at udel.edu Wed Sep 25 22:51:24 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 25 Sep 2013 16:51:24 -0400 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: References: Message-ID: On 9/25/2013 7:01 AM, Kim Gr?sman wrote: > I opened an issue before posting here: http://bugs.python.org/issue18314 I added as nosy the two Windows experts listed in http://docs.python.org/devguide/experts.html#experts I suspect that at least one of them knows enough about junction points to review a patch *were you to write one*. > I'd be happy to provide a patch, but I only want to put time into it > if there's a reasonable chance it gets committed. That's why I wanted > to hear if there were any objections, so I don't end up writing, > testing and posting a patch only to end up in quibbles around the > general idea. It is possible that one of the two might have an opinion to the contrary, but after reading the Wikipedia article, https://en.wikipedia.org/wiki/NTFS_junction_point It seems that you ought to be able to delete junction points from Python. That is no guarantee that any particular patch will be accepted. > I'm new to Python development; would a concrete patch help move this forward? Definitely. I added a note to the issue about testing and Windows versions. -- Terry Jan Reedy From kim.grasman at gmail.com Thu Sep 26 07:37:25 2013 From: kim.grasman at gmail.com (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Thu, 26 Sep 2013 07:37:25 +0200 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: References: Message-ID: Hi Nick, On Wed, Sep 25, 2013 at 2:04 PM, Nick Coghlan wrote: > > My recollection is that permissions around junction points are a little > weird at the Windows OS level (so the access denied might be genuine for a > regular user account), but if a patch can make os.unlink handle them more > like *nix symlinks, that sounds reasonable to me. Thanks for the heads-up! I haven't observed any differences on XP or Windows 7. Now I'm stuck in an organization where they force all command prompts to be elevated, so it's been a while since I was able to test the more normal cases. - Kim From kim.grasman at gmail.com Thu Sep 26 07:38:49 2013 From: kim.grasman at gmail.com (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Thu, 26 Sep 2013 07:38:49 +0200 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: References: Message-ID: Hi Oscar, On Wed, Sep 25, 2013 at 1:50 PM, Oscar Benjamin wrote: > >> I'm new to Python development; would a concrete patch help move this forward? > > It doesn't look like anyone else will write a patch so I don't think > much will happen if you don't either. I don't know anything about > junction points though so I have no idea how likely it is that a patch > would be accepted. I'll have to cook it up and see. I figured I'd run it by the community to see if anyone had considered it before, at least. Thanks! - Kim From kim.grasman at gmail.com Thu Sep 26 07:40:47 2013 From: kim.grasman at gmail.com (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Thu, 26 Sep 2013 07:40:47 +0200 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: References: Message-ID: Hi Terry, On Wed, Sep 25, 2013 at 10:51 PM, Terry Reedy wrote: > >> I opened an issue before posting here: http://bugs.python.org/issue18314 > > > I added as nosy the two Windows experts listed in > http://docs.python.org/devguide/experts.html#experts > > I suspect that at least one of them knows enough about junction points to > review a patch *were you to write one*. OK, I'll get to it when I find time. Last time I looked at it, it seemed pretty trivial, but I need to get the development environment for Python up. > Definitely. I added a note to the issue about testing and Windows versions. Thanks for your help! - Kim From p.f.moore at gmail.com Thu Sep 26 09:23:20 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 26 Sep 2013 08:23:20 +0100 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: References: Message-ID: On 26 September 2013 06:37, Kim Gr?sman wrote: > I haven't observed any differences on XP or Windows 7. Now I'm stuck > in an organization where they force all command prompts to be > elevated, so it's been a while since I was able to test the more > normal cases. Er, does this not already work? >From an elevated Powershell prompt: PS 08:20 C:\Work\Scratch >new-symlink symps .\ps.vim Mode LastWriteTime Length Name ---- ------------- ------ ---- -a--- 26/09/2013 08:20 symps [C:\Work\Scratch\ps.vim] PS 08:20 C:\Work\Scratch >type symps set shell=powershell set shellcmdflag=-c set shellquote=\" set shellxquote= >From a non-elevated prompt: PS 08:20 C:\Work\Scratch >py Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:06:53) [MSC v.1600 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.unlink('symps') >>> ^Z PS 08:21 C:\Work\Scratch >type symps type : Cannot find path 'C:\Work\Scratch\symps' because it does not exist. At line:1 char:1 + type symps + ~~~~~~~~~~ + CategoryInfo : ObjectNotFound: (C:\Work\Scratch\symps:String) [Get-Content], ItemNotFoundException + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand Paul From random832 at fastmail.us Thu Sep 26 14:51:55 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 26 Sep 2013 08:51:55 -0400 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: References: Message-ID: <1380199915.23284.26724813.20340F46@webmail.messagingengine.com> On Thu, Sep 26, 2013, at 3:23, Paul Moore wrote: > On 26 September 2013 06:37, Kim Gr?sman wrote: > > I haven't observed any differences on XP or Windows 7. Now I'm stuck > > in an organization where they force all command prompts to be > > elevated, so it's been a while since I was able to test the more > > normal cases. > > Er, does this not already work? Symlinks and junction points are not actually the same thing. From p.f.moore at gmail.com Thu Sep 26 18:04:31 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 26 Sep 2013 17:04:31 +0100 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: <1380199915.23284.26724813.20340F46@webmail.messagingengine.com> References: <1380199915.23284.26724813.20340F46@webmail.messagingengine.com> Message-ID: On 26 September 2013 13:51, wrote: >> Er, does this not already work? > > Symlinks and junction points are not actually the same thing. Sorry, I misread an earlier comment. You're right. Sorry for the noise. Paul From davidhalter88 at gmail.com Thu Sep 26 20:11:35 2013 From: davidhalter88 at gmail.com (David Halter) Date: Thu, 26 Sep 2013 22:41:35 +0430 Subject: [Python-ideas] Should we improve `dir`? In-Reply-To: References: Message-ID: Sorry for answering so late, but I've stayed in a very rural area of Afghanistan and enjoyed my life :-) I also realized that this discussion has been removed from python-ideas, sorry! 2013/9/15 Nick Coghlan > On 15 September 2013 16:06, David Halter wrote: > > > > 2013/9/15 Nick Coghlan > >> > >> If introspection tools want to show all the operations available *on the > >> class*, then they need to include "dir(type(cls))" as well. So there > may be > >> a legitimate feature request for a new section in the pydoc output > showing > >> "class only" methods and attributes. > > > > > > How about adding a keyword argument to `dir`: ``dir(object, > > with_class_methods=False)``? > > > > I get that there are compatibility issues with changing the default `dir` > > functionality. But at the same time adding such an option could make it > > easier for beginners, why type attributes are not being listed (because > one > > could read that in the `dir` docstring). > > It's actually the metaclass methods/attributes that are missing. The > trick with dir is it *stops at the class*, and thus always leaves out > the metaclass. While in a important sense "classes are just objects", > attribute access is a critical area where they're *different* from > most other objects, because they play different roles in the > descriptor protocol. > > That means the question is whether it is worth adding an appropriate > flag to dir(), over updating introspection tools (like IDLE's tab > completion as Chris points out) to consider "dir(type(cls))" when > appropriate. > > "dir" currently works roughly as follows for instances: > > - check the instance > - check the class MRO > > And for classes: > > - check the class MRO > > If an "include_metaclass" flag is added, then setting it to True has > an obvious meaning for classes: > > - check the class MRO > - check the metaclass MRO > > But what does "include_metaclass=True" mean for instances? You can't > access metaclass attributes and methods from an instance - the > attribute lookup only traverses one step. So, it could be reasonable > to have "include_metaclass=True" do nothing for instances, and only > change dir() behaviour for classes. > Good point. I haven't thought about this, but an "include_metaclass" option for dir can also be quite confusing (in the case of instances). > On the other hand, if the flag was called "include_class", then it > would need to be tri-valued: > > None: use appropriate default based on the kind of object > True: default for instances, forces inclusion of the metaclass MRO for > classes > False: default for classes, forces omission of the class MRO (and > thus all descriptors) for instances > > Alternatively, if we don't change dir() at all and just document that > getting a complete list of attributes means doing "sorted(set(dir(obj) > + dir(type(obj))", we'd have something that works for all versions of > Python, rather than something that was only available in 3.4+: > Yes, IMHO that's the least we should do. But I would strongly suggest to adjust the `dir` method docstrings (not only the online docs). I think that the current documentation really needs improvement (it is quite confusing now). >>> def full_dir(obj): > ... return sorted(set(dir(obj) + dir(tyape(obj)))) > ... > >>> len(set(full_dir(1)) - set(dir(1))) > 0 > >>> len(set(full_dir(int)) - set(dir(int))) > 19 > >>> len(set(dir(type)) - set(dir(object))) > 19 > > That's why my preference is for the latter approach - this isn't new > behaviour, and it's introspection tools that don't handle metaclasses > properly that need updating, rather than changing the dir() builtin. Well, I would still opt for changing dir, but I can understand that that would cause serious backwards-compatibility issues. If you would do that, it would be something for a Python 4 (and we're not even close to that). So for now that really leaves us with documenting it better. I don't really like the "include_metaclass" option. Maybe your "include_class" might make a little bit more sense. But even that one would complicate things. Cheers! Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Sep 27 12:07:18 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 27 Sep 2013 13:07:18 +0300 Subject: [Python-ideas] pprint in displayhook Message-ID: What are you think about using pprint.pprint() to output the result of evaluating an expression entered in an interactive Python session (and in IDLE)? From solipsis at pitrou.net Fri Sep 27 12:15:18 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 27 Sep 2013 12:15:18 +0200 Subject: [Python-ideas] pprint in displayhook References: Message-ID: <20130927121518.6a2d7bcb@pitrou.net> Le Fri, 27 Sep 2013 13:07:18 +0300, Serhiy Storchaka a ?crit : > What are you think about using pprint.pprint() to output the result > of evaluating an expression entered in an interactive Python session > (and in IDLE)? I'm not sure I like this idea. AFAICT pprint() isn't bullet-proof, and it would be a pain to debug if it failed to display some objects properly. Regards Antoine. From storchaka at gmail.com Fri Sep 27 13:15:13 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 27 Sep 2013 14:15:13 +0300 Subject: [Python-ideas] pprint in displayhook In-Reply-To: <20130927121518.6a2d7bcb@pitrou.net> References: <20130927121518.6a2d7bcb@pitrou.net> Message-ID: 27.09.13 13:15, Antoine Pitrou ???????(??): > Le Fri, 27 Sep 2013 13:07:18 +0300, > Serhiy Storchaka a > ?crit : >> What are you think about using pprint.pprint() to output the result >> of evaluating an expression entered in an interactive Python session >> (and in IDLE)? > > I'm not sure I like this idea. AFAICT pprint() isn't bullet-proof, > and it would be a pain to debug if it failed to display some objects > properly. We can set displayhook in site.py and for debug restore it from sys.__displayhook__. This is not more painful than use readline and enable completion by default. From solipsis at pitrou.net Fri Sep 27 13:57:13 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 27 Sep 2013 13:57:13 +0200 Subject: [Python-ideas] pprint in displayhook References: <20130927121518.6a2d7bcb@pitrou.net> Message-ID: <20130927135713.552f9e84@pitrou.net> Le Fri, 27 Sep 2013 14:15:13 +0300, Serhiy Storchaka a ?crit : > 27.09.13 13:15, Antoine Pitrou ???????(??): > > Le Fri, 27 Sep 2013 13:07:18 +0300, > > Serhiy Storchaka a > > ?crit : > >> What are you think about using pprint.pprint() to output the result > >> of evaluating an expression entered in an interactive Python > >> session (and in IDLE)? > > > > I'm not sure I like this idea. AFAICT pprint() isn't bullet-proof, > > and it would be a pain to debug if it failed to display some objects > > properly. > > We can set displayhook in site.py and for debug restore it from > sys.__displayhook__. This is not more painful than use readline and > enable completion by default. :-) I don't know, I'll let other people experiment with it. Regards Antoine. From ncoghlan at gmail.com Fri Sep 27 14:40:13 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 27 Sep 2013 22:40:13 +1000 Subject: [Python-ideas] pprint in displayhook In-Reply-To: <20130927135713.552f9e84@pitrou.net> References: <20130927121518.6a2d7bcb@pitrou.net> <20130927135713.552f9e84@pitrou.net> Message-ID: On 27 Sep 2013 21:58, "Antoine Pitrou" wrote: > > Le Fri, 27 Sep 2013 14:15:13 +0300, > Serhiy Storchaka a > ?crit : > > 27.09.13 13:15, Antoine Pitrou ???????(??): > > > Le Fri, 27 Sep 2013 13:07:18 +0300, > > > Serhiy Storchaka a > > > ?crit : > > >> What are you think about using pprint.pprint() to output the result > > >> of evaluating an expression entered in an interactive Python > > >> session (and in IDLE)? > > > > > > I'm not sure I like this idea. AFAICT pprint() isn't bullet-proof, > > > and it would be a pain to debug if it failed to display some objects > > > properly. > > > > We can set displayhook in site.py and for debug restore it from > > sys.__displayhook__. This is not more painful than use readline and > > enable completion by default. > > :-) I don't know, I'll let other people experiment with it. displayhook uses repr by default. Even normal print would make numbers and numeric string output ambiguous. Cheers, Nick. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Fri Sep 27 14:47:42 2013 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 27 Sep 2013 08:47:42 -0400 Subject: [Python-ideas] pprint in displayhook In-Reply-To: References: <20130927121518.6a2d7bcb@pitrou.net> <20130927135713.552f9e84@pitrou.net> Message-ID: <52457E6E.9040809@trueblade.com> On 9/27/2013 8:40 AM, Nick Coghlan wrote: > > On 27 Sep 2013 21:58, "Antoine Pitrou" > wrote: >> >> Le Fri, 27 Sep 2013 14:15:13 +0300, >> Serhiy Storchaka > a >> ?crit : >> > 27.09.13 13:15, Antoine Pitrou ???????(??): >> > > Le Fri, 27 Sep 2013 13:07:18 +0300, >> > > Serhiy Storchaka > a >> > > ?crit : >> > >> What are you think about using pprint.pprint() to output the result >> > >> of evaluating an expression entered in an interactive Python >> > >> session (and in IDLE)? >> > > >> > > I'm not sure I like this idea. AFAICT pprint() isn't bullet-proof, >> > > and it would be a pain to debug if it failed to display some objects >> > > properly. >> > >> > We can set displayhook in site.py and for debug restore it from >> > sys.__displayhook__. This is not more painful than use readline and >> > enable completion by default. >> >> :-) I don't know, I'll let other people experiment with it. > > displayhook uses repr by default. Even normal print would make numbers > and numeric string output ambiguous. Wouldn't this also invalidate the millions (I'm guessing) of examples in blogs, how-tos, etc. that show interactive command line examples? I'm sympathetic, but I don't think it's worth it. -- Eric. From solipsis at pitrou.net Fri Sep 27 15:05:50 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 27 Sep 2013 15:05:50 +0200 Subject: [Python-ideas] pprint in displayhook References: <20130927121518.6a2d7bcb@pitrou.net> <20130927135713.552f9e84@pitrou.net> <52457E6E.9040809@trueblade.com> Message-ID: <20130927150550.2a60cf96@pitrou.net> Le Fri, 27 Sep 2013 08:47:42 -0400, "Eric V. Smith" a ?crit : > On 9/27/2013 8:40 AM, Nick Coghlan wrote: > > > > On 27 Sep 2013 21:58, "Antoine Pitrou" > > wrote: > >> > >> Le Fri, 27 Sep 2013 14:15:13 +0300, > >> Serhiy Storchaka >> > a ?crit : > >> > 27.09.13 13:15, Antoine Pitrou ???????(??): > >> > > Le Fri, 27 Sep 2013 13:07:18 +0300, > >> > > Serhiy Storchaka >> > > > a ?crit : > >> > >> What are you think about using pprint.pprint() to output the > >> > >> result of evaluating an expression entered in an interactive > >> > >> Python session (and in IDLE)? > >> > > > >> > > I'm not sure I like this idea. AFAICT pprint() isn't > >> > > bullet-proof, and it would be a pain to debug if it failed to > >> > > display some objects properly. > >> > > >> > We can set displayhook in site.py and for debug restore it from > >> > sys.__displayhook__. This is not more painful than use readline > >> > and enable completion by default. > >> > >> :-) I don't know, I'll let other people experiment with it. > > > > displayhook uses repr by default. Even normal print would make > > numbers and numeric string output ambiguous. > > Wouldn't this also invalidate the millions (I'm guessing) of examples > in blogs, how-tos, etc. that show interactive command line examples? > I'm sympathetic, but I don't think it's worth it. Oh and how about... doctest? :-) From storchaka at gmail.com Fri Sep 27 15:52:09 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 27 Sep 2013 16:52:09 +0300 Subject: [Python-ideas] pprint in displayhook In-Reply-To: <20130927135713.552f9e84@pitrou.net> References: <20130927121518.6a2d7bcb@pitrou.net> <20130927135713.552f9e84@pitrou.net> Message-ID: 27.09.13 14:57, Antoine Pitrou ???????(??): > Le Fri, 27 Sep 2013 14:15:13 +0300, > Serhiy Storchaka a > ?crit : >> 27.09.13 13:15, Antoine Pitrou ???????(??): >>> Le Fri, 27 Sep 2013 13:07:18 +0300, >>> Serhiy Storchaka a >>> ?crit : >>>> What are you think about using pprint.pprint() to output the result >>>> of evaluating an expression entered in an interactive Python >>>> session (and in IDLE)? >>> >>> I'm not sure I like this idea. AFAICT pprint() isn't bullet-proof, >>> and it would be a pain to debug if it failed to display some objects >>> properly. >> >> We can set displayhook in site.py and for debug restore it from >> sys.__displayhook__. This is not more painful than use readline and >> enable completion by default. > > :-) I don't know, I'll let other people experiment with it. http://bugs.python.org/issue19103 Of course we should first resolve some other pprint-related issues (i.e. #19100, #19104). From storchaka at gmail.com Fri Sep 27 15:55:12 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 27 Sep 2013 16:55:12 +0300 Subject: [Python-ideas] pprint in displayhook In-Reply-To: <20130927150550.2a60cf96@pitrou.net> References: <20130927121518.6a2d7bcb@pitrou.net> <20130927135713.552f9e84@pitrou.net> <52457E6E.9040809@trueblade.com> <20130927150550.2a60cf96@pitrou.net> Message-ID: 27.09.13 16:05, Antoine Pitrou ???????(??): >> Wouldn't this also invalidate the millions (I'm guessing) of examples >> in blogs, how-tos, etc. that show interactive command line examples? >> I'm sympathetic, but I don't think it's worth it. > > Oh and how about... doctest? :-) Doctest restores sys.__displayhook__. From solipsis at pitrou.net Fri Sep 27 15:59:51 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 27 Sep 2013 15:59:51 +0200 Subject: [Python-ideas] pprint in displayhook References: <20130927121518.6a2d7bcb@pitrou.net> <20130927135713.552f9e84@pitrou.net> <52457E6E.9040809@trueblade.com> <20130927150550.2a60cf96@pitrou.net> Message-ID: <20130927155951.63623b10@pitrou.net> Le Fri, 27 Sep 2013 16:55:12 +0300, Serhiy Storchaka a ?crit : > 27.09.13 16:05, Antoine Pitrou ???????(??): > >> Wouldn't this also invalidate the millions (I'm guessing) of > >> examples in blogs, how-tos, etc. that show interactive command > >> line examples? I'm sympathetic, but I don't think it's worth it. > > > > Oh and how about... doctest? :-) > > Doctest restores sys.__displayhook__. I'm thinking more about the consistency of doctest output with actual interpreter output. One of the selling points of doctest is that it helps showcase API behaviour by showing interactive interpreter snippets. If the actual output starts being different, it might confuse people. Regards Antoine. From storchaka at gmail.com Fri Sep 27 15:57:29 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 27 Sep 2013 16:57:29 +0300 Subject: [Python-ideas] pprint in displayhook In-Reply-To: <52457E6E.9040809@trueblade.com> References: <20130927121518.6a2d7bcb@pitrou.net> <20130927135713.552f9e84@pitrou.net> <52457E6E.9040809@trueblade.com> Message-ID: 27.09.13 15:47, Eric V. Smith ???????(??): > Wouldn't this also invalidate the millions (I'm guessing) of examples in > blogs, how-tos, etc. that show interactive command line examples? I'm > sympathetic, but I don't think it's worth it. Only if they are too long. From storchaka at gmail.com Fri Sep 27 16:19:55 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 27 Sep 2013 17:19:55 +0300 Subject: [Python-ideas] pprint in displayhook In-Reply-To: <20130927155951.63623b10@pitrou.net> References: <20130927121518.6a2d7bcb@pitrou.net> <20130927135713.552f9e84@pitrou.net> <52457E6E.9040809@trueblade.com> <20130927150550.2a60cf96@pitrou.net> <20130927155951.63623b10@pitrou.net> Message-ID: 27.09.13 16:59, Antoine Pitrou ???????(??): > Le Fri, 27 Sep 2013 16:55:12 +0300, > Serhiy Storchaka a > ?crit : >> 27.09.13 16:05, Antoine Pitrou ???????(??): >>>> Wouldn't this also invalidate the millions (I'm guessing) of >>>> examples in blogs, how-tos, etc. that show interactive command >>>> line examples? I'm sympathetic, but I don't think it's worth it. >>> >>> Oh and how about... doctest? :-) >> >> Doctest restores sys.__displayhook__. > > I'm thinking more about the consistency of doctest output with actual > interpreter output. One of the selling points of doctest is that it > helps showcase API behaviour by showing interactive interpreter > snippets. If the actual output starts being different, it might confuse > people. Most doctests are significant shorter 80 columns. Actually doctest can't be longer without violating PEP8. From g.brandl at gmx.net Fri Sep 27 18:55:14 2013 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 27 Sep 2013 18:55:14 +0200 Subject: [Python-ideas] pprint in displayhook In-Reply-To: References: Message-ID: Am 27.09.2013 12:07, schrieb Serhiy Storchaka: > What are you think about using pprint.pprint() to output the result of > evaluating an expression entered in an interactive Python session (and > in IDLE)? > This is something users can set in their sitecustomize.py; for various reasons people have already mentioned it is not a sensible choice for default interactive interpreters. It might be different for IDLE; I don't know how faithfully it follows the interactive interpreter in other regards. Georg From bruce at leapyear.org Fri Sep 27 19:02:43 2013 From: bruce at leapyear.org (Bruce Leban) Date: Fri, 27 Sep 2013 10:02:43 -0700 Subject: [Python-ideas] pprint in displayhook In-Reply-To: References: Message-ID: It's a great idea to be able to do this. Fortunately, you already can. Changing the Python default is a terrible idea for all the other reasons people mentioned, not least of which is the fact that pprint doesn't work for all inputs. I suggest writing a recipe that provides a pprint-ish replacement for repr. I suggest producing results that are identical to repr with only two exceptions: (1) adding whitespace, (2) adding line breaks to strings. While I would never add this by default, there are certainly times where I would prefer prettier output. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Sep 27 21:25:15 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 27 Sep 2013 15:25:15 -0400 Subject: [Python-ideas] pprint in displayhook In-Reply-To: References: Message-ID: On 9/27/2013 12:55 PM, Georg Brandl wrote: > Am 27.09.2013 12:07, schrieb Serhiy Storchaka: >> What are you think about using pprint.pprint() to output the result of >> evaluating an expression entered in an interactive Python session (and >> in IDLE)? >> > > This is something users can set in their sitecustomize.py; for various > reasons people have already mentioned it is not a sensible choice for > default interactive interpreters. I agree. The default interpreter cannot be configured on the fly. > It might be different for IDLE; It has a menu for both one-time actions and changing defaults. I think a menu item and hot-key to re-display the last output object with pprint would be a nice little feature. The object remains bound to '_' in the user process, so executing "pprint.pprint(_)" should be possible. (The minor problem is that even if pprint is loaded in sys.modules on startup, it might not be in the user global namespace.) Without seeing a bug-fixed pprint in action, I could not be sure about turning on pprint for all output. It is not needed often. The sitecustomize option would work for a permanent change. > I don't know how faithfully it follows > the interactive interpreter in other regards. We try have it act the same except where there is a good reason not to. -- Terry Jan Reedy From steve at pearwood.info Sat Sep 28 02:16:13 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 28 Sep 2013 10:16:13 +1000 Subject: [Python-ideas] pprint in displayhook In-Reply-To: References: Message-ID: <20130928001613.GP7989@ando> On Fri, Sep 27, 2013 at 01:07:18PM +0300, Serhiy Storchaka wrote: > What are you think about using pprint.pprint() to output the result of > evaluating an expression entered in an interactive Python session (and > in IDLE)? Well, let's try it and see... py> L = list(range(50)) py> print(L) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] py> pprint.pprint(L) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] That would be an Absolutely Not. However, if somebody wanted to give pprint some attention to make it actually pretty print, that would be very welcome. -- Steven From raymond.hettinger at gmail.com Sat Sep 28 06:17:14 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 27 Sep 2013 21:17:14 -0700 Subject: [Python-ideas] pprint in displayhook In-Reply-To: References: Message-ID: <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> On Sep 27, 2013, at 3:07 AM, Serhiy Storchaka wrote: > What are you think about using pprint.pprint() to output the result of evaluating an expression entered in an interactive Python session (and in IDLE)? This might be a reasonable idea if pprint were in better shape. I think substantial work needs to be done on it, before it would be worthy of becoming the default method of display. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Sat Sep 28 06:44:46 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sat, 28 Sep 2013 07:44:46 +0300 Subject: [Python-ideas] Python 3.4 should include docopt as-is Message-ID: This - http://docopt.org/ - should be included with Python 3.4 distribution. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Sat Sep 28 07:19:44 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sat, 28 Sep 2013 08:19:44 +0300 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames Message-ID: FILE = os.path.abspath(__file__) DIR = os.path.abspath(os.path.dirname(__file__)) ? Repeated pattern for referencing resources relative to your scripts. Ideas about alternative names / locations are welcome. In PHP these are __FILE__ and __DIR__. For Python 3 adding __dir__ is impossible, because the name clashes with __dir__ method (which is not implemented for module object, but should be [ ] for consistency). Also current __file__ is rarely absolute path, because it is never normalized [ ]. So it will be nice to see normalization of Python file name after the import to reduce mess and make its behaviour predictable - http://stackoverflow.com/questions/7116889/python-file-attribute-absolute-or-relative ----[ possible spec. draft for a beautiful internal structure ]-- The Python interpreter should provide run-time information about: 1. order of import sequence 2. names of imported modules 3. unique location for each imported module which unambiguously identifies it 4. run-time import dependency tree (not sure about this, but it can help with debugging) 5. information about sys.path entry where this module was imported from 6. information about who and when added this sys.path entry -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Sep 28 09:24:43 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 28 Sep 2013 03:24:43 -0400 Subject: [Python-ideas] pprint in displayhook In-Reply-To: <20130928001613.GP7989@ando> References: <20130928001613.GP7989@ando> Message-ID: On 9/27/2013 8:16 PM, Steven D'Aprano wrote: > On Fri, Sep 27, 2013 at 01:07:18PM +0300, Serhiy Storchaka wrote: >> What are you think about using pprint.pprint() to output the result of >> evaluating an expression entered in an interactive Python session (and >> in IDLE)? > > Well, let's try it and see... > > > py> L = list(range(50)) > py> print(L) > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, > 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, > 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] > py> pprint.pprint(L) > [0, > 1, > 2, > 3, > 4, > 5, > 6, > 7, > 8, > 9, > 10, > 11, > 12, > 13, > 14, > 15, > 16, > 17, > 18, > 19, > 20, > 21, > 22, > 23, > 24, > 25, > 26, > 27, > 28, > 29, > 30, > 31, > 32, > 33, > 34, > 35, > 36, > 37, > 38, > 39, > 40, > 41, > 42, > 43, > 44, > 45, > 46, > 47, > 48, > 49] > > > That would be an Absolutely Not. This is why I suggested that I would consider making it available in Idle on a per object basis, for things like this >>> L = ['This is the first sentence.', 'This is the second, lets make it onger', 'and this is the third, but do not stop yet'] >>> L ['This is the first sentence.', 'This is the second, lets make it onger', 'and this is the third, but do not stop yet'] >>> import pprint >>> pprint.pprint(L) ['This is the first sentence.', 'This is the second, lets make it onger', 'and this is the third, but do not stop yet'] But your example is more typical of my usage. > However, if somebody wanted to give pprint some attention to make it > actually pretty print, that would be very welcome. -- Terry Jan Reedy From steve at pearwood.info Sat Sep 28 10:10:09 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 28 Sep 2013 18:10:09 +1000 Subject: [Python-ideas] Python 3.4 should include docopt as-is In-Reply-To: References: Message-ID: <20130928081009.GU7989@ando> On Sat, Sep 28, 2013 at 07:44:46AM +0300, anatoly techtonik wrote: > This - http://docopt.org/ - should be included with Python 3.4 distribution. Are you the developer or maintainer of docopt? If so, you'll probably need to write a PEP. Otherwise, you'll need to ask the maintainer of docopt to write a PEP. Some questions that will need to be asked: - does the maintainer agree to distribute the software under the same licence as Python? - does the maintainer agree to stick to Python's release schedule? - is the maintainer happy with keeping the API frozen for the next ten or fifteen years? I see that docopt is now up to version 0.6.1. To me, that indicates that the API should not be considered stable, it's under version 1. Perhaps the maintainer disagrees, and would be happy to freeze the API now. -- Steven From kwpolska at gmail.com Sat Sep 28 11:02:52 2013 From: kwpolska at gmail.com (=?UTF-8?B?Q2hyaXMg4oCcS3dwb2xza2HigJ0gV2Fycmljaw==?=) Date: Sat, 28 Sep 2013 11:02:52 +0200 Subject: [Python-ideas] Python 3.4 should include docopt as-is In-Reply-To: <20130928081009.GU7989@ando> References: <20130928081009.GU7989@ando> Message-ID: On Sat, Sep 28, 2013 at 10:10 AM, Steven D'Aprano wrote: > On Sat, Sep 28, 2013 at 07:44:46AM +0300, anatoly techtonik wrote: >> This - http://docopt.org/ - should be included with Python 3.4 distribution. > > Are you the developer or maintainer of docopt? He is not. I CC?d the developer, Vladimir Keleshev. > If so, you'll probably need to write a PEP. Otherwise, you'll need to > ask the maintainer of docopt to write a PEP. Some questions that will > need to be asked: > > - does the maintainer agree to distribute the software under the same > licence as Python? > > - does the maintainer agree to stick to Python's release schedule? > > - is the maintainer happy with keeping the API frozen for the next ten > or fifteen years? > > I see that docopt is now up to version 0.6.1. To me, that indicates that > the API should not be considered stable, it's under version 1. Perhaps > the maintainer disagrees, and would be happy to freeze the API now. > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -- Chris ?Kwpolska? Warrick PGP: 5EAAEA16 stop html mail | always bottom-post | only UTF-8 makes sense From breamoreboy at yahoo.co.uk Sat Sep 28 11:22:52 2013 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 28 Sep 2013 10:22:52 +0100 Subject: [Python-ideas] Python 3.4 should include docopt as-is In-Reply-To: References: Message-ID: On 28/09/2013 05:44, anatoly techtonik wrote: > This - http://docopt.org/ - should be included with Python 3.4 distribution. > -- > anatoly t. > Have you had the courtesy to ask the maintainer of this library their opinions prior to placing this? -- Cheers. Mark Lawrence From techtonik at gmail.com Sat Sep 28 11:59:03 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sat, 28 Sep 2013 12:59:03 +0300 Subject: [Python-ideas] AST Hash In-Reply-To: References: Message-ID: On Wed, Sep 11, 2013 at 8:05 PM, Amaury Forgeot d'Arc wrote: > 2013/9/11 anatoly techtonik >> >> Hi, >> >> We need a checksum for code pieces. The goal of the checksum is to >> reliably detect pieces of code with absolutely identical behaviour. >> Borders of such checksum can be functions, classes, modules,. > > > This looks like a nice project; I think this should first take the form of > an external package. > I'm sure there are many details to iron before this kind of technique can be > widely adopted. Yes, it is just an idea. > For example: > - Is there only one kind of hash? you suggested to erase the differences in > variable names, are there other possible customizations? Yes. There are different kinds of hashes depending on purpose, that why I explicitly mentioned that AST hashes are named. Every name corresponds to single purpose and to single set of filtering rules. I can see at les Possible customizations: -- 1 comments, docstrings and wihtespace handling -- 1. preserve all whitespace including comments 2. preserve comments 3. standard erase comments, preserve docstrings 4. erase comments in addition to docstrings -- 2 variable names handling -- 1. preserve all 2. preserve external 3. preserve stdlib names (stdlib needs to be described to detect namespace is from stdlib) 4. preserve thirdparty module names 5. preserve classes, rename variables 6. rename everything (abstract pattern matching) Are stdlib detection ideas welcome? > - To detect common patterns, is it interesting to hash and index all the > nodes of an AST tree? I am not sure, I need these hashes for sharing and detecting updates to code snippets contained in various .py files across various Python projects. I like to think that snippets are constrained on function or class boundary, or else the management is rather tiresome. > - Is there a central repository to store hashes of recipes? Is Google Search > enough? Google search indexes hashes of each revision for Mercurial repositories. Sure it can do this too. Maintaining and downloading files and snippets by hash from PyPI would be interesting. It seems that most cloud storage solutions use hashes for storage, so implementing this should be even easier than installing PyPI mirror. > I don't need answers, only a reference implementation that people can > discuss! Reference implementation will take some time for sure. It may never be done even, because things like https://bitbucket.org/techtonik/python-stdlib/ have higher priority and don't have sponsors. -- anatoly t. From techtonik at gmail.com Sat Sep 28 12:30:35 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sat, 28 Sep 2013 13:30:35 +0300 Subject: [Python-ideas] AST Hash In-Reply-To: <5230D853.6090108@egenix.com> References: <5230D853.6090108@egenix.com> Message-ID: On Wed, Sep 11, 2013 at 11:53 PM, M.-A. Lemburg wrote: > On 11.09.2013 18:05, anatoly techtonik wrote: >> Hi, >> >> We need a checksum for code pieces. The goal of the checksum is to >> reliably detect pieces of code with absolutely identical behaviour. >> Borders of such checksum can be functions, classes, modules,. >> Practical application for such checksums are: >> >> - detecting usage of recipes and examples across PyPI packages >> - detecting usage of standard stdlib calls >> - creating execution safe serialization formats for data >> - choosing class to deserialize data fields of the object based on its hash >> - enable consistent validation and testing of results across various AST tools >> >> There can be two approaches to build such checksum: >> 1. Code Section Hash >> 2. AST Hash >> >> Code Section Hash is built from a substring of a source code, cut on >> function or class boundaries. This hash is flaky - whitespace and >> comment differences ruin it, even when behaviour (and bytecode) stays >> the same. It is possible to reduce the effect of whitespace and >> comment changes by normalizing the substring - dedenting, reindenting >> with 4 spaces, stripping empty lines, comments and trailing >> whitespace. And it still will be unreliable and affected by whitespace >> changes in the middle of the string. Therefore a 2nd way of hashing is >> more preferable. >> >> AST Hash is build on AST. This excludes any comments, whitespace etc. >> and makes the hash strict and reliable. This is a canonical Default >> AST Hash. >> >> There are cases when Default AST Hash may not be enough for >> comparison. For example, if local variables are renamed, or docstrings >> changed, the behaviour of a function may not change, but its AST hash >> will. In these cases additional normalization rules apply. Such as >> changing all local variable names to var1, var2, ... in order of >> appearance, stripping docstrings etc. Every set of such normalization >> rules should have a name. This will also be the name of resulting >> custom AST Hash. >> >> Explicit naming of AST Hashes and hardlinking of names to rules that >> are used to build them will settle common ground (base) for AST tools >> interoperability and research papers. As such, it most likely require >> a separate PEP. > > You might want to have a look at this paper which discussed > AST compression (for Java, but the ideas apply to Python just > as well): > > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.135.5917&rep=rep1&type=pdf > > If you compress the AST into a string and take its hash, > you should pretty much have what you want. Thanks for the link. The idea to transform AST to string is obvious - I don't know other way to build hash than to feed some kind of binary string to function. But the guys addressed different problem - bytecode is harder to compress that AST, and I agree, because it is easier to analyse common patterns in AST and tune compressing algorithms accordingly. Structure and dependency in binary data matters and binary compression algorithms are usually dump. Google improved bsdiff compression a lot for executable by making it aware of binary structure. "..compressed AST provide ..., platform independence" - I thought that Java byte code platform independent. If I could write a paper for every idea that I have (or at least draw a diagram), I could be a president of academy of sciences already. =) Anyway, an interesting reading. Unfortunately, not much time for that. I am not sure their implementation can be adopted as a prototype for hash implementation, it seems that simple tree walker should do the job. -- anatoly t. From ned at nedbatchelder.com Sat Sep 28 12:42:42 2013 From: ned at nedbatchelder.com (Ned Batchelder) Date: Sat, 28 Sep 2013 06:42:42 -0400 Subject: [Python-ideas] Python 3.4 should include docopt as-is In-Reply-To: References: Message-ID: <5246B2A2.4080508@nedbatchelder.com> On 9/28/13 12:44 AM, anatoly techtonik wrote: > This - http://docopt.org/ - should be included with Python 3.4 > distribution. In addition to the other questions already asked, you haven't answered the fundamental one: Why should docopt be included in the stdlib? It's right there in PyPI where any one can get it. Why is it better in the stdlib than in PyPI? --Ned. > -- > anatoly t. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Sat Sep 28 13:34:59 2013 From: flying-sheep at web.de (Philipp A.) Date: Sat, 28 Sep 2013 13:34:59 +0200 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames In-Reply-To: References: Message-ID: as much as i would like the convenience, python has very few magic globals, and they all have names encased in 4 underscores. if we really add more globals, why not __abs_file__ and __abs_dir__ or sth. like that? 2013/9/28 anatoly techtonik > FILE = os.path.abspath(__file__) > DIR = os.path.abspath(os.path.dirname(__file__)) > ? > > Repeated pattern for referencing resources relative to your scripts. Ideas > about alternative names / locations are welcome. > > In PHP these are __FILE__ and __DIR__. For Python 3 adding __dir__ is > impossible, because the name clashes with __dir__ method (which is not > implemented for module object, but should be [ ] for consistency). Also > current __file__ is rarely absolute path, because it is never normalized [ > ]. > > So it will be nice to see normalization of Python file name after the > import to reduce mess and make its behaviour predictable - > http://stackoverflow.com/questions/7116889/python-file-attribute-absolute-or-relative > > > ----[ possible spec. draft for a beautiful internal structure ]-- > The Python interpreter should provide run-time information about: > 1. order of import sequence > 2. names of imported modules > 3. unique location for each imported module which unambiguously identifies > it > 4. run-time import dependency tree (not sure about this, but it can help > with debugging) > 5. information about sys.path entry where this module was imported from > 6. information about who and when added this sys.path entry > -- > anatoly t. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.rodola at gmail.com Sat Sep 28 13:48:02 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Sat, 28 Sep 2013 13:48:02 +0200 Subject: [Python-ideas] PyPi per-file download counters Message-ID: I recently moved psutil .tar.gz and .exe files from Google Code to PyPi and noticed it doesn't show total per-file download counters: https://pypi.python.org/pypi?:action=display&name=psutil#downloads Why don't we add them? Thoughts? --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram.rachum at gmail.com Sat Sep 28 10:53:37 2013 From: ram.rachum at gmail.com (Ram Rachum) Date: Sat, 28 Sep 2013 01:53:37 -0700 (PDT) Subject: [Python-ideas] String formatting Message-ID: Any reason why string formatting using % doesn't work when the list of arguments is in a list rather than a tuple? -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.rodola at gmail.com Sat Sep 28 13:57:00 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Sat, 28 Sep 2013 13:57:00 +0200 Subject: [Python-ideas] PyPi per-file download counters In-Reply-To: References: Message-ID: ...also, it seems the current counters are broken. I uploaded those files this morning and the page says there were over 5000 downloads in the last month. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/ On Sat, Sep 28, 2013 at 1:48 PM, Giampaolo Rodola' wrote: > I recently moved psutil .tar.gz and .exe files from Google Code to PyPi > and noticed it doesn't show total per-file download counters: > https://pypi.python.org/pypi?:action=display&name=psutil#downloads > Why don't we add them? Thoughts? > > --- Giampaolo > http://code.google.com/p/pyftpdlib/ > http://code.google.com/p/psutil/ > http://code.google.com/p/pysendfile/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Sep 28 14:32:46 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 28 Sep 2013 22:32:46 +1000 Subject: [Python-ideas] String formatting In-Reply-To: References: Message-ID: <20130928123246.GW7989@ando> On Sat, Sep 28, 2013 at 01:53:37AM -0700, Ram Rachum wrote: > Any reason why string formatting using % doesn't work when the list of > arguments is in a list rather than a tuple? Because it's not supposed to. It is part of the design of % that arbitrary objects, including lists, require only a single % target: py> L = list(range(8)) py> "Values: %s" % L 'Values: [0, 1, 2, 3, 4, 5, 6, 7]' The deliberately single exception to that are tuples. This is unavoidable, since there's otherwise no other way for a binary operator like % to take arbitrary numbers of arguments. Besides, even if this were a good idea, 20+ years of code that expects lists to be treated as a single object for the purposes of % formatting says we can't change it now. -- Steven From python at mrabarnett.plus.com Sat Sep 28 18:51:29 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 28 Sep 2013 17:51:29 +0100 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames In-Reply-To: References: Message-ID: <52470911.5020809@mrabarnett.plus.com> On 28/09/2013 12:34, Philipp A. wrote: > as much as i would like the convenience, python has very few magic > globals, and they all have names encased in 4 underscores. > > if we really add more globals, why not __abs_file__ and __abs_dir__ or > sth. like that? > +1 1. Do we need them? 2. If we do, then I agree with __abs_file__ and __abs_dir__. > > 2013/9/28 anatoly techtonik > > > FILE = os.path.abspath(__file__) > DIR = os.path.abspath(os.path.dirname(__file__)) > ? > > Repeated pattern for referencing resources relative to your scripts. > Ideas about alternative names / locations are welcome. > > In PHP these are __FILE__ and __DIR__. For Python 3 adding __dir__ > is impossible, because the name clashes with __dir__ method (which > is not implemented for module object, but should be [ ] for > consistency). Also current __file__ is rarely absolute path, because > it is never normalized [ ]. > > So it will be nice to see normalization of Python file name after > the import to reduce mess and make its behaviour predictable - > http://stackoverflow.com/questions/7116889/python-file-attribute-absolute-or-relative > > > ----[ possible spec. draft for a beautiful internal structure ]-- > The Python interpreter should provide run-time information about: > 1. order of import sequence > 2. names of imported modules > 3. unique location for each imported module which unambiguously > identifies it > 4. run-time import dependency tree (not sure about this, but it can > help with debugging) > 5. information about sys.path entry where this module was imported from > 6. information about who and when added this sys.path entry > From tjreedy at udel.edu Sat Sep 28 22:28:12 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 28 Sep 2013 16:28:12 -0400 Subject: [Python-ideas] Python 3.4 should include docopt as-is In-Reply-To: <5246B2A2.4080508@nedbatchelder.com> References: <5246B2A2.4080508@nedbatchelder.com> Message-ID: On 9/28/2013 6:42 AM, Ned Batchelder wrote: > On 9/28/13 12:44 AM, anatoly techtonik wrote: >> This - http://docopt.org/ - should be included with Python 3.4 >> distribution. > > In addition to the other questions already asked, you haven't answered > the fundamental one: Why should docopt be included in the stdlib? It's > right there in PyPI where any one can get it. Why is it better in the > stdlib than in PyPI? The stdlib has mostly switched from using optparse to argparse. The next question is what relation docopt has to either?? What is its backend? Anyway, it strikes be as a wrapper module best kept as third party, similar to re and urllib wrappers. -- Terry Jan Reedy From tjreedy at udel.edu Sat Sep 28 22:34:03 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 28 Sep 2013 16:34:03 -0400 Subject: [Python-ideas] String formatting In-Reply-To: References: Message-ID: On 9/28/2013 4:53 AM, Ram Rachum wrote: > Any reason why string formatting using % doesn't work when the list of > arguments is in a list rather than a tuple? Because that would double the troublesome anomaly of a tuple being treated as a sequence of objects and not just an object itself. >>> 'object %s' % [1,2] 'object [1, 2]' >>> 'object %s' % (1,2) Traceback (most recent call last): File "", line 1, in 'object %s' % (1,2) TypeError: not all arguments converted during string formatting One of the reasons for .format() is to eliminate that anomaly. >>> 'object {}'.format([1,2]) 'object [1, 2]' >>> 'object {}'.format((1,2)) 'object (1, 2)' -- Terry Jan Reedy From cs at zip.com.au Sun Sep 29 01:26:41 2013 From: cs at zip.com.au (Cameron Simpson) Date: Sun, 29 Sep 2013 09:26:41 +1000 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames In-Reply-To: References: Message-ID: <20130928232641.GA15985@cskk.homeip.net> On 28Sep2013 08:19, anatoly techtonik wrote: | FILE = os.path.abspath(__file__) | DIR = os.path.abspath(os.path.dirname(__file__)) | ? | | Repeated pattern for referencing resources relative to your scripts. Ideas | about alternative names / locations are welcome. | | In PHP these are __FILE__ and __DIR__. For Python 3 adding __dir__ is | impossible, because the name clashes with __dir__ method (which is not | implemented for module object, but should be [ ] for consistency). Also | current __file__ is rarely absolute path, because it is never normalized [ | ]. Maybe I'm grumpy this morning (though I felt the same reading this yesterday). -1 for any names commencing with __ (or even _). -1 for new globals. -1 because I can imagine wanting different nuances on the definitions above; in particular for DIR I can well imagine wanting bare dirname(abspath(FILE)) - semanticly different to your construction. There's lots of scope for bikeshedding here. -1 because this is trivial trival code. -1 because you can do all this with relative paths anyway, no need for abspath -1 because I can imagine being unable to compute abspath in certain circumstances ( certainly on older UNIX systems you could be inside a directory without sufficient privileges to walk back up the tree for getcwd and its equivalents ) -0 for adding some kind of convenience functions to importlib(?) for this (+0 except that I can see heaps of bikeshedding) Cheers, -- Cameron Simpson In an insane society, the sane man must appear insane. - Keith A. Schauer From ncoghlan at gmail.com Sun Sep 29 02:21:43 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 29 Sep 2013 10:21:43 +1000 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames In-Reply-To: <20130928232641.GA15985@cskk.homeip.net> References: <20130928232641.GA15985@cskk.homeip.net> Message-ID: Note that any remaining occurrences of non-absolute values in __file__ are generally considered bugs in the import system. However, we tend not to fix them in maintenance releases, since converting relative paths to absolute paths runs a risk of breaking user code. We're definitely *not* going to further pollute the module namespace with values that can be trivially and reliably derived from existing values. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Sun Sep 29 05:28:45 2013 From: songofacandy at gmail.com (INADA Naoki) Date: Sun, 29 Sep 2013 12:28:45 +0900 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames In-Reply-To: References: <20130928232641.GA15985@cskk.homeip.net> Message-ID: os.path.abspath(__file__) returns wrong path after chdir. So I don't think abspath of module can be trivially and reliably derived from existing values. $ cat foo.py import os print(os.path.abspath(__file__)) os.chdir('work') print(os.path.abspath(__file__)) $ python foo.py /home/inada-n/foo.py /home/inada-n/work/foo.py On Sun, Sep 29, 2013 at 9:21 AM, Nick Coghlan wrote: > Note that any remaining occurrences of non-absolute values in __file__ are > generally considered bugs in the import system. However, we tend not to fix > them in maintenance releases, since converting relative paths to absolute > paths runs a risk of breaking user code. > > We're definitely *not* going to further pollute the module namespace with > values that can be trivially and reliably derived from existing values. > > Cheers, > Nick. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From clay.sweetser at gmail.com Sun Sep 29 06:06:47 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Sun, 29 Sep 2013 00:06:47 -0400 Subject: [Python-ideas] An exhaust() function for iterators Message-ID: Currently, several strategies exist for exhausting an iterable when one does not care about what the iterable returns (such as when one merely wants a side effect of the iteration process). One can either use an empty for loop: for x in side_effect_iterable: pass A throwaway list comprehension: [x for x in side_effect_iterable] A try/except and a while: next = side_effect_iterable.next try: while True: next() except StopIteration: pass Or a number of other methods. The question is, which one is the fastest? Which one is the most memory efficient? Though these are all obvious methods, none of them are both the fastest and the most memory efficient (though the for/pass method comes close). As it turns out, the fastest and most efficient method available in the standard library is collections.deque's __init__ and extend methods. from collections import deque exhaust_iterable = deque(maxlen=0).extend exhaust_iterable(side_effect_iterable) When a deque object is initialized with a max length of zero or less, a special function, consume_iterator, is used instead of the regular element insertion calls. This function, found at http://hg.python.org/cpython/file/tip/Modules/_collectionsmodule.c#l278, merely iterates through the iterator, without doing any work allocating the object to the deque's internal structure. I would like to propose that this function, or one very similar to it, be added to the standard library, either in the itertools module, or the standard namespace. If nothing else, doing so would at least give a single *obvious* way to exhaust an iterator, instead of the several miscellaneous methods available. -- "Evil begins when you begin to treat people as things." - Terry Pratchett From g.brandl at gmx.net Sun Sep 29 07:47:38 2013 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 29 Sep 2013 07:47:38 +0200 Subject: [Python-ideas] An exhaust() function for iterators In-Reply-To: References: Message-ID: Am 29.09.2013 06:06, schrieb Clay Sweetser: > I would like to propose that this function, or one very similar to it, > be added to the standard library, either in the itertools module, or > the standard namespace. > If nothing else, doing so would at least give a single *obvious* way > to exhaust an iterator, instead of the several miscellaneous methods > available. YAGNI. This is not a very common operation. On the point of obvious ways, the first one you gave for _ in iterable: pass is perfectly obvious and simple enough AFAICS. cheers, Georg From g.brandl at gmx.net Sun Sep 29 07:50:35 2013 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 29 Sep 2013 07:50:35 +0200 Subject: [Python-ideas] Python 3.4 should include docopt as-is In-Reply-To: References: <5246B2A2.4080508@nedbatchelder.com> Message-ID: Am 28.09.2013 22:28, schrieb Terry Reedy: > On 9/28/2013 6:42 AM, Ned Batchelder wrote: >> On 9/28/13 12:44 AM, anatoly techtonik wrote: >>> This - http://docopt.org/ - should be included with Python 3.4 >>> distribution. >> >> In addition to the other questions already asked, you haven't answered >> the fundamental one: Why should docopt be included in the stdlib? It's >> right there in PyPI where any one can get it. Why is it better in the >> stdlib than in PyPI? > > The stdlib has mostly switched from using optparse to argparse. The next > question is what relation docopt has to either?? What is its backend? > Anyway, it strikes be as a wrapper module best kept as third party, > similar to re and urllib wrappers. Especially since it's one of the more "magical" argument parsers, which is fine as a library but not something we like to put in the standard library. cheers, Georg From ncoghlan at gmail.com Sun Sep 29 08:15:17 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 29 Sep 2013 16:15:17 +1000 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames In-Reply-To: References: <20130928232641.GA15985@cskk.homeip.net> Message-ID: On 29 September 2013 13:28, INADA Naoki wrote: > os.path.abspath(__file__) returns wrong path after chdir. > So I don't think abspath of module can be trivially and reliably derived > from existing values. Hence the part about any remaining instances of non-absolute __file__ values being considered a bug in the import system. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From kim.grasman at gmail.com Sun Sep 29 11:52:52 2013 From: kim.grasman at gmail.com (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Sun, 29 Sep 2013 11:52:52 +0200 Subject: [Python-ideas] Have os.unlink remove junction points on Windows In-Reply-To: References: Message-ID: Nick, On Wed, Sep 25, 2013 at 2:04 PM, Nick Coghlan wrote: > > My recollection is that permissions around junction points are a little > weird at the Windows OS level (so the access denied might be genuine for a > regular user account), but if a patch can make os.unlink handle them more > like *nix symlinks, that sounds reasonable to me. Just to follow up on this: I found a lot of Win32-specific code in Lib/test/symlink_support.py to detect whether the current user has the SeCreateSymbolicLink privilege (privileges in Windows are permissions not bound to a resource, rather like global access flags). So I think the "little weird" applies more to symbolic links than to junction points. At least I can't find any privileges that apply to junction points as such. There is a blurb on MSDN about the system-provided junction points from C:\Documents and Settings\... -> C:\Users\..., but that seems to concern actual file system object permissions for those specific paths rather than something general around junction points. http://msdn.microsoft.com/en-us/library/windows/desktop/bb968829(v=vs.85).aspx Cheers, - Kim From techtonik at gmail.com Sun Sep 29 19:36:06 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 29 Sep 2013 20:36:06 +0300 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames In-Reply-To: References: <20130928232641.GA15985@cskk.homeip.net> Message-ID: On Sun, Sep 29, 2013 at 9:15 AM, Nick Coghlan wrote: > On 29 September 2013 13:28, INADA Naoki wrote: >> os.path.abspath(__file__) returns wrong path after chdir. >> So I don't think abspath of module can be trivially and reliably derived >> from existing values. > > Hence the part about any remaining instances of non-absolute __file__ > values being considered a bug in the import system. Bug that will not be fixed, i.e. a wart. And as a result we don't have a way to reliably reference filename of the current script and its directory. Hence the proposal. -- anatoly t. From techtonik at gmail.com Sun Sep 29 19:39:44 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 29 Sep 2013 20:39:44 +0300 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames In-Reply-To: <20130928232641.GA15985@cskk.homeip.net> References: <20130928232641.GA15985@cskk.homeip.net> Message-ID: On Sun, Sep 29, 2013 at 2:26 AM, Cameron Simpson wrote: > On 28Sep2013 08:19, anatoly techtonik wrote: > | FILE = os.path.abspath(__file__) > | DIR = os.path.abspath(os.path.dirname(__file__)) > | ? > | > | Repeated pattern for referencing resources relative to your scripts. Ideas > | about alternative names / locations are welcome. > | > | In PHP these are __FILE__ and __DIR__. For Python 3 adding __dir__ is > | impossible, because the name clashes with __dir__ method (which is not > | implemented for module object, but should be [ ] for consistency). Also > | current __file__ is rarely absolute path, because it is never normalized [ > | ]. > > Maybe I'm grumpy this morning (though I felt the same reading this yesterday). > > -1 for any names commencing with __ (or even _). > > -1 for new globals. > > -1 because I can imagine wanting different nuances on the definitions > above; in particular for DIR I can well imagine wanting bare > dirname(abspath(FILE)) - semanticly different to your construction. > There's lots of scope for bikeshedding here. > > -1 because this is trivial trival code. > > -1 because you can do all this with relative paths anyway, no need for abspath > > -1 because I can imagine being unable to compute abspath in certain > circumstances ( certainly on older UNIX systems you could be > inside a directory without sufficient privileges to walk back > up the tree for getcwd and its equivalents ) > > -0 for adding some kind of convenience functions to importlib(?) for this > (+0 except that I can see heaps of bikeshedding) With all -1 above, what is your preferred way to refer to resources that are places into subdirectories of your script directory? -- anatoly t. From p.f.moore at gmail.com Sun Sep 29 20:16:59 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 29 Sep 2013 19:16:59 +0100 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames In-Reply-To: References: <20130928232641.GA15985@cskk.homeip.net> Message-ID: On 29 September 2013 18:39, anatoly techtonik wrote: > With all -1 above, what is your preferred way to refer to resources > that are places into subdirectories of your script directory? If you are an imported module, pkgutil.get_data (because that handles modules in zipfiles, etc). Otherwise, if you are running a single-file script: with open(os.path.join(os.path.dirname(__file__), 'path', 'to', 'my', 'data')) as f: data = f.read() Why is this a problem? Paul From tjreedy at udel.edu Sun Sep 29 21:13:59 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 29 Sep 2013 15:13:59 -0400 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames In-Reply-To: References: <20130928232641.GA15985@cskk.homeip.net> Message-ID: On 9/29/2013 1:36 PM, anatoly techtonik wrote: > On Sun, Sep 29, 2013 at 9:15 AM, Nick Coghlan wrote: >> On 29 September 2013 13:28, INADA Naoki wrote: >>> os.path.abspath(__file__) returns wrong path after chdir. >>> So I don't think abspath of module can be trivially and reliably derived >>> from existing values. >> >> Hence the part about any remaining instances of non-absolute __file__ >> values being considered a bug in the import system. > > Bug that will not be fixed, i.e. a wart. Nick said "we tend not to fix them in maintenance releases", which I take to mean we can fix in new versions. > And as a result we don't have a way to reliably reference filename > of the current script and its directory. Hence the proposal. The proposed addition would not happen in maintenance releases either. -- Terry Jan Reedy From tjreedy at udel.edu Sun Sep 29 21:19:38 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 29 Sep 2013 15:19:38 -0400 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames In-Reply-To: References: <20130928232641.GA15985@cskk.homeip.net> Message-ID: On 9/28/2013 11:28 PM, INADA Naoki wrote: > os.path.abspath(__file__) returns wrong path after chdir. So grab the path before chdir (which most programs do not do). > So I don't think abspath of module can be trivially and reliably derived > from existing values. It apparently can if you do so in a timely fashion. Grabbing it as soon as possible is the obvious time to do it. > $ cat foo.py > import os foopath = os.path.abspath(__file__) Now print it or do whatever you want with it. -- Terry Jan Reedy From storchaka at gmail.com Sun Sep 29 22:38:30 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 29 Sep 2013 23:38:30 +0300 Subject: [Python-ideas] pprint in displayhook In-Reply-To: <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> References: <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> Message-ID: 28.09.13 07:17, Raymond Hettinger ???????(??): > This might be a reasonable idea if pprint were in better shape. > I think substantial work needs to be done on it, before it would > be worthy of becoming the default method of display. What should be changed in pprint? From storchaka at gmail.com Sun Sep 29 22:42:20 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 29 Sep 2013 23:42:20 +0300 Subject: [Python-ideas] An exhaust() function for iterators In-Reply-To: References: Message-ID: 29.09.13 07:06, Clay Sweetser ???????(??): > I would like to propose that this function, or one very similar to it, > be added to the standard library, either in the itertools module, or > the standard namespace. > If nothing else, doing so would at least give a single *obvious* way > to exhaust an iterator, instead of the several miscellaneous methods > available. I prefer optimize the for loop so that it will be most efficient way (it is already most obvious way). From ncoghlan at gmail.com Mon Sep 30 00:18:41 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 30 Sep 2013 08:18:41 +1000 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames In-Reply-To: References: <20130928232641.GA15985@cskk.homeip.net> Message-ID: On 30 Sep 2013 05:14, "Terry Reedy" wrote: > > On 9/29/2013 1:36 PM, anatoly techtonik wrote: >> >> On Sun, Sep 29, 2013 at 9:15 AM, Nick Coghlan wrote: >>> >>> On 29 September 2013 13:28, INADA Naoki wrote: >>>> >>>> os.path.abspath(__file__) returns wrong path after chdir. >>>> So I don't think abspath of module can be trivially and reliably derived >>>> from existing values. >>> >>> >>> Hence the part about any remaining instances of non-absolute __file__ >>> values being considered a bug in the import system. >> >> >> Bug that will not be fixed, i.e. a wart. > > > Nick said "we tend not to fix them in maintenance releases", which I take to mean we can fix in new versions. Correct, it's the kind of arguably backwards incompatible bug fix that users will generally tolerate in a feature release but would be justifiably upset about in a maintenance release. Cheers, Nick. > > >> And as a result we don't have a way to reliably reference filename >> of the current script and its directory. Hence the proposal. > > > The proposed addition would not happen in maintenance releases either. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Mon Sep 30 00:17:46 2013 From: cs at zip.com.au (Cameron Simpson) Date: Mon, 30 Sep 2013 08:17:46 +1000 Subject: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames In-Reply-To: References: Message-ID: <20130929221746.GA11746@cskk.homeip.net> On 29Sep2013 20:39, anatoly techtonik wrote: | On Sun, Sep 29, 2013 at 2:26 AM, Cameron Simpson wrote: | > Maybe I'm grumpy this morning (though I felt the same reading this yesterday). [...] | > -1 because this is trivial trival code. | > -1 because you can do all this with relative paths anyway, no need for abspath [...] | With all -1 above, what is your preferred way to refer to resources | that are places into subdirectories of your script directory? Probably os.path.join(os.path.dirname(__file__), "datafile-here"). I've got some unit tests that want that. No need for abspath at all. Of course, chdir and passing paths to programs-which-are-not-my-children present scope for wanting abspath, but in the very common simple case: unnecessary and therefore undesirable. And I'm aware that modules-inside-zip-files don't work with this; let us ignore that; they won't work with abspath either:-) It is so trite that I can't imagine wanting to bolt it into the stdlib. Cheers, -- Cameron Simpson Thousands of years ago the Egyptians worshipped cats as gods. Cats have never forgotten this. - David Wren-Hardin From solipsis at pitrou.net Mon Sep 30 10:17:59 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 30 Sep 2013 10:17:59 +0200 Subject: [Python-ideas] Python 3.4 should include docopt as-is References: <5246B2A2.4080508@nedbatchelder.com> Message-ID: <20130930101759.0645c0b9@pitrou.net> Le Sun, 29 Sep 2013 07:50:35 +0200, Georg Brandl a ?crit : > Am 28.09.2013 22:28, schrieb Terry Reedy: > > On 9/28/2013 6:42 AM, Ned Batchelder wrote: > >> On 9/28/13 12:44 AM, anatoly techtonik wrote: > >>> This - http://docopt.org/ - should be included with Python 3.4 > >>> distribution. > >> > >> In addition to the other questions already asked, you haven't > >> answered the fundamental one: Why should docopt be included in the > >> stdlib? It's right there in PyPI where any one can get it. Why > >> is it better in the stdlib than in PyPI? > > > > The stdlib has mostly switched from using optparse to argparse. The > > next question is what relation docopt has to either?? What is its > > backend? Anyway, it strikes be as a wrapper module best kept as > > third party, similar to re and urllib wrappers. > > Especially since it's one of the more "magical" argument parsers, > which is fine as a library but not something we like to put in the > standard library. Agreed. It's also not the most appealing API IMO. Regards Antoine. From vladimir at keleshev.com Mon Sep 30 20:58:53 2013 From: vladimir at keleshev.com (Vladimir Keleshev) Date: Mon, 30 Sep 2013 20:58:53 +0200 Subject: [Python-ideas] Python 3.4 should include docopt as-is In-Reply-To: References: <20130928081009.GU7989@ando> Message-ID: <53381380567533@web12m.yandex.ru> Thanks for notifying me, and sorry for the late reply. I think it would be awesome if docopt became part of the standard library. However, it's not ready yet. I expect 1.0.0 to be released not earlier than 2014. When it's ready I will definitely write a PEP. According to the schedule the "feature freeze" will occur on Nov 24, 2013 together with 3.4.0 beta 1 release. If "feature freeze" means no new things in standard library, then, neither docopt, nor PEP will be ready by that time. Seems like docopt will need to wait another 2 years. But don't lose harts, docopt 1.0.0 will be much better, the language will be much more predictable and simple, the error messages will be much more clear, it will be more parseable and portable. That's why I think it is worth it to wait. Getopt is now about 33 years old and still widely used; So I want docopt to be ready for the year 2046. Cheers, Vladimir Keleshev 28.09.2013, 11:02, "Chris ?Kwpolska? Warrick" : > On Sat, Sep 28, 2013 at 10:10 AM, Steven D'Aprano wrote: > >> ?On Sat, Sep 28, 2013 at 07:44:46AM +0300, anatoly techtonik wrote: >>> ?This - http://docopt.org/ - should be included with Python 3.4 distribution. >> ?Are you the developer or maintainer of docopt? > > He is not. ?I CC?d the developer, Vladimir Keleshev. > >> ?If so, you'll probably need to write a PEP. Otherwise, you'll need to >> ?ask the maintainer of docopt to write a PEP. Some questions that will >> ?need to be asked: >> >> ?- does the maintainer agree to distribute the software under the same >> ?licence as Python? >> >> ?- does the maintainer agree to stick to Python's release schedule? >> >> ?- is the maintainer happy with keeping the API frozen for the next ten >> ?or fifteen years? >> >> ?I see that docopt is now up to version 0.6.1. To me, that indicates that >> ?the API should not be considered stable, it's under version 1. Perhaps >> ?the maintainer disagrees, and would be happy to freeze the API now. >> >> ?-- >> ?Steven >> ?_______________________________________________ >> ?Python-ideas mailing list >> ?Python-ideas at python.org >> ?https://mail.python.org/mailman/listinfo/python-ideas > > -- > Chris ?Kwpolska? Warrick > PGP: 5EAAEA16 > stop html mail | always bottom-post | only UTF-8 makes sense