From mark at qtrac.eu Wed Sep 8 18:50:29 2010 From: mark at qtrac.eu (Mark Summerfield) Date: Wed, 8 Sep 2010 17:50:29 +0100 Subject: [Python-ideas] with statement syntax forces ugly line breaks? Message-ID: <20100908175029.6617ae3b@dino> Hi, I can't see a _nice_ way of splitting a with statement over mulitple lines: class FakeContext: def __init__(self, name): self.name = name def __enter__(self): print("enter", self.name) def __exit__(self, *args): print("exit", self.name) with FakeContext("a") as a, FakeContext("b") as b: pass # works fine with FakeContext("a") as a, FakeContext("b") as b: pass # synax error with (FakeContext("a") as a, FakeContext("b") as b): pass # synax error The use case where this mattered to me was this: with open(args.actual, encoding="utf-8") as afh, open(args.expected, encoding="utf-8") as efh: actual = [line.rstrip("\n\r") for line in afh.readlines()] expected = [line.rstrip("\n\r") for line in efh.readlines()] Naturally, I could split the line in an ugly place: with open(args.actual, encoding="utf-8") as afh, open(args.expected, encoding="utf-8") as efh: but it seems a shame to do so. Or am I missing something? I'm using Python 3.1.2. -- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Rapid GUI Programming with Python and Qt" - ISBN 0132354187 http://www.qtrac.eu/pyqtbook.html From nathan at cmu.edu Wed Sep 8 19:00:25 2010 From: nathan at cmu.edu (Nathan Schneider) Date: Wed, 8 Sep 2010 13:00:25 -0400 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: <20100908175029.6617ae3b@dino> References: <20100908175029.6617ae3b@dino> Message-ID: Mark, I have approached these cases by using the backslash line-continuation operator: with FakeContext("a") as a, \ FakeContext("b") as b: pass Nathan On Wed, Sep 8, 2010 at 12:50 PM, Mark Summerfield wrote: > Hi, > > I can't see a _nice_ way of splitting a with statement over mulitple > lines: > > class FakeContext: > ? ?def __init__(self, name): > ? ? ? ?self.name = name > ? ?def __enter__(self): > ? ? ? ?print("enter", self.name) > ? ?def __exit__(self, *args): > ? ? ? ?print("exit", self.name) > > with FakeContext("a") as a, FakeContext("b") as b: > ? ?pass # works fine > > > with FakeContext("a") as a, > ? ? FakeContext("b") as b: > ? ?pass # synax error > > > with (FakeContext("a") as a, > ? ? ?FakeContext("b") as b): > ? ?pass # synax error > > The use case where this mattered to me was this: > > ? ?with open(args.actual, encoding="utf-8") as afh, > ? ?open(args.expected, encoding="utf-8") as efh: actual = > ? ?[line.rstrip("\n\r") for line in afh.readlines()] expected = > ? ?[line.rstrip("\n\r") for line in efh.readlines()] > > Naturally, I could split the line in an ugly place: > > ? ?with open(args.actual, encoding="utf-8") as afh, open(args.expected, > ? ? ? ? ? ?encoding="utf-8") as efh: > > but it seems a shame to do so. Or am I missing something? > > I'm using Python 3.1.2. > > -- > Mark Summerfield, Qtrac Ltd, www.qtrac.eu > ? ?C++, Python, Qt, PyQt - training and consultancy > ? ? ? ?"Rapid GUI Programming with Python and Qt" - ISBN 0132354187 > ? ? ? ? ? ?http://www.qtrac.eu/pyqtbook.html > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From mwm-keyword-python.b4bdba at mired.org Wed Sep 8 19:04:00 2010 From: mwm-keyword-python.b4bdba at mired.org (Mike Meyer) Date: Wed, 8 Sep 2010 13:04:00 -0400 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: <20100908175029.6617ae3b@dino> References: <20100908175029.6617ae3b@dino> Message-ID: <20100908130400.75ec0a60@bhuda.mired.org> On Wed, 8 Sep 2010 17:50:29 +0100 Mark Summerfield wrote: > Hi, > > I can't see a _nice_ way of splitting a with statement over mulitple > lines: > > class FakeContext: > def __init__(self, name): > self.name = name > def __enter__(self): > print("enter", self.name) > def __exit__(self, *args): > print("exit", self.name) > > with FakeContext("a") as a, FakeContext("b") as b: > pass # works fine > > > with FakeContext("a") as a, > FakeContext("b") as b: > pass # synax error > > > with (FakeContext("a") as a, > FakeContext("b") as b): > pass # synax error How about: with FakeContext("a") as a: with FakeContext("B") as b: If the double-indent bothers you, using two two-space indents might be acceptable in this case. http://www.mired.org/consulting.html Independent Network/Unix/Perforce consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From g.brandl at gmx.net Wed Sep 8 20:07:56 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 08 Sep 2010 20:07:56 +0200 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: <20100908175029.6617ae3b@dino> References: <20100908175029.6617ae3b@dino> Message-ID: Am 08.09.2010 18:50, schrieb Mark Summerfield: > Hi, > > I can't see a _nice_ way of splitting a with statement over mulitple > lines: > > class FakeContext: > def __init__(self, name): > self.name = name > def __enter__(self): > print("enter", self.name) > def __exit__(self, *args): > print("exit", self.name) > > with FakeContext("a") as a, FakeContext("b") as b: > pass # works fine > > > with FakeContext("a") as a, > FakeContext("b") as b: > pass # synax error > > > with (FakeContext("a") as a, > FakeContext("b") as b): > pass # synax error In addition to the backslash hint already given, I'd like to explain why this version isn't allowed: the parser couldn't distinguish between a multi-context with and an expression in parentheses. (In the case of import, where parens can be used around the import list, this is different, no arbitrary expression is allowed.) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From ncoghlan at gmail.com Wed Sep 8 23:30:26 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Sep 2010 07:30:26 +1000 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: References: <20100908175029.6617ae3b@dino> Message-ID: On Thu, Sep 9, 2010 at 4:07 AM, Georg Brandl wrote: > In addition to the backslash hint already given, I'd like to explain why > this version isn't allowed: the parser couldn't distinguish between a > multi-context with and an expression in parentheses. > > (In the case of import, where parens can be used around the import list, > this is different, no arbitrary expression is allowed.) I've sometimes wondered if we should consider the idea of making line continuation implicit between keywords and their associated colons. I've never seriously investigated the implications for the parser, though. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From python at mrabarnett.plus.com Thu Sep 9 00:17:11 2010 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 08 Sep 2010 23:17:11 +0100 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: References: <20100908175029.6617ae3b@dino> Message-ID: <4C880B67.5070607@mrabarnett.plus.com> On 08/09/2010 22:30, Nick Coghlan wrote: > On Thu, Sep 9, 2010 at 4:07 AM, Georg Brandl wrote: >> In addition to the backslash hint already given, I'd like to explain why >> this version isn't allowed: the parser couldn't distinguish between a >> multi-context with and an expression in parentheses. >> >> (In the case of import, where parens can be used around the import list, >> this is different, no arbitrary expression is allowed.) > > I've sometimes wondered if we should consider the idea of making line > continuation implicit between keywords and their associated colons. > I've never seriously investigated the implications for the parser, > though. > If a colon was omitted by mistake, how much later would the parser report a syntax error? From greg.ewing at canterbury.ac.nz Thu Sep 9 01:19:47 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 09 Sep 2010 11:19:47 +1200 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: <4C880B67.5070607@mrabarnett.plus.com> References: <20100908175029.6617ae3b@dino> <4C880B67.5070607@mrabarnett.plus.com> Message-ID: <4C881A13.4060709@canterbury.ac.nz> MRAB wrote: > On 08/09/2010 22:30, Nick Coghlan wrote: > >> I've sometimes wondered if we should consider the idea of making line >> continuation implicit between keywords and their associated colons. >> > If a colon was omitted by mistake, how much later would the parser > report a syntax error? It might be best to allow this only if the continuation lines are indented at least as far as the starting line. -- Greg From mikegraham at gmail.com Thu Sep 9 01:47:50 2010 From: mikegraham at gmail.com (Mike Graham) Date: Wed, 8 Sep 2010 19:47:50 -0400 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: References: <20100908175029.6617ae3b@dino> Message-ID: On Wed, Sep 8, 2010 at 5:30 PM, Nick Coghlan wrote: > I've sometimes wondered if we should consider the idea of making line > continuation implicit between keywords and their associated colons. This would also have the nice aesthetic quality of making colons serve a purpose. From greg at krypto.org Thu Sep 9 07:05:35 2010 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 8 Sep 2010 22:05:35 -0700 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: References: <20100908175029.6617ae3b@dino> Message-ID: On Wed, Sep 8, 2010 at 10:00 AM, Nathan Schneider wrote: > Mark, > > I have approached these cases by using the backslash line-continuation > operator: > > with FakeContext("a") as a, \ > FakeContext("b") as b: > pass > > Nathan > I'm in the "\ is evil" at all costs camp so I'd suggest either the nested with statements or alternatively do this: fc = FakeContext with fc("a") as a, fc("b") as b: pass > On Wed, Sep 8, 2010 at 12:50 PM, Mark Summerfield wrote: > > Hi, > > > > I can't see a _nice_ way of splitting a with statement over mulitple > > lines: > > > > class FakeContext: > > def __init__(self, name): > > self.name = name > > def __enter__(self): > > print("enter", self.name) > > def __exit__(self, *args): > > print("exit", self.name) > > > > with FakeContext("a") as a, FakeContext("b") as b: > > pass # works fine > > > > > > with FakeContext("a") as a, > > FakeContext("b") as b: > > pass # synax error > > > > > > with (FakeContext("a") as a, > > FakeContext("b") as b): > > pass # synax error > > > > The use case where this mattered to me was this: > > > > with open(args.actual, encoding="utf-8") as afh, > > open(args.expected, encoding="utf-8") as efh: actual = > > [line.rstrip("\n\r") for line in afh.readlines()] expected = > > [line.rstrip("\n\r") for line in efh.readlines()] > > > > Naturally, I could split the line in an ugly place: > > > > with open(args.actual, encoding="utf-8") as afh, open(args.expected, > > encoding="utf-8") as efh: > > > > but it seems a shame to do so. Or am I missing something? > > > > I'm using Python 3.1.2. > > > > -- > > Mark Summerfield, Qtrac Ltd, www.qtrac.eu > > C++, Python, Qt, PyQt - training and consultancy > > "Rapid GUI Programming with Python and Qt" - ISBN 0132354187 > > http://www.qtrac.eu/pyqtbook.html > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at qtrac.eu Thu Sep 9 07:49:51 2010 From: mark at qtrac.eu (Mark Summerfield) Date: Thu, 9 Sep 2010 06:49:51 +0100 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: References: <20100908175029.6617ae3b@dino> Message-ID: <20100909064951.1e1b4df3@dino> Hi Nathan, On Wed, 8 Sep 2010 13:00:25 -0400 Nathan Schneider wrote: > Mark, > > I have approached these cases by using the backslash > line-continuation operator: > > with FakeContext("a") as a, \ > FakeContext("b") as b: > pass Yes, of course, and that's the way I've done it. But it seems a pity to do it this way when the documentation explicitly discourages the use of the backslash for line continuation: http://docs.python.org/py3k/howto/doanddont.html (look at the very last item) > > Nathan > > On Wed, Sep 8, 2010 at 12:50 PM, Mark Summerfield > wrote: > > Hi, > > > > I can't see a _nice_ way of splitting a with statement over mulitple > > lines: > > > > class FakeContext: > > ? ?def __init__(self, name): > > ? ? ? ?self.name = name > > ? ?def __enter__(self): > > ? ? ? ?print("enter", self.name) > > ? ?def __exit__(self, *args): > > ? ? ? ?print("exit", self.name) > > > > with FakeContext("a") as a, FakeContext("b") as b: > > ? ?pass # works fine > > > > > > with FakeContext("a") as a, > > ? ? FakeContext("b") as b: > > ? ?pass # synax error > > > > > > with (FakeContext("a") as a, > > ? ? ?FakeContext("b") as b): > > ? ?pass # synax error > > > > The use case where this mattered to me was this: > > > > ? ?with open(args.actual, encoding="utf-8") as afh, > > ? ?open(args.expected, encoding="utf-8") as efh: actual = > > ? ?[line.rstrip("\n\r") for line in afh.readlines()] expected = > > ? ?[line.rstrip("\n\r") for line in efh.readlines()] > > > > Naturally, I could split the line in an ugly place: > > > > ? ?with open(args.actual, encoding="utf-8") as afh, > > open(args.expected, encoding="utf-8") as efh: > > > > but it seems a shame to do so. Or am I missing something? > > > > I'm using Python 3.1.2. > > > > -- > > Mark Summerfield, Qtrac Ltd, www.qtrac.eu > > ? ?C++, Python, Qt, PyQt - training and consultancy > > ? ? ? ?"Rapid GUI Programming with Python and Qt" - ISBN 0132354187 > > ? ? ? ? ? ?http://www.qtrac.eu/pyqtbook.html > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > -- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Programming in Python 3" - ISBN 0321680561 http://www.qtrac.eu/py3book.html From ben+python at benfinney.id.au Thu Sep 9 09:55:38 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 09 Sep 2010 17:55:38 +1000 Subject: [Python-ideas] with statement syntax forces ugly line breaks? References: <20100908175029.6617ae3b@dino> Message-ID: <87k4mv9wqt.fsf@benfinney.id.au> "Gregory P. Smith" writes: > On Wed, Sep 8, 2010 at 10:00 AM, Nathan Schneider wrote: > > I have approached these cases by using the backslash line-continuation > > operator: > > > > with FakeContext("a") as a, \ > > FakeContext("b") as b: > > pass > > I'm in the "\ is evil" at all costs camp [?] I agree, especially when we have a much neater continuation mechanism that could work just fine here:: with (FakeContext("a") as a, FakeContext("b") as b): pass -- \ ?[Entrenched media corporations will] maintain the status quo, | `\ or die trying. Either is better than actually WORKING for a | _o__) living.? ?ringsnake.livejournal.com, 2007-11-12 | Ben Finney From andy at insectnation.org Thu Sep 9 11:06:25 2010 From: andy at insectnation.org (Andy Buckley) Date: Thu, 09 Sep 2010 10:06:25 +0100 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: References: <20100908175029.6617ae3b@dino> Message-ID: <4C88A391.5070209@insectnation.org> On 09/09/10 00:47, Mike Graham wrote: > On Wed, Sep 8, 2010 at 5:30 PM, Nick Coghlan wrote: >> I've sometimes wondered if we should consider the idea of making line >> continuation implicit between keywords and their associated colons. > > This would also have the nice aesthetic quality of making colons serve > a purpose. Good point! I'm regularly niggled that backslash continuations are needed for long conditional statements where parentheses are not logically necessary (and look disturbingly unpythonic.) There's no ambiguity in allowing statements to extend until the colon, particularly if Greg's "at least as far" indentation rule is applied. +1 from me. Andy From g.brandl at gmx.net Thu Sep 9 14:08:25 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 09 Sep 2010 14:08:25 +0200 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: <4C881A13.4060709@canterbury.ac.nz> References: <20100908175029.6617ae3b@dino> <4C880B67.5070607@mrabarnett.plus.com> <4C881A13.4060709@canterbury.ac.nz> Message-ID: Am 09.09.2010 01:19, schrieb Greg Ewing: > MRAB wrote: >> On 08/09/2010 22:30, Nick Coghlan wrote: >> >>> I've sometimes wondered if we should consider the idea of making line >>> continuation implicit between keywords and their associated colons. >>> >> If a colon was omitted by mistake, how much later would the parser >> report a syntax error? > > It might be best to allow this only if the continuation > lines are indented at least as far as the starting line. That is dangerous, it makes the whitespace rules more complicated. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Thu Sep 9 14:14:50 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 09 Sep 2010 14:14:50 +0200 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: <20100909064951.1e1b4df3@dino> References: <20100908175029.6617ae3b@dino> <20100909064951.1e1b4df3@dino> Message-ID: Am 09.09.2010 07:49, schrieb Mark Summerfield: > Hi Nathan, > > On Wed, 8 Sep 2010 13:00:25 -0400 > Nathan Schneider wrote: >> Mark, >> >> I have approached these cases by using the backslash >> line-continuation operator: >> >> with FakeContext("a") as a, \ >> FakeContext("b") as b: >> pass > > Yes, of course, and that's the way I've done it. But it seems a pity to > do it this way when the documentation explicitly discourages the use of > the backslash for line continuation: > http://docs.python.org/py3k/howto/doanddont.html > (look at the very last item) Which is actually factually incorrect and should be rewritten. The only situation where stray whitespace after a backslash is valid syntax is within a string literal (and there, there is no alternative). So at least the "stray whitespace leads to silently buggy code" reason not to use backslashes is wrong. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Thu Sep 9 14:17:37 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 09 Sep 2010 14:17:37 +0200 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: <87k4mv9wqt.fsf@benfinney.id.au> References: <20100908175029.6617ae3b@dino> <87k4mv9wqt.fsf@benfinney.id.au> Message-ID: Am 09.09.2010 09:55, schrieb Ben Finney: > "Gregory P. Smith" > writes: > >> On Wed, Sep 8, 2010 at 10:00 AM, Nathan Schneider wrote: >> > I have approached these cases by using the backslash line-continuation >> > operator: >> > >> > with FakeContext("a") as a, \ >> > FakeContext("b") as b: >> > pass >> >> I'm in the "\ is evil" at all costs camp [?] > > I agree, especially when we have a much neater continuation mechanism > that could work just fine here:: > > with (FakeContext("a") as a, > FakeContext("b") as b): > pass No, it could not work just fine. You are basically banning tuples from the context expression (remember that the "as" clause is optional). Maybe one could argue that this is not a problem because tuples are not context managers anyway, but how would this work then: i = 0 or 1 with (a, b)[i]: Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Thu Sep 9 14:16:49 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 09 Sep 2010 14:16:49 +0200 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: <87k4mv9wqt.fsf@benfinney.id.au> References: <20100908175029.6617ae3b@dino> <87k4mv9wqt.fsf@benfinney.id.au> Message-ID: Am 09.09.2010 09:55, schrieb Ben Finney: > "Gregory P. Smith" > writes: > >> On Wed, Sep 8, 2010 at 10:00 AM, Nathan Schneider wrote: >> > I have approached these cases by using the backslash line-continuation >> > operator: >> > >> > with FakeContext("a") as a, \ >> > FakeContext("b") as b: >> > pass >> >> I'm in the "\ is evil" at all costs camp [?] > > I agree, especially when we have a much neater continuation mechanism > that could work just fine here:: > > with (FakeContext("a") as a, > FakeContext("b") as b): > pass No, it could not work just fine. You are basically banning tuples from the context expression (remember that the "as" clause is optional). You would argue that this is not a problem because tuples are not context -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From ncoghlan at gmail.com Thu Sep 9 14:53:37 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Sep 2010 22:53:37 +1000 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: References: <20100908175029.6617ae3b@dino> <4C880B67.5070607@mrabarnett.plus.com> <4C881A13.4060709@canterbury.ac.nz> Message-ID: On Thu, Sep 9, 2010 at 10:08 PM, Georg Brandl wrote: > Am 09.09.2010 01:19, schrieb Greg Ewing: >> MRAB wrote: >>> On 08/09/2010 22:30, Nick Coghlan wrote: >>> >>>> I've sometimes wondered if we should consider the idea of making line >>>> continuation implicit between keywords and their associated colons. >>>> >>> If a colon was omitted by mistake, how much later would the parser >>> report a syntax error? >> >> It might be best to allow this only if the continuation >> lines are indented at least as far as the starting line. > > That is dangerous, it makes the whitespace rules more complicated. I'm actually not sure it is even *possible* in general to implement my suggestion given the deliberate limitations of Python's parser. Parentheses normally work their indentation-ignoring magic by dropping down into expression evaluation scope where indentation isn't significant (import is a special case where this doesn't quite happen, but it's a rather constrained one). This is definitely a wart in the with statement syntax, but it really isn't clear how best to resolve it. You can at least use parentheses in the individual context expressions, even though you can't wrap the whole thing: .>> from contextlib import contextmanager .>> @contextmanager ... def FakeContext(a): ... yield a ... .>> with FakeContext(1) as x, ( ... FakeContext(2)) as y: ... print(x, y) ... 1 2 Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From grosser.meister.morti at gmx.net Thu Sep 9 15:02:24 2010 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Thu, 09 Sep 2010 15:02:24 +0200 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: References: <20100908175029.6617ae3b@dino> <87k4mv9wqt.fsf@benfinney.id.au> Message-ID: <4C88DAE0.9070607@gmx.net> On 09/09/2010 02:17 PM, Georg Brandl wrote: > Am 09.09.2010 09:55, schrieb Ben Finney: >> "Gregory P. Smith" >> writes: >> >>> On Wed, Sep 8, 2010 at 10:00 AM, Nathan Schneider wrote: >>>> I have approached these cases by using the backslash line-continuation >>>> operator: >>>> >>>> with FakeContext("a") as a, \ >>>> FakeContext("b") as b: >>>> pass >>> >>> I'm in the "\ is evil" at all costs camp [?] >> >> I agree, especially when we have a much neater continuation mechanism >> that could work just fine here:: >> >> with (FakeContext("a") as a, >> FakeContext("b") as b): >> pass > > No, it could not work just fine. You are basically banning tuples from the > context expression (remember that the "as" clause is optional). > > Maybe one could argue that this is not a problem because tuples are not > context managers anyway, but how would this work then: > > i = 0 or 1 > with (a, b)[i]: > > Georg > Just write: with ((a, b)[i]): It's ugly but it would work. ;) -panzi From mal at egenix.com Thu Sep 9 15:32:15 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 09 Sep 2010 15:32:15 +0200 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: <20100908175029.6617ae3b@dino> References: <20100908175029.6617ae3b@dino> Message-ID: <4C88E1DF.3090502@egenix.com> Mark Summerfield wrote: > Hi, > > I can't see a _nice_ way of splitting a with statement over mulitple > lines: > > class FakeContext: > def __init__(self, name): > self.name = name > def __enter__(self): > print("enter", self.name) > def __exit__(self, *args): > print("exit", self.name) > > with FakeContext("a") as a, FakeContext("b") as b: > pass # works fine > > > with FakeContext("a") as a, > FakeContext("b") as b: > pass # synax error > > > with (FakeContext("a") as a, > FakeContext("b") as b): > pass # synax error > > The use case where this mattered to me was this: > > with open(args.actual, encoding="utf-8") as afh, > open(args.expected, encoding="utf-8") as efh: actual = > [line.rstrip("\n\r") for line in afh.readlines()] expected = > [line.rstrip("\n\r") for line in efh.readlines()] > > Naturally, I could split the line in an ugly place: > > with open(args.actual, encoding="utf-8") as afh, open(args.expected, > encoding="utf-8") as efh: > > but it seems a shame to do so. Or am I missing something? Why do you need to put everything on one line ? afh = open(args.actual, encoding="utf-8") efh = open(args.expected, encoding="utf-8") with afh, efh: ... In the context of files, the only purpose of the with statement is to close them when leaving the block. >>> a = open('/etc/passwd') >>> b = open('/etc/group') >>> with a,b: print a.readline(), b.readline() ... at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash at:!:25: >>> a >>> b -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 09 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-08-19: Released mxODBC 3.1.0 http://python.egenix.com/ 2010-09-15: DZUG Tagung, Dresden, Germany 6 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From fuzzyman at voidspace.org.uk Thu Sep 9 15:41:52 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 9 Sep 2010 14:41:52 +0100 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: <4C88E1DF.3090502@egenix.com> References: <20100908175029.6617ae3b@dino> <4C88E1DF.3090502@egenix.com> Message-ID: On 9 September 2010 14:32, M.-A. Lemburg wrote: > [snip...] > Why do you need to put everything on one line ? > > afh = open(args.actual, encoding="utf-8") > efh = open(args.expected, encoding="utf-8") > > with afh, efh: > ... > > In the context of files, the only purpose of the with statement > is to close them when leaving the block. > > >>> a = open('/etc/passwd') > >>> b = open('/etc/group') > If my understanding is correct (which is perhaps unlikely...), using a single line will close a if opening b fails. Whereas doing them separately before the with statement risks leaving the first un-exited if creating the second fails. Michael > >>> with a,b: print a.readline(), b.readline() > ... > at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash > at:!:25: > > >>> a > > >>> b > > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Sep 09 2010) > >>> Python/Zope Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > 2010-08-19: Released mxODBC 3.1.0 http://python.egenix.com/ > 2010-09-15 : DZUG Tagung, Dresden, > Germany 6 days to go > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Sep 9 15:53:49 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 09 Sep 2010 15:53:49 +0200 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: References: <20100908175029.6617ae3b@dino> <4C88E1DF.3090502@egenix.com> Message-ID: <4C88E6ED.8000807@egenix.com> Michael Foord wrote: > On 9 September 2010 14:32, M.-A. Lemburg wrote: > >> [snip...] >> Why do you need to put everything on one line ? >> >> afh = open(args.actual, encoding="utf-8") >> efh = open(args.expected, encoding="utf-8") >> >> with afh, efh: >> ... >> >> In the context of files, the only purpose of the with statement >> is to close them when leaving the block. >> >>>>> a = open('/etc/passwd') >>>>> b = open('/etc/group') >> > > If my understanding is correct (which is perhaps unlikely...), using a > single line will close a if opening b fails. Whereas doing them separately > before the with statement risks leaving the first un-exited if creating the > second fails. Right, but if you stuff everything on a single line, your error handling will have a hard time figuring out which of the two failed to open. I was under the impression that Mark wanted to "protect" the inner block of the with statement, not the context manager creation itself. As usual: hiding away too much stuff in your closet makes things look tidy, but causes a hell of a mess if you ever need to open it again :-) > Michael > > >>>>> with a,b: print a.readline(), b.readline() >> ... >> at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash >> at:!:25: >> >>>>> a >> >>>>> b >> >> >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Source (#1, Sep 09 2010) >>>>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ >> ________________________________________________________________________ >> 2010-08-19: Released mxODBC 3.1.0 http://python.egenix.com/ >> 2010-09-15 : DZUG Tagung, Dresden, >> Germany 6 days to go >> >> ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: >> >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 09 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-08-19: Released mxODBC 3.1.0 http://python.egenix.com/ 2010-09-15: DZUG Tagung, Dresden, Germany 6 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mark at qtrac.eu Thu Sep 9 16:13:54 2010 From: mark at qtrac.eu (Mark Summerfield) Date: Thu, 9 Sep 2010 15:13:54 +0100 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: <4C88E6ED.8000807@egenix.com> References: <20100908175029.6617ae3b@dino> <4C88E1DF.3090502@egenix.com> <4C88E6ED.8000807@egenix.com> Message-ID: <20100909151354.6d0ce7a8@dino> On Thu, 09 Sep 2010 15:53:49 +0200 "M.-A. Lemburg" wrote: > Michael Foord wrote: > > On 9 September 2010 14:32, M.-A. Lemburg wrote: > > > >> [snip...] > >> Why do you need to put everything on one line ? > >> > >> afh = open(args.actual, encoding="utf-8") > >> efh = open(args.expected, encoding="utf-8") > >> > >> with afh, efh: > >> ... > >> > >> In the context of files, the only purpose of the with statement > >> is to close them when leaving the block. > >> > >>>>> a = open('/etc/passwd') > >>>>> b = open('/etc/group') > >> > > > > If my understanding is correct (which is perhaps unlikely...), > > using a single line will close a if opening b fails. Whereas doing > > them separately before the with statement risks leaving the first > > un-exited if creating the second fails. > > Right, but if you stuff everything on a single line, your > error handling will have a hard time figuring out which of > the two failed to open. > > I was under the impression that Mark wanted to "protect" the > inner block of the with statement, not the context manager > creation itself. Actually, I was more interested in the aesthetics. I've become habituated to _never_ using \ continuations and found it unsightly to need one here. > As usual: hiding away too much stuff in your closet makes things > look tidy, but causes a hell of a mess if you ever need to open > it again :-) :-) > > > Michael > > > > > >>>>> with a,b: print a.readline(), b.readline() > >> ... > >> at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash > >> at:!:25: > >> > >>>>> a > >> > >>>>> b > >> > >> > >> -- > >> Marc-Andre Lemburg > >> eGenix.com > >> > >> Professional Python Services directly from the Source (#1, Sep 09 > >> 2010) > >>>>> Python/Zope Consulting and Support ... > >>>>> http://www.egenix.com/ > >>>>> mxODBC.Zope.Database.Adapter ... > >>>>> http://zope.egenix.com/ mxODBC, mxDateTime, > >>>>> mxTextTools ... http://python.egenix.com/ > >> ________________________________________________________________________ > >> 2010-08-19: Released mxODBC 3.1.0 > >> http://python.egenix.com/ 2010-09-15 > >> : DZUG Tagung, Dresden, > >> Germany 6 days to go > >> > >> ::: Try our new mxODBC.Connect Python Database Interface for > >> free ! :::: > >> > >> > >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > >> Registered at Amtsgericht Duesseldorf: HRB 46611 > >> http://www.egenix.com/company/contact/ > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> http://mail.python.org/mailman/listinfo/python-ideas > >> > > > > > > > -- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Programming in Python 3" - ISBN 0321680561 http://www.qtrac.eu/py3book.html From fuzzyman at voidspace.org.uk Thu Sep 9 16:34:25 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 9 Sep 2010 15:34:25 +0100 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: <4C88E6ED.8000807@egenix.com> References: <20100908175029.6617ae3b@dino> <4C88E1DF.3090502@egenix.com> <4C88E6ED.8000807@egenix.com> Message-ID: On 9 September 2010 14:53, M.-A. Lemburg wrote: > Michael Foord wrote: > > On 9 September 2010 14:32, M.-A. Lemburg wrote: > > > >> [snip...] > >> Why do you need to put everything on one line ? > >> > >> afh = open(args.actual, encoding="utf-8") > >> efh = open(args.expected, encoding="utf-8") > >> > >> with afh, efh: > >> ... > >> > >> In the context of files, the only purpose of the with statement > >> is to close them when leaving the block. > >> > >>>>> a = open('/etc/passwd') > >>>>> b = open('/etc/group') > >> > > > > If my understanding is correct (which is perhaps unlikely...), using a > > single line will close a if opening b fails. Whereas doing them > separately > > before the with statement risks leaving the first un-exited if creating > the > > second fails. > > Right, but if you stuff everything on a single line, your > error handling will have a hard time figuring out which of > the two failed to open. > If you *need* to distinguish at a higher level then you have no choice. I was really just pointing out that there are *semantic* differences as well, and in fact the code you posted is less safe than the one line version. You lose some of the error handling built-in to context manager creation. Michael > > I was under the impression that Mark wanted to "protect" the > inner block of the with statement, not the context manager > creation itself. > > As usual: hiding away too much stuff in your closet makes things > look tidy, but causes a hell of a mess if you ever need to open > it again :-) > > > Michael > > > > > >>>>> with a,b: print a.readline(), b.readline() > >> ... > >> at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash > >> at:!:25: > >> > >>>>> a > >> > >>>>> b > >> > >> > >> -- > >> Marc-Andre Lemburg > >> eGenix.com > >> > >> Professional Python Services directly from the Source (#1, Sep 09 2010) > >>>>> Python/Zope Consulting and Support ... http://www.egenix.com/ > >>>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ > >>>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > >> ________________________________________________________________________ > >> 2010-08-19: Released mxODBC 3.1.0 > http://python.egenix.com/ > >> 2010-09-15 : DZUG Tagung, > Dresden, > >> Germany 6 days to go > >> > >> ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > >> > >> > >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > >> Registered at Amtsgericht Duesseldorf: HRB 46611 > >> http://www.egenix.com/company/contact/ > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> http://mail.python.org/mailman/listinfo/python-ideas > >> > > > > > > > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Sep 09 2010) > >>> Python/Zope Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > 2010-08-19: Released mxODBC 3.1.0 http://python.egenix.com/ > 2010-09-15 : DZUG Tagung, Dresden, > Germany 6 days to go > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Sep 9 22:55:15 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 09 Sep 2010 16:55:15 -0400 Subject: [Python-ideas] with statement syntax forces ugly line breaks? In-Reply-To: References: <20100908175029.6617ae3b@dino> <20100909064951.1e1b4df3@dino> Message-ID: On 9/9/2010 8:14 AM, Georg Brandl wrote: > Am 09.09.2010 07:49, schrieb Mark Summerfield: >> Hi Nathan, >> >> On Wed, 8 Sep 2010 13:00:25 -0400 >> Nathan Schneider wrote: >>> Mark, >>> >>> I have approached these cases by using the backslash >>> line-continuation operator: >>> >>> with FakeContext("a") as a, \ Adding a space makes the following a SyntaxError. No silent error here. >>> FakeContext("b") as b: >>> pass >> >> Yes, of course, and that's the way I've done it. But it seems a pity to >> do it this way when the documentation explicitly discourages the use of >> the backslash for line continuation: >> http://docs.python.org/py3k/howto/doanddont.html >> (look at the very last item) If no one uses \ for end of line escape, it should be removed ... But I am not suggesting that. > Which is actually factually incorrect and should be rewritten. The only > situation where stray whitespace after a backslash is valid syntax is > within a string literal (and there, there is no alternative). > > So at least the "stray whitespace leads to silently buggy code" reason > not to use backslashes is wrong. > > Georg > -- Terry Jan Reedy From cool-rr at cool-rr.com Fri Sep 10 18:37:44 2010 From: cool-rr at cool-rr.com (cool-RR) Date: Fri, 10 Sep 2010 18:37:44 +0200 Subject: [Python-ideas] Why not f(*my_list, *my_other_list) ? Message-ID: I noticed that it's impossible to call a Python function with two starred argument lists, like this: `f(*my_list, *my_other_list)`. I mean, if someone wants to feed two lists of arguments into a function, why not? I understand why you can't have two stars in a function definition; But why can't you have two (or more) stars in a function call? Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Fri Sep 10 18:54:33 2010 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 10 Sep 2010 17:54:33 +0100 Subject: [Python-ideas] Why not f(*my_list, *my_other_list) ? In-Reply-To: References: Message-ID: <4C8A62C9.1040206@mrabarnett.plus.com> On 10/09/2010 17:37, cool-RR wrote: > I noticed that it's impossible to call a Python function with two > starred argument lists, like this: `f(*my_list, *my_other_list)`. I > mean, if someone wants to feed two lists of arguments into a function, > why not? > > I understand why you can't have two stars in a function definition; But > why can't you have two (or more) stars in a function call? > Would there be any advantage over `f(*(my_list + my_other_list))`? (Send to wrong list originally :-() From benjamin at python.org Fri Sep 10 19:03:20 2010 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 10 Sep 2010 17:03:20 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Why_not_f=28*my=5Flist=2C*my=5Fother=5Fl?= =?utf-8?b?aXN0KSA/?= References: Message-ID: cool-RR writes: > > I noticed that it's impossible to call a Python function with two starred argument lists, like this: `f(*my_list, *my_other_list)`. I mean, if someone wants to feed two lists of arguments into a function, why not? Okay, so why would you want to? From phd at phd.pp.ru Fri Sep 10 18:57:13 2010 From: phd at phd.pp.ru (Oleg Broytman) Date: Fri, 10 Sep 2010 20:57:13 +0400 Subject: [Python-ideas] Why not f(*my_list, *my_other_list) ? In-Reply-To: References: Message-ID: <20100910165713.GA24612@phd.pp.ru> On Fri, Sep 10, 2010 at 06:37:44PM +0200, cool-RR wrote: > f(*my_list, *my_other_list) Not every one-lined should be a syntax. Just call f(*(my_list + my_other_list)) Oleg. -- Oleg Broytman http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From stefan_ml at behnel.de Fri Sep 10 19:16:52 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 10 Sep 2010 19:16:52 +0200 Subject: [Python-ideas] Why not f(*my_list,*my_other_list) ? In-Reply-To: References: Message-ID: Benjamin Peterson, 10.09.2010 19:03: > cool-RR writes: > >> >> I noticed that it's impossible to call a Python function with two starred > argument lists, like this: `f(*my_list, *my_other_list)`. I mean, if someone > wants to feed two lists of arguments into a function, why not? > > Okay, so why would you want to? Well, it can happen. It doesn't merit a syntax extension, though. You can just do args_for_f = tuple(my_list) + tuple(my_other_list) f(*args_for_f) (using tuple() here in case both are not really lists) Stefan From daniel at stutzbachenterprises.com Fri Sep 10 19:34:42 2010 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Fri, 10 Sep 2010 12:34:42 -0500 Subject: [Python-ideas] Why not f(*my_list,*my_other_list) ? In-Reply-To: References: Message-ID: On Fri, Sep 10, 2010 at 12:16 PM, Stefan Behnel wrote: > args_for_f = tuple(my_list) + tuple(my_other_list) > f(*args_for_f) > An alternative with better performance is: from itertools import chain f(*chain(my_list, my_other_list)) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergio at gruposinternet.com.br Fri Sep 10 19:43:30 2010 From: sergio at gruposinternet.com.br (=?ISO-8859-1?Q?S=E9rgio?= Surkamp) Date: Fri, 10 Sep 2010 14:43:30 -0300 Subject: [Python-ideas] Why not f(*my_list, *my_other_list) ? In-Reply-To: References: Message-ID: <20100910144330.640866f2@icedearth.corp.grupos.com.br> Em Fri, 10 Sep 2010 18:37:44 +0200 cool-RR escreveu: > I noticed that it's impossible to call a Python function with two > starred argument lists, like this: `f(*my_list, *my_other_list)`. I > mean, if someone wants to feed two lists of arguments into a > function, why not? > > I understand why you can't have two stars in a function definition; > But why can't you have two (or more) stars in a function call? > > > Ram. How the compiler should treat that? Put half of the arguments in the first list and the other half on the second list? Regards, -- .:''''':. .:' ` S?rgio Surkamp | Gerente de Rede :: ........ sergio at gruposinternet.com.br `:. .:' `:, ,.:' *Grupos Internet S.A.* `: :' R. Lauro Linhares, 2123 Torre B - Sala 201 : : Trindade - Florian?polis - SC :.' :: +55 48 3234-4109 : ' http://www.gruposinternet.com.br From mikegraham at gmail.com Fri Sep 10 21:28:09 2010 From: mikegraham at gmail.com (Mike Graham) Date: Fri, 10 Sep 2010 15:28:09 -0400 Subject: [Python-ideas] Why not f(*my_list,*my_other_list) ? In-Reply-To: References: Message-ID: On Fri, Sep 10, 2010 at 1:34 PM, Daniel Stutzbach wrote: > An alternative with better performance is: > > from itertools import chain > f(*chain(my_list, my_other_list)) Maybe. From tjreedy at udel.edu Fri Sep 10 23:25:35 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 10 Sep 2010 17:25:35 -0400 Subject: [Python-ideas] Why not f(*my_list, *my_other_list) ? In-Reply-To: References: Message-ID: On 9/10/2010 12:37 PM, cool-RR wrote: > I noticed that it's impossible to call a Python function with two > starred argument lists, like this: `f(*my_list, *my_other_list)`. I > mean, if someone wants to feed two lists of arguments into a function, > why not? > > I understand why you can't have two stars in a function definition; But > why can't you have two (or more) stars in a function call? Beyond 0. Not needed as others explained, some speculations: 1. Calls are designed to mirror definition. No multiple stars in definition means no multiple stars in calls. 2. Multiple stars begin to look like typing errors. 3. No one ever thought to support such. 4. It would make the call process even more complex, and it is slow enough already. 5. It might conflict with the current implementation. -- Terry Jan Reedy From guido at python.org Sat Sep 11 01:25:04 2010 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Sep 2010 16:25:04 -0700 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object In-Reply-To: <4C8AB874.9010703@openvpn.net> References: <4C8AB874.9010703@openvpn.net> Message-ID: Moving to python-ideas. Have you seen http://www.python.org/dev/peps/pep-3148/ ? That seems exactly what you want. --Guido On Fri, Sep 10, 2010 at 4:00 PM, James Yonan wrote: > I'd like to propose that the Python community standardize on a "deferred" > object for asynchronous return values, modeled after the well-thought-out > Twisted Deferred class. > > With more and more Python libraries implementing asynchronicity (for example > Futures -- PEP 3148), it's crucial to have a standard deferred object in > place so that code using a single asynchronous reactor can interoperate with > different asynchronous libraries. > > I think a lot of people don't realize how much cooler and more elegant it is > to return a deferred object from an asynchronous function rather than using > a generic callback approach (where you pass a function argument to the > asynchronous function telling it where to call when the asynchronous > operation completes). > > While asynchronous systems have been shown to have excellent scalability > properties, the callback-based programming style often used in asynchronous > programming has been criticized for breaking up the sequential readability > of program logic. > > This problem is elegantly addressed by using Deferred Generators. ?Since > Python 2.5 added enhanced generators (i.e. the capability for "yield" to > return a value), the infrastructure is now in place to allow an asynchronous > function to be written in a sequential style, without the use of explicit > callbacks. > > See the following blog article for a nice write-up on the capability: > > http://blog.mekk.waw.pl/archives/14-Twisted-inlineCallbacks-and-deferredGenerator.html > > Mekk's Twisted Deferred example: > > @defer.inlineCallbacks > def someFunction(): > ? ?a = 1 > ? ?b = yield deferredReturningFunction(a) > ? ?c = yield anotherDeferredReturningFunction(a, b) > ? ?defer.returnValue(c) > > What's cool about this is that between the two yield statements, the Twisted > reactor is in control meaning that other pending asynchronous tasks can be > attended to or the thread's remaining time slice can be yielded to the > kernel, yet this is all accomplished without the use of multi-threading. > ?Another interesting aspect of this approach is that since it leverages on > Python's enhanced generators, an exception thrown inside either of the > deferred-returning functions will be propagated through to someFunction() > where it can be handled with try/except. > > Think about what this means -- this sort of emulates the "stackless" design > pattern you would expect in Erlang or Stackless Python without leaving > standard Python. ?And it's made possible under the hood by Python Enhanced > Generators. > > Needless to say, it would be great to see this coolness be part of the > standard Python library, instead of having every Python asynchronous library > implement its own ad-hoc callback system. > > James Yonan > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Sat Sep 11 02:07:19 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Sep 2010 10:07:19 +1000 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object In-Reply-To: References: <4C8AB874.9010703@openvpn.net> Message-ID: On Sat, Sep 11, 2010 at 9:25 AM, Guido van Rossum wrote: > Moving to python-ideas. > > Have you seen http://www.python.org/dev/peps/pep-3148/ ? That seems > exactly what you want. James did mention that in the post, although he didn't say what deferreds really added beyond what futures provide, and why the "add_done_callback" method isn't adequate to provide interoperability between futures and deferreds (which would be odd, since Brian made changes to that part of PEP 3148 to help with that interoperability after discussions with Glyph). Between PEP 380 and PEP 3148 I'm not really seeing a lot more scope for standardisation in this space though. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From jnoller at gmail.com Sat Sep 11 18:03:12 2010 From: jnoller at gmail.com (Jesse Noller) Date: Sat, 11 Sep 2010 09:03:12 -0700 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object In-Reply-To: References: <4C8AB874.9010703@openvpn.net> Message-ID: On Fri, Sep 10, 2010 at 5:07 PM, Nick Coghlan wrote: > On Sat, Sep 11, 2010 at 9:25 AM, Guido van Rossum wrote: >> Moving to python-ideas. >> >> Have you seen http://www.python.org/dev/peps/pep-3148/ ? That seems >> exactly what you want. > > James did mention that in the post, although he didn't say what > deferreds really added beyond what futures provide, and why the > "add_done_callback" method isn't adequate to provide interoperability > between futures and deferreds (which would be odd, since Brian made > changes to that part of PEP 3148 to help with that interoperability > after discussions with Glyph). > > Between PEP 380 and PEP 3148 I'm not really seeing a lot more scope > for standardisation in this space though. > > Cheers, > Nick. That was my initial reaction as well, but I'm more than open to hearing from Jean Paul/Glyph and the other twisted folks on this. From guido at python.org Sun Sep 12 04:26:50 2010 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Sep 2010 19:26:50 -0700 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object In-Reply-To: References: <4C8AB874.9010703@openvpn.net> Message-ID: (Summary: I want to make an apology, and reopen the debate. Possibly relevant: PEP 342, PEP 380, PEP 3148, PEP 3152.) On Sat, Sep 11, 2010 at 9:03 AM, Jesse Noller wrote: > On Fri, Sep 10, 2010 at 5:07 PM, Nick Coghlan wrote: >> On Sat, Sep 11, 2010 at 9:25 AM, Guido van Rossum wrote: >>> Moving to python-ideas. >>> >>> Have you seen http://www.python.org/dev/peps/pep-3148/ ? That seems >>> exactly what you want. >> >> James did mention that in the post, Whoops. I was a bit quick at the trigger there. >> although he didn't say what >> deferreds really added beyond what futures provide, and why the >> "add_done_callback" method isn't adequate to provide interoperability >> between futures and deferreds (which would be odd, since Brian made >> changes to that part of PEP 3148 to help with that interoperability >> after discussions with Glyph). >> >> Between PEP 380 and PEP 3148 I'm not really seeing a lot more scope >> for standardisation in this space though. >> >> Cheers, >> Nick. > > That was my initial reaction as well, but I'm more than open to > hearing from Jean Paul/Glyph and the other twisted folks on this. Re-reading the OP's post[0] and the blog[1] he references, I notice that he did not mention PEP 380 (which for the blog's example doesn't actually add much except adding a nicer way to return a value from a generator) but he did mention the awesomeness of not needing threads when using deferreds. He sounds as if the python-dev community had never heard of that style of handling concurrency, which seems backwards: the generator-based style of doing it was introduced in PEP 342 which enabled Twisted's inline callbacks. (Though he does mention Python Enhanced Generators which could be an implicit reference to PEP 342 -- "Coroutines via Enhanced Generators".) But thinking about this more I don't know that it will be easy to mix PEP 3148, which is solidly thread-based, with a PEP 342 style scheduler (whether or not the PEP 380 enhancements are applied, or even PEP 3152). And if we take the OP's message at face value, his point isn't so much that Twisted is great, but that in order to benefit maximally from PEP 342 there needs to be a standard way of using callbacks. I think that's probably true. And comparing the blog's examples to PEP 3148, I find Twisted's terminology rather confusing compared to the PEP's clean Futures API (where IMO you can ignore almost everything except result()). Maybe it's possible to write a little framework that lets you create Futures using either threads, processes (both supported by PEP 3148) or generators. But I haven't tried it. And maybe the need to use 'yield' for everything that may block when using generators, but not when using threads or processes, will make this awkward. So maybe we'll be stuck with at least two Future-like APIs: PEP 3148 and something else, generator-based. Or maybe PEP 3152. So, yes, there may be something here, and let's reopen the discussion. And I apologize for shooting first and asking questions second. [0] http://mail.python.org/pipermail/python-dev/2010-September/103576.html [1] http://blog.mekk.waw.pl/archives/14-Twisted-inlineCallbacks-and-deferredGenerator.html -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Sun Sep 12 13:03:38 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 12 Sep 2010 13:03:38 +0200 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object References: <4C8AB874.9010703@openvpn.net> Message-ID: <20100912130338.714643f8@pitrou.net> On Sat, 11 Sep 2010 19:26:50 -0700 Guido van Rossum wrote: > > But thinking about this more I don't know that it will be easy to mix > PEP 3148, which is solidly thread-based, with a PEP 342 style > scheduler (whether or not the PEP 380 enhancements are applied, or > even PEP 3152). I'm not sure why. The implementation is certainly thread-based, but functions such as `wait(fs, timeout=None, return_when=ALL_COMPLETED)` could be implemented in termes of a single-threaded event loop / job scheduler. Actually, Twisted has a similar primitive in DeferredList, although more powerful since the DeferredList itself is a Deferred, and can therefore be further combined, etc.: http://twistedmatrix.com/documents/10.0.0/api/twisted.internet.defer.DeferredList.html > And comparing the > blog's examples to PEP 3148, I find Twisted's terminology rather > confusing compared to the PEP's clean Futures API (where IMO you can > ignore almost everything except result()). Well, apart from the API which may be considered a taste issue (I have used Deferreds long before I heard about Futures, so perhaps I'm a bit biased), the following API doc in PEP 3148 shows that the Future model of callbacks is less rich than Twisted's: ?add_done_callback(fn) Attaches a callable fn to the future that will be called when the future is cancelled or finishes running. fn will be called with the future as its only argument. Added callables are called in the order that they were added and are always called in a thread belonging to the process that added them. If the callable raises an Exception then it will be logged and ignored. If the callable raises another BaseException then behavior is not defined.? With Twisted Deferreds, when a callback or errback raises an error, its exception isn't ?logged and ignored?, it is passed to the remaining errback chain attached to the Deferred. This is part of what makes Deferreds more complicated to understand, but it also makes them more powerful. Another key point is that a callback can itself return another Deferred object, in which case the next callback (or errback, in case of error) will be called only once the other Deferred produces a result. This is all handled transparently and you can freely mix callbacks that immediately return a value, and callbacks that return a Deferred whose final value will be available later. And the other Deferred can have its own callback/errback chain, etc. (just for the record, the ?final value? of a Deferred is the value returned by the last callback in the chain) I think the main reason, though, that people find Deferreds inconvenient is that they force you to think in terms of asynchronicity (well, almost: you can of course hack yourself some code which blocks until a Deferred has a value, but it's extremely discouraged). They would like to have officially supported methods like `result(timeout=None)` which make simple things (like quick scripts to fetch a bunch of URLs) simpler. Twisted is generally used for server applications where such code is out of question (in an async model, that is). Regards Antoine. From guido at python.org Sun Sep 12 17:49:56 2010 From: guido at python.org (Guido van Rossum) Date: Sun, 12 Sep 2010 08:49:56 -0700 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object In-Reply-To: <20100912130338.714643f8@pitrou.net> References: <4C8AB874.9010703@openvpn.net> <20100912130338.714643f8@pitrou.net> Message-ID: On Sun, Sep 12, 2010 at 4:03 AM, Antoine Pitrou wrote: > On Sat, 11 Sep 2010 19:26:50 -0700 > Guido van Rossum wrote: >> >> But thinking about this more I don't know that it will be easy to mix >> PEP 3148, which is solidly thread-based, with a PEP 342 style >> scheduler (whether or not the PEP 380 enhancements are applied, or >> even PEP 3152). > > I'm not sure why. The implementation is certainly thread-based, but > functions such as `wait(fs, timeout=None, return_when=ALL_COMPLETED)` > could be implemented in termes of a single-threaded event loop / job > scheduler. Sure, but the tricky thing is to make it pluggable so that PEP 3148 and Twisted and other frameworks can use it all together, and a single call will accept a mixture of Futures. I also worry that "impure" code will have a hard time -- e.g. when mixing generator-based coroutines and thread-based futures, it would be quite bad if a coroutine called .result() on a Future or the .wait() function instead of yielding to the scheduler. > Actually, Twisted has a similar primitive in DeferredList, although > more powerful since the DeferredList itself is a Deferred, and can > therefore be further combined, etc.: > > http://twistedmatrix.com/documents/10.0.0/api/twisted.internet.defer.DeferredList.html This sounds similar to the way you can create derived futures in Java. >> And comparing the >> blog's examples to PEP 3148, I find Twisted's terminology rather >> confusing compared to the PEP's clean Futures API (where IMO you can >> ignore almost everything except result()). > > Well, apart from the API which may be considered a taste issue (I have > used Deferreds long before I heard about Futures, so perhaps I'm a bit > biased), I heard of Deferred long before PEP 3148 was even conceived, but I find Twisted's terminology terribly confusing while I find the PEP's names easy to understand. > the following API doc in PEP 3148 shows that the Future model > of callbacks is less rich than Twisted's: > > ?add_done_callback(fn) > > ? ?Attaches a callable fn to the future that will be called when the > ? ?future is cancelled or finishes running. fn will be called with the > ? ?future as its only argument. > > ? ?Added callables are called in the order that they were added and > ? ?are always called in a thread belonging to the process that added > ? ?them. If the callable raises an Exception then it will be logged > ? ?and ignored. If the callable raises another BaseException then > ? ?behavior is not defined.? > > With Twisted Deferreds, when a callback or errback raises an error, its > exception isn't ?logged and ignored?, it is passed to the remaining > errback chain attached to the Deferred. This is part of what makes > Deferreds more complicated to understand, but it also makes them more > powerful. Yeah, please do explain why Twisted has so much machinery to handle exceptions? ISTM that the main difference is that add_done_callback() isn't meant for callbacks that return a value. So then the exceptions that might be raised are kind of "out of band". For any API that returns a value I agree that raising an exception should be handled -- but in the PEP 342 world we can do that by passing exceptions back into coroutine using throw(), so no separate "success" and "failure" callbacks are needed. > Another key point is that a callback can itself return another Deferred > object, in which case the next callback (or errback, in case of error) > will be called only once the other Deferred produces a result. This is > all handled transparently and you can freely mix callbacks that > immediately return a value, and callbacks that return a Deferred whose > final value will be available later. And the other Deferred can have > its own callback/errback chain, etc. Yeah, that is part of what makes it so utterly confusing. PEP 380 supports a similar thing but much cleaner, without ever using callbacks. > (just for the record, the ?final value? of a Deferred is the value > returned by the last callback in the chain) > > > I think the main reason, though, that people find Deferreds > inconvenient is that they force you to think in terms of > asynchronicity (well, almost: you can of course hack yourself > some code which blocks until a Deferred has a value, but it's > extremely discouraged). They would like to have officially > supported methods like `result(timeout=None)` which make simple things > (like quick scripts to fetch a bunch of URLs) simpler. Twisted is > generally used for server applications where such code is out of > question (in an async model, that is). Actually I think the main reason is historic: Twisted introduced callback-based asynchronous (thread-less) programming when there was no alternative in Python, and they invented both the mechanisms and the terminology as they were figuring it all out. That is no mean feat. But with PEP 342 (generator-based coroutines) and especially PEP 380 (yield from) there *is* an alternative, and while Twisted has added APIs to support generators, it hasn't started to deprecate its other APIs, and its terminology becomes hard to follow for people (like me, frankly) who first learned this stuff through PEP 342. -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Sun Sep 12 18:17:51 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 12 Sep 2010 18:17:51 +0200 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object References: <4C8AB874.9010703@openvpn.net> <20100912130338.714643f8@pitrou.net> Message-ID: <20100912181751.2aa5bb32@pitrou.net> On Sun, 12 Sep 2010 08:49:56 -0700 Guido van Rossum wrote: > > Sure, but the tricky thing is to make it pluggable so that PEP 3148 > and Twisted and other frameworks can use it all together, and a single > call will accept a mixture of Futures. Having a common abstraction (Future or Deferred) allows for scheduling-agnostic libraries which consume and/or produce these abstractions (*). I'm not sure it is desireable to mix scheduling models in a single process (let alone a single thread), though. (*) Of course, the abstraction is somehow leaky since being called from different threads, depending on the scheduling model, could have adverse consequences > ISTM that the main difference is that add_done_callback() isn't meant > for callbacks that return a value. So then the exceptions that might > be raised are kind of "out of band". It implies that it's mostly useful for simple callbacks (which would e.g. print out a success report, or set an Event to wake up another thread). The Twisted model allows the major part of processing to occur in the callbacks themselves, in which case proper error handling and propagation is mandatory. Regards Antoine. From guido at python.org Sun Sep 12 18:48:20 2010 From: guido at python.org (Guido van Rossum) Date: Sun, 12 Sep 2010 09:48:20 -0700 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object In-Reply-To: <20100912181751.2aa5bb32@pitrou.net> References: <4C8AB874.9010703@openvpn.net> <20100912130338.714643f8@pitrou.net> <20100912181751.2aa5bb32@pitrou.net> Message-ID: On Sun, Sep 12, 2010 at 9:17 AM, Antoine Pitrou wrote: > On Sun, 12 Sep 2010 08:49:56 -0700 > Guido van Rossum wrote: >> >> Sure, but the tricky thing is to make it pluggable so that PEP 3148 >> and Twisted and other frameworks can use it all together, and a single >> call will accept a mixture of Futures. > > Having a common abstraction (Future or Deferred) allows for > scheduling-agnostic libraries which consume and/or produce these > abstractions (*). I'm not sure it is desireable to mix scheduling models > in a single process (let alone a single thread), though. IIRC even Twisted supports putting stuff in a thread if you really need it. And have you looked at Go's Goroutines? They are a hybrid -- they don't map 1:1 to OS threads, but they aren't pure coroutines either, so that if a goroutine blocks on I/O the others will still make progress. > (*) Of course, the abstraction is somehow leaky since being called from > different threads, depending on the scheduling model, could have adverse > consequences Yeah, this is always a problem with pure async frameworks -- if one callback or coroutine blocks by mistake, the whole world is blocked. (So Goroutines attempt to fix this; I have no idea how successful they are.) >> ISTM that the main difference is that add_done_callback() isn't meant >> for callbacks that return a value. So then the exceptions that might >> be raised are kind of "out of band". > > It implies that it's mostly useful for simple callbacks (which would > e.g. print out a success report, or set an Event to wake up another > thread). The Twisted model allows the major part of processing to occur > in the callbacks themselves, in which case proper error handling and > propagation is mandatory. A generator-based coroutines approach can do this too (just put the work between the yields in the generator) and has all the proper exception-propagation stuff built in since PEP 342 (PEP 380 will just make it easier). And a Futures-based approach can do it too -- it's not described in PEP 3148, but you can easily design an API for wrappable Futures. -- --Guido van Rossum (python.org/~guido) From yoavglazner at gmail.com Mon Sep 13 14:09:23 2010 From: yoavglazner at gmail.com (yoav glazner) Date: Mon, 13 Sep 2010 14:09:23 +0200 Subject: [Python-ideas] Why not break cycles with one __del__? Message-ID: Hi! I was thinking, why not let python gc break cycles with only one object.__del__ ? I don't see a problem with calling the __del__ method and then proceed as usual (break the cycle if it wasn't already broken by __del__) Many Thanks, Yoav Glazner -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Mon Sep 13 18:16:36 2010 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 13 Sep 2010 12:16:36 -0400 Subject: [Python-ideas] Why not break cycles with one __del__? In-Reply-To: References: Message-ID: On Mon, Sep 13, 2010 at 8:09 AM, yoav glazner wrote: > why not let python gc break cycles with only one > object.__del__ ? If you can point to the code that prevents this, please report a bug. The last time I checked, there were proposals toeither add a __close__ or weaken __del__ to handle multi-__del__ cycles -- but single-__del__ cycles were already handled OK. -jJ From solipsis at pitrou.net Mon Sep 13 19:05:49 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Sep 2010 19:05:49 +0200 Subject: [Python-ideas] Why not break cycles with one __del__? References: Message-ID: <20100913190549.15f218ce@pitrou.net> On Mon, 13 Sep 2010 12:16:36 -0400 Jim Jewett wrote: > > The last time I checked, there were proposals toeither add a > __close__ or weaken __del__ to handle multi-__del__ cycles -- but > single-__del__ cycles were already handled OK. They aren't: >>> class C(list): ... def __del__(self): pass ... >>> c = C() >>> c.append(c) >>> del c >>> import gc >>> gc.collect() 1 >>> gc.garbage [[[...]]] >>> type(gc.garbage[0]) From tim.peters at gmail.com Mon Sep 13 19:25:54 2010 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 13 Sep 2010 13:25:54 -0400 Subject: [Python-ideas] Why not break cycles with one __del__? In-Reply-To: <20100913190549.15f218ce@pitrou.net> References: <20100913190549.15f218ce@pitrou.net> Message-ID: [Jim Jewett] >> The last time I checked ... >> single-__del__ cycles were already handled OK. [Antoine Pitrou] > They aren't: ... Antoine's right, unless things have changed dramatically since last time I was intimate with that code. CPython's "cyclic garbage detection" makes no attempt to analyze cycle structure. It infers that all trash it sees must be in cycles simply because the trash hasn't already been collected by the regular refcount-based gc. The presence of __del__ on a trash object then disqualifies it from further analysis, but there's no analysis of cycle structure regardless. Of course it doesn't _have_ to be that way. Nobody cared enough yet to add a pile of new code to special-case cycles with a single __del__. From benjamin at python.org Mon Sep 13 21:22:02 2010 From: benjamin at python.org (Benjamin) Date: Mon, 13 Sep 2010 19:22:02 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Why_not_break_cycles_with_one_=5F=5Fdel?= =?utf-8?b?X18/?= References: <20100913190549.15f218ce@pitrou.net> Message-ID: Tim Peters writes: > Of course it doesn't _have_ to be that way. Nobody cared enough yet > to add a pile of new code to special-case cycles with a single > __del__. And hopefully no one will. That would be very brittle. From solipsis at pitrou.net Mon Sep 13 22:28:08 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Sep 2010 22:28:08 +0200 Subject: [Python-ideas] Why not break cycles with one __del__? References: <20100913190549.15f218ce@pitrou.net> Message-ID: <20100913222808.2459784a@pitrou.net> On Mon, 13 Sep 2010 19:22:02 +0000 (UTC) Benjamin wrote: > Tim Peters writes: > > Of course it doesn't _have_ to be that way. Nobody cared enough yet > > to add a pile of new code to special-case cycles with a single > > __del__. > > And hopefully no one will. That would be very brittle. Why would it be? From fuzzyman at voidspace.org.uk Mon Sep 13 22:36:35 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 13 Sep 2010 21:36:35 +0100 Subject: [Python-ideas] Why not break cycles with one __del__? In-Reply-To: References: <20100913190549.15f218ce@pitrou.net> Message-ID: On 13 September 2010 20:22, Benjamin wrote: > Tim Peters writes: > > Of course it doesn't _have_ to be that way. Nobody cared enough yet > > to add a pile of new code to special-case cycles with a single > > __del__. > > And hopefully no one will. That would be very brittle. > > More brittle than what PyPy, IronPython (and presumably) jython do? (Which is make cycles collectable by arbitrarily breaking them IIUC.) Michael > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From yoavglazner at gmail.com Mon Sep 13 22:56:09 2010 From: yoavglazner at gmail.com (yoav glazner) Date: Mon, 13 Sep 2010 22:56:09 +0200 Subject: [Python-ideas] Why not break cycles with one __del__? In-Reply-To: References: <20100913190549.15f218ce@pitrou.net> Message-ID: > > And hopefully no one will. That would be very brittle >> > Why do you hope for that? that is the "one obvious way to do it" -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Mon Sep 13 23:31:45 2010 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 13 Sep 2010 21:31:45 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Why_not_break_cycles_with_one_=5F=5Fdel?= =?utf-8?b?X18/?= References: <20100913190549.15f218ce@pitrou.net> <20100913222808.2459784a@pitrou.net> Message-ID: Antoine Pitrou writes: > > On Mon, 13 Sep 2010 19:22:02 +0000 (UTC) > Benjamin wrote: > > Tim Peters writes: > > > Of course it doesn't _have_ to be that way. Nobody cared enough yet > > > to add a pile of new code to special-case cycles with a single > > > __del__. > > > > And hopefully no one will. That would be very brittle. > > Why would it be? Because if you're cycle suddenly had more than one __del__, it would stop being collected. From ncoghlan at gmail.com Mon Sep 13 23:39:00 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 14 Sep 2010 07:39:00 +1000 Subject: [Python-ideas] Why not break cycles with one __del__? In-Reply-To: References: <20100913190549.15f218ce@pitrou.net> Message-ID: On Tue, Sep 14, 2010 at 3:25 AM, Tim Peters wrote: > [Jim Jewett] >>> The last time I checked ... >>> single-__del__ cycles were already handled OK. > > [Antoine Pitrou] >> They aren't: ... > > Antoine's right, unless things have changed dramatically since last > time I was intimate with that code. ?CPython's "cyclic garbage > detection" makes no attempt to analyze cycle structure. ?It infers > that all trash it sees must be in cycles simply because the trash > hasn't already been collected by the regular refcount-based gc. ?The > presence of __del__ on a trash object then disqualifies it from > further analysis, but there's no analysis of cycle structure > regardless. I had a skim through that code last night, and as far as I can tell it still works that way. However, it should be noted that the cyclic GC actually does release everything *else* in the cycle - it's solely the objects with __del__ methods that remain alive. There does appear to a *little* bit of structural analysis going on - it looks like the "finalizers" list ends up containing both objects with __del__ methods, as well as all other objects in the cyclic trash that are reachable from the objects with __del__ methods. > Of course it doesn't _have_ to be that way. ?Nobody cared enough yet > to add a pile of new code to special-case cycles with a single > __del__. Just from skimming the code, I wonder if, once finalizers has been figured out, the GC could further partition that list into "to_delete" (no __del__ method), "to_finalize" (__del__ method, but all referrers in cycle have no __del__ method) and "uncollectable" (multiple __del__ methods in cycle). Alternatively, when building finalizers, build two lists: one for objects with __del__ methods and one for objects that are reachable from objects with __del__ methods. Objects that appear only in the first list could safely have their finalisers invoked, while those that also in the latter could not. This is definitely a case of "code talks" though - there's no fundamental problem with the idea, but also no great incentive for anyone to code it when __del__ is comparatively easy to avoid (although not trivial, see Raymond's recent modifications to OrderedDictionary to avoid exactly this issue). Or, accept that __del__ is evil, and try to come up with a workable proposal for that better weakref callback based scheme Jim mentioned. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From greg.ewing at canterbury.ac.nz Tue Sep 14 04:44:25 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Sep 2010 14:44:25 +1200 Subject: [Python-ideas] Why not break cycles with one __del__? In-Reply-To: References: <20100913190549.15f218ce@pitrou.net> Message-ID: <4C8EE189.40408@canterbury.ac.nz> Nick Coghlan wrote: > Alternatively, when building finalizers, build two > lists: one for objects with __del__ methods and one for objects that > are reachable from objects with __del__ methods. But since it's a cycle, isn't *everything* in the cycle going to be reachable from everything else? -- Greg From tim.peters at gmail.com Tue Sep 14 05:04:08 2010 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 13 Sep 2010 23:04:08 -0400 Subject: [Python-ideas] Why not break cycles with one __del__? In-Reply-To: <4C8EE189.40408@canterbury.ac.nz> References: <20100913190549.15f218ce@pitrou.net> <4C8EE189.40408@canterbury.ac.nz> Message-ID: [Nick Coghlan] >> Alternatively, when building finalizers, build two >> lists: one for objects with __del__ methods and one for objects that >> are reachable from objects with __del__ methods. [Greg Ewing] > But since it's a cycle, isn't *everything* in the cycle > going to be reachable from everything else? Note that I was sloppy in saying that CPython's cyclic gc only sees trash objects in cycles. More accurately, it sees trash objects in cycles, and objects (which may or may not be in cycles) reachable only from trash objects in cycles. For example, if objects A and B point to each other, that's a cycle. If A also happens to point to D, where D has a __del__ method, and nothing else points to D, then that's a case where D is not in a cycle, but is nevertheless trash if A and B are trash. And if A and B lack finalizers, then CPython's cyclic gc will reclaim D, despite that it does have a __del__. That pattern is exploitable too. If, e.g., you have some resource R that needs to be cleaned up, owned by an object A that may participate in cycles, it's often possible to put R in a different, very simple object with a __del__ method, and have A point to that latter object instead. From guido at python.org Tue Sep 14 05:07:10 2010 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Sep 2010 20:07:10 -0700 Subject: [Python-ideas] Why not break cycles with one __del__? In-Reply-To: References: <20100913190549.15f218ce@pitrou.net> <4C8EE189.40408@canterbury.ac.nz> Message-ID: On Mon, Sep 13, 2010 at 8:04 PM, Tim Peters wrote: > [Nick Coghlan] >>> Alternatively, when building finalizers, build two >>> lists: one for objects with __del__ methods and one for objects that >>> are reachable from objects with __del__ methods. > > [Greg Ewing] >> But since it's a cycle, isn't *everything* in the cycle >> going to be reachable from everything else? > > Note that I was sloppy in saying that CPython's cyclic gc only sees > trash objects in cycles. ?More accurately, it sees trash objects in > cycles, and objects (which may or may not be in cycles) reachable only > from trash objects in cycles. ?For example, if objects A and B point > to each other, that's a cycle. ?If A also happens to point to D, where > D has a __del__ method, and nothing else points to D, then that's a > case where D is not in a cycle, but is nevertheless trash if A and B > are trash. ?And if A and B lack finalizers, then CPython's cyclic gc > will reclaim D, despite that it does have a __del__. > > That pattern is exploitable too. ?If, e.g., you have some resource R > that needs to be cleaned up, owned by an object A that may participate > in cycles, it's often possible to put R in a different, very simple > object with a __del__ method, and have A point to that latter object > instead. Yeah, I think we even recommended this pattern at some point. ISTR we designed the new io library to exploit it. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Tue Sep 14 06:16:37 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Sep 2010 16:16:37 +1200 Subject: [Python-ideas] Using * in indexes Message-ID: <4C8EF725.3050807@canterbury.ac.nz> I just found myself writing a method like this: def __getitem__(self, index): return self.data[(Ellipsis,) + index + (slice(),)] I would have liked to write it like this: self.data[..., index, :] because that would make it much easier to see what's being done. However, that won't work if index is itself a tuple of index elements. So I'd like to be able to do this: self.data[..., *index, :] -- Greg From scott+python-ideas at scottdial.com Tue Sep 14 07:12:37 2010 From: scott+python-ideas at scottdial.com (Scott Dial) Date: Tue, 14 Sep 2010 01:12:37 -0400 Subject: [Python-ideas] Why not break cycles with one __del__? In-Reply-To: References: <20100913190549.15f218ce@pitrou.net> <4C8EE189.40408@canterbury.ac.nz> Message-ID: <4C8F0445.2000905@scottdial.com> On 9/13/2010 11:07 PM, Guido van Rossum wrote: > On Mon, Sep 13, 2010 at 8:04 PM, Tim Peters wrote: >> [Nick Coghlan] >>>> Alternatively, when building finalizers, build two >>>> lists: one for objects with __del__ methods and one for objects that >>>> are reachable from objects with __del__ methods. >> >> [Greg Ewing] >>> But since it's a cycle, isn't *everything* in the cycle >>> going to be reachable from everything else? >> >> That pattern is exploitable too. If, e.g., you have some resource R >> that needs to be cleaned up, owned by an object A that may participate >> in cycles, it's often possible to put R in a different, very simple >> object with a __del__ method, and have A point to that latter object >> instead. > > Yeah, I think we even recommended this pattern at some point. ISTR we > designed the new io library to exploit it. > Yes, this topic came up some while back on this list and Tim's solution is exactly the design pattern I suggested then: http://mail.python.org/pipermail/python-ideas/2009-October/006222.html -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From ncoghlan at gmail.com Tue Sep 14 11:51:19 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 14 Sep 2010 19:51:19 +1000 Subject: [Python-ideas] Why not break cycles with one __del__? In-Reply-To: <4C8EE189.40408@canterbury.ac.nz> References: <20100913190549.15f218ce@pitrou.net> <4C8EE189.40408@canterbury.ac.nz> Message-ID: On Tue, Sep 14, 2010 at 12:44 PM, Greg Ewing wrote: > Nick Coghlan wrote: >> >> Alternatively, when building finalizers, build two >> lists: one for objects with __del__ methods and one for objects that >> are reachable from objects with __del__ methods. > > But since it's a cycle, isn't *everything* in the cycle > going to be reachable from everything else? In addition to what Tim said, there may be more than one cycle being collected. So you can have situations like objects, A, B C in one cycle and D, E, F in a different cycle. Suppose A, B and D all have __del__ methods. Then your two lists would be: __del__ method: A, B, D Reachable from objects with __del__ method: A, B, C, E, F It's just another way of viewing what the OP described: cycles containing only a single object with __del__ don't actually have an ordering problem, so you can just call it before you destroy any of the objects. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From mikegraham at gmail.com Tue Sep 14 15:54:49 2010 From: mikegraham at gmail.com (Mike Graham) Date: Tue, 14 Sep 2010 09:54:49 -0400 Subject: [Python-ideas] Using * in indexes In-Reply-To: <4C8EF725.3050807@canterbury.ac.nz> References: <4C8EF725.3050807@canterbury.ac.nz> Message-ID: On Tue, Sep 14, 2010 at 12:16 AM, Greg Ewing wrote: > I just found myself writing a method like this: > > ?def __getitem__(self, index): > ? ?return self.data[(Ellipsis,) + index + (slice(),)] > > I would have liked to write it like this: > > ? self.data[..., index, :] > > because that would make it much easier to see what's > being done. However, that won't work if index is itself > a tuple of index elements. > > So I'd like to be able to do this: > > ? self.data[..., *index, :] If in indexes, why not when making other tuples? Mike From alexander.belopolsky at gmail.com Tue Sep 14 16:09:05 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 14 Sep 2010 10:09:05 -0400 Subject: [Python-ideas] Using * in indexes In-Reply-To: References: <4C8EF725.3050807@canterbury.ac.nz> Message-ID: On Tue, Sep 14, 2010 at 9:54 AM, Mike Graham wrote: .. >> So I'd like to be able to do this: >> >> ? self.data[..., *index, :] > > If in indexes, why not when making other tuples? I believe this and other unpacking generalizations are implemented in issue #2292: http://bugs.python.org/issue2292 From greg.ewing at canterbury.ac.nz Wed Sep 15 00:15:10 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Sep 2010 10:15:10 +1200 Subject: [Python-ideas] Using * in indexes In-Reply-To: References: <4C8EF725.3050807@canterbury.ac.nz> Message-ID: <4C8FF3EE.9020209@canterbury.ac.nz> Mike Graham wrote: > On Tue, Sep 14, 2010 at 12:16 AM, Greg Ewing > wrote: > >> self.data[..., *index, :] > > If in indexes, why not when making other tuples? It would be handy to be able to use it when making other tuples, yes. There's a particularly strong motivation for it in relation to indexes, though, because otherwise you not only end up having to use ugly (foo,) constructs, but you lose the ability to use any of the special indexing syntax. There's also a performance penalty if you end up having to look up 'slice' a bunch of times. -- Greg From greg.ewing at canterbury.ac.nz Wed Sep 15 00:16:22 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Sep 2010 10:16:22 +1200 Subject: [Python-ideas] Using * in indexes In-Reply-To: References: <4C8EF725.3050807@canterbury.ac.nz> Message-ID: <4C8FF436.40305@canterbury.ac.nz> Alexander Belopolsky wrote: > I believe this and other unpacking generalizations are implemented in > issue #2292: http://bugs.python.org/issue2292 Yes, it appears so. Did a PEP for that ever materialise, or is everyone waiting until after the moratorium? -- Greg From tjreedy at udel.edu Wed Sep 15 06:23:18 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 15 Sep 2010 00:23:18 -0400 Subject: [Python-ideas] Using * in indexes In-Reply-To: <4C8FF436.40305@canterbury.ac.nz> References: <4C8EF725.3050807@canterbury.ac.nz> <4C8FF436.40305@canterbury.ac.nz> Message-ID: On 9/14/2010 6:16 PM, Greg Ewing wrote: > Alexander Belopolsky wrote: > >> I believe this and other unpacking generalizations are implemented in >> issue #2292: http://bugs.python.org/issue2292 > > Yes, it appears so. Did a PEP for that ever materialise, > or is everyone waiting until after the moratorium? The only PEP I know of is the one for what has been done: http://www.python.org/dev/peps/pep-3132/ Extended Iterable Unpacking -- Terry Jan Reedy From glyph at twistedmatrix.com Wed Sep 15 23:56:52 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Wed, 15 Sep 2010 17:56:52 -0400 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object In-Reply-To: References: <4C8AB874.9010703@openvpn.net> Message-ID: <9AF93392-544C-4539-98B2-19DB2563172D@twistedmatrix.com> Thanks for the ping about this (I don't think I subscribe to python-ideas, so someone may have to moderate my post in). Sorry for the delay in responding, but I've been kinda busy and cooking up these examples took a bit of thinking. And thanks, James, for restarting this discussion. I obviously find it interesting :). I'm going to mix in some other stuff I found on the web archives, since it's easiest just to reply in one message. I'm sorry that this response is a bit sprawling and doesn't have a single clear narrative, the thread thus far didn't seem to lend it to one. For those of you who don't want to read my usual novel-length post, you can probably stop shortly after the end of the first block of code examples. On Sep 11, 2010, at 10:26 PM, Guido van Rossum wrote: >>> although he didn't say what >>> deferreds really added beyond what futures provide, and why the >>> "add_done_callback" method isn't adequate to provide interoperability >>> between futures and deferreds (which would be odd, since Brian made >>> changes to that part of PEP 3148 to help with that interoperability >>> after discussions with Glyph). >>> >>> Between PEP 380 and PEP 3148 I'm not really seeing a lot more scope >>> for standardisation in this space though. >>> >>> Cheers, >>> Nick. >> >> That was my initial reaction as well, but I'm more than open to >> hearing from Jean Paul/Glyph and the other twisted folks on this. > But thinking about this more I don't know that it will be easy to mix > PEP 3148, which is solidly thread-based, with a PEP 342 style > scheduler (whether or not the PEP 380 enhancements are applied, or > even PEP 3152). And if we take the OP's message at face value, his > point isn't so much that Twisted is great, but that in order to > benefit maximally from PEP 342 there needs to be a standard way of > using callbacks. I think that's probably true. And comparing the > blog's examples to PEP 3148, I find Twisted's terminology rather > confusing compared to the PEP's clean Futures API (where IMO you can > ignore almost everything except result()). That blog post was written to demonstrate why programs using generators are "... far easier to read and write ..." than ones using Deferreds, so it stands to reason it would choose an example where that helps :). When you want to write systems that manage varying levels of parallelism within a single computation, generators can start to get pretty hairy and the "normal" Deferred way of doing things looks more straightforward. Thinking in terms of asynchronicity is tricky, and generators can be a useful tool for promoting that understanding, but they only make it superficially easier. For example: >>> def serial(): >>> results = set() >>> for x in ...: >>> results.add((yield do_something_async(x))) >>> return results If you're writing an application whose parallelism calls for an asynchronous approach, after all, you presumably don't want to be standing around waiting for each network round trip to complete. How do you re-write this so that there are always at least N outstanding do_something_async calls running in parallel? You can sorta do it like this: >>> def parallel(N): >>> results = set() >>> outstanding = [] >>> for x in ...: >>> if len(outstanding) > N: >>> results.add((yield outstanding.pop(0))) >>> else: >>> outstanding.append(do_something_async(x)) but that will always block on one particular do_something_async, when you really want to say "let me know when any outstanding call is complete". So I could handwave about 'yield any_completed(outstanding)'... >>> def parallel(N): >>> results = set() >>> outstanding = set() >>> for x in ...: >>> if len(outstanding) > N: >>> results.add((yield any_completed(outstanding))) >>> else: >>> outstanding.add(do_something_async(x)) but that just begs the question of how you implement any_completed(), and I can't think of a way to do that with generators, without getting into the specifics of some Deferred-or-Future-like asynchronous result object. You could implement such a function with such primitives, and here's what it looks like with Deferreds: >>> def any_completed(setOfDeferreds): >>> d = Deferred() >>> called = [] >>> def fireme(result, whichDeferred): >>> if not called: >>> called.append(True) >>> setOfDeferreds.remove(whichDeferred) >>> d.callback(result) >>> return result >>> for subd in setOfDeferreds: >>> subd.addBoth(fireme, subd) >>> return d Here's how you do the top-level task in Twisted, without generators, in the truly-parallel fashion (keep in mind this combines the functionality of 'any_completed' and 'parallel', so it's a bit shorter): >>> def parallel(N): >>> ds = DeferredSemaphore(N) >>> l = [] >>> def release(result): >>> ds.release() >>> return result >>> def after(sem, it): >>> return do_something_async(it) >>> for x in ...: >>> l.append(ds.acquire().addCallback(after_acquire, x).addBoth(release)) >>> return gatherResults(l).addCallback(set) Some informal benchmarking has shown this method to be considerably faster (on the order of 1/2 to 1/3 as much CPU time) than at least our own inlineCallbacks generator-scheduling method. Take this with the usual fist-sized grain of salt that you do any 'informal' benchmarks, but the difference is significant enough that I do try to refactor into this style in my own code, and I have seen performance benefits from doing this on more specific benchmarks. This is all untested, and that's far too many lines of code to expect to work without testing, but hopefully it gives a pretty good impression of the differences in flavor between the different styles. > Yeah, please do explain why Twisted has so much machinery to handle exceptions? There are a lot of different implied questions here, so I'll answer a few of those. Why does twisted.python.failure exist? The answer to that is that we wanted an object that represented an exception as raised at a particular point, associated with a particular stack, that could live on without necessarily capturing all the state in that stack. If you're going to report failures asynchronously, you don't necessarily want to hold a reference to every single thing in a potentially giant stack while you're waiting to send it to some network endpoint. Also, in 1.5.2 we had no way of chaining exceptions, and this code is that old. Finally, even if you can chain exceptions, it's a serious performance hit to have to re-raise and re-catch the same exception 4 or 5 times in order to translate it or handle it at many different layers of the stack, so a Failure is intended to encapsulate that state such that it can just be returned, in performance-sensitive areas. (This is sort of a weak point though, since the performance of Failure itself is so terrible, for unrelated reasons.) Why is twisted.python.failure such a god damned mess? The answer to that is ... uh, sorry. Yes, it is. We should clean it up. It was written a long time ago and the equivalent module now could be _much_ shorter, simpler, and less of a performance problem. It just never seems to be the highest priority. Maybe after we're done porting to py3 :). My one defense here is that still a slight improvement over the stdlib 'traceback' module ;-). Why do Deferreds have an errback chain rather than just handing you an exception object in the callback chain? Basically, this is for the same reason that Python has exceptions instead of just making you check return codes. We wanted it to be easy to say: >>> d = getPage("http://...") >>> def ok(page): >>> doSomething(...) >>> d.addCallback(ok) and know that the argument to 'ok' would always be what getPage promised (you don't need to typecheck it for exception-ness) and the default error behavior would be to simply bail out with a traceback, not to barrel through your success-path code wreaking havoc. > ISTM that the main difference is that add_done_callback() isn't meant for callbacks that return a value. add_done_callback works fine with callbacks that return a value. If it didn't, I'd be concerned, because then it would have the barrel-through-the-success-path flaw. But, I assume the idiomatic asynchronous-code-using-Futures would look like this: >>> f = some_future_thing(...) >>> def my_callback(future): >>> result = future.result() >>> do_something(result) >>> f.add_done_callback(my_callback) This is one extra line of code as compared to the Twisted version, and chaining involves a bit more gymnastics (somehow creating more futures to return further up the stack, I guess, I haven't thought about it too hard), but it does allow you to handle exceptions with a simple 'except:', rather than calling some exception-handling methods, so I can see why some people would prefer it. > Maybe it's possible to write a little framework that lets you create Futures using either threads, processes (both supported by PEP 3148) or generators. But I haven't tried it. And maybe the need to use 'yield' for everything that may block when using generators, but not when using threads or processes, will make this awkward. You've already addressed the main point that I really wanted to mention here, but I'd like to emphasize it. Blocking and not-blocking are fundamentally different programming styles, and if you sometimes allow blocking on asynchronous results, that means you are effectively always programming in the blocking-and-threaded style and not getting much benefit from the code which does choose to be politely non-blocking. I was somewhat pleased with the changes made to the Futures PEP because you could use them as an asynchronous result, and have things that implemented the Future API but raised an exception if you tried to wait on them. That would at least allow some layer of stdlib compatibility. If you are disciplined and careful, this would let you write async code which used a common interoperability mechanism, and if you weren't careful, it would blow up when you tried to use it the wrong way. But - and I am guessing that this is the main thrust of this discussion - I do think that having Deferred in the standard library would be much, much better if we can do that. > So maybe we'll be stuck with at least two Future-like APIs: PEP 3148 and something else, generator-based. Having something "generator-based" is, in my opinion, an abstraction inversion. The things which you are yielding from these generators are asynchronous results. There should be a specific type for asynchronous results which can be easily interacted with. Generators are syntactic sugar for doing that interaction in a way which doesn't involve defining tons of little functions. This is useful, and it makes the concept more accessible, so I don't say "just" syntactic sugar: but nevertheless, the generators need to be 'yield'ing something, and the type of thing that they're yielding is a Deferred-or-something-like-it. I don't think that this is really two 'Future-like APIs'. At least, they're not redundant, any more than having both socket.makefile() and socket.recv() is redundant. If Future had a deferred() method rather than an add_done_callback() method, then it would always be very clear whether you had a synchronous-but-possibly-not-ready or a purely-asynchronous result. Although it would be equally easy to just have a function that turned a Future into a Deferred by calling add_done_callback(). You can go from any arbitrary Future to a full-featured Deferred, but not the other way around. > Or maybe PEP 3152. I don't like PEP 3152 aesthetically on many levels, but I can't deny that it would do the job. 'cocall', though, really? It would be nice if it read like an actual word, i.e. "yield to" or "invoke" or even just "call" or something. In another message, where Guido is replying to Antoine: >> I think the main reason, though, that people find Deferreds inconvenient is that they force you to think in terms of asynchronicity (...) > > Actually I think the main reason is historic: Twisted introduced callback-based asynchronous (thread-less) programming when there was no alternative in Python, and they invented both the mechanisms and the terminology as they were figuring it all out. That is no mean feat. But with PEP 342 (generator-based coroutines) and especially PEP 380 (yield from) there *is* an alternative, and while Twisted has added APIs to support generators, it hasn't started to deprecate its other APIs, and its terminology becomes hard to follow for people (like me, frankly) who first learned this stuff through PEP 342. I really have to go with Antoine on this one: people were confused about Deferreds long before PEP 342 came along :). Given that Javascript environments have mostly adopted the Twisted terminology (oddly, Node.js doesn't, but Dojo and MochiKit both have pretty literal-minded Deferred translations), there are plenty of people who are familiar with the terminology but still get confused. See the beginning of the message for why we're not deprecating our own APIs. Once again, sorry for not compressing this down further! If you got this far, you win a prize :). -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Thu Sep 16 00:13:23 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Wed, 15 Sep 2010 18:13:23 -0400 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object In-Reply-To: <20100915220952.2058.14020740.divmod.xquotient.544@localhost.localdomain> References: <4C8AB874.9010703@openvpn.net> <9AF93392-544C-4539-98B2-19DB2563172D@twistedmatrix.com> <20100915220952.2058.14020740.divmod.xquotient.544@localhost.localdomain> Message-ID: On Sep 15, 2010, at 6:09 PM, exarkun at twistedmatrix.com wrote: > > Glyph meant this: > > def parallel(N): > ds = DeferredSemaphore(N) > l = [] > for x in ...: > l.append(ds.run(do_something_async, it)) > return gatherResults(l).addCallback(set) > > Jean-Paul I knew it should have looked shorter and sweeter. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel at stutzbachenterprises.com Thu Sep 16 17:35:14 2010 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Thu, 16 Sep 2010 10:35:14 -0500 Subject: [Python-ideas] list.sort with a int or str key Message-ID: list.sort, sorted, and similar methods currently have a "key" argument that accepts a callable. Often, that leads to code looking like this: mylist.sort(key=lambda x: x[1]) myotherlist.sort(key=lambda x: x.length) I would like to propose that the "key" parameter be generalized to accept str and int types, so the above code could be rewritten as follows: mylist.sort(key=1) myotherlist.sort(key='length') I find the latter to be much more readable. As a bonus, performance for those cases would also improve. -- Daniel Stutzbach -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwm-keyword-python.b4bdba at mired.org Thu Sep 16 17:41:37 2010 From: mwm-keyword-python.b4bdba at mired.org (Mike Meyer) Date: Thu, 16 Sep 2010 11:41:37 -0400 Subject: [Python-ideas] list.sort with a int or str key In-Reply-To: References: Message-ID: <20100916114137.51f6f90e@bhuda.mired.org> On Thu, 16 Sep 2010 10:35:14 -0500 Daniel Stutzbach wrote: > list.sort, sorted, and similar methods currently have a "key" argument that > accepts a callable. Often, that leads to code looking like this: > > mylist.sort(key=lambda x: x[1]) > myotherlist.sort(key=lambda x: x.length) > > I would like to propose that the "key" parameter be generalized to accept > str and int types, so the above code could be rewritten as follows: > > mylist.sort(key=1) > myotherlist.sort(key='length') -1 I think the idiom using the operator module tools: mylist.sort(key=itemgetter(1)) mylist.sort(key=attrgetter('length')) is more readable than your proposal - it makes what's going on explicit. http://www.mired.org/consulting.html Independent Network/Unix/Perforce consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From guido at python.org Thu Sep 16 17:44:15 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Sep 2010 08:44:15 -0700 Subject: [Python-ideas] list.sort with a int or str key In-Reply-To: References: Message-ID: On Thu, Sep 16, 2010 at 8:35 AM, Daniel Stutzbach wrote: > list.sort, sorted, and similar methods currently have a "key" argument that > accepts a callable.? Often, that leads to code looking like this: > > mylist.sort(key=lambda x: x[1]) > myotherlist.sort(key=lambda x: x.length) > > I would like to propose that the "key" parameter be generalized to accept > str and int types, so the above code could be rewritten as follows: > > mylist.sort(key=1) > myotherlist.sort(key='length') > > I find the latter to be much more readable. -1. I think this is too cryptic. > As a bonus, performance for those cases would also improve. Have you measured this? Remember that the key function is only called N times while the number of comparisons (using the values returned from the key function) is O(N log N). -- --Guido van Rossum (python.org/~guido) From robert.kern at gmail.com Thu Sep 16 17:51:55 2010 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 16 Sep 2010 10:51:55 -0500 Subject: [Python-ideas] list.sort with a int or str key In-Reply-To: References: Message-ID: On 9/16/10 10:35 AM, Daniel Stutzbach wrote: > list.sort, sorted, and similar methods currently have a "key" argument that > accepts a callable. Often, that leads to code looking like this: > > mylist.sort(key=lambda x: x[1]) > myotherlist.sort(key=lambda x: x.length) > > I would like to propose that the "key" parameter be generalized to accept str > and int types, so the above code could be rewritten as follows: > > mylist.sort(key=1) > myotherlist.sort(key='length') > > I find the latter to be much more readable. As a bonus, performance for those > cases would also improve. I find the latter significantly less readable because they are special cases that I need to remember. Right now, you can achieve the performance and arguably better readability using operator.itemgetter() and operator.attrgetter(): from operator import attrgetter, itemgetter mylist.sort(key=itemgetter(1)) myotherlist.sort(key=attrgetter('length')) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From bruce at leapyear.org Thu Sep 16 18:05:53 2010 From: bruce at leapyear.org (Bruce Leban) Date: Thu, 16 Sep 2010 09:05:53 -0700 Subject: [Python-ideas] list.sort with a int or str key In-Reply-To: References: Message-ID: -1 key='length' could reasonably mean lambda a:a.length or lambda a:a['length'] an explicit lambda or itemgetter/attrgetter is clearer. --- Bruce http://www.vroospeak.com http://j.mp/gruyere-security On Thu, Sep 16, 2010 at 8:35 AM, Daniel Stutzbach < daniel at stutzbachenterprises.com> wrote: > list.sort, sorted, and similar methods currently have a "key" argument that > accepts a callable. Often, that leads to code looking like this: > > mylist.sort(key=lambda x: x[1]) > myotherlist.sort(key=lambda x: x.length) > > I would like to propose that the "key" parameter be generalized to accept > str and int types, so the above code could be rewritten as follows: > > mylist.sort(key=1) > myotherlist.sort(key='length') > > I find the latter to be much more readable. As a bonus, performance for > those cases would also improve. > -- > Daniel Stutzbach > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Sep 16 18:11:29 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 16 Sep 2010 18:11:29 +0200 Subject: [Python-ideas] list.sort with a int or str key References: Message-ID: <20100916181129.5e39c6d4@pitrou.net> On Thu, 16 Sep 2010 10:35:14 -0500 Daniel Stutzbach wrote: > list.sort, sorted, and similar methods currently have a "key" argument that > accepts a callable. Often, that leads to code looking like this: > > mylist.sort(key=lambda x: x[1]) > myotherlist.sort(key=lambda x: x.length) > > I would like to propose that the "key" parameter be generalized to accept > str and int types, so the above code could be rewritten as follows: > > mylist.sort(key=1) > myotherlist.sort(key='length') It is not obvious whether key='length' should use __getitem__ or __getattr__. Your example claims attribute lookup but an indexed lookup would be more consistent with key=1. I'm quite skeptical towards this. Special cases make things harder to remember, and foreign code more difficult to read. Regards Antoine. From daniel at stutzbachenterprises.com Thu Sep 16 18:12:37 2010 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Thu, 16 Sep 2010 11:12:37 -0500 Subject: [Python-ideas] list.sort with a int or str key In-Reply-To: References: Message-ID: Since most everyone else finds it less readable, I withdraw the proposal. Thanks for the feedback, -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Thu Sep 16 20:28:32 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Thu, 16 Sep 2010 11:28:32 -0700 Subject: [Python-ideas] list.sort with a int or str key In-Reply-To: References: Message-ID: <5B7D2EAA-672E-4744-9D11-A9C4CA4CD7D4@gmail.com> On Sep 16, 2010, at 8:35 AM, Daniel Stutzbach wrote: > list.sort, sorted, and similar methods currently have a "key" argument that accepts a callable. Often, that leads to code looking like this: > > mylist.sort(key=lambda x: x[1]) > myotherlist.sort(key=lambda x: x.length) > > I would like to propose that the "key" parameter be generalized to accept str and int types, so the above code could be rewritten as follows: > > mylist.sort(key=1) > myotherlist.sort(key='length') -1 The key= parameter is a protocol that is used across multiple tools min(). max(), groupby(), nmallest(), nlargest(), etc. All of those would need to change to stay in-sync. > I find the latter to be much more readable. It also becomes harder to learn. Multiple signatures (int or str or other callable) create more problems that they solve. > As a bonus, performance for those cases would also improve. ISTM, the performance would be about the same as you already get from attrgetter(), itemgetter(), and methodcaller(). Also, those three tools are already more flexible than the proposal, for example: attrgetter('lastname', 'firstname') # key = lambda r: (r.lastname, r.firstname) itemgetter(0, 7) # key = lambda r: (r[0], r[7]) methodcaller('get_stats', 'size') # key = lambda r: r.get_stats('size') We've already got a way to do it, so the proposal is basically about saving a few characters in exchange for complexifying the protocol with a form of multiple dispatch. Raymond From tjreedy at udel.edu Fri Sep 17 05:11:22 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 16 Sep 2010 23:11:22 -0400 Subject: [Python-ideas] list.sort with a int or str key In-Reply-To: <5B7D2EAA-672E-4744-9D11-A9C4CA4CD7D4@gmail.com> References: <5B7D2EAA-672E-4744-9D11-A9C4CA4CD7D4@gmail.com> Message-ID: On 9/16/2010 2:28 PM, Raymond Hettinger wrote: > The key= parameter is a protocol that is used across multiple tools min(). max(), groupby(), nmallest(), nlargest(), etc. All of those would need to change to stay in-sync. ... > ISTM, the performance would be about the same as you already get from attrgetter(), itemgetter(), and methodcaller(). Also, those three tools are already more flexible than the proposal, for example: > > attrgetter('lastname', 'firstname') # key = lambda r: (r.lastname, r.firstname) > itemgetter(0, 7) # key = lambda r: (r[0], r[7]) > methodcaller('get_stats', 'size') # key = lambda r: r.get_stats('size') It is easy to not know about these. I think the doc set could usefully use an expanded entry on *key functions* (that would be a cross-reference link) that includes examples like the above. Currently, for example, the min entry has "The optional keyword-only key argument specifies a one-argument ordering function like that used for list.sort()." but there is no link and going to list.sort only adds "that is used to extract a comparison key from each list element: key=str.lower. The default value is None." Perhaps we could expand that and make the existing cross-references into links. -- Terry Jan Reedy From masklinn at masklinn.net Fri Sep 17 06:49:21 2010 From: masklinn at masklinn.net (Masklinn) Date: Fri, 17 Sep 2010 10:19:21 +0530 Subject: [Python-ideas] list.sort with a int or str key In-Reply-To: References: <5B7D2EAA-672E-4744-9D11-A9C4CA4CD7D4@gmail.com> Message-ID: On 2010-09-17, at 08:41 , Terry Reedy wrote: > On 9/16/2010 2:28 PM, Raymond Hettinger wrote: >> The key= parameter is a protocol that is used across multiple tools min(). max(), groupby(), nmallest(), nlargest(), etc. All of those would need to change to stay in-sync. > ... > >> ISTM, the performance would be about the same as you already get from attrgetter(), itemgetter(), and methodcaller(). Also, those three tools are already more flexible than the proposal, for example: >> >> attrgetter('lastname', 'firstname') # key = lambda r: (r.lastname, r.firstname) >> itemgetter(0, 7) # key = lambda r: (r[0], r[7]) >> methodcaller('get_stats', 'size') # key = lambda r: r.get_stats('size') > > It is easy to not know about these. I think the doc set could usefully use an expanded entry on *key functions* (that would be a cross-reference link) that includes examples like the above. +1, in my experience, the operator module in general is fairly unknown and the attrgetter/itemgetter/methodcaller family criminally so. It doesn't help that they're kind-of lost in a big bunch of text at the very bottom of the module. From raymond.hettinger at gmail.com Fri Sep 17 11:04:04 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 17 Sep 2010 02:04:04 -0700 Subject: [Python-ideas] list.sort with a int or str key In-Reply-To: References: <5B7D2EAA-672E-4744-9D11-A9C4CA4CD7D4@gmail.com> Message-ID: <98438F80-5D4F-48D1-B7E3-37E991F65ED1@gmail.com> >> ISTM, the performance would be about the same as you already get from attrgetter(), itemgetter(), and methodcaller(). Also, those three tools are already more flexible than the proposal, for example: >> >> attrgetter('lastname', 'firstname') # key = lambda r: (r.lastname, r.firstname) >> itemgetter(0, 7) # key = lambda r: (r[0], r[7]) >> methodcaller('get_stats', 'size') # key = lambda r: r.get_stats('size') > > It is easy to not know about these. FWIW, those and other sorting related topics are covered in the sorting-howto: http://wiki.python.org/moin/HowTo/Sorting/ We link to that from the main docs for sorted(): http://docs.python.org/library/functions.html#sorted > I think the doc set could usefully use an expanded entry on *key functions* That might also make a useful entry to the glossary. Raymond P.S. I don't know that it applies here but one limitation of the docs is that they can get too voluminous. Already, it is a significant time investment just to read the doc page on builtin functions. You can kill a whole afternoon just reading the docs for unittest and logging. The gestalt of the language gets lost when the docs get too fat. Instead, I like the howto write-ups because they bring together many thoughts on a single topic. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Sep 17 14:14:23 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 17 Sep 2010 22:14:23 +1000 Subject: [Python-ideas] list.sort with a int or str key In-Reply-To: References: <5B7D2EAA-672E-4744-9D11-A9C4CA4CD7D4@gmail.com> Message-ID: On Fri, Sep 17, 2010 at 1:11 PM, Terry Reedy wrote: > It is easy to not know about these. I think the doc set could usefully use > an expanded entry on *key functions* (that would be a cross-reference link) > that includes examples like the above. Currently, for example, the min entry > has "The optional keyword-only key argument specifies a one-argument > ordering function like that used for list.sort()." but there is no link and > going to list.sort only adds "that is used to extract a comparison key from > each list element: key=str.lower. The default value is None." Perhaps we > could expand that and make the existing cross-references into links. Tracker issue to capture this idea: http://bugs.python.org/issue9886 Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From lie.1296 at gmail.com Fri Sep 17 16:40:48 2010 From: lie.1296 at gmail.com (Lie Ryan) Date: Sat, 18 Sep 2010 00:40:48 +1000 Subject: [Python-ideas] Cofunctions: It's alive! Its alive! In-Reply-To: References: <4C5D0759.30606@canterbury.ac.nz> <4C60FE37.2020303@canterbury.ac.nz> Message-ID: On 08/11/10 01:57, Guido van Rossum wrote: > - Would it be sufficient if codef was a decorator instead of a > keyword? (This new keyword in particular chafes me, since we've been > so successful at overloading 'def' for so many meanings -- functions, > methods, class methods, static methods, properties...) +1. I'd like to see this implemented as decorator (perhaps with special casing by the VM if necessary), and see how this cofunction will be used in wider practice before deciding whether the syntax sugar is necessary. The decorator could live as a built-in function or as stdlib module (from cofunction import cofunction), and be clearly marked as experimental. From raymond.hettinger at gmail.com Fri Sep 17 21:44:53 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 17 Sep 2010 12:44:53 -0700 Subject: [Python-ideas] New 3.x restriction in list comprehensions Message-ID: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com> In Python2, you can transform: r = [] for x in 2, 4, 6: r.append(x*x+1) into: r = [x*x+1 for x in 2, 4, 6] In Python3, the first still works but the second gives a SyntaxError. It wants the 2, 4, 6 to have parentheses. The good parts of the change: + it matches what genexps do + that simplifies the grammar a bit (listcomps bodies and genexp bodies) + a listcomp can be reliably transformed to a genexp The bad parts: + The restriction wasn't necessary (we could undo it) + It makes 2-to-3 conversion a bit harder + It no longer parallels other paren-free tuple constructions: return x, y yield x, y t = x, y ... + It particular, it no longer parallels regular for-loop syntax The last part is the one that seems the most problematic. If you write for-loops day in and day out with the unrestricted syntax, you (or least me) will tend to do the wrong thing when writing a list comprehension. It is a bit jarring to get the SyntaxError when the code looks correct -- it took me a bit of fiddling to figure-out what was going on. My question for the group is whether it would be a good idea to drop the new restriction. Raymond From raymond.hettinger at gmail.com Fri Sep 17 22:00:08 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 17 Sep 2010 13:00:08 -0700 Subject: [Python-ideas] New 3.x restriction on number of keyword arguments Message-ID: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> One of the use cases for named tuples is to have them be automatically created from a SQL query or CSV header. Sometimes (but not often), those can have a huge number of columns. In Python 2.x, it worked just fine -- we had a test for a named tuple with 5000 fields. In Python 3.x, there is a SyntaxError when there are more than 255 fields. The origin of the change was a hack to fit positional argument counts and keyword-only argument counts in a single oparg in the python opcode encoding. ISTM, this is an implementation specific hack and there is no reason that other implementations would have the same restriction (unless their starting point is Python's bytecode). The good news is that long argument lists are uncommon. They probably only arise in cases with dynamically created functions and classes. Most people are unaffected. The bad news is that an implementation detail has become visible and added a language restriction. The 255 limit seems weird to me in a version of Python that has gone to lengths to unify ints and longs so that char/short/long boundaries stop manifesting themselves to users. Is there any support here for trying to get smarter about the keyword-only argument implementation? The 255 limit does not seem unreasonably low, but then it was once thought that no one would ever need more that 640k of ram. If the new restriction isn't necessary, it would be great to remove it. Raymond From matthew.russell at ovi.com Fri Sep 17 22:03:34 2010 From: matthew.russell at ovi.com (Matthew Russell) Date: Fri, 17 Sep 2010 21:03:34 +0100 Subject: [Python-ideas] New 3.x restriction in list comprehensions In-Reply-To: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com> References: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com> Message-ID: <1284753814.365.142.camel@stone> Personally, I tend to always add parens to tuple expressions since it removes any and all ambiguity about when they're required or ney. I'd actually prefer it if parens were always required, but can appreciate that might/would offend those who prefer otherwise. >>> for (a, b) in d.items(): ... process(a, b) >>> def items(t): ... return (a, b) Always using parens means that when refactoring one can avoid the extra mental step of 'are the parens required in use with python feature >' Additionally, in some language features, the use of parens has become required to squash warts: >>> try: ... a = b[k] >>> except (KeyError, IndexError), no_item: ... a = handle(no_item) Regards, Matt On Fri, 2010-09-17 at 12:44 -0700, Raymond Hettinger wrote: > In Python2, you can transform: > r = [] > for x in 2, 4, 6: > r.append(x*x+1) > > into: > > r = [x*x+1 for x in 2, 4, 6] > > In Python3, the first still works but the second gives a SyntaxError. > It wants the 2, 4, 6 to have parentheses. > > The good parts of the change: > + it matches what genexps do > + that simplifies the grammar a bit (listcomps bodies and genexp bodies) > + a listcomp can be reliably transformed to a genexp > > The bad parts: > + The restriction wasn't necessary (we could undo it) > + It makes 2-to-3 conversion a bit harder > + It no longer parallels other paren-free tuple constructions: > return x, y > yield x, y > t = x, y > ... > + It particular, it no longer parallels regular for-loop syntax > > The last part is the one that seems the most problematic. > If you write for-loops day in and day out with the unrestricted > syntax, you (or least me) will tend to do the wrong thing when > writing a list comprehension. It is a bit jarring to get the SyntaxError > when the code looks correct -- it took me a bit of fiddling to figure-out > what was going on. > > My question for the group is whether it would be a good > idea to drop the new restriction. > > > Raymond > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------------------------------------------------------- Ovi Mail: Making email access easy http://mail.ovi.com From python at mrabarnett.plus.com Fri Sep 17 22:23:49 2010 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 17 Sep 2010 21:23:49 +0100 Subject: [Python-ideas] New 3.x restriction on number of keyword arguments In-Reply-To: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> Message-ID: <4C93CE55.1030308@mrabarnett.plus.com> On 17/09/2010 21:00, Raymond Hettinger wrote: > One of the use cases for named tuples is to have them be > automatically created from a SQL query or CSV header. Sometimes (but > not often), those can have a huge number of columns. In Python 2.x, > it worked just fine -- we had a test for a named tuple with 5000 > fields. In Python 3.x, there is a SyntaxError when there are more > than 255 fields. > > The origin of the change was a hack to fit positional argument counts > and keyword-only argument counts in a single oparg in the python > opcode encoding. > > ISTM, this is an implementation specific hack and there is no reason > that other implementations would have the same restriction (unless > their starting point is Python's bytecode). > > The good news is that long argument lists are uncommon. They > probably only arise in cases with dynamically created functions and > classes. Most people are unaffected. > > The bad news is that an implementation detail has become visible and > added a language restriction. The 255 limit seems weird to me in a > version of Python that has gone to lengths to unify ints and longs so > that char/short/long boundaries stop manifesting themselves to > users. > > Is there any support here for trying to get smarter about the > keyword-only argument implementation? The 255 limit does not seem > unreasonably low, but then it was once thought that no one would ever > need more that 640k of ram. If the new restriction isn't necessary, > it would be great to remove it. > Strings can be any length, lists can be any length, even the humble int can be any length! It does seem unPythonic to have a low limit like that. I think that the implementation hack needs a bit of a rethink if that's what it's causing, IMHO. From python at mrabarnett.plus.com Fri Sep 17 22:27:37 2010 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 17 Sep 2010 21:27:37 +0100 Subject: [Python-ideas] New 3.x restriction in list comprehensions In-Reply-To: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com> References: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com> Message-ID: <4C93CF39.5090406@mrabarnett.plus.com> On 17/09/2010 20:44, Raymond Hettinger wrote: > In Python2, you can transform: > > r = [] > for x in 2, 4, 6: > r.append(x*x+1) > > into: > > r = [x*x+1 for x in 2, 4, 6] > > In Python3, the first still works but the second gives a SyntaxError. > It wants the 2, 4, 6 to have parentheses. > > The good parts of the change: > + it matches what genexps do > + that simplifies the grammar a bit (listcomps bodies and genexp bodies) > + a listcomp can be reliably transformed to a genexp > > The bad parts: > + The restriction wasn't necessary (we could undo it) > + It makes 2-to-3 conversion a bit harder > + It no longer parallels other paren-free tuple constructions: > return x, y > yield x, y > t = x, y > ... > + It particular, it no longer parallels regular for-loop syntax > > The last part is the one that seems the most problematic. > If you write for-loops day in and day out with the unrestricted > syntax, you (or least me) will tend to do the wrong thing when > writing a list comprehension. It is a bit jarring to get the SyntaxError > when the code looks correct -- it took me a bit of fiddling to figure-out > what was going on. > > My question for the group is whether it would be a good > idea to drop the new restriction. > Listcomps look more like genexps than for loops, so they should probably have the same syntax retrictions (or lack of), IMHO. From solipsis at pitrou.net Fri Sep 17 23:11:46 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 17 Sep 2010 23:11:46 +0200 Subject: [Python-ideas] New 3.x restriction on number of keyword arguments References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> Message-ID: <20100917231146.23f0cef1@pitrou.net> On Fri, 17 Sep 2010 13:00:08 -0700 Raymond Hettinger wrote: > One of the use cases for named tuples is to have them be automatically created from a SQL > query or CSV header. Sometimes (but not often), those can have a huge number of columns. In > Python 2.x, it worked just fine -- we had a test for a named tuple with 5000 fields. In > Python 3.x, there is a SyntaxError when there are more than 255 fields. I don't understand your explanation. You can't pass a namedtuple using the **kw convention: >>> import collections >>> T = collections.namedtuple('a', 'b c d') >>> t = T(1,2,3) >>> def f(**a): pass ... >>> f(**t) Traceback (most recent call last): File "", line 1, in TypeError: f() argument after ** must be a mapping, not a Besides, even if that worked, you are doing an intermediate conversion to a dict, which is wasteful. Why not simply pass the namedtuple as a regular parameter? > The bad news is that an implementation detail has become visible and added a language > restriction. The 255 limit seems weird to me in a version of Python that has gone to lengths > to unify ints and longs so that char/short/long boundaries stop manifesting themselves to users. Well, it sounds like a theoretical worry of no practical value to me. The **kw notation is meant to marshal passing of actual keyword args, which are going to be explicitly typed in either at the call site or at the function definition site (ignoring any proxies in-between). Nobody is going to type more than 255 keyword arguments by hand. And there's generated code, but since it's generated they can easily find a workaround anyway. > If the new restriction isn't necessary, it would be great to remove it. I assume the restriction is useful since, according to your explanation, it improves the encoding of opcodes. Of course, we could switch bytecode to use a standard 32-bit word size, but someone has to propose a patch. Regards Antoine. From cs at zip.com.au Fri Sep 17 23:05:46 2010 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 18 Sep 2010 07:05:46 +1000 Subject: [Python-ideas] New 3.x restriction on number of keyword arguments In-Reply-To: <4C93CE55.1030308@mrabarnett.plus.com> References: <4C93CE55.1030308@mrabarnett.plus.com> Message-ID: <20100917210546.GA32088@cskk.homeip.net> On 17Sep2010 21:23, MRAB wrote: | On 17/09/2010 21:00, Raymond Hettinger wrote: | >One of the use cases for named tuples is to have them be | >automatically created from a SQL query or CSV header. Sometimes (but | >not often), those can have a huge number of columns. In Python 2.x, | >it worked just fine -- we had a test for a named tuple with 5000 | >fields. In Python 3.x, there is a SyntaxError when there are more | >than 255 fields. | > | >The origin of the change was a hack to fit positional argument counts | >and keyword-only argument counts in a single oparg in the python | >opcode encoding. [...] | >Is there any support here for trying to get smarter about the | >keyword-only argument implementation? [...] | | Strings can be any length, lists can be any length, even the humble int | can be any length! | It does seem unPythonic to have a low limit like that. A big +10 from me. Implementation internals should not cause language level limitations. If there's a (entirely reasonable IMHO) desire to get the opcode small, the count should be encoded in a compact be extendable form. (I speak here with no idea how inflexible the opcode readers are.) As an example, I use a personal encoding for natural numbers scheme where values below 128 fit in one byte, 128 or more set the top bit on leading bytes to indicate followon bytes, so values up to 16383 fit in two bytes and so on arbitrarily. Compact and simple but unbounded. Is something like that tractable for the Python opcodes? Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ I am returning this otherwise good typing paper to you because someone has printed gibberish all over it and put your name at the top. - English Professor, Ohio University From solipsis at pitrou.net Fri Sep 17 23:21:33 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 17 Sep 2010 23:21:33 +0200 Subject: [Python-ideas] New 3.x restriction on number of keyword arguments References: <4C93CE55.1030308@mrabarnett.plus.com> <20100917210546.GA32088@cskk.homeip.net> Message-ID: <20100917232133.6088424a@pitrou.net> On Sat, 18 Sep 2010 07:05:46 +1000 Cameron Simpson wrote: > > As an example, I use a personal encoding for natural numbers scheme > where values below 128 fit in one byte, 128 or more set the top bit on > leading bytes to indicate followon bytes, so values up to 16383 fit in > two bytes and so on arbitrarily. Compact and simple but unbounded. Well, you are proposing that we (Python core maintainers) live with additional complication in one of the most central and critical parts of the interpreter, just so that we satisfy some theoretical impulse for "consistency". That doesn't sound reasonable. (and, sure, the variable-length encoding wouldn't be very complicated; it would still be more complicated than it needs to be, and that's already a problem) For the record, have you been hit by this problem, or do you even think you might be hit by it in the near future? Thank you Antoine. From tjreedy at udel.edu Fri Sep 17 23:32:04 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 17 Sep 2010 17:32:04 -0400 Subject: [Python-ideas] New 3.x restriction in list comprehensions In-Reply-To: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com> References: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com> Message-ID: On 9/17/2010 3:44 PM, Raymond Hettinger wrote: > In Python2, you can transform: > > r = [] > for x in 2, 4, 6: > r.append(x*x+1) for x in 2,4,6: yield x*x+1 also works in 2/3.x > > into: > > r = [x*x+1 for x in 2, 4, 6] > > In Python3, the first still works but the second gives a SyntaxError. > It wants the 2, 4, 6 to have parentheses. > > The good parts of the change: > + it matches what genexps do Is the restriction necessary for genexps? If the parser could handle [x*x+1 for x in 2, 4, 6] is (x*x+1 for x in 2, 4, 6) impossible, perhaps due to paren confusion? > + that simplifies the grammar a bit (listcomps bodies and genexp bodies) > + a listcomp can be reliably transformed to a genexp > > The bad parts: > + The restriction wasn't necessary (we could undo it) > + It makes 2-to-3 conversion a bit harder > + It no longer parallels other paren-free tuple constructions: > return x, y > yield x, y > t = x, y > ... > + It particular, it no longer parallels regular for-loop syntax > > The last part is the one that seems the most problematic. > If you write for-loops day in and day out with the unrestricted > syntax, you (or least me) will tend to do the wrong thing when > writing a list comprehension. It is a bit jarring to get the SyntaxError > when the code looks correct -- it took me a bit of fiddling to figure-out > what was going on. > > My question for the group is whether it would be a good > idea to drop the new restriction. 3.x is in a sense more consistent than 2.x in that converting a for loop with a bare tuple always requires addition of parentheses rather than just sometimes. Never requiring parens would be even better to me if it did not make the implementation too messy. -- Terry Jan Reedy From tjreedy at udel.edu Fri Sep 17 23:50:00 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 17 Sep 2010 17:50:00 -0400 Subject: [Python-ideas] New 3.x restriction on number of keyword arguments In-Reply-To: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> Message-ID: On 9/17/2010 4:00 PM, Raymond Hettinger wrote: > One of the use cases for named tuples is to have them be > automatically created from a SQL query or CSV header. Sometimes (but > not often), those can have a huge number of columns. In Python 2.x, > it worked just fine -- we had a test for a named tuple with 5000 > fields. In Python 3.x, there is a SyntaxError when there are more > than 255 fields. So, when the test failed due to the code change, the test was simply removed? > The origin of the change was a hack to fit positional argument counts > and keyword-only argument counts in a single oparg in the python > opcode encoding. I do not remember any discussion of adding such a language restriction, though I could have forgotten or missed it. As near as I can tell, it is undocumented. While there are undocumented limits to the interpreter, like nesting depth, this one is so low that I would consider the discrepancy between doc and behavior a bug. -- Terry Jan Reedy From alexander.belopolsky at gmail.com Fri Sep 17 23:50:15 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 17 Sep 2010 17:50:15 -0400 Subject: [Python-ideas] New 3.x restriction on number of keyword arguments In-Reply-To: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> Message-ID: <3F05AB9C-2353-429F-8343-9777C4F2F874@gmail.com> On Sep 17, 2010, at 4:00 PM, Raymond Hettinger wrote: .. > > Is there any support here for trying to get smarter about the keyword-only argument implementation? The 255 limit does not seem unreasonably low, but then it was once thought that no one would ever need more that 640k of ram. If the new restriction isn't necessary, it would be great to remove This has been requested before, but rejected for the lack of a valid use case. See issue 1636. I think supporting huge named tuples for the benefit of database applications is a valid use case. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Fri Sep 17 23:56:55 2010 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 18 Sep 2010 07:56:55 +1000 Subject: [Python-ideas] New 3.x restriction on number of keyword arguments In-Reply-To: <20100917232133.6088424a@pitrou.net> References: <20100917232133.6088424a@pitrou.net> Message-ID: <20100917215655.GA7813@cskk.homeip.net> On 17Sep2010 23:21, Antoine Pitrou wrote: | On Sat, 18 Sep 2010 07:05:46 +1000 | Cameron Simpson wrote: | > As an example, I use a personal encoding for natural numbers scheme | > where values below 128 fit in one byte, 128 or more set the top bit on | > leading bytes to indicate followon bytes, so values up to 16383 fit in | > two bytes and so on arbitrarily. Compact and simple but unbounded. | | Well, you are proposing that we (Python core maintainers) live with | additional complication in one of the most central and critical parts of | the interpreter, just so that we satisfy some theoretical impulse for | "consistency". That doesn't sound reasonable. [...] | For the record, have you been hit by this problem, or do you even think | you might be hit by it in the near future? Me, no. But arbitrary _syntactic_ constraints in an otherwise flexible language grate. I was only suggesting a compactness-supporting approach, not lobbying very hard for making the devs use it. I'm +10 on removing the syntactic constraint, not on hacking the opcode definitons. Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ Withdrawing in disgust is not the same as conceding. - Jon Adams From dirkjan at ochtman.nl Sat Sep 18 00:00:57 2010 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sat, 18 Sep 2010 00:00:57 +0200 Subject: [Python-ideas] New 3.x restriction in list comprehensions In-Reply-To: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com> References: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com> Message-ID: On Fri, Sep 17, 2010 at 21:44, Raymond Hettinger wrote: > My question for the group is whether it would be a good > idea to drop the new restriction. I like the restriction and would actually advocate having it for regular for-loops too (though that would be a big no-no, I guess). Here's why I never use them without parenthesis, in python 2: >>> (1 if True else 3, 4) (1, 4) >>> (lambda x: x * x, 6) ( at 0x100475ed8>, 6) >>> [i for i in 2, 3] [2, 3] >>> (i for i in 2, 3) File "", line 1 (i for i in 2, 3) ^ SyntaxError: invalid syntax And in Python 3: >>> (1 if True else 3, 4) (1, 4) >>> (lambda x: x * x, 6) ( at 0x7f4ef41785a0>, 6) >>> [i for i in 2, 3] File "", line 1 [i for i in 2, 3] ^ SyntaxError: invalid syntax >>> (i for i in 2, 3) File "", line 1 (i for i in 2, 3) ^ SyntaxError: invalid syntax Cheers, Dirkjan From guido at python.org Sat Sep 18 02:16:39 2010 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Sep 2010 17:16:39 -0700 Subject: [Python-ideas] New 3.x restriction on number of keyword arguments In-Reply-To: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> Message-ID: On Fri, Sep 17, 2010 at 1:00 PM, Raymond Hettinger wrote: > One of the use cases for named tuples is to have them be automatically created from a SQL query or CSV header. ?Sometimes (but not often), those can have a huge number of columns. ?In Python 2.x, it worked just fine -- we had a test for a named tuple with 5000 fields. ?In Python 3.x, there is a SyntaxError when there are more than 255 fields. > > The origin of the change was a hack to fit positional argument counts and keyword-only argument counts in a single oparg in the python opcode encoding. > > ISTM, this is an implementation specific hack and there is no reason that other implementations would have the same restriction (unless their starting point is Python's bytecode). > > The good news is that long argument lists are uncommon. ?They probably only arise in cases with dynamically created functions and classes. ?Most people are unaffected. > > The bad news is that an implementation detail has become visible and added a language restriction. ?The 255 limit seems weird to me in a version of Python that has gone to lengths to unify ints and longs so that char/short/long boundaries stop manifesting themselves to users. > > Is there any support here for trying to get smarter about the keyword-only argument implementation? ?The 255 limit does not seem unreasonably low, but then it was once thought that no one would ever need more that 640k of ram. ?If the new restriction isn't necessary, it would be great to remove it. +256 on removing this limit from the language. I've come across code generators that produced quite insane-looking code that worked perfectly fine because Python's grammar has no (or very large) limits, and I consider this a language feature. I've also written code where there was a good reason to use **kwds in the function definition and another good reason to pass **kwds to the call where the kwds passed could be huge. -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Sep 18 02:18:21 2010 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Sep 2010 17:18:21 -0700 Subject: [Python-ideas] New 3.x restriction in list comprehensions In-Reply-To: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com> References: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com> Message-ID: On Fri, Sep 17, 2010 at 12:44 PM, Raymond Hettinger wrote: > In Python2, you can transform: > > ?r = [] > ?for x in 2, 4, 6: > ? ? ? r.append(x*x+1) > > into: > > ? r = [x*x+1 for x in 2, 4, 6] > > In Python3, the first still works but the second gives a SyntaxError. > It wants the 2, 4, 6 to have parentheses. > > The good parts of the change: > ?+ it matches what genexps do > ?+ that simplifies the grammar a bit (listcomps bodies and genexp bodies) > ?+ a listcomp can be reliably transformed to a genexp > > The bad parts: > ?+ The restriction wasn't necessary (we could undo it) > ?+ It makes 2-to-3 conversion a bit harder > ?+ It no longer parallels other paren-free tuple constructions: > ? ? ? ?return x, y > ? ? ? ?yield x, y > ? ? ? ?t = x, y > ? ? ? ? ? ... > ?+ It particular, it no longer parallels regular for-loop syntax > > The last part is the one that seems the most problematic. > If you write for-loops day in and day out with the unrestricted > syntax, you (or least me) will tend to do the wrong thing when > writing a list comprehension. ?It is a bit jarring to get the SyntaxError > when the code looks correct -- it took me a bit of fiddling to figure-out > what was going on. > > My question for the group is whether it would be a good > idea to drop the new restriction. This was intentional. It parallels genexps and it avoids an ambiguity (for the human reader -- I know the parser has no problem with it :-). Please don't change this back. (It would violate the moratorium too...) -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Sat Sep 18 09:28:42 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 18 Sep 2010 17:28:42 +1000 Subject: [Python-ideas] New 3.x restriction on number of keyword arguments In-Reply-To: <20100917231146.23f0cef1@pitrou.net> References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> <20100917231146.23f0cef1@pitrou.net> Message-ID: On Sat, Sep 18, 2010 at 7:11 AM, Antoine Pitrou wrote: > On Fri, 17 Sep 2010 13:00:08 -0700 > Raymond Hettinger > wrote: >> One of the use cases for named tuples is to have them be automatically created from a SQL >> query or CSV header. ?Sometimes (but not often), those can have a huge number of columns. ?In >> Python 2.x, it worked just fine -- we had a test for a named tuple with 5000 fields. ?In >> Python 3.x, there is a SyntaxError when there are more than 255 fields. > > I don't understand your explanation. You can't pass a namedtuple using > the **kw convention: But you do need to *initialise* the named tuple after you create it. If it's a big tuple, then all of those field values need to be passed in either as positional arguments or as keyword arguments. A restriction to 255 parameters means that named tuples with more than 255 fields become a lot less useful. Merging the parameter count into the opcode as an optimisation when the number of parameters is < 256 is fine. *Disallowing* parameter counts >= 255 is not. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Sep 18 09:39:11 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 18 Sep 2010 17:39:11 +1000 Subject: [Python-ideas] New 3.x restriction in list comprehensions In-Reply-To: References: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com> Message-ID: On Sat, Sep 18, 2010 at 8:00 AM, Dirkjan Ochtman wrote: > On Fri, Sep 17, 2010 at 21:44, Raymond Hettinger > wrote: >> My question for the group is whether it would be a good >> idea to drop the new restriction. > > I like the restriction and would actually advocate having it for > regular for-loops too (though that would be a big no-no, I guess). Yep, I tend to parenthesise tuples even when it isn't strictly necessary as well. Even if the parser doesn't care, it makes it a lot easier for human readers (including myself when I have to go back and read that code). (I have similar objections to people that rely on precedence ordering too heavily in complicated expressions - even if the compiler understands them correctly, many readers won't know the precedence table off by heart. Judicious use of parentheses turns code those readers would otherwise have to think about into something which is obviously correct even at a glance). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From greg.ewing at canterbury.ac.nz Sat Sep 18 10:29:02 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 18 Sep 2010 20:29:02 +1200 Subject: [Python-ideas] New 3.x restriction on number of keyword arguments In-Reply-To: <20100917210546.GA32088@cskk.homeip.net> References: <4C93CE55.1030308@mrabarnett.plus.com> <20100917210546.GA32088@cskk.homeip.net> Message-ID: <4C94784E.1040702@canterbury.ac.nz> Cameron Simpson wrote: > If there's a (entirely reasonable IMHO) desire to get > the opcode small, the count should be encoded in a compact be extendable > form. I suspect it's more because it was easier to do it that way than to track down all the places that assume a bytecode never has more than one 16-bit operand. -- Greg From lie.1296 at gmail.com Sat Sep 18 16:23:59 2010 From: lie.1296 at gmail.com (Lie Ryan) Date: Sun, 19 Sep 2010 00:23:59 +1000 Subject: [Python-ideas] New 3.x restriction on number of keyword arguments In-Reply-To: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> Message-ID: On 09/18/10 06:00, Raymond Hettinger wrote: > The good news is that long argument lists are uncommon. They > probably only arise in cases with dynamically created functions and > classes. Most people are unaffected. How about showing a Warning when trying to create a large namedtuple? The Warning contains a reference to a bug issue, and should describe that if they really, really need to have this limitation removed, then they should ask in the bug report. Just so that we don't complicate the code unnecessarily without a real usage. In Python, classes are largely syntax sugar for a dictionary anyway, if they needed such a large namedtuple, they should probably reconsider using dictionary or list or real classes instead. From taleinat at gmail.com Sun Sep 19 11:08:28 2010 From: taleinat at gmail.com (Tal Einat) Date: Sun, 19 Sep 2010 11:08:28 +0200 Subject: [Python-ideas] New 3.x restriction on number of keyword arguments In-Reply-To: References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com> Message-ID: Lie Ryan wrote: > On 09/18/10 06:00, Raymond Hettinger wrote: > > The good news is that long argument lists are uncommon. They > > probably only arise in cases with dynamically created functions and > > classes. Most people are unaffected. > > How about showing a Warning when trying to create a large namedtuple? > The Warning contains a reference to a bug issue, and should describe > that if they really, really need to have this limitation removed, then > they should ask in the bug report. Just so that we don't complicate the > code unnecessarily without a real usage. > > In Python, classes are largely syntax sugar for a dictionary anyway, if > they needed such a large namedtuple, they should probably reconsider > using dictionary or list or real classes instead. > +1 on removing the restriction, just because I find large namedtuples useful. I work with large tables of data and often use namedtuples for their compactness. Python dictionaries have a large memory overhead compared to tuples. This restriction could seriously hamper my future efforts to migrate to Python 3. - Tal Einat -------------- next part -------------- An HTML attachment was scrubbed... URL: From james at openvpn.net Mon Sep 20 23:41:35 2010 From: james at openvpn.net (James Yonan) Date: Mon, 20 Sep 2010 15:41:35 -0600 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object Message-ID: <4C97D50F.1000908@openvpn.net> I think that Glyph hit the nail on the head when he said that "you can go from any arbitrary Future to a full-featured Deferred, but not the other way around." This is exactly my concern, and the reason why I think it's important for Python to standardize on an async result type that is sufficiently general that it can accommodate the different kinds of async semantics in common use in the Python world today. If you don't think this is a problem, just Google for "twisted vs. tornado". While the debate is sometimes passionate and rude, it points to the fragmentation that has occured in the Python async space due to the lack of direction from the standard library. And there's a real cost to this fragmentation -- it's not easy to build an application that uses different async frameworks when there's no standardized result object or reactor model. My concern is that PEP 3148 was really designed for the purpose of thread and process pooling, and that the Future object is designed with the minimum functionality required to achieve this end. The problem is that the Future object starts to look like a stripped-down version of a Twisted Deferred. And that begs the question of why are we standardizing on the special case and not the general case? Wouldn't it be better to break this into two problems: * Develop a full-featured standard async result type and reactor model to facilitate interoperability of different async libraries. This would consist of a standard async result type and an abstract base class for a reactor model. * Let PEP 3148 focus on the problem of thread and process pooling and leverage on the above async result type. The semantics that a general async type should support include: 1. Semantics that allow you to define a callback channel for results and and optionally a separate channel for exceptions as well. 2. Semantics that offer the flexibility of working with async results at the callback level or at the generator level (having a separate channel for exceptions makes it easy for the generator decorator implementation (that facilitates "yield function_returning_async_object()") to dispatch exceptions into the caller). 3. Semantics that can easily be used to pass results and exceptions back from thread or process pools. 4. Semantics that allow for aggregate processing of parallel asynchronous results, such as "fire async result when all of the async results in an async set have fired" or "fire async result when the first result from an async set has fired." Deferreds presently support all of the above. My point here is not so much that Deferreds should be the standard, but that whatever standard is chosen, that the semantics be general enough that different async Python libraries/platforms can interoperate. James > Thanks for the ping about this (I don't think I subscribe to python-ideas, so someone may have to moderate my post in). Sorry for the delay in responding, but I've been kinda busy and cooking up these examples took a bit of thinking. > > And thanks, James, for restarting this discussion. I obviously find it interesting :). > > I'm going to mix in some other stuff I found on the web archives, since it's easiest just to reply in one message. I'm sorry that this response is a bit sprawling and doesn't have a single clear narrative, the thread thus far didn't seem to lend it to one. > > For those of you who don't want to read my usual novel-length post, you can probably stop shortly after the end of the first block of code examples. > > On Sep 11, 2010, at 10:26 PM, Guido van Rossum wrote: > >>>> although he didn't say what >>>> deferreds really added beyond what futures provide, and why the >>>> "add_done_callback" method isn't adequate to provide interoperability >>>> between futures and deferreds (which would be odd, since Brian made >>>> changes to that part of PEP 3148 to help with that interoperability >>>> after discussions with Glyph). >>>> >>>> Between PEP 380 and PEP 3148 I'm not really seeing a lot more scope >>>> for standardisation in this space though. >>>> >>>> Cheers, >>>> Nick. >>> >>> That was my initial reaction as well, but I'm more than open to >>> hearing from Jean Paul/Glyph and the other twisted folks on this. > >> But thinking about this more I don't know that it will be easy to mix >> PEP 3148, which is solidly thread-based, with a PEP 342 style >> scheduler (whether or not the PEP 380 enhancements are applied, or >> even PEP 3152). And if we take the OP's message at face value, his >> point isn't so much that Twisted is great, but that in order to >> benefit maximally from PEP 342 there needs to be a standard way of >> using callbacks. I think that's probably true. And comparing the >> blog's examples to PEP 3148, I find Twisted's terminology rather >> confusing compared to the PEP's clean Futures API (where IMO you can >> ignore almost everything except result()). > > That blog post was written to demonstrate why programs using generators are "... far easier to read and write ..." than ones using Deferreds, so it stands to reason it would choose an example where that helps :). > > When you want to write systems that manage varying levels of parallelism within a single computation, generators can start to get pretty hairy and the "normal" Deferred way of doing things looks more straightforward. > > Thinking in terms of asynchronicity is tricky, and generators can be a useful tool for promoting that understanding, but they only make it superficially easier. For example: > >>>> def serial(): >>>> results = set() >>>> for x in ...: >>>> results.add((yield do_something_async(x))) >>>> return results > > If you're writing an application whose parallelism calls for an asynchronous approach, after all, you presumably don't want to be standing around waiting for each network round trip to complete. How do you re-write this so that there are always at least N outstanding do_something_async calls running in parallel? > > You can sorta do it like this: > >>>> def parallel(N): >>>> results = set() >>>> outstanding = [] >>>> for x in ...: >>>> if len(outstanding) > N: >>>> results.add((yield outstanding.pop(0))) >>>> else: >>>> outstanding.append(do_something_async(x)) > > but that will always block on one particular do_something_async, when you really want to say "let me know when any outstanding call is complete". So I could handwave about 'yield any_completed(outstanding)'... > >>>> def parallel(N): >>>> results = set() >>>> outstanding = set() >>>> for x in ...: >>>> if len(outstanding) > N: >>>> results.add((yield any_completed(outstanding))) >>>> else: >>>> outstanding.add(do_something_async(x)) > > but that just begs the question of how you implement any_completed(), and I can't think of a way to do that with generators, without getting into the specifics of some Deferred-or-Future-like asynchronous result object. You could implement such a function with such primitives, and here's what it looks like with Deferreds: > >>>> def any_completed(setOfDeferreds): >>>> d = Deferred() >>>> called = [] >>>> def fireme(result, whichDeferred): >>>> if not called: >>>> called.append(True) >>>> setOfDeferreds.remove(whichDeferred) >>>> d.callback(result) >>>> return result >>>> for subd in setOfDeferreds: >>>> subd.addBoth(fireme, subd) >>>> return d > > Here's how you do the top-level task in Twisted, without generators, in the truly-parallel fashion (keep in mind this combines the functionality of 'any_completed' and 'parallel', so it's a bit shorter): > >>>> def parallel(N): >>>> ds = DeferredSemaphore(N) >>>> l = [] >>>> def release(result): >>>> ds.release() >>>> return result >>>> def after(sem, it): >>>> return do_something_async(it) >>>> for x in ...: >>>> l.append(ds.acquire().addCallback(after_acquire, x).addBoth(release)) >>>> return gatherResults(l).addCallback(set) > > Some informal benchmarking has shown this method to be considerably faster (on the order of 1/2 to 1/3 as much CPU time) than at least our own inlineCallbacks generator-scheduling method. Take this with the usual fist-sized grain of salt that you do any 'informal' benchmarks, but the difference is significant enough that I do try to refactor into this style in my own code, and I have seen performance benefits from doing this on more specific benchmarks. > > This is all untested, and that's far too many lines of code to expect to work without testing, but hopefully it gives a pretty good impression of the differences in flavor between the different styles. > >> Yeah, please do explain why Twisted has so much machinery to handle exceptions? > > There are a lot of different implied questions here, so I'll answer a few of those. > > Why does twisted.python.failure exist? The answer to that is that we wanted an object that represented an exception as raised at a particular point, associated with a particular stack, that could live on without necessarily capturing all the state in that stack. If you're going to report failures asynchronously, you don't necessarily want to hold a reference to every single thing in a potentially giant stack while you're waiting to send it to some network endpoint. Also, in 1.5.2 we had no way of chaining exceptions, and this code is that old. Finally, even if you can chain exceptions, it's a serious performance hit to have to re-raise and re-catch the same exception 4 or 5 times in order to translate it or handle it at many different layers of the stack, so a Failure is intended to encapsulate that state such that it can just be returned, in performance-sensitive areas. (This is sort of a weak point though, since the performance of Failure itself is so terrible, for u nrelated reasons.) > > Why is twisted.python.failure such a god damned mess? The answer to that is ... uh, sorry. Yes, it is. We should clean it up. It was written a long time ago and the equivalent module now could be _much_ shorter, simpler, and less of a performance problem. It just never seems to be the highest priority. Maybe after we're done porting to py3 :). My one defense here is that still a slight improvement over the stdlib 'traceback' module ;-). > > Why do Deferreds have an errback chain rather than just handing you an exception object in the callback chain? Basically, this is for the same reason that Python has exceptions instead of just making you check return codes. We wanted it to be easy to say: > >>>> d = getPage("http://...") >>>> def ok(page): >>>> doSomething(...) >>>> d.addCallback(ok) > > and know that the argument to 'ok' would always be what getPage promised (you don't need to typecheck it for exception-ness) and the default error behavior would be to simply bail out with a traceback, not to barrel through your success-path code wreaking havoc. > >> ISTM that the main difference is that add_done_callback() isn't meant for callbacks that return a value. > > > add_done_callback works fine with callbacks that return a value. If it didn't, I'd be concerned, because then it would have the barrel-through-the-success-path flaw. But, I assume the idiomatic asynchronous-code-using-Futures would look like this: > >>>> f = some_future_thing(...) >>>> def my_callback(future): >>>> result = future.result() >>>> do_something(result) >>>> f.add_done_callback(my_callback) > > This is one extra line of code as compared to the Twisted version, and chaining involves a bit more gymnastics (somehow creating more futures to return further up the stack, I guess, I haven't thought about it too hard), but it does allow you to handle exceptions with a simple 'except:', rather than calling some exception-handling methods, so I can see why some people would prefer it. > >> Maybe it's possible to write a little framework that lets you create Futures using either threads, processes (both supported by PEP 3148) or generators. But I haven't tried it. And maybe the need to use 'yield' for everything that may block when using generators, but not when using threads or processes, will make this awkward. > > You've already addressed the main point that I really wanted to mention here, but I'd like to emphasize it. Blocking and not-blocking are fundamentally different programming styles, and if you sometimes allow blocking on asynchronous results, that means you are effectively always programming in the blocking-and-threaded style and not getting much benefit from the code which does choose to be politely non-blocking. > > I was somewhat pleased with the changes made to the Futures PEP because you could use them as an asynchronous result, and have things that implemented the Future API but raised an exception if you tried to wait on them. That would at least allow some layer of stdlib compatibility. If you are disciplined and careful, this would let you write async code which used a common interoperability mechanism, and if you weren't careful, it would blow up when you tried to use it the wrong way. > > But - and I am guessing that this is the main thrust of this discussion - I do think that having Deferred in the standard library would be much, much better if we can do that. > >> So maybe we'll be stuck with at least two Future-like APIs: PEP 3148 and something else, generator-based. > > Having something "generator-based" is, in my opinion, an abstraction inversion. The things which you are yielding from these generators are asynchronous results. There should be a specific type for asynchronous results which can be easily interacted with. Generators are syntactic sugar for doing that interaction in a way which doesn't involve defining tons of little functions. This is useful, and it makes the concept more accessible, so I don't say "just" syntactic sugar: but nevertheless, the generators need to be 'yield'ing something, and the type of thing that they're yielding is a Deferred-or-something-like-it. > > I don't think that this is really two 'Future-like APIs'. At least, they're not redundant, any more than having both socket.makefile() and socket.recv() is redundant. > > If Future had a deferred() method rather than an add_done_callback() method, then it would always be very clear whether you had a synchronous-but-possibly-not-ready or a purely-asynchronous result. Although it would be equally easy to just have a function that turned a Future into a Deferred by calling add_done_callback(). You can go from any arbitrary Future to a full-featured Deferred, but not the other way around. > >> Or maybe PEP 3152. > > > I don't like PEP 3152 aesthetically on many levels, but I can't deny that it would do the job. 'cocall', though, really? It would be nice if it read like an actual word, i.e. "yield to" or "invoke" or even just "call" or something. > > In another message, where Guido is replying to Antoine: > >>> I think the main reason, though, that people find Deferreds inconvenient is that they force you to think in terms of asynchronicity (...) >> >> Actually I think the main reason is historic: Twisted introduced callback-based asynchronous (thread-less) programming when there was no alternative in Python, and they invented both the mechanisms and the terminology as they were figuring it all out. That is no mean feat. But with PEP 342 (generator-based coroutines) and especially PEP 380 (yield from) there *is* an alternative, and while Twisted has added APIs to support generators, it hasn't started to deprecate its other APIs, and its terminology becomes hard to follow for people (like me, frankly) who first learned this stuff through PEP 342. > > I really have to go with Antoine on this one: people were confused about Deferreds long before PEP 342 came along :). Given that Javascript environments have mostly adopted the Twisted terminology (oddly, Node.js doesn't, but Dojo and MochiKit both have pretty literal-minded Deferred translations), there are plenty of people who are familiar with the terminology but still get confused. > > See the beginning of the message for why we're not deprecating our own APIs. > > Once again, sorry for not compressing this down further! If you got this far, you win a prize :). From guido at python.org Tue Sep 21 01:49:04 2010 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Sep 2010 16:49:04 -0700 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object In-Reply-To: <4C97D50F.1000908@openvpn.net> References: <4C97D50F.1000908@openvpn.net> Message-ID: On Mon, Sep 20, 2010 at 2:41 PM, James Yonan wrote: > I think that Glyph hit the nail on the head when he said that "you can go > from any arbitrary Future to a full-featured Deferred, but not the other way > around." Where by "go from X to Y" you mean "take a program written using X and change it to use Y", right? > This is exactly my concern, and the reason why I think it's important for > Python to standardize on an async result type that is sufficiently general > that it can accommodate the different kinds of async semantics in common use > in the Python world today. I think I get your gist. Unfortunately there's only a small number of people who know enough about async semantics in order to write the PEP that is needed. > If you don't think this is a problem, just Google for "twisted vs. tornado". > ?While the debate is sometimes passionate and rude, Is it ever distanced and polite? :-) > it points to the > fragmentation that has occured in the Python async space due to the lack of > direction from the standard library. ?And there's a real cost to this > fragmentation -- it's not easy to build an application that uses different > async frameworks when there's no standardized result object or reactor > model. But, circularly, the lack of direction from the standard library is that nobody has contributed an async framework to the standard library since asyncore was added in, oh, 1999. > My concern is that PEP 3148 was really designed for the purpose of thread > and process pooling, and that the Future object is designed with the minimum > functionality required to achieve this end. ?The problem is that the Future > object starts to look like a stripped-down version of a Twisted Deferred. > ?And that begs the question of why are we standardizing on the special case > and not the general case? Because we could reach agreement fairly quickly on PEP 3148. There are some core contributors who know threads and processes inside out, and after several rounds of comments (a lot, really) they were satisfied. At this point it is probably best to forget about PEP 3148 if you want to improve the async situation in the stdlib, and start thinking about that async PEP instead. > Wouldn't it be better to break this into two problems: > > * Develop a full-featured standard async result type and reactor model to > facilitate interoperability of different async libraries. ?This would > consist of a standard async result type and an abstract base class for a > reactor model. Unless you want to propose to include Twisted into the stdlib, this is not going to be ready for inclusion into Python 3.2. > * Let PEP 3148 focus on the problem of thread and process pooling and > leverage on the above async result type. But PEP 3148 *is* ready for inclusion in Python 3.2. So you've got the ordering wrong. It doesn't make sense to hold up PEP 3148, waiting for the perfect solution to appear. In fact, the changes that were made to PEP 3148 at Glyph's suggestion are probably all you are going to get regarding PEP 3148. > The semantics that a general async type should support include: > > 1. Semantics that allow you to define a callback channel for results and and > optionally a separate channel for exceptions as well. > > 2. Semantics that offer the flexibility of working with async results at the > callback level or at the generator level (having a separate channel for > exceptions makes it easy for the generator decorator implementation (that > facilitates "yield function_returning_async_object()") to dispatch > exceptions into the caller). > > 3. Semantics that can easily be used to pass results and exceptions back > from thread or process pools. > > 4. Semantics that allow for aggregate processing of parallel asynchronous > results, such as "fire async result when all of the async results in an > async set have fired" or "fire async result when the first result from an > async set has fired." > > Deferreds presently support all of the above. ?My point here is not so much > that Deferreds should be the standard, but that whatever standard is chosen, > that the semantics be general enough that different async Python > libraries/platforms can interoperate. Do you want to champion a PEP? I hope you do -- it will be a long march but rewarding, especially if you get the Tornado folks to participate and contribute. -- --Guido van Rossum (python.org/~guido) From andrew at bemusement.org Tue Sep 21 07:39:11 2010 From: andrew at bemusement.org (Andrew Bennetts) Date: Tue, 21 Sep 2010 15:39:11 +1000 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object In-Reply-To: References: <4C97D50F.1000908@openvpn.net> Message-ID: <20100921053911.GD18831@aihal.home.puzzling.org> Guido van Rossum wrote: [...] > > Unless you want to propose to include Twisted into the stdlib, this is > not going to be ready for inclusion into Python 3.2. I don't think anyone has suggested "include Twisted". What is being suggested is "include twisted.internet.defer, or something about as useful." Let's consider just how hard it would be to just adding twisted/internet/defer.py to the stdlib (possibly as 'deferred.py'). It's already almost a standalone module, especially if pared back to just the Deferred class and maybe one or two of the most useful helpers (e.g. gatherResults, to take a list of Deferreds and turn them into a single Deferred that fires when they have all fired). The two most problematic dependencies would be: 1) twisted.python.log, which for these purposes could be replaced with a call to a user-replaceable hook whenever an unhandled error occurs (similiar to sys.excepthook). 2) twisted.python.failure... this one is harder. As glyph said, it provides "an object that represent[s] an exception as raised at a particular point, associated with a particular stack". But also, as he said, it's a mess and could use a clean up. Cleaning it up or thinking of a simpler replacement is not insurmountable, but probably too ambitious for Python 3.2's schedule. My point is that adding the Deferred abstraction to the stdlib is a *much* smaller and more reasonable proposition than "include Twisted." -Andrew. From jnoller at gmail.com Tue Sep 21 15:25:13 2010 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 21 Sep 2010 09:25:13 -0400 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object In-Reply-To: <20100921053911.GD18831@aihal.home.puzzling.org> References: <4C97D50F.1000908@openvpn.net> <20100921053911.GD18831@aihal.home.puzzling.org> Message-ID: On Tue, Sep 21, 2010 at 1:39 AM, Andrew Bennetts wrote: > Guido van Rossum wrote: > [...] >> >> Unless you want to propose to include Twisted into the stdlib, this is >> not going to be ready for inclusion into Python 3.2. > > I don't think anyone has suggested "include Twisted". ?What is being suggested > is "include twisted.internet.defer, or something about as useful." > > Let's consider just how hard it would be to just adding > twisted/internet/defer.py to the stdlib (possibly as 'deferred.py'). ?It's > already almost a standalone module, especially if pared back to just the > Deferred class and maybe one or two of the most useful helpers (e.g. > gatherResults, to take a list of Deferreds and turn them into a single Deferred > that fires when they have all fired). > > The two most problematic dependencies would be: > > ?1) twisted.python.log, which for these purposes could be replaced with a call > ? ?to a user-replaceable hook whenever an unhandled error occurs (similiar to > ? ?sys.excepthook). > ?2) twisted.python.failure... this one is harder. ?As glyph said, it provides > ? ?"an object that represent[s] an exception as raised at a particular point, > ? ?associated with a particular stack". ?But also, as he said, it's a mess and > ? ?could use a clean up. ?Cleaning it up or thinking of a simpler replacement > ? ?is not insurmountable, but probably too ambitious for Python 3.2's schedule. > > My point is that adding the Deferred abstraction to the stdlib is a *much* > smaller and more reasonable proposition than "include Twisted." > > -Andrew. No on was seriously proposing including twisted wholesale. There has been discussion, off and on *for years* about doing including a stripped down deferred object; and yet no one has stepped up to *do it*, so it might be hilariously easy, it might be a 40 line module, but it doesn't matter if no one steps up to do the pep, and commit the code, and commit to maintaining it. jesse From ncoghlan at gmail.com Tue Sep 21 15:40:28 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 21 Sep 2010 23:40:28 +1000 Subject: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object In-Reply-To: References: <4C97D50F.1000908@openvpn.net> <20100921053911.GD18831@aihal.home.puzzling.org> Message-ID: On Tue, Sep 21, 2010 at 11:25 PM, Jesse Noller wrote: > There has > been discussion, off and on *for years* about doing including a > stripped down deferred object; and yet no one has stepped up to *do > it*, so it might be hilariously easy, it might be a 40 line module, > but it doesn't matter if no one steps up to do the pep, and commit the > code, and commit to maintaining it. Indeed. Thread and process pools had similarly been talked about for quite some time before Brian stepped up to actually do the work of writing and championing PEP 3148. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From michael.s.gilbert at gmail.com Tue Sep 21 20:44:52 2010 From: michael.s.gilbert at gmail.com (Michael Gilbert) Date: Tue, 21 Sep 2010 14:44:52 -0400 Subject: [Python-ideas] Including elementary mathematical functions in the python data model Message-ID: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com> Hi, It would be really nice if elementary mathematical operations such as sin/cosine (via __sin__ and __cos__) were available as base parts of the python data model [0]. This would make it easier to write new math classes, and it would eliminate the ugliness of things like self.exp(). This would also eliminate the need for separate math and cmath libraries since those could be built into the default float and complex types. Of course if those libs were removed, that would be a potential backwards compatibility issue. It would also help new users who just want to do math and don't know that they need to import separate classes just for elementary math functionality. I think full coverage of the elementary function set would be the goal (i.e. exp, sqrt, ln, trig, and hyperbolic functions). This would not include special functions since that would be overkill, and they are already handled well by scipy and numpy. Anyway, just a thought. Best wishes, Mike [0] http://docs.python.org/reference/datamodel.html From ncoghlan at gmail.com Tue Sep 21 23:53:09 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 22 Sep 2010 07:53:09 +1000 Subject: [Python-ideas] Including elementary mathematical functions in the python data model In-Reply-To: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com> References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com> Message-ID: On Wed, Sep 22, 2010 at 4:44 AM, Michael Gilbert wrote: > Hi, > > It would be really nice if elementary mathematical operations such as > sin/cosine (via __sin__ and __cos__) were available as base parts of > the python data model [0]. ?This would make it easier to write new math > classes, and it would eliminate the ugliness of things like self.exp(). > > This would also eliminate the need for separate math and cmath > libraries since those could be built into the default float and complex > types. ?Of course if those libs were removed, that would be a potential > backwards compatibility issue. > > It would also help new users who just want to do math and don't know > that they need to import separate classes just for elementary math > functionality. > > I think full coverage of the elementary function set would be the goal > (i.e. exp, sqrt, ln, trig, and hyperbolic functions). ?This would not > include special functions since that would be overkill, and they are > already handled well by scipy and numpy. I think the basic problem here is that, by comparison to the basic syntax-driven options, the additional functionality covered by the math, cmath and decimal modules is much harder to implement both correctly and efficiently. It's hard enough making good algorithms that work on a single data type with a known representation, let alone ones which work on arbitrary data types. Also, needing exp, sqrt, ln, trig and hyperbolic functions is *significantly* less common than the core mathematical options, so telling people to do "from math import *" if they want to do a lot of mathematical operations at the interactive prompt isn't much of a hurdle. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From cs at zip.com.au Thu Sep 23 00:31:36 2010 From: cs at zip.com.au (Cameron Simpson) Date: Thu, 23 Sep 2010 08:31:36 +1000 Subject: [Python-ideas] Python needs a standard asynchronous return object In-Reply-To: <4C97D50F.1000908@openvpn.net> References: <4C97D50F.1000908@openvpn.net> Message-ID: <20100922223136.GA23975@cskk.homeip.net> On 20Sep2010 15:41, James Yonan wrote: [...] | * Develop a full-featured standard async result type and reactor | model to facilitate interoperability of different async libraries. | This would consist of a standard async result type and an abstract | base class for a reactor model. | | * Let PEP 3148 focus on the problem of thread and process pooling | and leverage on the above async result type. | | The semantics that a general async type should support include: | | 1. Semantics that allow you to define a callback channel for results | and and optionally a separate channel for exceptions as well. | | 2. Semantics that offer the flexibility of working with async | results at the callback level or at the generator level (having a | separate channel for exceptions makes it easy for the generator | decorator implementation (that facilitates "yield | function_returning_async_object()") to dispatch exceptions into the | caller). | | 3. Semantics that can easily be used to pass results and exceptions | back from thread or process pools. [...] Just to address this particular aspect (return types and notification), I have my own futures-like module, where the equivalent of a Future is called a LateFunction. There are only 3 basic types of return in my model: there's a .report() method in the main (Executor equivalent) class that yields LateFunctions as they complete. A LateFunction has two basic get-the result methods. Having made a LateFunction: LF = Later.defer(func) You can either go: result = LF() This waits for func's ompletion and returns func's return value. If func raises an exception, this raises that exception. Or you can go: result, exc_info = LF.wait() which returns: result, None if func completed without exception and None, exc_info if an exception was raised, where exc_info is a 3-tuple as from sys.exc_info(). At any rate, when looking for completion you can either get LateFunctions as they complete via .report(), or function results plain (that may raise exceptions) or function (results xor exceptions). This makes implementing the separate streams (results vs exceptions) models trivial if it is desired while keeping the LateFunction interface simple (few interface methods). Yes, I know there's no timeout stuff in there :-( Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ By God, Mr. Chairman, at this moment I stand astonished at my own moderation! - Baron Robert Clive of Plassey From tristanz at gmail.com Thu Sep 23 06:41:19 2010 From: tristanz at gmail.com (Tristan Zajonc) Date: Thu, 23 Sep 2010 00:41:19 -0400 Subject: [Python-ideas] Python needs a standard asynchronous return object In-Reply-To: <20100922223136.GA23975@cskk.homeip.net> References: <4C97D50F.1000908@openvpn.net> <20100922223136.GA23975@cskk.homeip.net> Message-ID: I'm not an expert on this subject by any stretch, but have been following the discussion with interest. One of the more interesting ideas out of Microsoft in the last few years is their Reactive Framework (http://msdn.microsoft.com/en-us/devlabs/ee794896.aspx), which implements IObserver and IObservable as the dual to IEnumerator and IEnumerable. This makes operators on events just as composable as operators on enumerables. It also comes after several other attempts to formalize a standard async programming pattern. The ideas seam pretty generic, since they've released a javascript version of the approach as well. The basic interface is very simple, consisting of a subscribe method on IObservable and on_next, on_completed, and on_error methods for IObserver. The power comes from the extension methods, similar to itertools, defined in the Observable class (http://bit.ly/acBhbP). These methods provide a huge range of composable functionality. For instance, using a chaining style, consider a async webclient module that takes a bunch of urls: responses = webclient.get(['http://www1.cnn.com', 'http://www2.cnn.com']) responses.filter(lambda x: x.status == 200).first().do(lambda x: print(x.body)) The filter is nonblocking and returns another observable. The first() blocks and returns after the first document is received. The do calls a method. Multiple async streams can be composed together in all sorts of ways. For instance, http = webclient.get(['http://www.cnn.com', 'http://www.nyt.com']) https = webclient.get(['https://www.cnn.com', 'https://www.nyt.com']) http.zip(https).filter(lambda x, y: x.status == 200 and y.status == 200).start(lambda x, y: slow_save(x, y)) This never blocks. It downloads both the https and http versions of web pages, zips them into a new observable, filters sites with both http and https, and then saves asynchronously the remaining sites. I personally find this easy to reason about, and much easier than manually specifying a callback chain. Errors and completed events propagate through these chains intuitively. "Marble diagrams" help with intuition here (http://bit.ly/cl7Oad). All you need to do is implement the observable interface and you get all the composibility for free. Or you can just use any number of simple methods to convert things to observables (http://bit.ly/7VMnKv), such as observable.start(lambda: print("hi")). Or use decorators. If the observable interface became standard, all future async libraries would be composable, and their would also be a growing collection of observabletools. As somebody who is new to async programming, I quite quickly grasped this reactive approach even though I was otherwise completely unfamiliar with C#. While it may be due to my lack of experience, I still get confused when thinking about callback chains and error channels. For instance, I have no idea how to zip an async http call and a mongodb call into a simple observable that returns a tuple when both respond and then alerts the user. This would be as simple as webclient.get().zip(mongodb.get()).start(flash_completed_message) or maybe it's more pythonic to write obstools.start(obstools.zip(mongodb.get(), webclient.get), flash_completed_message) although I've never like this inside out style. But perhaps I missed the point of this thread? Tristan On Wed, Sep 22, 2010 at 6:31 PM, Cameron Simpson wrote: > On 20Sep2010 15:41, James Yonan wrote: > [...] > | * Develop a full-featured standard async result type and reactor > | model to facilitate interoperability of different async libraries. > | This would consist of a standard async result type and an abstract > | base class for a reactor model. > | > | * Let PEP 3148 focus on the problem of thread and process pooling > | and leverage on the above async result type. > | > | The semantics that a general async type should support include: > | > | 1. Semantics that allow you to define a callback channel for results > | and and optionally a separate channel for exceptions as well. > | > | 2. Semantics that offer the flexibility of working with async > | results at the callback level or at the generator level (having a > | separate channel for exceptions makes it easy for the generator > | decorator implementation (that facilitates "yield > | function_returning_async_object()") to dispatch exceptions into the > | caller). > | > | 3. Semantics that can easily be used to pass results and exceptions > | back from thread or process pools. > [...] > > Just to address this particular aspect (return types and notification), > I have my own futures-like module, where the equivalent of a Future is > called a LateFunction. > > There are only 3 basic types of return in my model: > > ?there's a .report() method in the main (Executor equivalent) class > ?that yields LateFunctions as they complete. > > ?A LateFunction has two basic get-the result methods. Having made a > ?LateFunction: > ? ?LF = Later.defer(func) > > ?You can either go: > ? ?result = LF() > ?This waits for func's ompletion and returns func's return value. > ?If func raises an exception, this raises that exception. > > ?Or you can go: > ? ?result, exc_info = LF.wait() > ?which returns: > ? ?result, None > ?if func completed without exception and > ? ?None, exc_info > ?if an exception was raised, where exc_info is a 3-tuple as from > ?sys.exc_info(). > > At any rate, when looking for completion you can either get > LateFunctions as they complete via .report(), or function results plain > (that may raise exceptions) or function (results xor exceptions). > > This makes implementing the separate streams (results vs exceptions) models > trivial if it is desired while keeping the LateFunction interface simple > (few interface methods). > > Yes, I know there's no timeout stuff in there :-( > > Cheers, > -- > Cameron Simpson DoD#743 > http://www.cskk.ezoshosting.com/cs/ > > By God, Mr. Chairman, at this moment I stand astonished at my own moderation! > ? ? ? ?- Baron Robert Clive of Plassey > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From tristanz at gmail.com Thu Sep 23 07:04:34 2010 From: tristanz at gmail.com (Tristan Zajonc) Date: Thu, 23 Sep 2010 01:04:34 -0400 Subject: [Python-ideas] Python needs a standard asynchronous return object In-Reply-To: References: <4C97D50F.1000908@openvpn.net> <20100922223136.GA23975@cskk.homeip.net> Message-ID: I should note that it should be possible to convert the twisted, twisted, eventlet, monocle, and other existing async libraries to observables pretty easily. The Javascript Rx library, for instance, already wraps the events from dojo, extjs, google maps, jquery, google translate, microsoft translate, mootools, prototype, raphael, virtualearth, and yui3, and keeps adding others to enable composability between different event driven widgets/frameworks. Tristan On Thu, Sep 23, 2010 at 12:41 AM, Tristan Zajonc wrote: > I'm not an expert on this subject by any stretch, but have been > following the discussion with interest. > > One of the more interesting ideas out of Microsoft in the last few > years is their Reactive Framework > (http://msdn.microsoft.com/en-us/devlabs/ee794896.aspx), which > implements IObserver and IObservable as the dual to IEnumerator and > IEnumerable. ?This makes operators on events just as composable as > operators on enumerables. ?It also comes after several other attempts > to formalize a standard async programming pattern. ?The ideas seam > pretty generic, since they've released a javascript version of the > approach as well. > > The basic interface is very simple, consisting of a subscribe method > on IObservable and on_next, on_completed, and on_error methods for > IObserver. ?The power comes from the extension methods, similar to > itertools, defined in the Observable class (http://bit.ly/acBhbP). > These methods provide a huge range of composable functionality. > > For instance, using a chaining style, consider a async webclient > module that takes a bunch of urls: > > responses = webclient.get(['http://www1.cnn.com', 'http://www2.cnn.com']) > responses.filter(lambda x: x.status == 200).first().do(lambda x: print(x.body)) > > The filter is nonblocking and returns another observable. ?The first() > blocks and returns after the first document is received. ?The do calls > a method. Multiple async streams can be composed together in all sorts > of ways. ?For instance, > > http = webclient.get(['http://www.cnn.com', 'http://www.nyt.com']) > https = webclient.get(['https://www.cnn.com', 'https://www.nyt.com']) > http.zip(https).filter(lambda x, y: x.status == 200 and y.status == > 200).start(lambda x, y: slow_save(x, y)) > > This never blocks. ?It downloads both the https and http versions of > web pages, zips them into a new observable, filters sites with both > http and https, and then saves asynchronously the remaining sites. ?I > personally find this easy to reason about, and much easier than > manually specifying a callback chain. ?Errors and completed events > propagate through these chains intuitively. "Marble diagrams" help > with intuition here (http://bit.ly/cl7Oad). > > All you need to do is implement the observable interface and you get > all the composibility for free. Or you can just use any number of > simple methods to convert things to observables > (http://bit.ly/7VMnKv), such as observable.start(lambda: print("hi")). > ?Or use decorators. ?If the observable interface became standard, all > future async libraries would be composable, and their would also be a > growing collection of observabletools. > > As somebody who is new to async programming, I quite quickly grasped > this reactive approach even though I was otherwise completely > unfamiliar with C#. ? While it may be due to my lack of experience, I > still get confused when thinking about callback chains and error > channels. ?For instance, I have no idea how to zip an async http call > and a mongodb call into a simple observable that returns a tuple when > both respond and then alerts the user. ?This would be as simple as > > webclient.get().zip(mongodb.get()).start(flash_completed_message) > > or maybe it's more pythonic to write > > obstools.start(obstools.zip(mongodb.get(), webclient.get), > flash_completed_message) > > although I've never like this inside out style. > > But perhaps I missed the point of this thread? > > Tristan > > On Wed, Sep 22, 2010 at 6:31 PM, Cameron Simpson wrote: >> On 20Sep2010 15:41, James Yonan wrote: >> [...] >> | * Develop a full-featured standard async result type and reactor >> | model to facilitate interoperability of different async libraries. >> | This would consist of a standard async result type and an abstract >> | base class for a reactor model. >> | >> | * Let PEP 3148 focus on the problem of thread and process pooling >> | and leverage on the above async result type. >> | >> | The semantics that a general async type should support include: >> | >> | 1. Semantics that allow you to define a callback channel for results >> | and and optionally a separate channel for exceptions as well. >> | >> | 2. Semantics that offer the flexibility of working with async >> | results at the callback level or at the generator level (having a >> | separate channel for exceptions makes it easy for the generator >> | decorator implementation (that facilitates "yield >> | function_returning_async_object()") to dispatch exceptions into the >> | caller). >> | >> | 3. Semantics that can easily be used to pass results and exceptions >> | back from thread or process pools. >> [...] >> >> Just to address this particular aspect (return types and notification), >> I have my own futures-like module, where the equivalent of a Future is >> called a LateFunction. >> >> There are only 3 basic types of return in my model: >> >> ?there's a .report() method in the main (Executor equivalent) class >> ?that yields LateFunctions as they complete. >> >> ?A LateFunction has two basic get-the result methods. Having made a >> ?LateFunction: >> ? ?LF = Later.defer(func) >> >> ?You can either go: >> ? ?result = LF() >> ?This waits for func's ompletion and returns func's return value. >> ?If func raises an exception, this raises that exception. >> >> ?Or you can go: >> ? ?result, exc_info = LF.wait() >> ?which returns: >> ? ?result, None >> ?if func completed without exception and >> ? ?None, exc_info >> ?if an exception was raised, where exc_info is a 3-tuple as from >> ?sys.exc_info(). >> >> At any rate, when looking for completion you can either get >> LateFunctions as they complete via .report(), or function results plain >> (that may raise exceptions) or function (results xor exceptions). >> >> This makes implementing the separate streams (results vs exceptions) models >> trivial if it is desired while keeping the LateFunction interface simple >> (few interface methods). >> >> Yes, I know there's no timeout stuff in there :-( >> >> Cheers, >> -- >> Cameron Simpson DoD#743 >> http://www.cskk.ezoshosting.com/cs/ >> >> By God, Mr. Chairman, at this moment I stand astonished at my own moderation! >> ? ? ? ?- Baron Robert Clive of Plassey >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > From ziade.tarek at gmail.com Thu Sep 23 16:37:21 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 23 Sep 2010 16:37:21 +0200 Subject: [Python-ideas] ABC: what about the method arguments ? Message-ID: Hello, ABC __subclasshook__ implementations will only check that the method is present in the class. That's the case for example in collections.Container. It will check that the __contains__ method is present but that's it. It won't check that the method has only one argument. e.g. __contains__(self, x) The problem is that the implemented method could have a different list of arguments and will eventually fail. Using inspect, we could check in __subclasshook__ that the arguments defined are the same than the ones defined in the abstractmethod.-- the name and the ordering. I can even think of a small function in ABC for that: same_signature(method1, method2) => True or False Regards Tarek -- Tarek Ziad? | http://ziade.org From guido at python.org Thu Sep 23 16:53:37 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Sep 2010 07:53:37 -0700 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: Message-ID: That is not a new idea. So far I have always rejected it because I worry about both false positives and false negatives. Trying to enforce that the method *behaves* as it should (or even its return type) is hopeless; there can be a variety of reasons to modify the argument list while still conforming to (the intent of) the interface. I also worry that it will slow everything down. That said, if you want to provide a standard mechanism that can *optionally* be turned on to check argument conformance, e.g. by using a class or method decorator on the subclass, I would be fine with that (as long as it runs purely at class-definition time; it shouldn't slow down class instantiation or method calls). It will probably even find some bugs. It will also surely have to be tuned to avoid certain classes false positives. --Guido On Thu, Sep 23, 2010 at 7:37 AM, Tarek Ziad? wrote: > Hello, > > ABC __subclasshook__ implementations will only check that the method > is present in the class. That's the case for example in > collections.Container. It will check that the __contains__ method is > present but that's it. It won't check that the method has only one > argument. e.g. __contains__(self, x) > > The problem is that the implemented method could have a different list > of arguments and will eventually fail. > > Using inspect, we could check in __subclasshook__ that the arguments > defined are the same than the ones defined in the abstractmethod.-- > the name and the ordering. > > I can even think of a small function in ABC for that: > same_signature(method1, method2) => True or False > > Regards > Tarek > > -- > Tarek Ziad? | http://ziade.org > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From daniel at stutzbachenterprises.com Thu Sep 23 16:54:55 2010 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Thu, 23 Sep 2010 09:54:55 -0500 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: Message-ID: On Thu, Sep 23, 2010 at 9:37 AM, Tarek Ziad? wrote: > The problem is that the implemented method could have a different list > of arguments and will eventually fail. A slightly different argument list is okay if it is more permissive. For example, the collections.Sequence ABC defines a count method with one parameter. However, the list implementation's count method takes one mandatory parameter plus two optional parameters. I'm not sure how easy it would be to detect a valid but more general signature. You might be interested in the related Issue 9731 ("Add ABCMeta.has_methods and tests that use it"). -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From ziade.tarek at gmail.com Thu Sep 23 17:01:29 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 23 Sep 2010 17:01:29 +0200 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: Message-ID: On Thu, Sep 23, 2010 at 4:53 PM, Guido van Rossum wrote: > That is not a new idea. So far I have always rejected it because I > worry about both false positives and false negatives. Trying to > enforce that the method *behaves* as it should (or even its return > type) is hopeless; there can be a variety of reasons to modify the > argument list while still conforming to (the intent of) the interface. > I also worry that it will slow everything down. Right > > That said, if you want to provide a standard mechanism that can > *optionally* be turned on to check argument conformance, e.g. by using > a class or method decorator on the subclass, I would be fine with that > (as long as it runs purely at class-definition time; it shouldn't slow > down class instantiation or method calls). It will probably even find > some bugs. It will also surely have to be tuned to avoid certain > classes false positives. I'll experiment on this and come back :) Regards Tarek From ziade.tarek at gmail.com Thu Sep 23 17:08:03 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 23 Sep 2010 17:08:03 +0200 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: Message-ID: On Thu, Sep 23, 2010 at 4:54 PM, Daniel Stutzbach wrote: > On Thu, Sep 23, 2010 at 9:37 AM, Tarek Ziad? wrote: >> >> The problem is that the implemented method could have a different list >> of arguments and will eventually fail. > > A slightly different argument list is okay if it is more permissive. ?For > example, the collections.Sequence ABC defines a count method with one > parameter. ?However, the list implementation's count method takes one > mandatory parameter plus two optional parameters. ?I'm not sure how easy it > would be to detect a valid but more general signature. Well, with inspect it's possible to see if the extra parameters have defaults values, thus making calls without them still working. > You might be interested in the related Issue 9731 ("Add ABCMeta.has_methods > and tests that use it"). Ah... interesting.. has_methods could possibly have an option to check for the signature --will hack on that when I find some time-- > -- > Daniel Stutzbach, Ph.D. > President, Stutzbach Enterprises, LLC > -- Tarek Ziad? | http://ziade.org From solipsis at pitrou.net Thu Sep 23 17:39:55 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 23 Sep 2010 17:39:55 +0200 Subject: [Python-ideas] ABC: what about the method arguments ? References: Message-ID: <20100923173955.4fc0bb03@pitrou.net> On Thu, 23 Sep 2010 16:37:21 +0200 Tarek Ziad? wrote: > > The problem is that the implemented method could have a different list > of arguments and will eventually fail. > > Using inspect, we could check in __subclasshook__ that the arguments > defined are the same than the ones defined in the abstractmethod.-- > the name and the ordering. I don't think we should steer in the type checking direction. After all, the Python philosophy of dynamicity (dynamism?) is articulated around the idea that checking types "ahead of time" is useless. IMO, ABCs should be used more as a convention for documenting what capabilities a class claims to expose, than for type checking. (also, you'll have a hard time checking methods with *args or **kwargs parameters) Regards Antoine. From ziade.tarek at gmail.com Thu Sep 23 18:18:49 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 23 Sep 2010 18:18:49 +0200 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: <20100923173955.4fc0bb03@pitrou.net> References: <20100923173955.4fc0bb03@pitrou.net> Message-ID: On Thu, Sep 23, 2010 at 5:39 PM, Antoine Pitrou wrote: > On Thu, 23 Sep 2010 16:37:21 +0200 > Tarek Ziad? wrote: >> >> The problem is that the implemented method could have a different list >> of arguments and will eventually fail. >> >> Using inspect, we could check in __subclasshook__ that the arguments >> defined are the same than the ones defined in the abstractmethod.-- >> the name and the ordering. > > I don't think we should steer in the type checking direction. > After all, the Python philosophy of dynamicity (dynamism?) is > articulated around the idea that checking types "ahead of time" is > useless. IMO, ABCs should be used more as a convention for documenting > what capabilities a class claims to expose, than for type checking. I think it goes further than documentation at this point. ABC is present and used in the stdlib, not the doc. So asking a class about its capabilities is a feature we provide for third-party code. Also, not sure what you mean about the "ahead of time", but ABCs can be used with issubclass() to check that an object quacks like it should. This is not opposed to dynamicity. > > (also, you'll have a hard time checking methods with *args or **kwargs > parameters) True, but I don't expect the ABC to define abstract methods with vague arguments. And if it is so, there's no point checking them in that case. So it should definitely be something optional. Regards, Tarek > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Tarek Ziad? | http://ziade.org From solipsis at pitrou.net Thu Sep 23 18:32:49 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 23 Sep 2010 18:32:49 +0200 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: <20100923173955.4fc0bb03@pitrou.net> Message-ID: <1285259569.3178.9.camel@localhost.localdomain> Le jeudi 23 septembre 2010 ? 18:18 +0200, Tarek Ziad? a ?crit : > >> Using inspect, we could check in __subclasshook__ that the arguments > >> defined are the same than the ones defined in the abstractmethod.-- > >> the name and the ordering. > > > > I don't think we should steer in the type checking direction. > > After all, the Python philosophy of dynamicity (dynamism?) is > > articulated around the idea that checking types "ahead of time" is > > useless. IMO, ABCs should be used more as a convention for documenting > > what capabilities a class claims to expose, than for type checking. > > I think it goes further than documentation at this point. ABC is > present and used in the stdlib, not the doc. > So asking a class about its capabilities is a feature we provide for > third-party code. This feature already exists, as you mention, using issubclass() or isinstance(). What you are asking for is a different feature: check that a class has an appropriate implementation of the advertised capabilities. Traditionally, this is best left to unit testing (or other forms of test-based checking). Do you have an use case where unit testing would not be appropriate for this? > > (also, you'll have a hard time checking methods with *args or **kwargs > > parameters) > > True, but I don't expect the ABC to define abstract methods with vague > arguments. It depends on the arguments. And the implementation could definitely use *args or **kwargs arguments, especially if it acts as a proxy. Regards Antoine. From ziade.tarek at gmail.com Thu Sep 23 19:51:35 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 23 Sep 2010 19:51:35 +0200 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: <1285259569.3178.9.camel@localhost.localdomain> References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> Message-ID: On Thu, Sep 23, 2010 at 6:32 PM, Antoine Pitrou wrote: ... > This feature already exists, as you mention, using issubclass() or > isinstance(). What you are asking for is a different feature: check that > a class has an appropriate implementation of the advertised > capabilities. Traditionally, this is best left to unit testing (or other > forms of test-based checking). > > Do you have an use case where unit testing would not be appropriate for > this? Why are you thinking about unit tests ? Don't you ever use issubclass/isinstance in your programs ? Checking signatures using ABC when you create a plugin system is one use case for instance. > >> > (also, you'll have a hard time checking methods with *args or **kwargs >> > parameters) >> >> True, but I don't expect the ABC to define abstract methods with vague >> arguments. > > It depends on the arguments. And the implementation could definitely use > *args or **kwargs arguments, especially if it acts as a proxy. Sure but ISTM that most of the time signatures are well defined, and proxies lives in an upper layer. Regards Tarek From solipsis at pitrou.net Thu Sep 23 20:01:33 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 23 Sep 2010 20:01:33 +0200 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> Message-ID: <1285264893.3178.14.camel@localhost.localdomain> Le jeudi 23 septembre 2010 ? 19:51 +0200, Tarek Ziad? a ?crit : > On Thu, Sep 23, 2010 at 6:32 PM, Antoine Pitrou wrote: > ... > > This feature already exists, as you mention, using issubclass() or > > isinstance(). What you are asking for is a different feature: check that > > a class has an appropriate implementation of the advertised > > capabilities. Traditionally, this is best left to unit testing (or other > > forms of test-based checking). > > > > Do you have an use case where unit testing would not be appropriate for > > this? > > Why are you thinking about unit tests ? Don't you ever use > issubclass/isinstance in your programs ? Sorry, you don't seem to be answering the question. Why wouldn't the implementor of the class use unit tests to check that his/her class implements the desired ABC? > Checking signatures using ABC when you create a plugin system is one > use case for instance. Again, why do you want to check signatures? Do you not trust plugin authors to write plugins? Also, why do you think checking signatures is actually useful? It only checks that the signature is right, not that the expected semantics are observed. The argument for checking method signature in advance is as weak as the argument for checking types at compile time. > > It depends on the arguments. And the implementation could definitely use > > *args or **kwargs arguments, especially if it acts as a proxy. > > Sure but ISTM that most of the time signatures are well defined, and > proxies lives in an upper layer. Not really. If I write a file object wrapper that proxies some methods to an other file object, I don't want to re-type all method signatures (including default args) by hand. Regards Antoine. From tjreedy at udel.edu Thu Sep 23 20:39:01 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 23 Sep 2010 14:39:01 -0400 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> Message-ID: If I were writing a class intended to implement an particular ABC, I would be happy to have an automated check function that might catch errors. 100% testing is hard to achieve. -- Terry Jan Reedy From solipsis at pitrou.net Thu Sep 23 20:52:24 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 23 Sep 2010 20:52:24 +0200 Subject: [Python-ideas] ABC: what about the method arguments ? References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> Message-ID: <20100923205224.3fc27060@pitrou.net> On Thu, 23 Sep 2010 14:39:01 -0400 Terry Reedy wrote: > If I were writing a class intended to implement an particular ABC, I > would be happy to have an automated check function that might catch > errors. 100% testing is hard to achieve. How would an automatic check function solve anything, if you don't test that the class does what is expected? Again, this is exactly the argument for compile-time type checking, and it is routinely pointed out that it is mostly useless. From ziade.tarek at gmail.com Thu Sep 23 20:59:07 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 23 Sep 2010 20:59:07 +0200 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: <1285264893.3178.14.camel@localhost.localdomain> References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> <1285264893.3178.14.camel@localhost.localdomain> Message-ID: On Thu, Sep 23, 2010 at 8:01 PM, Antoine Pitrou wrote: > Le jeudi 23 septembre 2010 ? 19:51 +0200, Tarek Ziad? a ?crit : >> On Thu, Sep 23, 2010 at 6:32 PM, Antoine Pitrou wrote: >> ... >> > This feature already exists, as you mention, using issubclass() or >> > isinstance(). What you are asking for is a different feature: check that >> > a class has an appropriate implementation of the advertised >> > capabilities. Traditionally, this is best left to unit testing (or other >> > forms of test-based checking). >> > >> > Do you have an use case where unit testing would not be appropriate for >> > this? >> >> Why are you thinking about unit tests ?? Don't you ever use >> issubclass/isinstance in your programs ? > > Sorry, you don't seem to be answering the question. > Why wouldn't the implementor of the class use unit tests to check that > his/her class implements the desired ABC? That's fine indeed. Now, why wouldn't the implementor of an application use ABC to check that the third party class he's about to load in his app implements the desired ABC? > >> Checking signatures using ABC when you create a plugin system is one >> use case for instance. > > Again, why do you want to check signatures? Do you not trust plugin > authors to write plugins? > > Also, why do you think checking signatures is actually useful? It only > checks that the signature is right, not that the expected semantics are > observed. The argument for checking method signature in advance is as > weak as the argument for checking types at compile time. Sorry but it seems that you are now advocating against ABC altogether. Checking the methods, and optionally their attributes is just a deeper operation on something that already exists. It's fine to use those only in your tests, but why do you object that someone would want to use them in their app. This is completely orthogonal to the discussion which is: extend a method checker to check attributes. > >> > It depends on the arguments. And the implementation could definitely use >> > *args or **kwargs arguments, especially if it acts as a proxy. >> >> Sure but ISTM that most of the time signatures are well defined, and >> proxies lives in an upper layer. > > Not really. If I write a file object wrapper that proxies some methods > to an other file object, I don't want to re-type all method signatures > (including default args) by hand. In that case I am curious to see why you would have file I/O method with extra *args/**kwargs. You should handle this kind of set up in the constructor and keep the methods similar. (and avoid extra re-type actually) Regards Tarek > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Thu Sep 23 21:00:12 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 23 Sep 2010 21:00:12 +0200 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: <20100923205224.3fc27060@pitrou.net> References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> <20100923205224.3fc27060@pitrou.net> Message-ID: On Thu, Sep 23, 2010 at 8:52 PM, Antoine Pitrou wrote: > On Thu, 23 Sep 2010 14:39:01 -0400 > Terry Reedy wrote: >> If I were writing a class intended to implement an particular ABC, I >> would be happy to have an automated check function that might catch >> errors. 100% testing is hard to achieve. > > How would an automatic check function solve anything, if you don't test > that the class does what is expected? > > Again, this is exactly the argument for compile-time type checking, and > it is routinely pointed out that it is mostly useless. So are you in favor of the removal of all kind of type checking mechanism in Python ? > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Tarek Ziad? | http://ziade.org From daniel at stutzbachenterprises.com Thu Sep 23 21:03:52 2010 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Thu, 23 Sep 2010 14:03:52 -0500 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: <20100923205224.3fc27060@pitrou.net> References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> <20100923205224.3fc27060@pitrou.net> Message-ID: On Thu, Sep 23, 2010 at 1:52 PM, Antoine Pitrou wrote: > How would an automatic check function solve anything, if you don't test > that the class does what is expected? > Automated checks are a good way to help ensure that your test coverage is good. If the automated check fails and all the other tests pass, it means there's been an oversight in both functionality and tests. This isn't a purely theoretical concern. See Issues 9212 and 9213 for cases where a class purported to support an ABC but wasn't actually supplying all the methods. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Sep 23 21:26:23 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 23 Sep 2010 21:26:23 +0200 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> <1285264893.3178.14.camel@localhost.localdomain> Message-ID: <1285269983.3178.46.camel@localhost.localdomain> Le jeudi 23 septembre 2010 ? 20:59 +0200, Tarek Ziad? a ?crit : > > That's fine indeed. Now, why wouldn't the implementor of an > application use ABC to check that the third party class he's about to > load in his app implements the desired ABC? Why would he? What does it provide him exactly? A false sense of security / robustness? > > Also, why do you think checking signatures is actually useful? It only > > checks that the signature is right, not that the expected semantics are > > observed. The argument for checking method signature in advance is as > > weak as the argument for checking types at compile time. > > Sorry but it seems that you are now advocating against ABC altogether. As I said, I believe ABCs are useful mainly for documentation purposes; that is, for conveying an /intent/. Thinking that ABCs guarantee anything about quality or conformity of the implementation sounds wrong to me. (the other reason for using ABCs is to provide default implementations of some methods, like the io ABCs do) > This is completely orthogonal to the discussion which is: extend a > method checker to check attributes. It's not really orthogonal. I'm opposing the idea that programmatically checking the conformity of method signatures is useful; I also think it's *not* a good thing to advocate to Python programmers coming from other languages. > In that case I am curious to see why you would have file I/O method > with extra *args/**kwargs. def seek(self, *args): return self.realfileobj.seek(*args) > So are you in favor of the removal of all kind of type checking > mechanism in Python ? "Type" checking is simply done when necessary. It is duck typing. Even in the case of ABCs, method calls are still duck-typed. For example, if you look at the io ABCs and concrete classes, a BufferedReader won't check that you are giving it a RawIOBase to wrap access to. Regards Antoine. From guido at python.org Thu Sep 23 21:26:48 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Sep 2010 12:26:48 -0700 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: <20100923205224.3fc27060@pitrou.net> References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> <20100923205224.3fc27060@pitrou.net> Message-ID: On Thu, Sep 23, 2010 at 11:52 AM, Antoine Pitrou wrote: > On Thu, 23 Sep 2010 14:39:01 -0400 > Terry Reedy wrote: >> If I were writing a class intended to implement an particular ABC, I >> would be happy to have an automated check function that might catch >> errors. 100% testing is hard to achieve. > > How would an automatic check function solve anything, if you don't test > that the class does what is expected? > > Again, this is exactly the argument for compile-time type checking, and > it is routinely pointed out that it is mostly useless. That may be the party line of dynamic-language diehards, but that doesn't make it true. There are plenty of times when compile-time checking can save the day, and typically, the larger a system, the more useful it becomes. Antoine, can you back off your attempts to prove that the proposed feature is useless and instead help designing the details of the feature (or if you can't or don't want to help there, just stay out of the discussion)? -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Thu Sep 23 23:42:12 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 24 Sep 2010 07:42:12 +1000 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: <20100923173955.4fc0bb03@pitrou.net> Message-ID: On Fri, Sep 24, 2010 at 2:18 AM, Tarek Ziad? wrote: > I think it goes further than documentation at this point. ABC is > present and used in the stdlib, not the doc. > So asking a class about its capabilities is a feature we provide for > third-party code. Minor nit - we can only ask a fairly limited subset of questions along these lines (i.e. does *this* class/instance implement *this* ABC?). More interesting questions like "which ABCs does this class/instance explicitly implement?" are currently impossible (see http://bugs.python.org/issue5405). Back on topic - I like Guido's approach. While we can debate the merits of LBYL signature checking forever without reaching agreement (for the record, my opinion is that static checks should be thought of as a bunch of implicit unit tests that you get "for free"), providing a way to explicitly request ABC signature checks in the abc module probably isn't a bad idea. If nothing else, invoking that check can become a recommended part of the unit test suite for classes that claim to implement ABCs. Is getting the method signatures right *sufficient* for ABC compliance? No. Is it *necessary*? Yes. It's the latter point that makes this feature potentially worth standardising. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Fri Sep 24 01:15:08 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 23 Sep 2010 19:15:08 -0400 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: <20100923205224.3fc27060@pitrou.net> References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> <20100923205224.3fc27060@pitrou.net> Message-ID: On 9/23/2010 2:52 PM, Antoine Pitrou wrote: > On Thu, 23 Sep 2010 14:39:01 -0400 > Terry Reedy wrote: >> If I were writing a class intended to implement an particular ABC, I >> would be happy to have an automated check function that might catch >> errors. 100% testing is hard to achieve. > > How would an automatic check function solve anything, if you don't test > that the class does what is expected? If all tests are written with calls by position, as is my habit and general preference, they will not catch argument name mismatches that would trip up someone who prefers call by keyword or any introspection-by-name process. -- Terry Jan Reedy From tjreedy at udel.edu Fri Sep 24 02:24:13 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 23 Sep 2010 20:24:13 -0400 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> <20100923205224.3fc27060@pitrou.net> Message-ID: On 9/23/2010 3:26 PM, Guido van Rossum wrote: > On Thu, Sep 23, 2010 at 11:52 AM, Antoine Pitrou wrote: >> On Thu, 23 Sep 2010 14:39:01 -0400 >> Terry Reedy wrote: >>> If I were writing a class intended to implement an particular ABC, I >>> would be happy to have an automated check function that might catch >>> errors. 100% testing is hard to achieve. >> >> How would an automatic check function solve anything, if you don't test >> that the class does what is expected? >> >> Again, this is exactly the argument for compile-time type checking, and >> it is routinely pointed out that it is mostly useless. > > That may be the party line of dynamic-language diehards, but that > doesn't make it true. There are plenty of times when compile-time > checking can save the day, and typically, the larger a system, the > more useful it becomes. Sometimes you surprise me with your non-dogmatic practicality. I do hope, though, that you continue to reject C-like braces {;-}. > Antoine, can you back off your attempts to > prove that the proposed feature is useless and instead help designing > the details of the feature (or if you can't or don't want to help > there, just stay out of the discussion)? Yes, let the cat scratch his itch and see what he produces. Since unittests have been brought up, I have a idea and question. Can this work? Split the current test suite for a concrete class that implements one of the ABCs into concrete-specific and ABC-general portions, with the abstract part parameterized by concrete class. For instance, split test/test_dict.py into test_dict.py and test_Mapping.py, where the latter has all tests that test compliance with the Mapping ABC (or whatever it is called) and the former keeps all the dict-specific extension tests. Rewrite test_Mapping so it is not dict specific, so one could write something like class MyMapping(): "Implement exactly the Mapping ABC with no extras." ... if __name__ == '__main__': from test import test_Mapping as tM tM.concrete = MyMapping tM.runtests() This is similar to but not the same as splitting tests into generic and CPython parts, the latter for reuse by other implementations of the interpreter. (For dicts, test_dict.py could still be so split, or a portion of it made conditional on the platform.) This idea is for reuse of tests by other implementations of ABCs, whatever interpreter implementation they run under. The underlying question is whether ABCs are intended to be an integral part of Python3 or just an optional extra tucked away in a corner (which is how many, including me, still tend to view them)? If the former, then to me they should, if possible, be supported by a semantic validation test suite. In a way, I am agreeing with Antoine's objection that signature validation is not enough, but with the opposite suggestion of extend rather than reject Tarek's idea of providing auto test tools that make writing and using ABCs easier. -- Terry Jan Reedy From digitalxero at gmail.com Fri Sep 24 05:42:41 2010 From: digitalxero at gmail.com (Dj Gilcrease) Date: Thu, 23 Sep 2010 23:42:41 -0400 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> Message-ID: On Thu, Sep 23, 2010 at 1:51 PM, Tarek Ziad? wrote: > Why are you thinking about unit tests ?? Don't you ever use > issubclass/isinstance in your programs ? > > Checking signatures using ABC when you create a plugin system is one > use case for instance. This is something that I have implemented (before ABCs) in plugin systems I use. When loading the plugin I validate all methods exist and that each method has the correct number of required arguments, I generally dont check argument name as my plugin systems all pass by position instead of keyword. If the signature I am checking contains *args it automatically passes the check. If the plugin fails the check I dont load it. On Thu, Sep 23, 2010 at 2:01 PM, Antoine Pitrou wrote: > Again, why do you want to check signatures? Do you not trust plugin > authors to write plugins? No, no I dont. I have had several plugin authors come to me complaining that the plugin system is broken because it wont load their plugin (even with a fairly detailed error message). Dj Gilcrease ?____ ( | ? ? \ ?o ? ?() ? | ?o ?|`| ? | ? ? ?| ? ? ?/`\_/| ? ? ?| | ? ,__ ? ,_, ? ,_, ? __, ? ?, ? ,_, _| ? ? ?| | ? ?/ ? ? ?| ?| ? |/ ? / ? ? ?/ ? | ? |_/ ?/ ? ?| ? / \_|_/ (/\___/ ?|/ ?/(__,/ ?|_/|__/\___/ ? ?|_/|__/\__/|_/\,/ ?|__/ ? ? ? ? ?/| ? ? ? ? ?\| From andrew at bemusement.org Fri Sep 24 07:58:00 2010 From: andrew at bemusement.org (Andrew Bennetts) Date: Fri, 24 Sep 2010 15:58:00 +1000 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> <20100923205224.3fc27060@pitrou.net> Message-ID: <20100924055800.GA2633@aihal.home.puzzling.org> Terry Reedy wrote: [...] > Since unittests have been brought up, I have a idea and question. > Can this work? Split the current test suite for a concrete class > that implements one of the ABCs into concrete-specific and > ABC-general portions, with the abstract part parameterized by > concrete class. FWIW, bzr's test suite has this facility, and bzr plugins that implement various bzr interfaces will have tests for those interfaces automatically applied. (Being a Python 2.4+ project, bzr doesn't actually use the ABCs feature, but we certainly use the concept of "interface with many implemenations".) E.g. if you define a new Transport (in bzr terms, a thing like FTP, HTTP, etc) you probably want to make sure it complies with bzrlib's expectations for Transports. So you can include a get_test_permutations function in your module that returns a list of (transport_class, server_class) pairs. [Unsurprisingly you need a test server to run against, although for transports like LocalTransport (local filesystem access) they can be very simple.] It works very well, and is very useful both for bzrlib itself and plugins. We have ?per-implementation? tests for: branch, bzrdir, repository, interrepository, merger, transport, tree, workingtree, uifactory, and more. Look for bzrlib/tests/per_*. It's not necessarily easy to write all those tests. The more complex an interface, the more likely it is you'll have many tests for that interface that don't really apply to all implementations ? for instance some Transports are read-only, or don't support list_dir, etc. So tests that involve those need to specifically check for that capability and raise NotApplicable, and finding the exact right way to do that can be tricky. It's often easier to say ?if isinstance(thing, ParticularImplementation): ...?, but that quickly erodes the applicability of those tests for new implementations. Also tricky is when the setup or even assertions for some tests needs to vary considerably by implementation: how complicated is your parameterisation interface going to have to be? bzr has found it worthwhile, so I do encourage trying it. I'd use Robert Collins' http://launchpad.net/testscenarios library if I were providing this infrastructure in a suite that doesn't already have this approach; it's basically a distillation of the infrastructure developed in bzrlib.tests. -Andrew. From g.brandl at gmx.net Fri Sep 24 09:15:00 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 24 Sep 2010 09:15:00 +0200 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: Message-ID: Am 23.09.2010 16:37, schrieb Tarek Ziad?: > Hello, > > ABC __subclasshook__ implementations will only check that the method > is present in the class. That's the case for example in > collections.Container. It will check that the __contains__ method is > present but that's it. It won't check that the method has only one > argument. e.g. __contains__(self, x) > > The problem is that the implemented method could have a different list > of arguments and will eventually fail. I'm not concerned about this in the least. Whoever implements a special method with the wrong signature has more pressing problems than a false- positive ABC subclass check. And AFAIK, our ABCs only check for special methods. > Using inspect, we could check in __subclasshook__ that the arguments > defined are the same than the ones defined in the abstractmethod.-- > the name and the ordering. "ordering"? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From daniel at stutzbachenterprises.com Fri Sep 24 16:17:19 2010 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Fri, 24 Sep 2010 09:17:19 -0500 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> <20100923205224.3fc27060@pitrou.net> Message-ID: On Thu, Sep 23, 2010 at 7:24 PM, Terry Reedy wrote: > Can this work? Split the current test suite for a concrete class that > implements one of the ABCs into concrete-specific and ABC-general portions, > with the abstract part parameterized by concrete class. > > For instance, split test/test_dict.py into test_dict.py and > test_Mapping.py, where the latter has all tests that test compliance with > the Mapping ABC (or whatever it is called) and the former keeps all the > dict-specific extension tests. Rewrite test_Mapping so it is not dict > specific, so one could write something like > As a heavy user of the ABCs in the collections module, that would be awesome. :-) It would make my life a lot easier when I'm writing tests to go along with an ABC-derived class. I have 8 such classes on PyPi (heapdict.heapdict and blist.*), plus more in private repositories. There is some code vaguely along those lines in the existing unit tests. For example, Lib/test/seq_tests.py contains tests common to sequences. However, that was written before collections.Sequence came along and the pre-2.6 definition of "sequence" only loosely correlates with a collections.Sequence. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Sep 24 18:20:49 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 24 Sep 2010 12:20:49 -0400 Subject: [Python-ideas] ABC: what about the method arguments ? In-Reply-To: References: <20100923173955.4fc0bb03@pitrou.net> <1285259569.3178.9.camel@localhost.localdomain> <20100923205224.3fc27060@pitrou.net> Message-ID: On 9/24/2010 10:17 AM, Daniel Stutzbach wrote: > On Thu, Sep 23, 2010 at 7:24 PM, Terry Reedy > > wrote: > > Can this work? Split the current test suite for a concrete class > that implements one of the ABCs into concrete-specific and > ABC-general portions, with the abstract part parameterized by > concrete class. > > For instance, split test/test_dict.py into test_dict.py and > test_Mapping.py, where the latter has all tests that test compliance > with the Mapping ABC (or whatever it is called) and the former keeps > all the dict-specific extension tests. Rewrite test_Mapping so it is > not dict specific, so one could write something like Reading the responses, I realized that I am already doing a simplified version of my suggestion for functions rather than classes. For didactic purposes, I am writing multiple implementations of multiple abstract functions. I embody a test for a particular function in an iterable of input-output pairs (where the 'output' can also be an exception class). I use that with a custom super test function that tests one or more callables against the pairs. It works great and it is easy to add another implementation or more pairs. > As a heavy user of the ABCs in the collections module, that would be > awesome. :-) It would make my life a lot easier when I'm writing tests > to go along with an ABC-derived class. I have 8 such classes on PyPi > (heapdict.heapdict and blist.*), plus more in private repositories. > > There is some code vaguely along those lines in the existing unit tests. > For example, Lib/test/seq_tests.py contains tests common to sequences. > However, that was written before collections.Sequence came along and > the pre-2.6 definition of "sequence" only loosely correlates with a > collections.Sequence. Well, pick one existing test file, revise and extend and perhaps split, start a tracker issue with proposed patch, get comments, and perhaps commit it. If you do, add terry.reedy as nosy. -- Terry Jan Reedy From greg.ewing at canterbury.ac.nz Sat Sep 25 03:55:55 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 25 Sep 2010 13:55:55 +1200 Subject: [Python-ideas] =?utf-8?q?=5BPython-Dev=5D_os=2Epath_function_for_?= =?utf-8?b?4oCcZ2V0IHRoZSByZWFsIGZpbGVuYW1l4oCd?= In-Reply-To: <877hia4tte.fsf_-_@benfinney.id.au> References: <4C9531A7.10405@simplistix.co.uk> <4C9C79DA.7000506@simplistix.co.uk> <20100924121737.309071FA5C2@kimball.webabinitio.net> <4C9D21E8.1080005@canterbury.ac.nz> <877hia4tte.fsf_-_@benfinney.id.au> Message-ID: <4C9D56AB.2060602@canterbury.ac.nz> Ben Finney wrote: > Your heuristics seem to assume there will only ever be a maximum of one > match, which is false. I present the following example: > > $ ls foo/ > bAr.dat BaR.dat bar.DAT There should perhaps be an extra step at the beginning: 0) Test whether the specified path refers to an existing file. If not, raise an exception. If that passes, and the file system is case-sensitive, then there must be a directory entry that is an exact match, so it will be returned by step 1. If the file system is case-insensitive, then there can be at most one entry that matches except for case, and it must be the one we're looking for, so there is no need for the extra test in step 2. So the revised algorithm is: 0) Test whether the specified path refers to an existing file. If not, raise an exception. 1) Search the directory for an exact match, return it if found. 2) Search for a match ignoring case, and return one if found. 3) Otherwise, raise an exception. There's also some prior art that might be worth looking at: On Windows, Python checks to see whether the file name of an imported module has the same case as the name being imported, which is a similar problem in some ways. > It seems to me this whole thing should be hashed out on ?python-ideas?. Good point -- I've redirected the discussion there. -- Greg From greg.ewing at canterbury.ac.nz Sat Sep 25 03:56:06 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 25 Sep 2010 13:56:06 +1200 Subject: [Python-ideas] [Python-Dev] os.path.normcase rationale? In-Reply-To: References: <4C9531A7.10405@simplistix.co.uk> <4C9C79DA.7000506@simplistix.co.uk> <20100924121737.309071FA5C2@kimball.webabinitio.net> <4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com> Message-ID: <4C9D56B6.9050908@canterbury.ac.nz> Guido van Rossum wrote: > Maybe the API could be called os.path.unnormpath(), since it is in a > sense the opposite of normpath() (which removes case) ? Cute, but not very intuitive. Something like actualpath() might be better -- although that's somewhat arbitrarily different from realpath(). -- Greg From python at mrabarnett.plus.com Sat Sep 25 04:14:51 2010 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 25 Sep 2010 03:14:51 +0100 Subject: [Python-ideas] [Python-Dev] os.path.normcase rationale? In-Reply-To: <4C9D56B6.9050908@canterbury.ac.nz> References: <4C9531A7.10405@simplistix.co.uk> <4C9C79DA.7000506@simplistix.co.uk> <20100924121737.309071FA5C2@kimball.webabinitio.net> <4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com> <4C9D56B6.9050908@canterbury.ac.nz> Message-ID: <4C9D5B1B.3020709@mrabarnett.plus.com> On 25/09/2010 02:56, Greg Ewing wrote: > Guido van Rossum wrote: > >> Maybe the API could be called os.path.unnormpath(), since it is in a >> sense the opposite of normpath() (which removes case) ? > > Cute, but not very intuitive. Something like actualpath() > might be better -- although that's somewhat arbitrarily > different from realpath(). > 'actualcase' perhaps? Does it need to end in 'path'? From solipsis at pitrou.net Sat Sep 25 12:11:42 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 25 Sep 2010 12:11:42 +0200 Subject: [Python-ideas] [Python-Dev] os.path.normcase rationale? References: <4C9531A7.10405@simplistix.co.uk> <4C9C79DA.7000506@simplistix.co.uk> <20100924121737.309071FA5C2@kimball.webabinitio.net> <4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com> <4C9D56B6.9050908@canterbury.ac.nz> Message-ID: <20100925121142.74fe35e1@pitrou.net> On Sat, 25 Sep 2010 13:56:06 +1200 Greg Ewing wrote: > Guido van Rossum wrote: > > > Maybe the API could be called os.path.unnormpath(), since it is in a > > sense the opposite of normpath() (which removes case) ? > > Cute, but not very intuitive. Something like actualpath() > might be better -- although that's somewhat arbitrarily > different from realpath(). Again, why not simply improve realpath()? From ben+python at benfinney.id.au Sat Sep 25 16:00:57 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 26 Sep 2010 00:00:57 +1000 Subject: [Python-ideas] =?utf-8?b?4oCYb3MucGF0aC5mb2/igJkgZnVuY3Rpb24gdG8g?= =?utf-8?q?get_the_name_of_a_filesystem_entry_=28was=3A_=5BPython-Dev=5D_o?= =?utf-8?q?s=2Epath=2Enormcase_rationale=3F=29?= References: <4C9531A7.10405@simplistix.co.uk> <4C9C79DA.7000506@simplistix.co.uk> <20100924121737.309071FA5C2@kimball.webabinitio.net> <4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com> <4C9D56B6.9050908@canterbury.ac.nz> <20100925121142.74fe35e1@pitrou.net> Message-ID: <87vd5t3oti.fsf_-_@benfinney.id.au> Antoine Pitrou writes: > Again, why not simply improve realpath()? Because that already does what it says it does. The behaviour being asked for is distinct from what ?os.path.normcase? and ?os.path.realpath? are meant to do, so that behaviour belongs in a different place from those two. -- \ ?Value your freedom or you will lose it, teaches history. | `\ ?Don't bother us with politics,? respond those who don't want | _o__) to learn.? ?Richard Stallman, 2002 | Ben Finney From solipsis at pitrou.net Sat Sep 25 16:11:57 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 25 Sep 2010 16:11:57 +0200 Subject: [Python-ideas] reusing realpath() References: <4C9531A7.10405@simplistix.co.uk> <4C9C79DA.7000506@simplistix.co.uk> <20100924121737.309071FA5C2@kimball.webabinitio.net> <4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com> <4C9D56B6.9050908@canterbury.ac.nz> <20100925121142.74fe35e1@pitrou.net> <87vd5t3oti.fsf_-_@benfinney.id.au> Message-ID: <20100925161157.17059398@pitrou.net> On Sun, 26 Sep 2010 00:00:57 +1000 Ben Finney wrote: > Antoine Pitrou > writes: > > > Again, why not simply improve realpath()? > > Because that already does what it says it does. So what? The behaviour of fetching the canonical name can be added to the behaviour of resolving symlinks. It wouldn't be incompatible with the current behaviour AFAICT. And it would be better than adding yet another function to our m?nagerie of path-normalizing functions. We already have abspath(), normpath(), normcase(), realpath() -- all with very descriptive names as you might notice. We don't need another function. Regards Antoine. From guido at python.org Sat Sep 25 22:55:30 2010 From: guido at python.org (Guido van Rossum) Date: Sat, 25 Sep 2010 13:55:30 -0700 Subject: [Python-ideas] reusing realpath() In-Reply-To: <20100925161157.17059398@pitrou.net> References: <4C9531A7.10405@simplistix.co.uk> <4C9C79DA.7000506@simplistix.co.uk> <20100924121737.309071FA5C2@kimball.webabinitio.net> <4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com> <4C9D56B6.9050908@canterbury.ac.nz> <20100925121142.74fe35e1@pitrou.net> <87vd5t3oti.fsf_-_@benfinney.id.au> <20100925161157.17059398@pitrou.net> Message-ID: On Sat, Sep 25, 2010 at 7:11 AM, Antoine Pitrou wrote: > On Sun, 26 Sep 2010 00:00:57 +1000 > Ben Finney wrote: >> Antoine Pitrou >> writes: >> >> > Again, why not simply improve realpath()? >> >> Because that already does what it says it does. > > So what? The behaviour of fetching the canonical name can be added to > the behaviour of resolving symlinks. It wouldn't be incompatible with > the current behaviour AFAICT. And it would be better than adding yet > another function to our m?nagerie of path-normalizing functions. > We already have abspath(), normpath(), normcase(), realpath() -- all > with very descriptive names as you might notice. We don't need another > function. There's no need to get all emotional or sarcastic about it. You might have noticed the risks of sarcasm on this list recently. Instead, it should be possibly to analyze how realpath() is currently used and see if changing it as desired is likely to break any code. TBH, I am personally on the fence and would like to see an analysis including the current and desired behavior in the following cases: - Windows - OS X - Other Unixoid systems Also take into account: - Filesystems whose case behavior is the opposite of the platform default (all three support such filesystems through system configuration and/or mounting) - Relative paths - Paths containing symlinks In any case it is much easier to design and implement the best possible functionality if you don't also have to be backward compatible with an existing function. I think it might be useful to call this new API (let's call it "casefulpath" while we wait for a better name to come to us :-) on a relative path without having the answer turned into an absolute path -- if that's desired it's easy enough to call abspath() or realpath() on the result. -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Sat Sep 25 23:04:03 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 25 Sep 2010 23:04:03 +0200 Subject: [Python-ideas] reusing realpath() In-Reply-To: References: <4C9531A7.10405@simplistix.co.uk> <4C9C79DA.7000506@simplistix.co.uk> <20100924121737.309071FA5C2@kimball.webabinitio.net> <4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com> <4C9D56B6.9050908@canterbury.ac.nz> <20100925121142.74fe35e1@pitrou.net> <87vd5t3oti.fsf_-_@benfinney.id.au> <20100925161157.17059398@pitrou.net> Message-ID: <1285448643.17320.1.camel@localhost.localdomain> Le samedi 25 septembre 2010 ? 13:55 -0700, Guido van Rossum a ?crit : > > There's no need to get all emotional or sarcastic about it. You might > have noticed the risks of sarcasm on this list recently. Ironic considering the naming of the language :) Anyway: > I think it might be useful to > call this new API (let's call it "casefulpath" while we wait for a > better name to come to us :-) realcase() ? From pjenvey at underboss.org Sat Sep 25 23:57:42 2010 From: pjenvey at underboss.org (Philip Jenvey) Date: Sat, 25 Sep 2010 14:57:42 -0700 Subject: [Python-ideas] reusing realpath() In-Reply-To: <20100925161157.17059398@pitrou.net> References: <4C9531A7.10405@simplistix.co.uk> <4C9C79DA.7000506@simplistix.co.uk> <20100924121737.309071FA5C2@kimball.webabinitio.net> <4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com> <4C9D56B6.9050908@canterbury.ac.nz> <20100925121142.74fe35e1@pitrou.net> <87vd5t3oti.fsf_-_@benfinney.id.au> <20100925161157.17059398@pitrou.net> Message-ID: On Sep 25, 2010, at 7:11 AM, Antoine Pitrou wrote: > On Sun, 26 Sep 2010 00:00:57 +1000 > Ben Finney wrote: >> Antoine Pitrou >> writes: >> >>> Again, why not simply improve realpath()? >> >> Because that already does what it says it does. > > So what? The behaviour of fetching the canonical name can be added to > the behaviour of resolving symlinks. It wouldn't be incompatible with > the current behaviour AFAICT. And it would be better than adding yet > another function to our m?nagerie of path-normalizing functions. > We already have abspath(), normpath(), normcase(), realpath() -- all > with very descriptive names as you might notice. We don't need another > function. realpath's docs describe its result as "the canonical path of the specified filename, eliminating any symbolic links encountered in the path (if they are supported by the operating system)". "Canonical" should describe the behavior we're after, with the correct case of the filename as it is actually stored on disk. But isn't realpath modeled after POSIX realpath(3)? realpath(3) doesn't seem to clearly guarantee the original name as stored on disk either. However realpath(3) on OSX 10.6 with case-insensitive HFS+ does return the original name as it was stored. Do any other platforms do this and do we care about maintaining parity with realpath(3)? -- Philip Jenvey From greg.ewing at canterbury.ac.nz Sun Sep 26 01:02:00 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 26 Sep 2010 11:02:00 +1200 Subject: [Python-ideas] reusing realpath() In-Reply-To: <20100925161157.17059398@pitrou.net> References: <4C9531A7.10405@simplistix.co.uk> <4C9C79DA.7000506@simplistix.co.uk> <20100924121737.309071FA5C2@kimball.webabinitio.net> <4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com> <4C9D56B6.9050908@canterbury.ac.nz> <20100925121142.74fe35e1@pitrou.net> <87vd5t3oti.fsf_-_@benfinney.id.au> <20100925161157.17059398@pitrou.net> Message-ID: <4C9E7F68.9030308@canterbury.ac.nz> Antoine Pitrou wrote: > So what? The behaviour of fetching the canonical name can be added to > the behaviour of resolving symlinks. Finding the actual name (I wouldn't call it "canonical", since that term could be ambiguous) requires reading the contents of entire directories at each step, which could be noticeably less efficient than what realpath() currently does. Users who only want symlinks expanded might object to that. An option could be added to realpath(), but then we're into constant-parameter territory. -- Greg From ncoghlan at gmail.com Sun Sep 26 10:17:49 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 26 Sep 2010 18:17:49 +1000 Subject: [Python-ideas] reusing realpath() In-Reply-To: <4C9E7F68.9030308@canterbury.ac.nz> References: <4C9531A7.10405@simplistix.co.uk> <4C9C79DA.7000506@simplistix.co.uk> <20100924121737.309071FA5C2@kimball.webabinitio.net> <4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com> <4C9D56B6.9050908@canterbury.ac.nz> <20100925121142.74fe35e1@pitrou.net> <87vd5t3oti.fsf_-_@benfinney.id.au> <20100925161157.17059398@pitrou.net> <4C9E7F68.9030308@canterbury.ac.nz> Message-ID: On Sun, Sep 26, 2010 at 9:02 AM, Greg Ewing wrote: > Antoine Pitrou wrote: > >> So what? The behaviour of fetching the canonical name can be added to >> the behaviour of resolving symlinks. > > Finding the actual name (I wouldn't call it "canonical", > since that term could be ambiguous) requires reading the > contents of entire directories at each step, which could > be noticeably less efficient than what realpath() currently > does. Users who only want symlinks expanded might object > to that. > > An option could be added to realpath(), but then we're > into constant-parameter territory. Constant parameter territory isn't *necessarily* a bad thing if the number of parameters is sufficiently high. In particular, if you have one basic command (say, "give me the canonical path for this possibly-non-canonical path I already have") with a gazillion different variants (*ahem*), then a single function with well-named boolean parameters (to explain "this is what I really mean by 'canonical path'") is likely to be much easier for people to remember than trying to create a concise-yet-meaningful mnemonic for each variant. So we shouldn't dismiss out of hand the idea of a keyword-only "swiss-army" path normalisation function that can at least be queried via help() if you forget the exact spelling for the various parameters. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From dickinsm at gmail.com Sun Sep 26 13:05:11 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 26 Sep 2010 12:05:11 +0100 Subject: [Python-ideas] Including elementary mathematical functions in the python data model In-Reply-To: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com> References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com> Message-ID: On Tue, Sep 21, 2010 at 7:44 PM, Michael Gilbert wrote: > It would be really nice if elementary mathematical operations such as > sin/cosine (via __sin__ and __cos__) were available as base parts of > the python data model [0]. ?This would make it easier to write new math > classes, and it would eliminate the ugliness of things like self.exp(). > > This would also eliminate the need for separate math and cmath > libraries since those could be built into the default float and complex > types. Hmm. Are you proposing adding 'sin', 'cos', etc. as new builtins? If so, I think this is a nonstarter: the number of Python builtins is deliberately kept quite small, and adding all these functions (we could argue about which ones, but it seems to me that you're talking about around 18 new builtins---e.g., 6 trig and inverse trig, 6 hyperbolic and inverse hyperbolic, exp, expm1, log, log10, log1p, sqrt) would enlarge it considerably. For many users, those functions would just be additional bloat in builtins, and there's possibility of confusion with existing variables with the same name ('log' seems like a particular candidate for this; 'sin' less likely, but who knows ;-). A less invasive proposal would be just to introduce __sin__, etc. magic methods and have math.sin delegate to .__sin__; i.e., have math.sin work in exactly the same way that math.floor and math.ceil currently work. That would be quite nice for e.g., the decimal module: you'd be able to write something like: from math import sqrt root = (-b + sqrt(b*b - 4*a*c)) / (2*a) to compute the root of a quadratic equation, and it would work regardless of whether a, b, c were Decimal instances or floats. I'm not sure how I feel about the entailed magic method explosion, though. Mark From ncoghlan at gmail.com Sun Sep 26 14:07:50 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 26 Sep 2010 22:07:50 +1000 Subject: [Python-ideas] Including elementary mathematical functions in the python data model In-Reply-To: References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com> Message-ID: On Sun, Sep 26, 2010 at 9:05 PM, Mark Dickinson wrote: > A less invasive proposal would be just to introduce __sin__, etc. > magic methods and have math.sin delegate to .__sin__; ?i.e., > have math.sin work in exactly the same way that math.floor and > math.ceil currently work. ?That would be quite nice for e.g., the > decimal module: ?you'd be able to write something like: > > from math import sqrt > root = (-b + sqrt(b*b - 4*a*c)) / (2*a) > > to compute the root of a quadratic equation, and it would work > regardless of whether a, b, c were Decimal instances or floats. > > I'm not sure how I feel about the entailed magic method explosion, though. Couple that with the extra function call overhead (since these wouldn't have real typeslots) and it still seems like a less than stellar idea. As another use case for solid, efficient generic function support though... great idea :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Sun Sep 26 14:25:29 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 26 Sep 2010 14:25:29 +0200 Subject: [Python-ideas] Including elementary mathematical functions in the python data model References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com> Message-ID: <20100926142529.79ffaabd@pitrou.net> On Sun, 26 Sep 2010 22:07:50 +1000 Nick Coghlan wrote: > On Sun, Sep 26, 2010 at 9:05 PM, Mark Dickinson wrote: > > A less invasive proposal would be just to introduce __sin__, etc. > > magic methods and have math.sin delegate to .__sin__; ?i.e., > > have math.sin work in exactly the same way that math.floor and > > math.ceil currently work. ?That would be quite nice for e.g., the > > decimal module: ?you'd be able to write something like: > > > > from math import sqrt > > root = (-b + sqrt(b*b - 4*a*c)) / (2*a) > > > > to compute the root of a quadratic equation, and it would work > > regardless of whether a, b, c were Decimal instances or floats. > > > > I'm not sure how I feel about the entailed magic method explosion, though. > > Couple that with the extra function call overhead (since these > wouldn't have real typeslots) and it still seems like a less than > stellar idea. > > As another use case for solid, efficient generic function support > though... great idea :) At the cost of even more execution overhead? :) Regards Antoine. From ncoghlan at gmail.com Sun Sep 26 14:34:06 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 26 Sep 2010 22:34:06 +1000 Subject: [Python-ideas] Including elementary mathematical functions in the python data model In-Reply-To: <20100926142529.79ffaabd@pitrou.net> References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com> <20100926142529.79ffaabd@pitrou.net> Message-ID: On Sun, Sep 26, 2010 at 10:25 PM, Antoine Pitrou wrote: >> Couple that with the extra function call overhead (since these >> wouldn't have real typeslots) and it still seems like a less than >> stellar idea. >> >> As another use case for solid, efficient generic function support >> though... great idea :) > > At the cost of even more execution overhead? :) I did put that "efficient" in there for a reason! Now, I'm not saying anything about how *reasonable* that idea is, but I can dream ;) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Sun Sep 26 14:38:46 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 26 Sep 2010 14:38:46 +0200 Subject: [Python-ideas] Including elementary mathematical functions in the python data model References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com> <20100926142529.79ffaabd@pitrou.net> Message-ID: <20100926143846.7021d807@pitrou.net> On Sun, 26 Sep 2010 22:34:06 +1000 Nick Coghlan wrote: > On Sun, Sep 26, 2010 at 10:25 PM, Antoine Pitrou wrote: > >> Couple that with the extra function call overhead (since these > >> wouldn't have real typeslots) and it still seems like a less than > >> stellar idea. > >> > >> As another use case for solid, efficient generic function support > >> though... great idea :) > > > > At the cost of even more execution overhead? :) > > I did put that "efficient" in there for a reason! Now, I'm not saying > anything about how *reasonable* that idea is, but I can dream ;) Well, I can't see how it could be less than the overhead involved in a sqrt(x) -> x.__sqrt__() indirection anyway. When I read Mark's example, I wondered why he didn't simply write x**0.5 instead of sqrt(x), but it turns out it doesn't work on decimals. cheers Antoine. From masklinn at masklinn.net Sun Sep 26 14:33:14 2010 From: masklinn at masklinn.net (Masklinn) Date: Sun, 26 Sep 2010 14:33:14 +0200 Subject: [Python-ideas] Including elementary mathematical functions in the python data model In-Reply-To: References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com> Message-ID: <369EA2B0-54ED-4204-96F9-408C4B8CB5BE@masklinn.net> On 2010-09-26, at 14:07 , Nick Coghlan wrote: > On Sun, Sep 26, 2010 at 9:05 PM, Mark Dickinson wrote: >> A less invasive proposal would be just to introduce __sin__, etc. >> magic methods and have math.sin delegate to .__sin__; i.e., >> have math.sin work in exactly the same way that math.floor and >> math.ceil currently work. That would be quite nice for e.g., the >> decimal module: you'd be able to write something like: >> >> from math import sqrt >> root = (-b + sqrt(b*b - 4*a*c)) / (2*a) >> >> to compute the root of a quadratic equation, and it would work >> regardless of whether a, b, c were Decimal instances or floats. >> >> I'm not sure how I feel about the entailed magic method explosion, though. > > Couple that with the extra function call overhead (since these > wouldn't have real typeslots) and it still seems like a less than > stellar idea. > > As another use case for solid, efficient generic function support > though... great idea :) > > Cheers, > Nick. Couldn't that also be managed via ABCs for numerical types? Make sqrt & al methods of those types, and roll out in the sunset, no? The existing `math` functions could check on the presence of those methods (or the input types being instances of the ABCs they need), and fall back on the current implementations if they don't match. From jason.orendorff at gmail.com Sun Sep 26 17:48:35 2010 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Sun, 26 Sep 2010 10:48:35 -0500 Subject: [Python-ideas] Including elementary mathematical functions in the python data model In-Reply-To: References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com> Message-ID: On Sun, Sep 26, 2010 at 7:07 AM, Nick Coghlan wrote: > On Sun, Sep 26, 2010 at 9:05 PM, Mark Dickinson wrote: >> A less invasive proposal would be just to introduce __sin__, etc. >> magic methods [...] >> >> I'm not sure how I feel about the entailed magic method explosion, though. > > Couple that with the extra function call overhead (since these > wouldn't have real typeslots) and it still seems like a less than > stellar idea. This could certainly be implemented so as to be fast for floats and flexible for everything else. -j From greg.ewing at canterbury.ac.nz Mon Sep 27 00:21:47 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 27 Sep 2010 10:21:47 +1200 Subject: [Python-ideas] reusing realpath() In-Reply-To: References: <4C9531A7.10405@simplistix.co.uk> <4C9C79DA.7000506@simplistix.co.uk> <20100924121737.309071FA5C2@kimball.webabinitio.net> <4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com> <4C9D56B6.9050908@canterbury.ac.nz> <20100925121142.74fe35e1@pitrou.net> <87vd5t3oti.fsf_-_@benfinney.id.au> <20100925161157.17059398@pitrou.net> <4C9E7F68.9030308@canterbury.ac.nz> Message-ID: <4C9FC77B.1000104@canterbury.ac.nz> Nick Coghlan wrote: > Constant parameter territory isn't *necessarily* a bad thing if the > number of parameters is sufficiently high. That's true, but the number of parameters wouldn't be high in this case. -- Greg From greg.ewing at canterbury.ac.nz Mon Sep 27 00:29:19 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 27 Sep 2010 10:29:19 +1200 Subject: [Python-ideas] Including elementary mathematical functions in the python data model In-Reply-To: References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com> Message-ID: <4C9FC93F.9020708@canterbury.ac.nz> Nick Coghlan wrote: > Couple that with the extra function call overhead (since these > wouldn't have real typeslots) and it still seems like a less than > stellar idea. > > As another use case for solid, efficient generic function support > though... great idea :) Could a generic function mechanism be made to have any less overhead, though? -- Greg From ncoghlan at gmail.com Mon Sep 27 14:15:27 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 27 Sep 2010 22:15:27 +1000 Subject: [Python-ideas] reusing realpath() In-Reply-To: <4C9FC77B.1000104@canterbury.ac.nz> References: <4C9531A7.10405@simplistix.co.uk> <4C9C79DA.7000506@simplistix.co.uk> <20100924121737.309071FA5C2@kimball.webabinitio.net> <4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com> <4C9D56B6.9050908@canterbury.ac.nz> <20100925121142.74fe35e1@pitrou.net> <87vd5t3oti.fsf_-_@benfinney.id.au> <20100925161157.17059398@pitrou.net> <4C9E7F68.9030308@canterbury.ac.nz> <4C9FC77B.1000104@canterbury.ac.nz> Message-ID: On Mon, Sep 27, 2010 at 8:21 AM, Greg Ewing wrote: > Nick Coghlan wrote: > >> Constant parameter territory isn't *necessarily* a bad thing if the >> number of parameters is sufficiently high. > > That's true, but the number of parameters wouldn't be > high in this case. How high is high enough? Just in realpath, normpath, normcase we already have 3 options, with the "match the existing case-preserving filename if it exists" variant requested in this discussion making it 4. Supporting platform appropriate Unicode normalisation would make it 5. Note that I'm not saying the swiss-army function is necessarily the right answer here, but remembering "use os.realpath to get canonical filenames" and then having a bunch of flags to enable/disable various aspects of the normalisation (defaulting to the current implementation of only expanding symlinks) fits my brain more easily than remembering the distinctions between the tasks that currently correspond to each function name. If there really isn't a name that makes sense for the new variant, then maybe adding some constant parameters to one of the existing methods is the way to go. realpath and normpath are the two most likely candidates to use as a basis for such an approach. If realpath was used as a basis, then it would gain keyword-only parameters along the lines of "expand_links=True", "collapse=False", "lower_case=False", "match_case=False". Setting both lower_case=True and match_case=True would trigger ValueError, but the API with separate boolean flags is easier to use than one with a single tri-state parameter for the case conversion. If normcase was used as a basis instead, then symlink expansion would remain a separate operation and normpath would gain "collapse=True", "lower_case=False", "match_case=False" as keyword-only parameters. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Mon Sep 27 14:20:14 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 27 Sep 2010 22:20:14 +1000 Subject: [Python-ideas] Including elementary mathematical functions in the python data model In-Reply-To: <4C9FC93F.9020708@canterbury.ac.nz> References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com> <4C9FC93F.9020708@canterbury.ac.nz> Message-ID: On Mon, Sep 27, 2010 at 8:29 AM, Greg Ewing wrote: > Nick Coghlan wrote: > >> Couple that with the extra function call overhead (since these >> wouldn't have real typeslots) and it still seems like a less than >> stellar idea. >> >> As another use case for solid, efficient generic function support >> though... great idea :) > > Could a generic function mechanism be made to have any > less overhead, though? See my response to Antoine - probably not. Although, as has been pointed out by others, by doing the check for PyFloat_CheckExact early and running the fast path immediately if that check passes, you can avoid most of the overhead in the common case, even when using pseudo-typeslots. So performance impact likely isn't a major factor here after all. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From denis.spir at gmail.com Tue Sep 28 10:27:07 2010 From: denis.spir at gmail.com (spir) Date: Tue, 28 Sep 2010 10:27:07 +0200 Subject: [Python-ideas] multiline string notation Message-ID: <20100928102707.5b3467ac@o> Hello, multiline string By recently studying a game scripting language (*), and designing a toy language of mine, I realised the following 2 facts, that may be relevant for python as well: -1- no need for a separate multiline string notation A single string format can deal text including newlines, without any syntactic or parsing (**) issue: a string notation just ends with the second quote. No idea why python introduced that distinction (and would like to know it); possibly for historic reason? The only advantage of """...""" seems to be that this format allows literal quotes in strings; am I right on this? -2- trimming of indentation On my computer, calling the following function: def write(): if True: print """To be or not to be, that is the question.""" results in the following output: |To be or not to be, | that is the question. This is certainly not the programmer's intent. To get what is expected, one should write instead: def write(): if True: print """To be or not to be, that is the question.""" ...which distorts the visual presentation of code by breaking correct indentation. To have a multiline text written on multiple lines and preserve indentation, one needs to use more complicated forms like: def write(): if True: print "To be or not to be,\n" + \ "that is the question." (Actually, the '+' can be here omitted, but this fact is not commonly known.) My project uses a visual structure ? la python (and no curly braces). Indentation is removed by the arser from the significant part of code even inside strings (and also comments). This allows the programmer preserving clean source outline, while having multiline text be simply written as is. In other words, the following routine would work as you guess (':' is assignment sign): write : action if true terminal.write "To be or not to be, that is the question." I imagine the python parser replaces indentation by block-delimiting tokens (analog in role to C braces). My language's parser thus has a preprocessing phase that would transform the above piece of code above to: write : action { if true { terminal.write "To be or not to be, that is the question." } } The preprocess routine is actually easier than it would be with python rules, since one can trim indents systematically, without any exception for strings (and comments). Thank you for reading, Denis (*) namely WML, scripting language of the game called Wesnoth (**) This is true for 1-pass parsers (like PEG), as well as for 2-pass ones (with separate lexical phase). -- -- -- -- -- -- -- vit esse estrany ? spir.wikidot.com From mwm-keyword-python.b4bdba at mired.org Tue Sep 28 10:58:42 2010 From: mwm-keyword-python.b4bdba at mired.org (Mike Meyer) Date: Tue, 28 Sep 2010 04:58:42 -0400 Subject: [Python-ideas] multiline string notation In-Reply-To: <20100928102707.5b3467ac@o> References: <20100928102707.5b3467ac@o> Message-ID: <20100928045842.346bb9d0@bhuda.mired.org> On Tue, 28 Sep 2010 10:27:07 +0200 spir wrote: > Hello, > > > > multiline string > > By recently studying a game scripting language (*), and designing a toy language of mine, I realised the following 2 facts, that may be relevant for python as well: > > > > -1- no need for a separate multiline string notation > > A single string format can deal text including newlines, without any syntactic or parsing (**) issue: a string notation just ends with the second quote. > No idea why python introduced that distinction (and would like to know it); possibly for historic reason? The only advantage of """...""" seems to be that this format allows literal quotes in strings; am I right on this? No, you're not. The ' form allows literal "'s, and vice versa. The reason for the triple-quoted string is to allow simple multi-line string literals. The reason you want both single and multi-line string literals is so the parser can properly flag the error line when you forget to terminate the far more common single-line literal. Not as important now that nearly everything does syntax coloring, but still a nice feature. > -2- trimming of indentation > > On my computer, calling the following function: > def write(): > if True: > print """To be or not to be, > that is the question.""" > results in the following output: > |To be or not to be, > | that is the question. > This is certainly not the programmer's intent. To get what is expected, one should write instead: > def write(): > if True: > print """To be or not to be, > that is the question.""" > ...which distorts the visual presentation of code by breaking correct indentation. > To have a multiline text written on multiple lines and preserve indentation, one needs to use more complicated forms like: > def write(): > if True: > print "To be or not to be,\n" + \ > "that is the question." > (Actually, the '+' can be here omitted, but this fact is not commonly known.) And in 3.x, where print is a function instead of a statement, it could be (leaving off the optional "+"): def write(): if True: print("To be or not to be,\n" "that is the question.") So -1 for this idea. http://www.mired.org/consulting.html Independent Network/Unix/Perforce consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From ncoghlan at gmail.com Tue Sep 28 12:49:04 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 28 Sep 2010 20:49:04 +1000 Subject: [Python-ideas] multiline string notation In-Reply-To: <20100928102707.5b3467ac@o> References: <20100928102707.5b3467ac@o> Message-ID: These two questions are ones where good arguments can be made in both directions. Having explicit notation for multi-line strings is primarily a benefit for readability and error detection. The readability benefit is that it flags to the reader that the next string literal may cover several lines. As Mike noted, the error detection benefit is that the parser can more readily detect a missing end-quote from a normal string instead of inadvertently treating the entire rest of the file as part of the string and giving a relatively useless error regarding EOF while parsing a string. Stripping leading whitespace even inside strings is potentially convenient for the programmer, but breaks the tokenisation stream. String literals are meant to be atomic. Having the parser digging inside them to declare certain whitespace to not be part of the string despite its presence in the source code is certainly a valid design choice a language could make when defining its grammar, but would actually be a fairly significant change for Python. For Python, these two rules are a case of "status quo wins a stalemate". Changing Python's behaviour in this area would be difficult and time-consuming for negligible benefit, so it really isn't worth doing. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From taleinat at gmail.com Tue Sep 28 12:57:08 2010 From: taleinat at gmail.com (Tal Einat) Date: Tue, 28 Sep 2010 12:57:08 +0200 Subject: [Python-ideas] multiline string notation In-Reply-To: <20100928102707.5b3467ac@o> References: <20100928102707.5b3467ac@o> Message-ID: > > -2- trimming of indentation > > On my computer, calling the following function: > def write(): > if True: > print """To be or not to be, > that is the question.""" > results in the following output: > |To be or not to be, > | that is the question. > This is certainly not the programmer's intent. To get what is expected, one > should write instead: > def write(): > if True: > print """To be or not to be, > that is the question.""" > ...which distorts the visual presentation of code by breaking correct > indentation. > To have a multiline text written on multiple lines and preserve > indentation, one needs to use more complicated forms like: > def write(): > if True: > print "To be or not to be,\n" + \ > "that is the question." > (Actually, the '+' can be here omitted, but this fact is not commonly > known.) > > Have you heard of textwrap.dedent()? I usually would write this as: def write(): if True: print textwrap.dedent("""\ To be or not to be, that is the question.""") - Tal -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Sep 28 14:57:04 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 28 Sep 2010 14:57:04 +0200 Subject: [Python-ideas] Prefetching on buffered IO files References: <20100928004119.3963a4ad@pitrou.net> Message-ID: <20100928145704.2fb2e382@pitrou.net> Hello, (moved to python-ideas) On Mon, 27 Sep 2010 17:39:45 -0700 Guido van Rossum wrote: > On Mon, Sep 27, 2010 at 3:41 PM, Antoine Pitrou wrote: > > While trying to solve #3873 (poor performance of pickle on file > > objects, due to the overhead of calling read() with very small values), > > it occurred to me that the prefetching facilities offered by > > BufferedIOBase are not flexible and efficient enough. > > I haven't read the whole bug but there seem to be lots of different > smaller issues there, right? The bug entry is quite old and at first the slowness had to do with the pure Python IO layer. Now the remaining performance difference with Python 2 is entirely caused by the following core issue: > It seems that one (unfortunate) > constraint is that reading pickles cannot use buffered I/O (at least > not on a non-seekable file) because the API has been documented to > leave the file positioned right after the last byte of the pickled > data, right? Right. > > Indeed, if you use seek() and read(), 1) you limit yourself to seekable > > files 2) performance can be hampered by very bad seek() performance > > (this is true on GzipFile). > > Ow... I've always assumed that seek() is essentially free, because > that's how a typical OS kernel implements it. If seek() is bad on > GzipFile, how hard would it be to fix this? The worst case is backwards seeks. Forward seeks are implemented as a simply read(), which makes them O(k) where k is the displacement. For buffering applications where k is bounded by the buffer size, it is O(1) (still with, of course, a non-trivial multiplier). Backwards seeks are implemented as rewinding the whole file (seek(0)) and then reading again up to the requested position, which makes them O(n) with n the absolute target position. When your requirement is to rewind by a bounded number of bytes in order to undo some readahead, this is rather catastrophic. I don't know how the gzip algorithm works under the hood; my impression is that optimizing backwards seeks would have us save us checkpoints of the decompressor state and restore it if needed. It doesn't sound like a trivial improvement, and would involve tradeoffs w.r.t. to performance of sequential reads. (I haven't looked at BZ2File, which has a totally different -- and outdated -- implementation) It's why I would favour the peek() (or peek()-like, as in the prefetch() idea) approach anyway. Not only it works on unseekable files, but implementing peek() when you have an internal buffer is quite simple (see GzipFile.peek here: http://bugs.python.org/issue9962). peek() could also be added to BytesIO even though it claims to implement RawIOBase rather than BufferedIOBase. (buf of course, when you have a BytesIO, you can simply feed its getvalue() or getbuffer() directly to pickle.loads) > How common is the use case where you need to read a gzipped pickle > *and* you need to leave the unzipped stream positioned exactly at the > end of the pickle? I really don't know. But I don't think we can break the API for a special case without potentially causing nasty surprises for the user. Also, my intuition is that pickling directly from a stream is partly meant for cases where you want to access data following the pickle data in the stream. > > If instead you use peek() and read(), the situation is better, but you > > end up doing multiple copies of data; also, you must call read() to > > advance the file pointer even though you don't care about the results. > > Have you measured how bad the situation is if you do implement it this way? It is actually quite good compared to the statu quo (3x to 10x), and as good as the seek/read solution for regular files (and, of course, much better for gzipped files once GzipFile.peek is implemented): http://bugs.python.org/issue3873#msg117483 So, for solving the unpickle performance issue, it is sufficient. Chances are the bottleneck for further improvements would be in the unpickling logic itself. It feels a bit clunky, though. Direct timing shows that peek()+read() has a non-trivial cost compared to read(): $ ./python -m timeit -s "f=open('Misc/HISTORY', 'rb')" "f.seek(0)" \ "while f.read(4096): pass" 1000 loops, best of 3: 277 usec per loop $ ./python -m timeit -s "f=open('Misc/HISTORY', 'rb')" "f.seek(0)" \ "while f.read(4096): f.peek(4096)" 1000 loops, best of 3: 361 usec per loop (that's on a C extension type where peek() is almost a single call to PyBytes_FromStringAndSize) > > So I would propose adding the following method to BufferedIOBase: > > > > prefetch(self, buffer, skip, minread) > > > > Skip `skip` bytes from the stream. ?Then, try to read at > > least `minread` bytes and write them into `buffer`. The file > > pointer is advanced by at most `skip + minread`, or less if > > the end of file was reached. The total number of bytes written > > in `buffer` is returned, which can be more than `minread` > > if additional bytes could be prefetched (but, of course, > > cannot be more than `len(buffer)`). > > > > Arguments: > > - `buffer`: a writable buffer (e.g. bytearray) > > - `skip`: number of bytes to skip (must be >= 0) > > - `minread`: number of bytes to read (must be >= 0 and <= len(buffer)) > > I like the idea of an API that combines seek and read into a mutable > buffer. However the semantics of this call seem really weird: there is > no direct relationship between where it leaves the stream position and > how much data it reads into the buffer. can you explain how exactly > this will help solve the gzipped pickle performance problem? The general idea with buffering is that: - you want to skip the previously prefetched bytes (through peek() or prefetch()) which have been consumed -> hence the `skip` argument - you want to consume a known number of bytes from the stream (for example a 4-bytes little-endian integer) -> hence the `minread` argument - you would like to prefetch some more bytes if cheaply possible, so as to avoid calling read() or prefetch() too much; but you don't know yet if you will consume those bytes, so the file pointer shouldn't be advanced for them If you don't prefetch more than the minimum needed amount of bytes, you don't solve the performance problem at all (unpickling needs many tiny reads). If you advance the file pointer after the whole prefetched data (even though it might not be entirely consumed), you need to seek() back at the end: it doesn't work on unseekable files, and is very slow on some seekable file types. So, the proposal is like a combination of forward seek() + read() + peek() in a single call. With the advantages that: - it works on non-seekable files (things like SocketIO) - it allows the caller to operate in its own buffer (this is nice in C) - it returns the data naturally concatenated, so you don't have to do it yourself if needed - it gives more guarantees than peek() as to the min and max number of bytes returned; peek(), as it is not allowed to advance the file pointer, can return as little as 1 byte (even if you ask for 4096, and even if EOF isn't reached) I also find it interesting that implementing a single primitive be enough for creating custom buffered types (by deriving other methods from it), but the aesthetics of this can be controversial. Regards Antoine. From guido at python.org Tue Sep 28 16:08:08 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Sep 2010 07:08:08 -0700 Subject: [Python-ideas] Prefetching on buffered IO files In-Reply-To: <20100928145704.2fb2e382@pitrou.net> References: <20100928004119.3963a4ad@pitrou.net> <20100928145704.2fb2e382@pitrou.net> Message-ID: On Tue, Sep 28, 2010 at 5:57 AM, Antoine Pitrou wrote: > On Mon, 27 Sep 2010 17:39:45 -0700 > Guido van Rossum wrote: >> On Mon, Sep 27, 2010 at 3:41 PM, Antoine Pitrou wrote: >> > While trying to solve #3873 (poor performance of pickle on file >> > objects, due to the overhead of calling read() with very small values), >> > it occurred to me that the prefetching facilities offered by >> > BufferedIOBase are not flexible and efficient enough. >> >> I haven't read the whole bug but there seem to be lots of different >> smaller issues there, right? > > The bug entry is quite old and at first the slowness had to do with the > pure Python IO layer. Now the remaining performance difference with > Python 2 is entirely caused by the following core issue: > >> It seems that one (unfortunate) >> constraint is that reading pickles cannot use buffered I/O (at least >> not on a non-seekable file) because the API has been documented to >> leave the file positioned right after the last byte of the pickled >> data, right? > > Right. > >> > Indeed, if you use seek() and read(), 1) you limit yourself to seekable >> > files 2) performance can be hampered by very bad seek() performance >> > (this is true on GzipFile). >> >> Ow... I've always assumed that seek() is essentially free, because >> that's how a typical OS kernel implements it. If seek() is bad on >> GzipFile, how hard would it be to fix this? > > The worst case is backwards seeks. Forward seeks are implemented as a > simply read(), which makes them O(k) where k is the displacement. For > buffering applications where k is bounded by the buffer size, it is > O(1) (still with, of course, a non-trivial multiplier). > > Backwards seeks are implemented as rewinding the whole file (seek(0)) > and then reading again up to the requested position, which makes them > O(n) with n the absolute target position. When your requirement is to > rewind by a bounded number of bytes in order to undo some readahead, > this is rather catastrophic. > > I don't know how the gzip algorithm works under the hood; my impression > is that optimizing backwards seeks would have us save us checkpoints of > the decompressor state and restore it if needed. It doesn't sound like a > trivial improvement, and would involve tradeoffs w.r.t. to > performance of sequential reads. > > ?(I haven't looked at BZ2File, which has a totally different -- and > ?outdated -- implementation) > > It's why I would favour the peek() (or peek()-like, as in the prefetch() > idea) approach anyway. Not only it works on unseekable files, but > implementing peek() when you have an internal buffer is quite simple > (see GzipFile.peek here: http://bugs.python.org/issue9962). > > peek() could also be added to BytesIO even though it claims to > implement RawIOBase rather than BufferedIOBase. > (buf of course, when you have a BytesIO, you can simply feed its > getvalue() or getbuffer() directly to pickle.loads) > >> How common is the use case where you need to read a gzipped pickle >> *and* you need to leave the unzipped stream positioned exactly at the >> end of the pickle? > > I really don't know. But I don't think we can break the API for a > special case without potentially causing nasty surprises for the user. > > Also, my intuition is that pickling directly from a stream is partly > meant for cases where you want to access data following the pickle > data in the stream. > >> > If instead you use peek() and read(), the situation is better, but you >> > end up doing multiple copies of data; also, you must call read() to >> > advance the file pointer even though you don't care about the results. >> >> Have you measured how bad the situation is if you do implement it this way? > > It is actually quite good compared to the statu quo (3x to 10x), and as > good as the seek/read solution for regular files (and, of course, much > better for gzipped files once GzipFile.peek is implemented): > http://bugs.python.org/issue3873#msg117483 > > So, for solving the unpickle performance issue, it is sufficient. > Chances are the bottleneck for further improvements would be in the > unpickling logic itself. It feels a bit clunky, though. > > Direct timing shows that peek()+read() has a non-trivial cost compared > to read(): > > $ ./python -m timeit -s "f=open('Misc/HISTORY', 'rb')" "f.seek(0)" \ > ?"while f.read(4096): pass" > 1000 loops, best of 3: 277 usec per loop > $ ./python -m timeit -s "f=open('Misc/HISTORY', 'rb')" "f.seek(0)" \ > ?"while f.read(4096): f.peek(4096)" > 1000 loops, best of 3: 361 usec per loop > > (that's on a C extension type where peek() is almost a single call to > PyBytes_FromStringAndSize) > >> > So I would propose adding the following method to BufferedIOBase: >> > >> > prefetch(self, buffer, skip, minread) >> > >> > Skip `skip` bytes from the stream. ?Then, try to read at >> > least `minread` bytes and write them into `buffer`. The file >> > pointer is advanced by at most `skip + minread`, or less if >> > the end of file was reached. The total number of bytes written >> > in `buffer` is returned, which can be more than `minread` >> > if additional bytes could be prefetched (but, of course, >> > cannot be more than `len(buffer)`). >> > >> > Arguments: >> > - `buffer`: a writable buffer (e.g. bytearray) >> > - `skip`: number of bytes to skip (must be >= 0) >> > - `minread`: number of bytes to read (must be >= 0 and <= len(buffer)) >> >> I like the idea of an API that combines seek and read into a mutable >> buffer. However the semantics of this call seem really weird: there is >> no direct relationship between where it leaves the stream position and >> how much data it reads into the buffer. can you explain how exactly >> this will help solve the gzipped pickle performance problem? > > The general idea with buffering is that: > - you want to skip the previously prefetched bytes (through peek() > ?or prefetch()) which have been consumed -> hence the `skip` argument > - you want to consume a known number of bytes from the stream (for > ?example a 4-bytes little-endian integer) -> hence the `minread` > ?argument > - you would like to prefetch some more bytes if cheaply possible, so as > ?to avoid calling read() or prefetch() too much; but you don't know > ?yet if you will consume those bytes, so the file pointer shouldn't be > ?advanced for them > > If you don't prefetch more than the minimum needed amount of bytes, you > don't solve the performance problem at all (unpickling needs many tiny > reads). If you advance the file pointer after the whole prefetched data > (even though it might not be entirely consumed), you need to seek() > back at the end: it doesn't work on unseekable files, and is very slow > on some seekable file types. > > So, the proposal is like a combination of forward seek() + read() + > peek() in a single call. With the advantages that: > - it works on non-seekable files (things like SocketIO) > - it allows the caller to operate in its own buffer (this is nice in C) > - it returns the data naturally concatenated, so you don't have to do > ?it yourself if needed > - it gives more guarantees than peek() as to the min and max number of > ?bytes returned; peek(), as it is not allowed to advance the file > ?pointer, can return as little as 1 byte (even if you ask for 4096, > ?and even if EOF isn't reached) > > I also find it interesting that implementing a single primitive be > enough for creating custom buffered types (by deriving other methods > from it), but the aesthetics of this can be controversial. Thanks for the long explanation. I have some further questions: It seems this won't make any difference for a truly unbuffered stream, right? A truly unbuffered stream would not have a buffer where it could save the bytes that were prefetched past the stream position, so it wouldn't return any optional extra bytes, so there would be no speedup. And for a buffered stream, it would be much simpler to just read ahead in large chunks and seek back once you've found the end. (Actually for a buffered stream I suppose that many short read() and small seek() calls aren't actually slow since most of the time they work within the buffer.) So it seems the API is specifically designed to improve the situation with GzipFile since it maintains the fiction of an unbuffered file but in fact has some internal buffer space. I wonder if it wouldn't be better to add an extra buffer to GzipFile so small seek() and read() calls can be made more efficient? In fact, this makes me curious as to the use that unpickling can make of the prefetch() call -- I suppose you had to implement some kind of layer on top of prefetch() that behaves more like a plain unbuffered file? I want to push back on this more, primarily because a new primitive I/O operation has high costs: it can never be removed, it has to be added to every stream implementation, developers need to learn to use the new operation, and so on. A local change that only affects GzipFile doesn't have any of these problems. Also, if you can believe the multi-core crowd, a very different possible future development might be to run the gunzip algorithm and the unpickle algorithm in parallel, on separate cores. Truly such a solution would require totally *different* new I/O primitives, which might have a higher chance of being reusable outside the context of pickle. -- --Guido van Rossum (python.org/~guido) From daniel at stutzbachenterprises.com Tue Sep 28 16:26:30 2010 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Tue, 28 Sep 2010 09:26:30 -0500 Subject: [Python-ideas] [Python-Dev] Prefetching on buffered IO files In-Reply-To: <20100928004119.3963a4ad@pitrou.net> References: <20100928004119.3963a4ad@pitrou.net> Message-ID: On Mon, Sep 27, 2010 at 5:41 PM, Antoine Pitrou wrote: > While trying to solve #3873 (poor performance of pickle on file > objects, due to the overhead of calling read() with very small values), > After looking over the relevant code, it looks to me like the overhead of calling the read() method compared to calling fread() in Python 2 is the overhead of calling PyObject_Call along with the construction of argument tuples and deconstruction of the return value. I don't think the extra interface would benefit code written in Python as much. Even if Python code gets the data into a buffer more easily, it's going to pay those costs to manipulate the buffered data. It would mostly help modules written in C, such as pickle, which right now are heavily bottlenecked getting the data into a buffer. Comparing the C code for Python 2's cPickle and Python 3's pickle, I see that Python 2 has paths for unpickling from a FILE *, cStringIO, and "other". Python effectively only has a code path for "other", so it's not surprising that it's slower. In the worst case, I am sure that if we re-added specialized code paths that we could make it just as fast as Python 2, although that would make the code messy. Some ideas: - Use readinto() instead of read(), to avoid extra allocations/deallocations - But first, fix bufferediobase_readinto() so it doesn't work by calling the read() method and/or follow up on the TODO in buffered_readinto() If you want a new API, I think a new C API for I/O objects with C-friendly arguments would be better than a new Python-level API. In a nutshell, if you feel the need to make a buffer around BufferedReader, then I agree there's a problem, but I don't think helping you make a buffer around BufferedReader is the right solution. ;-) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Sep 28 16:32:49 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 28 Sep 2010 16:32:49 +0200 Subject: [Python-ideas] Prefetching on buffered IO files In-Reply-To: References: <20100928004119.3963a4ad@pitrou.net> <20100928145704.2fb2e382@pitrou.net> Message-ID: <1285684369.3141.22.camel@localhost.localdomain> Le mardi 28 septembre 2010 ? 07:08 -0700, Guido van Rossum a ?crit : > > Thanks for the long explanation. I have some further questions: > > It seems this won't make any difference for a truly unbuffered stream, > right? A truly unbuffered stream would not have a buffer where it > could save the bytes that were prefetched past the stream position, so > it wouldn't return any optional extra bytes, so there would be no > speedup. Indeed. But you can trivially wrap an unbuffered stream inside a BufferedReader, and get peek() even when the raw stream is unseekable. > And for a buffered stream, it would be much simpler to just > read ahead in large chunks and seek back once you've found the end. Well, no, only if your stream is seekable and seek() is fast enough. So, it wouldn't work on SocketIO for example (even wrapped inside a BufferedReader, since BufferedReader will refuse to seek() if seekable() returns False). > I > wonder if it wouldn't be better to add an extra buffer to GzipFile so > small seek() and read() calls can be made more efficient? The problem is that, since the buffer of the unpickler and the buffer of the GzipFile are not aware of each other, the unpickler could easily ask to seek() backwards past the current GzipFile buffer, and fall back on the slow algorithm. The "extra buffer" can trivially consist in wrapping the GzipFile inside a BufferedReader (which is actually recommended if you want e.g. very fast readlines()), but it doesn't solve the above issue. > In fact, this makes me curious as to the use that unpickling can make > of the prefetch() call -- I suppose you had to implement some kind of > layer on top of prefetch() that behaves more like a plain unbuffered > file? I didn't implement prefetch() at all. It would be prematurate :) But, if the stream had prefetch(), the unpickling would be simplified: I would only have to call prefetch() once when refilling the buffer, rather than two read()'s followed by a peek(). (I could try to coalesce the two reads, but it would complicate the code a bit more...) > I want to push back on this more, primarily because a new primitive > I/O operation has high costs: it can never be removed, it has to be > added to every stream implementation, developers need to learn to use > the new operation, and so on. I agree with this (except that most developers don't really need to learn to use it: common uses of readable files are content with read() and readline(), and need neither peek() nor prefetch()). I don't intend to push this for 3.2; I'm throwing the idea around with a hypothetical 3.3 landing if it seems useful. > Also, if you can believe the multi-core crowd, a very different > possible future development might be to run the gunzip algorithm and > the unpickle algorithm in parallel, on separate cores. Truly such a > solution would require totally *different* new I/O primitives, which > might have a higher chance of being reusable outside the context of > pickle. Well, it's a bit of a pie-in-the-sky perspective :) Furthermore, such a solution won't improve CPU efficiency, so if your workload is already able to utilize all CPU cores (which it can easily do if you are in a VM, or have multiple busy daemons), it doesn't bring anything. Regards Antoine. From solipsis at pitrou.net Tue Sep 28 17:06:44 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 28 Sep 2010 17:06:44 +0200 Subject: [Python-ideas] Prefetching on buffered IO files In-Reply-To: References: <20100928004119.3963a4ad@pitrou.net> Message-ID: <1285686404.3141.56.camel@localhost.localdomain> > I don't think the extra interface would benefit code written in Python > as much. Even if Python code gets the data into a buffer more > easily, it's going to pay those costs to manipulate the buffered data. > It would mostly help modules written in C, such as pickle, which right > now are heavily bottlenecked getting the data into a buffer. Right. It would, however, benefit /file objects/ written in Python (since the cost of calling a peek() written in pure Python is certainly significant compared to the cost of the actual peeking operation). > - But first, fix bufferediobase_readinto() so it doesn't work by > calling the read() method and/or follow up on the TODO in > buffered_readinto() Patches welcome :) > Comparing the C code for Python 2's cPickle and Python 3's pickle, I > see that Python 2 has paths for unpickling from a FILE *, cStringIO, > and "other". Python effectively only has a code path for "other", so > it's not surprising that it's slower. In the worst case, I am sure > that if we re-added specialized code paths that we could make it just > as fast as Python 2, although that would make the code messy. It would be very ugly, IMO. And it would still be slower than the clean solution, which is to have a buffer size big enough that the overhead of making a read() method call is dwarfed by the processing cost of the data (that's how TextIOWrapper works). (for the record, with the read()+peek() patch, unpickle is already faster than Python 2, but that's comparing apples to oranges because Python 3 got other unpickle optimizations in the meantime) > If you want a new API, I think a new C API for I/O objects with > C-friendly arguments would be better than a new Python-level API. I really think we should keep an unified API. A low-level C API would be difficult to get right, make implementations more complicated, and consumers would have to keep fallback code for objects not implementing the C API, which would complicate things on their side too. Conversely, one purpose of my prefetch() proposal, besides optimizing some workloads, is to *simplify* writing of buffered IO code. > In a nutshell, if you feel the need to make a buffer around > BufferedReader, then I agree there's a problem, but I don't think > helping you make a buffer around BufferedReader is the right > solution. ;-) In a layered approach, it's hard not to end up with multiple levels of buffering (think TextIOWrapper + BufferedReader + OS page-level caching) :) I agree that shared buffers sound more efficient but, again, I fear they would be a lot of work to get right. If you look at the BufferedReader code, it's already non-trivial, and bugs in this area can be really painful. Regards Antoine. From daniel at stutzbachenterprises.com Tue Sep 28 17:19:51 2010 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Tue, 28 Sep 2010 10:19:51 -0500 Subject: [Python-ideas] Prefetching on buffered IO files In-Reply-To: <1285686404.3141.56.camel@localhost.localdomain> References: <20100928004119.3963a4ad@pitrou.net> <1285686404.3141.56.camel@localhost.localdomain> Message-ID: On Tue, Sep 28, 2010 at 10:06 AM, Antoine Pitrou wrote: > > - But first, fix bufferediobase_readinto() so it doesn't work by > > calling the read() method and/or follow up on the TODO in > > buffered_readinto() > > Patches welcome :) I'm not likely to get to it soon, but I've opened Issue 9971 to at least keep track of it. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Sep 28 18:44:38 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Sep 2010 09:44:38 -0700 Subject: [Python-ideas] Prefetching on buffered IO files In-Reply-To: <1285684369.3141.22.camel@localhost.localdomain> References: <20100928004119.3963a4ad@pitrou.net> <20100928145704.2fb2e382@pitrou.net> <1285684369.3141.22.camel@localhost.localdomain> Message-ID: On Tue, Sep 28, 2010 at 7:32 AM, Antoine Pitrou wrote: [Guido] >> wonder if it wouldn't be better to add an extra buffer to GzipFile so >> small seek() and read() calls can be made more efficient? > > The problem is that, since the buffer of the unpickler and the buffer of > the GzipFile are not aware of each other, the unpickler could easily ask > to seek() backwards past the current GzipFile buffer, and fall back on > the slow algorithm. But AFAICT unpickle doesn't use seek()? [...] > But, if the stream had prefetch(), the unpickling would be simplified: I > would only have to call prefetch() once when refilling the buffer, > rather than two read()'s followed by a peek(). > > (I could try to coalesce the two reads, but it would complicate the code > a bit more...) Where exactly would the peek be used? (I must be confused because I can't find either peek or seek in _pickle.c.) It still seems to me that the "right" way to solve this would be to insert a transparent extra buffer somewhere, probably in the GzipFile code, and work in reducing the call overhead. >> I want to push back on this more, primarily because a new primitive >> I/O operation has high costs: it can never be removed, it has to be >> added to every stream implementation, developers need to learn to use >> the new operation, and so on. > > I agree with this (except that most developers don't really need to > learn to use it: common uses of readable files are content with read() > and readline(), and need neither peek() nor prefetch()). I don't intend > to push this for 3.2; I'm throwing the idea around with a hypothetical > 3.3 landing if it seems useful. So far it seems more awkward than useful. >> Also, if you can believe the multi-core crowd, a very different >> possible future development might be to run the gunzip algorithm and >> the unpickle algorithm in parallel, on separate cores. Truly such a >> solution would require totally *different* new I/O primitives, which >> might have a higher chance of being reusable outside the context of >> pickle. > > Well, it's a bit of a pie-in-the-sky perspective :) > Furthermore, such a solution won't improve CPU efficiency, so if your > workload is already able to utilize all CPU cores (which it can easily > do if you are in a VM, or have multiple busy daemons), it doesn't bring > anything. Agreed it's pie in the sky... Though the interface between the two CPUs might actually be designed to be faster than the current buffered I/O. I have (mostly :-) fond memories of async I/O on a mainframe I used in the '70s which worked this way. -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Tue Sep 28 22:33:39 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 28 Sep 2010 22:33:39 +0200 Subject: [Python-ideas] Prefetching on buffered IO files References: <20100928004119.3963a4ad@pitrou.net> <20100928145704.2fb2e382@pitrou.net> <1285684369.3141.22.camel@localhost.localdomain> Message-ID: <20100928223339.3f621915@pitrou.net> On Tue, 28 Sep 2010 09:44:38 -0700 Guido van Rossum wrote: > > But AFAICT unpickle doesn't use seek()? > > [...] > > But, if the stream had prefetch(), the unpickling would be simplified: I > > would only have to call prefetch() once when refilling the buffer, > > rather than two read()'s followed by a peek(). > > > > (I could try to coalesce the two reads, but it would complicate the code > > a bit more...) > > Where exactly would the peek be used? (I must be confused because I > can't find either peek or seek in _pickle.c.) peek/seek are not used currently (in SVN). Each of them is used in one of the prefetching approaches proposed to solve the unpickling performance problem. (the first approach uses seek() and read(), the second approach uses read() and peek(); as already explained, I tend to consider the second approach much better, and the prefetch() proposal comes in part from the experience gathered on that approach) > It still seems to me that the "right" way to solve this would be to > insert a transparent extra buffer somewhere, probably in the GzipFile > code, and work in reducing the call overhead. No, because if you don't have any buffering on the unpickling side (rather than the GzipFile or the BufferedReader side), then you still have the method call overhead no matter what. And this overhead is rather big when you're reading data byte per byte, or word per word (which unpickling very frequently does). (for the record, GzipFile already has an internal buffer. But calling GzipFile.read() still has a large overhead compared to reading data directly from a prefetch buffer inside the unpickler object) Regards Antoine.