From abarnert at yahoo.com Sat Jun 1 02:43:04 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 31 May 2013 17:43:04 -0700 (PDT) Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: <1370025834.12159.YahooMailNeo@web184706.mail.ne1.yahoo.com> <8ED3592D-5613-4412-B792-4CF9D170A722@masklinn.net> Message-ID: <1370047384.48409.YahooMailNeo@web184702.mail.ne1.yahoo.com> From: Masklinn Sent: Friday, May 31, 2013 1:45 PM > > On 2013-05-31, at 21:39 , Philipp A. wrote: > >> 2013/5/31 Masklinn >> >>> On 2013-05-31, at 20:43 , Andrew Barnert wrote: >>>> ? ? try: >>>> ? ? ? ? from lxml import etree as ET >>>> ? ? except ImportError: >>>> ? ? ? ? from xml.etree import ElementTree as ET >>>> >>>> Your registration mechanism would mean they don't have to do > this; they >>> just import from the stdlib, and if lxml is present and registered, it >>> would be loaded instead. >>> >>> That seems rife with potential issues and unintended side-effects e.g. >>> while lxml does for the most part provide ET's API, it also extends > it >>> and I do not know if it can run ET's testsuite. It also doesn't > handle >>> ET's implementation details for obvious reasons. >>> >> >> and that?s where my idea?s ?strict API? comes into play: compatible >> implementations would *have to* pass a test suite and implement a certain >> API and comply with the standard. > > But that's not sufficient is the issue here, as I tried to point out > when somebody uses ElementTree they may be using more than just the API, > they may well be relying on implementation details (e.g. namespace > behavior or the _namespace_map). In Philipp A.'s original suggestion, he made it explicitly clear that it should be easily to pick the best-installed, but also easy to pick a specific implementation. And?I brought up ElementTree specifically because the same is true there: some people just want the best-installed (the ones who are currently doing the try:/except ImportError:, or would be if they knew about it), others specifically want one or the other. Maybe I didn't make things clear enough with my examples, so let's look at one of them, dbm, in more depth. Plenty of code just wants any implementation of the core API. For that case, you import dbm (or anydbm, in 2.x). But there's also code that specifically needs the?traditional 100%-ndbm-compatible implementation. For that case, you import dbm.ndbm (or dbm, in 2.x). And there's code that specifically needs the extended functionality of gdbm, and is willing to deal with the possibility that it isn't available. For that case, you import dbm.gdbm (or gdbm in 2.x). So, what would be the equivalent for ElementTree? Code that just wants the best implementation or the core API can import xml.etree (or xml.etree.AnyElementTree, if you like the 2.x naming better).? Code that specifically needs the traditional implementation, e.g., to use _namespace_map, would import xml.etree.ElementTree. Code that specifically needs the extended functionality of lxml, and is willing to deal with the possibility that it isn't available, can import lxml.etree. Notice that this is no change from the present day at all for the latter two cases; the only thing it changes is the first case, which goes from a try/import/except ImportError/import to a simple import. And all existing code would continue to work exactly the same way it always has. So, unless your worry is that it will be an attractive nuisance causing people to change their code to import xml.etree and try to use _namespace_map anyway and not know why they're getting NameErrors, I don't understand what you're arguing about. From abarnert at yahoo.com Sat Jun 1 03:18:38 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 31 May 2013 18:18:38 -0700 (PDT) Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: <1370025834.12159.YahooMailNeo@web184706.mail.ne1.yahoo.com> <8ED3592D-5613-4412-B792-4CF9D170A722@masklinn.net> Message-ID: <1370049518.38392.YahooMailNeo@web184706.mail.ne1.yahoo.com> From: Philipp A. Sent: Friday, May 31, 2013 12:39 PM >?and that?s where my idea?s ?strict API? comes into play: compatible implementations would *have to* pass a test suite and implement a certain API and comply with the standard. > >unsure if and how to test the latter (surely running a testsuite when something wants to register isn?t practical ? or is it?) After sleeping on this question, I'm not sure the best-implementation wrapper really needs to be that dynamic. There are only so many YAML libraries out there, and the set doesn't change that rapidly. (And the same is true for dbm, ElementTree, etc.)? So, maybe we just put a static list in yaml, like?the one in dbm (http://hg.python.org/cpython/file/3.3/Lib/dbm/__init__.py#l41): _names = ['pyyaml', 'yaml.yaml']?If a new implementation reaches maturity, it goes through some process TBD, and in 3.5.2, we change that one line to?_names = ['pyyaml', 'yayaml', 'yaml.yaml'].?And that's how you "register" a new implementation. The only real downside I can see is that some people might stick to 3.5.1 for a long time after 3.5.2 is released (maybe because it comes pre-installed with OS X 10.9 or RHEL 7.2 ESR or something), but still want to accept yayaml.?If that's really an issue,?someone could put an "anyyaml" backport project on PyPI that offered the latest registered name list to older versions of Python (maybe even including 2.7). Is that good enough? From steve at pearwood.info Sat Jun 1 11:24:50 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 01 Jun 2013 19:24:50 +1000 Subject: [Python-ideas] Timing hefty or long-running blocks of code Message-ID: <51A9BDE2.4060201@pearwood.info> The timeit module is great for timing small code snippets, but it is rather inconvenient for timing larger blocks of code. It's also over-kill: under normal circumstances, there is little need for the heroic measures timeit goes through to accurately time long-running code. When people want to time something that doesn't require timeit, they have to write their own timer boilerplate: from time import time t = time() do_this() do_that() print(time() - t) Although only a few lines of boilerplate, this has a few problems, the biggest two being: * the user has to decide which of the many clocks provided by Python is suitable (hint: time.time is not necessarily the best); * this is inconvenient at the interactive interpreter, since the timer is started before the code being typed has been entered. I propose a context manager in the timeit module: with Stopwatch(): do_this() do_that() => results are automatically printed, or available for programmatic access I have had a recipe for this on ActiveState for about 18 months, where it is relatively popular: http://code.activestate.com/recipes/577896-benchmark-code-with-the-with-statement and I have extensively used it interactively, in Python 2.6 through 3.3 inclusive. The latest version can be found here: https://code.google.com/p/my-startup-file/source/browse/timer.py This solves the two problems listed above: * the Stopwatch can use the most appropriate timer by default, and allow the user to override it if they so choose; * the Stopwatch will not start until the with-block is actually entered. Is there interest in seeing this in the standard library? -- Steven From flying-sheep at web.de Sat Jun 1 12:38:18 2013 From: flying-sheep at web.de (Philipp A.) Date: Sat, 1 Jun 2013 12:38:18 +0200 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: 2013/5/31 Brett Cannon > > [citation needed] > > OK, I claim it isn't as widely used as I think would warrant > inclusion, you disagree. Asking for a citation could be thrown in > either direction and on any point in this discussion and it comes off > as aggressive. > it was a joke. you say ?it?s not widely used? and i used wikipedias citation tag in order to say: ?i?m not so sure about this? > wrong, wrong and irrelevant. > > It might be irrelevant to you, but it isn't irrelevant to everyone. > Remember, this is for the stdlib which means its use needs to beyond > just what packaging wants. > you said JSON is faster to parse in context of metadata files. that?s really irrelevant, because those metadata files are tiny. speed only matters if you have to do an operation very often in some way. that isn?t at all the case if your task is parsing one <100 lines metadata file. for every other use, the user would be able to decide if it was worth the very slightly slower parsing. the area where parsing speed is of essence is a use case for something like protocol buffers anyway, not JSON or YAML. Yes, says me. It's my opinion and I am allowed to express it here. > > You are beginning to take this personally and become a bit hostile. > Please take a moment to step back and realize this is just a > discussion and just because I disagree with it doesn't mean I think > that's bad or I think negatively of you, I just disagree with you. > i?m sorry if i came over like this. let me tell you that this was not at all my intention! best regards, philipp -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Sat Jun 1 12:50:14 2013 From: masklinn at masklinn.net (Masklinn) Date: Sat, 1 Jun 2013 12:50:14 +0200 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: <51A9BDE2.4060201@pearwood.info> References: <51A9BDE2.4060201@pearwood.info> Message-ID: <99340D41-E9E9-4439-9059-C7A78C3E85E1@masklinn.net> On 2013-06-01, at 11:24 , Steven D'Aprano wrote: > The timeit module is great for timing small code snippets, but it is rather inconvenient for timing larger blocks of code. It's also over-kill: under normal circumstances, there is little need for the heroic measures timeit goes through to accurately time long-running code. Does timeit really go through much heroics when using it in code? The CLI version does a bunch of testing and adaptation to try to get accurate measures but as far as I know the Python API has no such facilities, it runs what you gave it the number of time you tell it. The only "heroics" it goes through is using timeit.default_timer(), which is supposed to provide the most accurate clock available. > When people want to time something that doesn't require timeit, they have to write their own timer boilerplate: > > from time import time > t = time() > do_this() > do_that() > print(time() - t) > > > Although only a few lines of boilerplate, this has a few problems, the biggest two being: > > * the user has to decide which of the many clocks provided by Python > is suitable (hint: time.time is not necessarily the best); But that's what timeit.default_timer() is for right? This seems to be a problem of awareness more than availability. > * this is inconvenient at the interactive interpreter, since the timer > is started before the code being typed has been entered. > > I propose a context manager in the timeit module: > > with Stopwatch(): > do_this() > do_that() > > => results are automatically printed, or available for programmatic access > > I have had a recipe for this on ActiveState for about 18 months, where it is relatively popular: > > http://code.activestate.com/recipes/577896-benchmark-code-with-the-with-statement > > and I have extensively used it interactively, in Python 2.6 through 3.3 inclusive. The latest version can be found here: > > https://code.google.com/p/my-startup-file/source/browse/timer.py > > This solves the two problems listed above: > > * the Stopwatch can use the most appropriate timer by default, and > allow the user to override it if they so choose; > > * the Stopwatch will not start until the with-block is actually > entered. > > > Is there interest in seeing this in the standard library? A more general version can be achieved through timeit.Timer, though it's got more boilerplate and as far as I know there's no example of it in the documentation: @timeit.Timer def foo(): do_this() do_that() print foo.timeit(1) On the other hand it allows repeated benching to try and refine the iterations count. Maybe context-management could simply be added to timeit.Timer, rather than be a separate object? From ncoghlan at gmail.com Sat Jun 1 16:04:21 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 2 Jun 2013 00:04:21 +1000 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: On Sat, Jun 1, 2013 at 2:35 AM, Philipp A. wrote: > Hi, reading PEP 426, I made a connection to a (IMHO) longstanding issue: > YAML not being in the stdlib. As MAL already noted, PEP 426 defines a data interchange format for automated tools. Build tools should not (and likely will not) use it as an input format (JSON is a human *readable* serialisation format, but it's a bit of a stretch to call it human *editable* - editing JSON files directly is in some respects even more painful than writing XML by hand). While having a YAML parser in the standard library is a defensible idea, it still wouldn't make any difference to the packaging metadata standards. Those are going to be strictly JSON, as it is a simpler format and there is no technical reason to allow YAML for files that should *never* be edited by hand. >From my perspective, at this point in time, you have 3 reasonable choices for storing application configuration data on disk: * .ini syntax * easy for humans to read and write * stdlib support * can't handle structured data without awkward hacks * relatively simple * JSON * easy for humans to read, hard for humans to write * stdlib support * handles structured data * relatively simple * YAML * easy for humans to read and write * no stdlib support * handles structured data * relatively complex (Historically, XML may also have been on that list, but in most new code, JSON or YAML will be a better choice wherever XML would have previously been suitable) YAML's complexity is the reason I prefer JSON as a data interchange format, but I still believe YAML fills a useful niche for more complex configuration files where .ini syntax is too limited and JSON is too hard to edit by hand. Thus, for me, I answer "Is it worth supporting YAML in the standard library?" with a definite "Yes" (as a more complex configuration file format for the cases .ini syntax can't readily handle). That answer leaves the key open questions as: * whether or not there are any YAML implementations for Python that are sufficiently mature for one to be locked into the standard library's 18-24 month upgrade cycle and 6 month bugfix cycle * whether including such a library would carry an acceptable increase in our risk of security vulnerabilities * whether the authors/maintainers of any such library are prepared to accept the implications of standard library inclusion. Given the mess that was the partial inclusion of PyXML, the explicit decision to disallow any future externally maintained libraries (see PEP 360) and the existing proposal to include a pip bootstraping mechanism in Python 3.4 (see PEP 439), I have my doubts that Python 3.4 is the right time to be including a potentially volatile library, even if providing a YAML parser as an included battery is a good idea in the long run. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ubershmekel at gmail.com Sat Jun 1 16:19:50 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sat, 1 Jun 2013 17:19:50 +0300 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: On Fri, May 31, 2013 at 9:43 PM, Guido van Rossum wrote: > On Fri, May 31, 2013 at 11:35 AM, Brett Cannon wrote: > >> Is there a reason JSON is used other than YAML not being in the stdlib? > > > > It's simpler, it's Python syntax, it's faster to parse. > > I would warn strongly against the "JSON is Python syntax" meme. > > Another warning - this javascript tidbit broke my heart more times than I care to admit: >var k = 'foo' >var obj = {k: 'wonderful'} >obj[k] undefined >obj['k'] "wonderful" Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Sat Jun 1 17:16:04 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 01 Jun 2013 10:16:04 -0500 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: <99340D41-E9E9-4439-9059-C7A78C3E85E1@masklinn.net> References: <51A9BDE2.4060201@pearwood.info> <99340D41-E9E9-4439-9059-C7A78C3E85E1@masklinn.net> Message-ID: <51AA1034.3080809@gmail.com> On 06/01/2013 05:50 AM, Masklinn wrote: > A more general version can be achieved through timeit.Timer, though it's > got more boilerplate and as far as I know there's no example of it in > the documentation: > > @timeit.Timer > def foo(): > do_this() > do_that() > print foo.timeit(1) > > On the other hand it allows repeated benching to try and refine the > iterations count. > > Maybe context-management could simply be added to timeit.Timer, > rather than be a separate object? I like the decorator variation. +1 for adding it to timeit. I would like to be able to specify the maximum count to time, and to be able to specify where to send the output. function_timer = timeit.FunctionTimer(count=3, file=sys.stderr) @function_timer def foo(): ... Which would print ... Foo timer 1/3: n seconds Foo timer 2/3: n seconds Foo timer 3/3: n seconds It would only time 'count' times through the function. A function timer like this requires minimal changes to the code and can be pasted in or commented out as needed. For me, timing a function is the most common type of timing I do in real programs. Cheers, Ron From vito.detullio at gmail.com Sat Jun 1 18:29:09 2013 From: vito.detullio at gmail.com (Vito De Tullio) Date: Sat, 01 Jun 2013 18:29:09 +0200 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery References: Message-ID: Yuval Greenfield wrote: >> > It's simpler, it's Python syntax, it's faster to parse. >> >> I would warn strongly against the "JSON is Python syntax" meme. >> >> > Another warning - this javascript tidbit broke my heart more times than I > care to admit: > >>var k = 'foo' >>var obj = {k: 'wonderful'} >>obj[k] > undefined >>obj['k'] > "wonderful" *sigh* brother! worse if instead of a generic k you have: var MY_CONFIGURATION_CONSTANT = 'foo'; fortunately JSON does not allow unquoted keys (and variables...) so at least his behaviour does not make surprises. -- ZeD From abarnert at yahoo.com Sat Jun 1 18:42:53 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 1 Jun 2013 09:42:53 -0700 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: <9E465DF0-739A-4CA6-B9AD-A4BF1D0CEB7F@yahoo.com> On Jun 1, 2013, at 7:04, Nick Coghlan wrote: > Given the mess that was the partial inclusion of PyXML, the explicit > decision to disallow any future externally maintained libraries (see > PEP 360) and the existing proposal to include a pip bootstraping > mechanism in Python 3.4 (see PEP 439), I have my doubts that Python > 3.4 is the right time to be including a potentially volatile library, > even if providing a YAML parser as an included battery is a good idea > in the long run. For the record, the OP (Phillip) was thinking 3.5 or later. I'm the one who made the assumption he wanted this in 3.4, and he immediately corrected me. Also, I get the impression that he wants to define a new API which doesn't match any of the existing libraries, and not add anything to the stdlib until one of the existing (fast) libraries has an adaptation to the new API, which means it's clearly not an immediate-term goal. He just wants to get some consensus that this is a good idea before starting the work of defining that API, finding or building a reference implementation in pure Python, and convincing the existing library authors to adapt to the standard API. If I've interpreted him wrong, I apologize. From stefan at drees.name Sat Jun 1 20:07:31 2013 From: stefan at drees.name (Stefan Drees) Date: Sat, 01 Jun 2013 20:07:31 +0200 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: <9E465DF0-739A-4CA6-B9AD-A4BF1D0CEB7F@yahoo.com> References: <9E465DF0-739A-4CA6-B9AD-A4BF1D0CEB7F@yahoo.com> Message-ID: <51AA3863.7070907@drees.name> On 01.06.13 18:42, Andrew Barnert wrote: > On Jun 1, 2013, at 7:04, Nick Coghlan ... wrote: > >> ... I have my doubts that Python >> 3.4 is the right time to be including a potentially volatile library, >> even if providing a YAML parser as an included battery is a good idea >> in the long run. > > For the record, the OP (Phillip) was thinking 3.5 or later. I'm the > one who made the assumption he wanted this in 3.4, and he immediately > corrected me. At least I spotted no '3.4' in the first mail of this thread. > Also, I get the impression that he wants to define a new API which > doesn't match any of the existing libraries, and not add anything to the > stdlib until one of the existing (fast) libraries has an adaptation to > the new API, which means it's clearly not an immediate-term goal. He > just wants to get some consensus that this is a good idea before > starting the work of defining that API, finding or building a reference > implementation in pure Python, and convincing the existing library > authors to adapt to the standard API. ... that is what I also distilled (after removing some add-on topics). I am +1 on both designing a new API (willing to support) and to nudge the existing libraries to adapt it **before** lobbying towards stdlib inclusion. Personally I also perceive YAML as Nick put it between .ini and JSON. All the best, Stefan From masklinn at masklinn.net Sat Jun 1 20:15:00 2013 From: masklinn at masklinn.net (Masklinn) Date: Sat, 1 Jun 2013 20:15:00 +0200 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: <51AA1034.3080809@gmail.com> References: <51A9BDE2.4060201@pearwood.info> <99340D41-E9E9-4439-9059-C7A78C3E85E1@masklinn.net> <51AA1034.3080809@gmail.com> Message-ID: <1CC76AFE-E0B5-4C20-A9B2-294570718FC2@masklinn.net> On 2013-06-01, at 17:16 , Ron Adam wrote: > > > On 06/01/2013 05:50 AM, Masklinn wrote: >> A more general version can be achieved through timeit.Timer, though it's >> got more boilerplate and as far as I know there's no example of it in >> the documentation: >> >> @timeit.Timer >> def foo(): >> do_this() >> do_that() >> print foo.timeit(1) >> >> On the other hand it allows repeated benching to try and refine the >> iterations count. >> >> Maybe context-management could simply be added to timeit.Timer, >> rather than be a separate object? > > I like the decorator variation. > > +1 for adding it to timeit. > > > > I would like to be able to specify the maximum count to time, and to be able to specify where to send the output. > > > > function_timer = timeit.FunctionTimer(count=3, file=sys.stderr) > > > @function_timer > def foo(): > ... > > Which would print ... > > Foo timer 1/3: n seconds > Foo timer 2/3: n seconds > Foo timer 3/3: n seconds > > It would only time 'count' times through the function. > > A function timer like this requires minimal changes to the code and can be pasted in or commented out as needed. FWIW that can also be achieved by using timeit.Timer as a decorator: simply call Timer.repeat afterwards instead of Timer.timeit, it defaults to repeat=3 (which corresponds to your count=3) and returns a list of results. So @timeit.Timer def foo(): do_this() do_that() print foo.repeat(repeat=3, number=1) will print a list of 3 elements, each of which corresponds to running the function once. > For me, timing a function is the most common type of timing I do in real programs. All of timeit's stuff can take functions instead of statement strings, both are supported. From storchaka at gmail.com Sat Jun 1 20:34:45 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 01 Jun 2013 21:34:45 +0300 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: <99340D41-E9E9-4439-9059-C7A78C3E85E1@masklinn.net> References: <51A9BDE2.4060201@pearwood.info> <99340D41-E9E9-4439-9059-C7A78C3E85E1@masklinn.net> Message-ID: 01.06.13 13:50, Masklinn ???????(??): > A more general version can be achieved through timeit.Timer, though it's > got more boilerplate and as far as I know there's no example of it in > the documentation: > > @timeit.Timer > def foo(): > do_this() > do_that() > print foo.timeit(1) Why not just timeit.Timer(foo).timeit(1)? From abarnert at yahoo.com Sat Jun 1 22:28:27 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 1 Jun 2013 13:28:27 -0700 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: <51A9BDE2.4060201@pearwood.info> References: <51A9BDE2.4060201@pearwood.info> Message-ID: On Jun 1, 2013, at 2:24, Steven D'Aprano wrote: > The timeit module is great for timing small code snippets, but it is rather inconvenient for timing larger blocks of code. If you factor the block out into a function, you can just timeit that function. And often (not always, I realize, but often) the refactoring will be worth doing anyway. After all, you have a block of code that's sufficiently separate to be worth timing separately. > from time import time > t = time() > do_this() > do_that() > print(time() - t) def do_them(): do_this() do_that() print(timeit(do_them, number=1)) This eliminates both of your problems. You don't have to guess which timing function to use, and it doesn't start the timer until the function starts. In 2.x I've used your recipe, because there's no trivial way to handle this: x = do_this() y = do_that() But in 3.x you just just add "nonlocal x, y" inside the function. So, I've rarely used it. (I've also just called timeit(more_functools.sequence(do_this, do_that)), but that isn't very pythonic. If you want a new non-trivial function, it's usually better to just def it.) Again, I realize that sometimes (even in 3.x) it's not appropriate to convey the block into a function. But often enough to be worth adding to the stdlib? I don't know. From ncoghlan at gmail.com Sun Jun 2 01:38:02 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 2 Jun 2013 09:38:02 +1000 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: <9E465DF0-739A-4CA6-B9AD-A4BF1D0CEB7F@yahoo.com> References: <9E465DF0-739A-4CA6-B9AD-A4BF1D0CEB7F@yahoo.com> Message-ID: On 2 Jun 2013 02:43, "Andrew Barnert" wrote: > > On Jun 1, 2013, at 7:04, Nick Coghlan wrote: > > > Given the mess that was the partial inclusion of PyXML, the explicit > > decision to disallow any future externally maintained libraries (see > > PEP 360) and the existing proposal to include a pip bootstraping > > mechanism in Python 3.4 (see PEP 439), I have my doubts that Python > > 3.4 is the right time to be including a potentially volatile library, > > even if providing a YAML parser as an included battery is a good idea > > in the long run. > > For the record, the OP (Phillip) was thinking 3.5 or later. I'm the one who made the assumption he wanted this in 3.4, and he immediately corrected me. > > Also, I get the impression that he wants to define a new API which doesn't match any of the existing libraries, and not add anything to the stdlib until one of the existing (fast) libraries has an adaptation to the new API, which means it's clearly not an immediate-term goal. He just wants to get some consensus that this is a good idea before starting the work of defining that API, finding or building a reference implementation in pure Python, and convincing the existing library authors to adapt to the standard API. Ah, I missed that. If the target time frame is 3.5 and the API design goals include "secure by default, full power of YAML when requested" then it sounds like a fine idea to try. Cheers, Nick. > > If I've interpreted him wrong, I apologize. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Sun Jun 2 01:53:58 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 01 Jun 2013 18:53:58 -0500 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: <1CC76AFE-E0B5-4C20-A9B2-294570718FC2@masklinn.net> References: <51A9BDE2.4060201@pearwood.info> <99340D41-E9E9-4439-9059-C7A78C3E85E1@masklinn.net> <51AA1034.3080809@gmail.com> <1CC76AFE-E0B5-4C20-A9B2-294570718FC2@masklinn.net> Message-ID: <51AA8996.3080704@gmail.com> On 06/01/2013 01:15 PM, Masklinn wrote: >> >@function_timer >> >def foo(): >> > ... >> > >> >Which would print ... >> > >> >Foo timer 1/3: n seconds >> >Foo timer 2/3: n seconds >> >Foo timer 3/3: n seconds >> > >> >It would only time 'count' times through the function. >> > >> >A function timer like this requires minimal changes to the code and can be pasted in or commented out as needed. > FWIW that can also be achieved by using timeit.Timer as a decorator: > simply call Timer.repeat afterwards instead of Timer.timeit, it defaults > to repeat=3 (which corresponds to your count=3) and returns a list of > results. > > So > > @timeit.Timer > def foo(): > do_this() > do_that() > print foo.repeat(repeat=3, number=1) > > will print a list of 3 elements, each of which corresponds to running > the function once. > >> >For me, timing a function is the most common type of timing I do in real programs. > All of timeit's stuff can take functions instead of statement strings, > both are supported. Yes, but that is very different in context. A more realistic example would include arguments... @function_timer def foo(*args, **kwds): ... If the args and kwds are generated by other parts of the program, then you would need to create more boiler plate to generate the arguments for the out of context print call. This could get quite involved depending on how complex the arguments are. The advantage is that it gives a more accurate time for a very specific set of arguments. The decorator I suggest gives the times of the actual usage while the program (or module) is running with a minimal amount of changes to the program, and minimal amount of work to get meaningful output. The function timer counter should be called max_count rather than count. If the function is only called once, then you will only get one time for it. If the function is called many times, then you get a *max* of three times in the case max_count is 3. Then it would stop timing that function. And the function_timer decorator can be replaced with a null decorator at the top to disable all of them at once. function_timer = null_decorator # disable this decorator I think this is closer to what Steven was asking for. His with statement example is better suited for targeted timing of particular blocks of code in a program while it's executing. Ron From steve at pearwood.info Sun Jun 2 03:58:43 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 02 Jun 2013 11:58:43 +1000 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: <99340D41-E9E9-4439-9059-C7A78C3E85E1@masklinn.net> References: <51A9BDE2.4060201@pearwood.info> <99340D41-E9E9-4439-9059-C7A78C3E85E1@masklinn.net> Message-ID: <51AAA6D3.4080300@pearwood.info> On 01/06/13 20:50, Masklinn wrote: > On 2013-06-01, at 11:24 , Steven D'Aprano wrote: > >> The timeit module is great for timing small code snippets, but it is rather inconvenient for timing larger blocks of code. It's also over-kill: under normal circumstances, there is little need for the heroic measures timeit goes through to accurately time long-running code. > > Does timeit really go through much heroics when using it in code? The > CLI version does a bunch of testing and adaptation to try to get > accurate measures but as far as I know the Python API has no such > facilities, it runs what you gave it the number of time you tell it. Perhaps "heroic" is an exaggeration, but timeit.Timer does more than just run the code snippet. By default it repeats the code one million times, to dissipate any cache effects or other interference; takes efforts to minimize the overhead of the looping construct; it tries to pick the best timer function it can; and it disables garbage collection. Most of these things are unnecessary or even counter-productive when timing a single run of some long-running block of code. [...] > A more general version can be achieved through timeit.Timer, though it's > got more boilerplate and as far as I know there's no example of it in > the documentation: > > @timeit.Timer > def foo(): > do_this() > do_that() > print foo.timeit(1) I'm not suggesting that timeit.Timer go away :-) But I am suggesting that for timing code with a minimum of friction and a maximum of convenience, particularly in the interactive interpreter, a context manager beats Timer, even the decorator version. with Stopwatch(): do_this() do_that() No mess, no fuss, no need to wrap things in an extraneous function, or to use nonlocal (as per Andrew Barnet's email), it just times the code you run. This is not just for benchmarking, sometimes in the interactive interpreter you just want to know how long something will take, in a hand-wavy sense. "It takes about a minute to process this directory, so when I process this other one I'll allow about ten minutes. Time enough for a coffee break!" > On the other hand it allows repeated benching to try and refine the > iterations count. > > Maybe context-management could simply be added to timeit.Timer, > rather than be a separate object? I had thought about that, but the constructor signatures are just too different. -- Steven From clay.sweetser at gmail.com Sun Jun 2 06:15:10 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Sun, 2 Jun 2013 00:15:10 -0400 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: References: Message-ID: I quite like this proposal. To me, the main advantage of such a context manager/decorator over the current timeit module is that it can be used for pieces of code that either can't or shouldn't be arbitrarily repeated. For example, I am currently writing a plugin for the sublime text editor, and it would be very convenient for me to be able to get a quick summery of timing statistics for functions that make use of the editor's plugin api to modify documents. My only suggestion for the proposed function is to allow the output of statistics similar to the ones the timeit module allows, such as the mean runtime of a block of code. Sincerely, Clay Sweetser -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jun 2 07:29:40 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 02 Jun 2013 15:29:40 +1000 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: References: Message-ID: <51AAD844.6080705@pearwood.info> On 02/06/13 14:15, Clay Sweetser wrote: > I quite like this proposal. To me, the main advantage of such a context > manager/decorator over the current timeit module is that it can be used for > pieces of code that either can't or shouldn't be arbitrarily repeated. > For example, I am currently writing a plugin for the sublime text > editor, and it would be very convenient for me to be able to get a quick > summery of timing statistics for functions that make use of the editor's > plugin api to modify documents. > My only suggestion for the proposed function is to allow the output of > statistics similar to the ones the timeit module allows, such as the mean > runtime of a block of code. While I don't wish to discourage anyone with the good sense to agree with my proposal *wink*, in fairness I should say a couple of things: * A context manager is not capable of running the body of the block multiple times automatically. If you want to do that, you need to wrap it in a for-loop, or just use timeit.Timer. * Even when you run multiple trials, say using Timer.repeat(), taking the mean is statistically the wrong thing to do. The docs already say this: [quote] It?s tempting to calculate mean and standard deviation from the result vector and report these. However, this is not very useful. In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python?s speed, but by other processes interfering with your timing accuracy. So the min() of the result is probably the only number you should be interested in. After that, you should look at the entire vector and apply common sense rather than statistics. [end quote] http://docs.python.org/2/library/timeit.html#timeit.Timer.repeat To put it another way, you can assume that each timeit result is made up of two components: - the "actual" or "real" time that the code would take in a perfect world where no other processes are running and your code is run at the highest speed possible; - plus some unknown error, due to external factors, overhead of the timing code, etc. Note that the error is always strictly positive, never negative or zero. So the min() of the timing results is the best estimate of the "actual" time. Any other estimate, including mean or median, will be higher than the minimum, and therefore less accurate. -- Steven From ron3200 at gmail.com Sun Jun 2 17:00:29 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sun, 02 Jun 2013 10:00:29 -0500 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: <51AAD844.6080705@pearwood.info> References: <51AAD844.6080705@pearwood.info> Message-ID: <51AB5E0D.8080201@gmail.com> On 06/02/2013 12:29 AM, Steven D'Aprano wrote: > > To put it another way, you can assume that each timeit result is made up of > two components: > > - the "actual" or "real" time that the code would take in a perfect world > where no other processes are running and your code is run at the highest > speed possible; > > - plus some unknown error, due to external factors, overhead of the timing > code, etc. > > Note that the error is always strictly positive, never negative or zero. So > the min() of the timing results is the best estimate of the "actual" time. > Any other estimate, including mean or median, will be higher than the > minimum, and therefore less accurate. Yes to all that, and in addition, it only gives results for a set of very limited conditions. When doing larger blocks of code (hefty or long-running), there may be different sub-paths of execution that result in a wide range of times when in actual use. In those cases, while the minimum is nice to know, I want to identify the sections of code and conditions that take the longest. Those are the parts I will try to make more efficient. I like your practical description of just wanting to know how long something will take in a hand-wavy way. My thoughts this morning is that it would be nice to have both your stopwatch context timer, along with timers for use in running programs. For example if a program I'm working on seems a bit slower than I think it should be and want to look into how it can be made more efficient. What I'd like to do is ... * Import the timers from a single module. * Use a function timer decorator to get a general idea of what larger blocks of code are long running, as in your subject title for this thread. * And use the context timer to look at what parts within the functions I can improve. Maybe something like... @fuction_timer(3) def foo(...): ... with block_timer(10, id="B1"): do_this() do_that() ... The output might be something like this. 1/3: 8.56 secs, foo(), args=(...), kwds-{...} 1/10: 2.34 secs, B1 2/3: 8.56 secs, foo(), args=(...), kwds-{...} 2/10: 2.34 secs, B1 3/3: 8.56 secs, foo(), args=(...), kwds-{...} 3/10: 2.34 secs, B1 4/10: 2.34 secs, B1 .... 19/10: 2.34 secs, B1 (The times wouldn't be the same for each count in reality.) In this case, the output is when the program is actually running, so you don't want to repeat each function or block n times, but instead want to log each normal iteration up to the max count. That keeps the amount of output within looping code reasonable. The stopwatch context is useful on the command line, these program timers would be useful in larger, longer running programs and scripts. Both are much easier to use than timeit. Timeit would still be the better way to benchmark scripts and micro benchmark small bits of similar code. cheers, Ron From ron3200 at gmail.com Sun Jun 2 17:15:33 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sun, 02 Jun 2013 10:15:33 -0500 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: <51AB5E0D.8080201@gmail.com> References: <51AAD844.6080705@pearwood.info> <51AB5E0D.8080201@gmail.com> Message-ID: <51AB6195.9000503@gmail.com> On 06/02/2013 10:00 AM, Ron Adam wrote: > 19/10: 2.34 secs, B1 Should be.. 10/10 not 19/10 Cheers, Ron From masklinn at masklinn.net Sun Jun 2 17:20:09 2013 From: masklinn at masklinn.net (Masklinn) Date: Sun, 2 Jun 2013 17:20:09 +0200 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: <51AB5E0D.8080201@gmail.com> References: <51AAD844.6080705@pearwood.info> <51AB5E0D.8080201@gmail.com> Message-ID: On 2013-06-02, at 17:00 , Ron Adam wrote: > On 06/02/2013 12:29 AM, Steven D'Aprano wrote: >> >> To put it another way, you can assume that each timeit result is made up of >> two components: >> >> - the "actual" or "real" time that the code would take in a perfect world >> where no other processes are running and your code is run at the highest >> speed possible; >> >> - plus some unknown error, due to external factors, overhead of the timing >> code, etc. >> >> Note that the error is always strictly positive, never negative or zero. So >> the min() of the timing results is the best estimate of the "actual" time. >> Any other estimate, including mean or median, will be higher than the >> minimum, and therefore less accurate. > > Yes to all that, and in addition, it only gives results for a set of very limited conditions. > > When doing larger blocks of code (hefty or long-running), there may be different sub-paths of execution that result in a wide range of times when in actual use. > > In those cases, while the minimum is nice to know, I want to identify the sections of code and conditions that take the longest. Those are the parts I will try to make more efficient. > > > I like your practical description of just wanting to know how long something will take in a hand-wavy way. > > My thoughts this morning is that it would be nice to have both your stopwatch context timer, along with timers for use in running programs. > > > For example if a program I'm working on seems a bit slower than I think it should be and want to look into how it can be made more efficient. That's a profile. Note that you can add both timing and profiling with low syntactic overhead (via function decorators) through the profilehooks package (available on pypi): profilehooks.profile will do profiling runs on the function it decorates when that function is run, profilehook.timecall will just print the function's runtime. Both can be either immediate (print whatever's been collected for a call at the end of the call) or not immediate (print a summary of all calls when the program terminates): >>> import profilehooks >>> @profilehooks.timecall ... def immediate(n): ... list(xrange(n)) ... >>> @profilehooks.timecall(immediate=False) ... def deferred(n): ... list(xrange(n)) ... >>> immediate(40000) immediate (:1): 0.002 seconds >>> immediate(400000) immediate (:1): 0.023 seconds >>> immediate(4000000) immediate (:1): 0.198 seconds >>> deferred(40000) >>> deferred(400000) >>> deferred(4000000) >>> ^D deferred (:1): 3 calls, 0.155 seconds (0.052 seconds per call) From andrew at acooke.org Sun Jun 2 19:26:58 2013 From: andrew at acooke.org (andrew cooke) Date: Sun, 2 Jun 2013 13:26:58 -0400 Subject: [Python-ideas] A Simpler Enum For Python 3 Message-ID: <20130602172658.GA10367@acooke.org> Hi, Not sure if this is appropriate or useful, so won't repeat, but I wrote an alternative Enum implementation for Python 3, which is available at https://github.com/andrewcooke/simple-enum There's a discussion of how it differes from PEP 0435 at https://github.com/andrewcooke/simple-enum#discussion and some examples of teh differences at https://github.com/andrewcooke/simple-enum#things-you-can-do-with-the-simpler-enum-that-you-cant-do-with-the-standard-enum Also, I have an almost-complete codebase that extends Ethan's "standard" implementation with some of the ideas in this design, available at https://github.com/andrewcooke/bnum - I have abandoned that project in favour of the one mentioned first, above, but it may be of interest if you want to consider modifying the PEP design slightly (in other words: bnum includes inheritance, simple-enum does not). Cheers, Andrew From techtonik at gmail.com Sun Jun 2 20:23:13 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 2 Jun 2013 21:23:13 +0300 Subject: [Python-ideas] Stdlib YAML evolution (Was: PEP 426, YAML in the stdlib and implementation discovery) Message-ID: FWIW, I am +1 on for the ability to read YAML based configs Python without dependencies, but waiting for several years is hard. Maybe try an alternative data driven development process as opposed to traditional PEP based all-or-nothing style to speed up the process? It is possible to make users happy incrementally and keep development fun without sacrificing too much on the Zen side. If open source is about scratching your own itches, then the most effective way to implement a spec would be to allow people add support for their own flavor without disrupting works of the others. For some reason I think most people don't need full YAML speccy, especially if final implementation will be slow and heavy. So instead of: import yaml I'd start with something more generic and primitive: from datatrans import yamlish Where `datatrans` is data transformation framework taking care of usual text parsing (data partitioning), partition mapping (structure transformation) and conversion (binary to string etc.) trying to be as fast and lightweight as possible, bringing vast field for future optimizations on algorithmic level. `yamlish` is an implementation which is not vastly optimized (datatrans to yamlish is like RPython to PyPy) and can be easily extended to cover more YAML (hopefully). Therefore the name - it doesn't pretend to parse YAML - it parses some supported subset, which is improved over time by different parties (if datatrans is done right to provide readable (maintainable + extensible) code for implementation). There is an exisiting package called `yamlish` on PyPI - I am not talking about it - it is PyYAML based, which is not an option for now as I see it. So I stole its name. Sorry. This PyPI package was used to parse TAP format, which is again, a subset. Subset.. It appears that YAML is good for humans for its subsets. It leaves an impression (maybe it's just an illusion) that development work for subset support can also be partitioned. If `datatrans` "done right" is possible, it will allow incremental addition of new YAML features as the need for them arises (new data examples are added). Or it can help to build parsers for YAML subsets that are intentionally limited to make them performance efficient. Because `datatrans` is a package isolating parsing, mapping and conversion parts of the process to make it modular and extensible, it can serve as a reference point for various kinds of (scientific) papers including the ones that prove that such data transformation framework is impossible. As for `yamlish` submodule, the first important paper covering it will be a reference table matrix of supported features. While it all sounds too complicated, driving development by data and real short term user needs (as opposed to designing everything upfront) will make the process more attractive. In data driven development, there are not many things that can break - you either continue parsing previous data or not. The output from the parsing process may change over time, but it may be controlled by configuring the last step of data transformation phase. `Parsing AppEngine config file` or `reading package meta data` are good starting points. Once package meta data subset is parsed, it is done and won't break. The implementation for meta data parsing may mature in distutils package, for AppEngine in its SDK, and merged in either of those, sending patches for `datastrans` to stdlib. The question is only to design output format for the parse stage. I am not sure everything should be convertible into Python objects using the "best fit" technique. I will be pretty comfortable if target format will not be native Python objects at all. More than that - I will even insist to avoid converting to native Python object from the start. The ideal output for the first version should be generic tree structure with defined names for YAML elements. The tree that can be represented as XML where these names are tags. The tree can be therefore traversed and selected using XPath/JQuery syntax. It will take several years for implementation to mature and in the end there will be a plenty of backward compatibility matters with the API, formatting and serializing. So the first thing I'd do is [abandon serialization]. From the point of view of my self-proclaimed data transformation theory, the input and output formats are data. If output format is not human readable - as some linked Python data structures in memory - it wastes time and hinders development. Serializing Python is a problem of different level, which is an example of binary, abstract, memory-only output format - a lot of properties that you don't want to deal with while working with data. To summarize: 1. full spec support is no goal 2. data driven (using real world examples/stories) 3. incremental (one example/story at a time) 4. research based (beautiful ideas vs ugly real world limitations) 5. maintainable (code is easy to read and understand the structure) 6. extensible (easy to find out the place to be modified) 7. core "generic tree" data structure as an intermediate format and "yaml tree" data structure as a final format from parsing process P.S. I am willing to wok on this "data transformation theory" stuff and prototype implementation, because it is generally useful in many areas. But I need support. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Jun 2 21:56:59 2013 From: mertz at gnosis.cx (David Mertz) Date: Sun, 2 Jun 2013 12:56:59 -0700 Subject: [Python-ideas] Stdlib YAML evolution (Was: PEP 426, YAML in the stdlib and implementation discovery) In-Reply-To: References: Message-ID: <1702023F-0AAB-4B6A-8FEB-D026693F9431@gnosis.cx> I would definitely like to have a YAML library--even one with restricted function--in the standard library. Anatoly's suggestion of 'yamlish' is fine. I have myself had to write a 'miniyaml.py' as part of one work project. We have some configuration files in bit of software I wrote/maintain, and these files are very much human edited, and their structure very well fits a subset of YAML. There are many things you can do in YAML that we do not need to do. Using JSON, or INI format, or XML, or other options supported in the Python standard library would have been possible, but in all cases the configuration files would have been far less readable than in the way I did it. Within the particular environment I'm supporting, it is possible but slightly cumbersome to install a 3rd party library like PyYaml. So I wrote <40 lines of code to support the subset we need, but keeping the API compatible. There's nothing special about my implementation, and it's certainly not ready for inclusion in STDLIB, but the fact I needed to do it is suggestive to me. Below is the docstring for my small module. Basically, ANY reasonable subset of YAML that was supported in the STDLIB would also be a drop-in replacement for my trivial code. Well, I suppose that if its API were different from that of PyYaml, some small change might be needed, but it would certainly support my simple use case. > This module provides an implementation for a small subset of YAML > > Only the constructs needed for parsing an "invariants" file are supported here. > The only supported API function of the PyYaml library is 'load_all()'. However, > within that restriction, the result returned by 'miniyaml.load_all()'--if loading > a string that this module can parse--is intended to be identical to that returned > by 'yaml.load_all()'. > > The intended use of this module is with an import like: > > import miniyaml as yaml > > In the presence of an actual PyYaml installation, this can simply be instead: > > import yaml > > The parsed subset used in invariants files looks like the below. If multiline > comments (with line breaks) are need, the usual YAML construct is used. Each > invariant block is a new YAML "document": > > Invariant: some_python_construct(anton.leaf.parameter) is something_else > Comment: This describes the invariant in more human readable terms > --- > Invariant: isMultiple(DT, some.type.of.interval) > Comment: | > The interval should really be a multiple of timestep because of equation: > sigma = epsilon^2 + (3*foo)^5 > And that's how it works. > --- On Jun 2, 2013, at 11:23 AM, anatoly techtonik wrote: > FWIW, I am +1 on for the ability to read YAML based configs Python > without dependencies, but waiting for several years is hard. > > Maybe try an alternative data driven development process as opposed > to traditional PEP based all-or-nothing style to speed up the process? > It is possible to make users happy incrementally and keep development > fun without sacrificing too much on the Zen side. If open source is > about scratching your own itches, then the most effective way to > implement a spec would be to allow people add support for their own > flavor without disrupting works of the others. > > For some reason I think most people don't need full YAML > speccy, especially if final implementation will be slow and heavy. > > So instead of: > import yaml > I'd start with something more generic and primitive: > from datatrans import yamlish > > Where `datatrans` is data transformation framework taking care > of usual text parsing (data partitioning), partition mapping (structure > transformation) and conversion (binary to string etc.) trying to be as fast > and lightweight as possible, bringing vast field for future optimizations on > algorithmic level. `yamlish` is an implementation which is not vastly > optimized (datatrans to yamlish is like RPython to PyPy) and can be > easily extended to cover more YAML (hopefully). Therefore the name - > it doesn't pretend to parse YAML - it parses some supported subset, > which is improved over time by different parties (if datatrans is done right > to provide readable (maintainable + extensible) code for implementation). > > > There is an exisiting package called `yamlish` on PyPI - I am not talking > about it - it is PyYAML based, which is not an option for now as I see it. > So I stole its name. Sorry. This PyPI package was used to parse TAP > format, which is again, a subset. Subset.. > > It appears that YAML is good for humans for its subsets. It leaves an > impression (maybe it's just an illusion) that development work for subset > support can also be partitioned. If `datatrans` "done right" is possible, it > will allow incremental addition of new YAML features as the need for > them arises (new data examples are added). Or it can help to build > parsers for YAML subsets that are intentionally limited to make them > performance efficient. > > Because `datatrans` is a package isolating parsing, mapping and > conversion parts of the process to make it modular and extensible, it can > serve as a reference point for various kinds of (scientific) papers including > the ones that prove that such data transformation framework is impossible. > As for `yamlish` submodule, the first important paper covering it will be a > reference table matrix of supported features. > > > While it all sounds too complicated, driving development by data and real > short term user needs (as opposed to designing everything upfront) will > make the process more attractive. In data driven development, there are not > many things that can break - you either continue parsing previous data or > not. The output from the parsing process may change over time, but it may > be controlled by configuring the last step of data transformation phase. > > `Parsing AppEngine config file` or `reading package meta data` > are good starting points. Once package meta data subset is parsed, it is > done and won't break. The implementation for meta data parsing may mature > in distutils package, for AppEngine in its SDK, and merged in either of those, > sending patches for `datastrans` to stdlib. The question is only to design > output format for the parse stage. I am not sure everything should be > convertible into Python objects using the "best fit" technique. > > I will be pretty comfortable if target format will not be native Python objects > at all. More than that - I will even insist to avoid converting to native Python > object from the start. The ideal output for the first version should be generic > tree structure with defined names for YAML elements. The tree that can be > represented as XML where these names are tags. The tree can be > therefore traversed and selected using XPath/JQuery syntax. > > It will take several years for implementation to mature and in the end there > will be a plenty of backward compatibility matters with the API, formatting > and serializing. So the first thing I'd do is [abandon serialization]. From the > point of view of my self-proclaimed data transformation theory, the input and > output formats are data. If output format is not human readable - as some > linked Python data structures in memory - it wastes time and hinders > development. Serializing Python is a problem of different level, which is an > example of binary, abstract, memory-only output format - a lot of properties > that you don't want to deal with while working with data. > > > To summarize: > 1. full spec support is no goal > 2. data driven (using real world examples/stories) > 3. incremental (one example/story at a time) > 4. research based (beautiful ideas vs ugly real world limitations) > > 5. maintainable (code is easy to read and understand the structure) > 6. extensible (easy to find out the place to be modified) > 7. core "generic tree" data structure as an intermediate format and > "yaml tree" data structure as a final format from parsing process > > > P.S. I am willing to wok on this "data transformation theory" stuff and > prototype implementation, because it is generally useful in many > areas. But I need support. > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- mertz@ THIS MESSAGE WAS BROUGHT TO YOU BY: n o gnosis Postmodern Enterprises .cx IN A WORLD W/O WALLS, THERE WOULD BE NO GATES d o z e From ncoghlan at gmail.com Sun Jun 2 23:03:06 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 3 Jun 2013 07:03:06 +1000 Subject: [Python-ideas] A Simpler Enum For Python 3 In-Reply-To: <20130602172658.GA10367@acooke.org> References: <20130602172658.GA10367@acooke.org> Message-ID: Please call it something like "magic enum" rather than "simple enum" - the reason we rejected that design for the PEP is because it is anything *but* simple and breaks most standard assumptions about how class bodies work. (If you look at the enum discussion threads, this syntax was proposed *and* implemented prior to acceptance of the design in the PEP) Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Jun 3 01:53:55 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 2 Jun 2013 16:53:55 -0700 (PDT) Subject: [Python-ideas] Stdlib YAML evolution (Was: PEP 426, YAML in the stdlib and implementation discovery) In-Reply-To: References: Message-ID: <1370217235.82524.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: anatoly techtonik Sent: Sunday, June 2, 2013 11:23 AM >FWIW, I am +1 on for the ability to read YAML based configs Python >without dependencies, but waiting for several years is hard. With all due respect, I don't think you've read even a one-sentence description of YAML, so your entire post is nonsense. The first sentence of the abstract says,?"YAML? is a?data serialization language designed around the common native data types of agile programming languages." So, your idea that we shouldn't use it for serialization, and shouldn't map it to native Python data types, is ridiculous. You specifically suggest mapping YAML to XML so we can treat it as a structured document. From the "Relation to XML" section: "YAML is primarily a data serialization language. XML was? designed to support structured documentation." You suggest that we shouldn't build all of YAML, just some bare-minimum subset that's good enough to get started. JSON is already _more_ than a bare-minimum subset of YAML, so we're already done. But you'd also like some data-driven way to extend this. YAML has already designed exactly that. Once you have the core schema, you can add new types, and the syntax for those types is data-driven (although the semantics are really only defined in hand-wavy English and probably require code to implement, but I'm not sure how you expect your proposal to be any different, unless you're proposing something like XML Schema). So, either the necessary subset of YAML you want is the entire spec, or you want to do an equal amount of work building something just as complex but not actually YAML. The idea of building a useful subset of YAML isn't a bad one. But the way to do that is to go through the features of YAML that JSON doesn't have, and decide which ones you want. For example, YAML with the core schema, but no aliases, no plain strings, and no explicit tags is basically JSON with indented block structure, raw strings, and useful synonyms for key constants (so you can write True instead of true). You could even carefully import a few useful definitions from the type library as long as they're unambiguous (e.g., timestamp).?That gives you most of the advantages of YAML that don't bring any safety risks, and its output would be interpretable as full YAML, and it might be a little easier to implement than the full spec. But that has very little to do with your proposal. In particular, leaving out the data-driven features of YAML is what makes it safe and simple. Meanwhile, I think what you actually want is XSLT processors to convert YAML to and from XML. Fortunately, the YAML community is already working on that at?http://www.yaml.org/xml. Then you don't need any new Python code at all; just convert your YAML to XML and use whichever XML library (in the stdlib or not), and you're done. >Maybe try an alternative data driven development process as opposed >to traditional PEP based all-or-nothing style to speed up the process? >It is possible to make users happy incrementally and keep development >fun without sacrificing too much on the Zen side. If open source is? >about scratching your own itches, then the most effective way to >implement a spec would be to allow people add support for their own >flavor without disrupting works of the others. > > >For some reason I think most people don't need full YAML > >speccy, especially if final implementation will be slow and heavy. > > >So instead of: >? ? import yaml >I'd start with something more generic and primitive: >? ? from datatrans import yamlish > > >Where `datatrans` is data transformation framework taking care >of usual text parsing (data partitioning), partition mapping (structure >transformation) and conversion (binary to string etc.) trying to be as fast >and lightweight as possible, bringing vast field for future optimizations on >algorithmic level. `yamlish` is an implementation which is not vastly >optimized (datatrans to yamlish is like RPython to PyPy) and can be >easily extended to cover more YAML (hopefully). Therefore the name - >it doesn't pretend to parse YAML - it parses some supported subset, >which is improved over time by different parties (if datatrans is done right >to provide readable (maintainable + extensible) code for implementation). > > > > >There is an exisiting package called `yamlish` on PyPI - I am not talking >about it - it is PyYAML based, which is not an option for now as I see it. >So I stole its name. Sorry. This PyPI package was used to parse TAP >format, which is again, a subset. Subset.. > > >It appears that YAML is good for humans for its subsets. It leaves an >impression (maybe it's just an illusion) that development work for subset >support can also be partitioned. If `datatrans` "done right" is possible, it >will allow incremental addition of new YAML features as the need for >them arises (new data examples are added). Or it can help to build >parsers for YAML subsets that are intentionally limited to make them >performance efficient. > > >Because `datatrans` is a package isolating parsing, mapping and >conversion parts of the process to make it modular and extensible, it can >serve as a reference point for various kinds of (scientific) papers including >the ones that prove that such data transformation framework is impossible. >As for `yamlish` submodule, the first important paper covering it will be a >reference table matrix of supported features. > > > > >While it all sounds too complicated, driving development by data and real >short term user needs (as opposed to designing everything upfront) will >make the process more attractive. In data driven development, there are not? >many things that can break - you either continue parsing previous data or >not. The output from the parsing process may change over time, but it may >be controlled by configuring the last step of data transformation phase. > > >`Parsing AppEngine config file` or `reading package meta data` >are good starting points. Once package meta data subset is parsed, it is >done and won't break. The implementation for meta data parsing may mature >in distutils package, for AppEngine in its SDK, and merged in either of those, >sending patches for `datastrans` to stdlib. The question is only to design >output format for the parse stage. I am not sure everything should be >convertible into Python objects using the "best fit" technique. > > >I will be pretty comfortable if target format will not be native Python objects > >at all. More than that - I will even insist to avoid converting to native Python >object from the start. The ideal output for the first version should be generic >tree structure with defined names for YAML elements. The tree that can be >represented as XML where these names are tags. The tree can be >therefore traversed and selected using XPath/JQuery syntax. > > >It will take several years for implementation to mature and in the end there >will be a plenty of backward compatibility matters with the API, formatting >and serializing. So the first thing I'd do is [abandon serialization]. From the >point of view of my self-proclaimed data transformation theory, the input and >output formats are data. If output format is not human readable - as some >linked Python data structures in memory - it wastes time and hinders >development. Serializing Python is a problem of different level, which is an >example of binary, abstract, memory-only output format - a lot of properties >that you don't want to deal with while working with data. > > > > >To summarize: > >1. full spec support is no goal >2. data driven (using real world examples/stories) >3. incremental (one example/story at a time) >4. research based (beautiful ideas vs ugly real world limitations) > > >5. maintainable (code is easy to read and understand the structure) >6. extensible (easy to find out the place to be modified) >7. core "generic tree" data structure as an intermediate format and >? ? "yaml tree" data structure as a final format from parsing process > > > > >P.S. I am willing to wok on this "data transformation theory" stuff and >? ? ? ?prototype implementation, because it is generally useful in many >? ? ? ?areas. But I need support. >--? >anatoly t. > > From jeanpierreda at gmail.com Mon Jun 3 02:16:19 2013 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sun, 2 Jun 2013 20:16:19 -0400 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: On Sat, Jun 1, 2013 at 10:04 AM, Nick Coghlan wrote: > YAML's complexity is the reason I prefer JSON as a data interchange > format, but I still believe YAML fills a useful niche for more complex > configuration files where .ini syntax is too limited and JSON is too > hard to edit by hand. YAML, unlike JSON, is able to represent non-tree data (it can store any reference graph). At this point the only stdlib module that can do that is pickle, which would ordinarily precluded by security concerns. So adding YAML is excellent even as a data interchange format in the stdlib -- there is no stdlib module that occupies this particular intersection of functionality (nominal security and object graphs). I wrote a Python-Ideas post in the past about including something in-between json and pickle on the power spectrum, it didn't go over well, but I figured I'd mention this point again anyway. Sorry. -- Devin From stephen at xemacs.org Mon Jun 3 02:59:57 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 03 Jun 2013 09:59:57 +0900 Subject: [Python-ideas] Stdlib YAML evolution (Was: PEP 426, YAML in the stdlib and implementation discovery) In-Reply-To: <1702023F-0AAB-4B6A-8FEB-D026693F9431@gnosis.cx> References: <1702023F-0AAB-4B6A-8FEB-D026693F9431@gnosis.cx> Message-ID: <87ip1wau0i.fsf@uwakimon.sk.tsukuba.ac.jp> David Mertz writes: > I would definitely like to have a YAML library--even one with > restricted function--in the standard library. Different use cases (and users) will stick at different restrictions. This would be endlessly debatable. I think the only restriction that really makes sense is the load vs. load_unsafe restriction (and that should be a user decision; the "unsafe" features should be available to users who want them). From ncoghlan at gmail.com Mon Jun 3 03:05:36 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 3 Jun 2013 11:05:36 +1000 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: On Mon, Jun 3, 2013 at 10:16 AM, Devin Jeanpierre wrote: > On Sat, Jun 1, 2013 at 10:04 AM, Nick Coghlan wrote: >> YAML's complexity is the reason I prefer JSON as a data interchange >> format, but I still believe YAML fills a useful niche for more complex >> configuration files where .ini syntax is too limited and JSON is too >> hard to edit by hand. > > YAML, unlike JSON, is able to represent non-tree data (it can store > any reference graph). At this point the only stdlib module that can do > that is pickle, which would ordinarily precluded by security concerns. > So adding YAML is excellent even as a data interchange format in the > stdlib -- there is no stdlib module that occupies this particular > intersection of functionality (nominal security and object graphs). Note that while JSON itself doesn't handle arbitrary reference graphs, people actually *store* complex graph structures in JSON all the time. It's just that there are no standard syntax/semantics for doing so (logging.dictConfig, for example, uses context dependent named references, while jsonschema uses $ref fields). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From dholth at gmail.com Mon Jun 3 04:11:23 2013 From: dholth at gmail.com (Daniel Holth) Date: Sun, 2 Jun 2013 22:11:23 -0400 Subject: [Python-ideas] iterator/stream-based JSON API Message-ID: When will the stdlib get a decent iterator/stream-based JSON API? For example automated packaging tools may be parsing a lot of JSON but ignoring most of it, and it would be lovely to say "if key not in interesting: skip_without_parsing()". Or to go straight from parse to domain objects without putting the whole thing in an intermediate dict. http://code.google.com/p/jsonpull/ is a nice looking but woefully undocumented Java one, and https://crate.io/packages/ijson/ is one with a much different API that is for Python. From abarnert at yahoo.com Mon Jun 3 04:53:58 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 2 Jun 2013 19:53:58 -0700 (PDT) Subject: [Python-ideas] iterator/stream-based JSON API In-Reply-To: References: Message-ID: <1370228038.57427.YahooMailNeo@web184705.mail.ne1.yahoo.com> From: Daniel Holth Sent: Sunday, June 2, 2013 7:11 PM > When will the stdlib get a decent iterator/stream-based JSON API? For > example automated packaging tools may be parsing a lot of JSON but > ignoring most of it, and it would be lovely to say "if key not in > interesting: skip_without_parsing()".? This sounds very interesting, but I'm not sure I'm thinking of the same thing you are. First, are you looking for a SAX-style API? Or something more like fts (http://man7.org/linux/man-pages/man3/fts.3.html) but with a nicer (iterator-plus-methods) syntax??Or??can you just post a complete simple example of what you'd like it to look like? >?Or to go straight from parse to> domain objects without putting the whole thing in an intermediate > dict. What does this part mean? You mean you want an opaque object with a DOM API, or an XPath-style accessor, instead of a native Python object? If so, why??Do you think having a dict hidden inside an opaque object, or using some other hash table implementation, is going to save space over just having a dict? Or would you prefer to write?doc.find_element("my_key").find_element("my_other_key").find_element(3) or doc.find('//my_key/my_other_key[3]") instead of?doc["my_key"]["my_other_key"][3]?? From guido at python.org Mon Jun 3 05:14:57 2013 From: guido at python.org (Guido van Rossum) Date: Sun, 2 Jun 2013 20:14:57 -0700 Subject: [Python-ideas] Loaded questions Message-ID: On Sun, Jun 2, 2013 at 7:11 PM, Daniel Holth wrote: > When will the stdlib get a decent iterator/stream-based JSON API? As a reminder, a loaded question is not a good way to suggest a new feature. If you want a feature, great. If you don't feel you have the skills to implement it, you can still play*. But if you want a feature, please say something like "I think it would be cool if the stdlib had X" and we can discuss whether we all feel the same way. To most people, "when will the stdlib get a decent X" reads as either implying "the stdlib has a lousy X" or "the stdlib is stupid (or whatever negative adjective) for not having X." And many of the folks here who together have made the stdlib what it is and never missed X will feel offended. Do you think it is a good thing to start a request by offending the person you are asking? (Don't answer. :-) It's possible you didn't mean this at all. Likely, even. So next time, think about how the words you use might be interpreted by others, and leave your loaded questions at home. Please. :-) __________ * Though if nobody can be found who wants a feature badly enough to implement it, it won't happen, and you shouldn't complain about it -- we're all volunteers here, and nobody has enough time to implement every desirable feature. -- --Guido van Rossum (python.org/~guido) From raymond.hettinger at gmail.com Mon Jun 3 06:14:35 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 2 Jun 2013 21:14:35 -0700 Subject: [Python-ideas] iterator/stream-based JSON API In-Reply-To: References: Message-ID: <9173B4CE-24F0-4BA0-B018-215F171684A7@gmail.com> On Jun 2, 2013, at 7:11 PM, Daniel Holth wrote: > When will the stdlib get a decent iterator/stream-based JSON API? You could probably write one right now using the object hooks, but I'm not sure it would be useful. If JSON data were too big too fit into memory (measured in gigabytes), it is going to have a host of other issues. Also, it might not work well with JSON where the outermost object is a dictionary in arbitrary order, meaning that one would potentially have to read the whole stream to find the first key. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jun 3 11:12:06 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 3 Jun 2013 11:12:06 +0200 Subject: [Python-ideas] iterator/stream-based JSON API References: Message-ID: <20130603111206.7ed511dc@fsol> On Sun, 2 Jun 2013 22:11:23 -0400 Daniel Holth wrote: > When will the stdlib get a decent iterator/stream-based JSON API? For > example automated packaging tools may be parsing a lot of JSON but > ignoring most of it, and it would be lovely to say "if key not in > interesting: skip_without_parsing()". Do you think that would make a significant difference? I'm not sure what "parsing a lot of JSON" means, but I suppose packaging metadata is usually quite small. Regards Antoine. From luca.sbardella at gmail.com Mon Jun 3 12:51:55 2013 From: luca.sbardella at gmail.com (Luca Sbardella) Date: Mon, 3 Jun 2013 03:51:55 -0700 (PDT) Subject: [Python-ideas] ThreadPool worker function Message-ID: <9e99caba-e229-4fb2-886b-6a3460160964@googlegroups.com> While using the ThreadPool in the multiprocessing.pool module, I needed to use a different implementation for the *worker* function. Since there is no way to overwrite it directly, I'm overwriting the Process attribute with a method: def myworker(...): ... class ThreadPool(pool.ThreadPool): def Process(self, target=None, **kwargs): return Thread(target=myworker, **kwargs) which is good enough, works for 2.6 and up. On the other hand, *worker* could be defined as a class attribute in the same way as the Process. Something like this: def default_worker(...): ... class Pool(object): Process = Process worker = default_worker ... Less verbose and less of a hack IMO. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Mon Jun 3 14:37:05 2013 From: dholth at gmail.com (Daniel Holth) Date: Mon, 3 Jun 2013 08:37:05 -0400 Subject: [Python-ideas] iterator/stream-based JSON API In-Reply-To: <20130603111206.7ed511dc@fsol> References: <20130603111206.7ed511dc@fsol> Message-ID: On Mon, Jun 3, 2013 at 5:12 AM, Antoine Pitrou wrote: > On Sun, 2 Jun 2013 22:11:23 -0400 > Daniel Holth wrote: >> When will the stdlib get a decent iterator/stream-based JSON API? For >> example automated packaging tools may be parsing a lot of JSON but >> ignoring most of it, and it would be lovely to say "if key not in >> interesting: skip_without_parsing()". No offense meant. The existing JSON API is quite good. > Do you think that would make a significant difference? I'm not sure > what "parsing a lot of JSON" means, but I suppose packaging metadata is > usually quite small. I don't know whether it would matter for packaging but it would be very useful sometimes. The jsonpull API looks like: http://code.google.com/p/jsonpull/source/browse/trunk/Example.java A bit like: parser = Json(text) parser.eat('{') # expect an object for element in parser.objectElements(): parser.eat(Json.KEY) key = parser.getString() if key == "name": name = parser.getStringValue() elif key == "contact": You can ask it what the next token is, seek ahead (never behind) to a named key in an object, or iterate over all the keys in an object without necessarily iterating over child objects. Once you get to an interesting sub-object you can get an iterator for that sub-object and perhaps pass it to a child constructor. From dholth at gmail.com Mon Jun 3 15:01:43 2013 From: dholth at gmail.com (Daniel Holth) Date: Mon, 3 Jun 2013 09:01:43 -0400 Subject: [Python-ideas] iterator/stream-based JSON API In-Reply-To: References: <20130603111206.7ed511dc@fsol> Message-ID: On Mon, Jun 3, 2013 at 8:37 AM, Daniel Holth wrote: > On Mon, Jun 3, 2013 at 5:12 AM, Antoine Pitrou wrote: >> On Sun, 2 Jun 2013 22:11:23 -0400 >> Daniel Holth wrote: >>> When will the stdlib get a decent iterator/stream-based JSON API? For >>> example automated packaging tools may be parsing a lot of JSON but >>> ignoring most of it, and it would be lovely to say "if key not in >>> interesting: skip_without_parsing()". > > No offense meant. The existing JSON API is quite good. > >> Do you think that would make a significant difference? I'm not sure >> what "parsing a lot of JSON" means, but I suppose packaging metadata is >> usually quite small. > > I don't know whether it would matter for packaging but it would be > very useful sometimes. > > The jsonpull API looks like: > http://code.google.com/p/jsonpull/source/browse/trunk/Example.java > > A bit like: > > parser = Json(text) > parser.eat('{') # expect an object > for element in parser.objectElements(): > parser.eat(Json.KEY) > key = parser.getString() > if key == "name": > name = parser.getStringValue() > elif key == "contact": > > > You can ask it what the next token is, seek ahead (never behind) to a > named key in an object, or iterate over all the keys in an object > without necessarily iterating over child objects. Once you get to an > interesting sub-object you can get an iterator for that sub-object and > perhaps pass it to a child constructor. The ijson API yields a stream of events containing the full path to each item in the parsed JSON, an event name like "start_map", "end_map", "start_array", ... list(ijson.parse(StringIO.StringIO("""{ "a": { "b": "c" } }"""))) [('', 'start_map', None), ('', 'map_key', 'a'), ('a', 'start_map', None), ('a', 'map_key', 'b'), ('a.b', 'string', u'c'), ('a', 'end_map', None), ('', 'end_map', None)] It also has a higher-level API yielding only the objects under a certain prefix. Pass "a.b" and you would get only "c". Besides memory this kind of thing makes it much easier to know which level of the JSON structure you are in compared to the existing object_pairs hook. I kindof like the pull API because you can "choose your own adventure", deciding whether to do higher or lower level parsing depending on where in the JSON you are. But you could easily get lost and do things that aren't permitted based on the parser state. Daniel Holth From robert.kern at gmail.com Mon Jun 3 15:14:16 2013 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 03 Jun 2013 14:14:16 +0100 Subject: [Python-ideas] iterator/stream-based JSON API In-Reply-To: References: <20130603111206.7ed511dc@fsol> Message-ID: On 2013-06-03 13:37, Daniel Holth wrote: > On Mon, Jun 3, 2013 at 5:12 AM, Antoine Pitrou wrote: >> On Sun, 2 Jun 2013 22:11:23 -0400 >> Daniel Holth wrote: >>> When will the stdlib get a decent iterator/stream-based JSON API? For >>> example automated packaging tools may be parsing a lot of JSON but >>> ignoring most of it, and it would be lovely to say "if key not in >>> interesting: skip_without_parsing()". > > No offense meant. The existing JSON API is quite good. > >> Do you think that would make a significant difference? I'm not sure >> what "parsing a lot of JSON" means, but I suppose packaging metadata is >> usually quite small. > > I don't know whether it would matter for packaging but it would be > very useful sometimes. > > The jsonpull API looks like: > http://code.google.com/p/jsonpull/source/browse/trunk/Example.java > > A bit like: > > parser = Json(text) > parser.eat('{') # expect an object > for element in parser.objectElements(): > parser.eat(Json.KEY) > key = parser.getString() > if key == "name": > name = parser.getStringValue() > elif key == "contact": > > > You can ask it what the next token is, seek ahead (never behind) to a > named key in an object, or iterate over all the keys in an object > without necessarily iterating over child objects. Once you get to an > interesting sub-object you can get an iterator for that sub-object and > perhaps pass it to a child constructor. Unless if your JSON file has dict values of, say, megabytes in size, I doubt that writing such code is going to be much more efficient than just building the whole dict and ignoring the keys that you don't want. I suspect that most of the use cases could be satisfied by being able to either whitelist or blacklist top-level keys. This would be a relatively simple modification to _json.c, I think, if you wanted to pursue it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ron3200 at gmail.com Mon Jun 3 15:44:26 2013 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 03 Jun 2013 08:44:26 -0500 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: References: <51AAD844.6080705@pearwood.info> <51AB5E0D.8080201@gmail.com> Message-ID: <51AC9DBA.8000908@gmail.com> On 06/02/2013 10:20 AM, Masklinn wrote: > That's a profile. Note that you can add both timing and profiling with > low syntactic overhead (via function decorators) through the > profilehooks package (available on pypi): profilehooks.profile will do > profiling runs on the function it decorates when that function is run, > profilehook.timecall will just print the function's runtime. Both can be > either immediate (print whatever's been collected for a call at the end > of the call) or not immediate (print a summary of all calls when the > program terminates): Thanks, I'll definitely find this useful. Seems like it would make a nice addition to the library. >>> import profilehooks >>> @profilehooks.timecall ... def immediate(n): ... list(xrange(n)) ... >>> immediate(40000) immediate (:1): 0.002 seconds It would be nice if the output included the function arguments, if they are not too long. When running a script that information isn't as obvious as it is on a command line. Cheers, Ron From flying-sheep at web.de Mon Jun 3 16:56:06 2013 From: flying-sheep at web.de (Philipp A.) Date: Mon, 3 Jun 2013 16:56:06 +0200 Subject: [Python-ideas] Stdlib YAML evolution (Was: PEP 426, YAML in the stdlib and implementation discovery) In-Reply-To: <87ip1wau0i.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1702023F-0AAB-4B6A-8FEB-D026693F9431@gnosis.cx> <87ip1wau0i.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: > > it is PyYAML based, which is not an option for now as I see it. > can you please elaborate on why this is the case? did Kirill Simonov say ?no?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jun 3 17:43:04 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 3 Jun 2013 17:43:04 +0200 Subject: [Python-ideas] iterator/stream-based JSON API References: <20130603111206.7ed511dc@fsol> Message-ID: <20130603174304.2765409e@fsol> On Mon, 3 Jun 2013 09:01:43 -0400 Daniel Holth wrote: > > The ijson API yields a stream of events containing the full path to > each item in the parsed JSON, an event name like "start_map", > "end_map", "start_array", ... > > list(ijson.parse(StringIO.StringIO("""{ "a": { "b": "c" } }"""))) > > [('', 'start_map', None), > ('', 'map_key', 'a'), > ('a', 'start_map', None), > ('a', 'map_key', 'b'), > ('a.b', 'string', u'c'), > ('a', 'end_map', None), > ('', 'end_map', None)] > > It also has a higher-level API yielding only the objects under a > certain prefix. Pass "a.b" and you would get only "c". > > Besides memory this kind of thing makes it much easier to know which > level of the JSON structure you are in compared to the existing > object_pairs hook. But did you encounter a use case where the existing API didn't fit the bill? Regards Antoine. From dholth at gmail.com Mon Jun 3 17:58:21 2013 From: dholth at gmail.com (Daniel Holth) Date: Mon, 3 Jun 2013 11:58:21 -0400 Subject: [Python-ideas] iterator/stream-based JSON API In-Reply-To: <20130603174304.2765409e@fsol> References: <20130603111206.7ed511dc@fsol> <20130603174304.2765409e@fsol> Message-ID: On Mon, Jun 3, 2013 at 11:43 AM, Antoine Pitrou wrote: > On Mon, 3 Jun 2013 09:01:43 -0400 > Daniel Holth wrote: >> >> The ijson API yields a stream of events containing the full path to >> each item in the parsed JSON, an event name like "start_map", >> "end_map", "start_array", ... >> >> list(ijson.parse(StringIO.StringIO("""{ "a": { "b": "c" } }"""))) >> >> [('', 'start_map', None), >> ('', 'map_key', 'a'), >> ('a', 'start_map', None), >> ('a', 'map_key', 'b'), >> ('a.b', 'string', u'c'), >> ('a', 'end_map', None), >> ('', 'end_map', None)] >> >> It also has a higher-level API yielding only the objects under a >> certain prefix. Pass "a.b" and you would get only "c". >> >> Besides memory this kind of thing makes it much easier to know which >> level of the JSON structure you are in compared to the existing >> object_pairs hook. > > But did you encounter a use case where the existing API didn't fit the > bill? Sometimes it's nice to have a stream-based API; when your memory is very small, your JSON is very large, or your JSON may be very large, or the JSON is being streamed to you little by little and you want to parse part of it, or just don't want to wait for that closing }. It's just a different way to parse than the current all-at-once option. From stefan at drees.name Mon Jun 3 18:16:24 2013 From: stefan at drees.name (Stefan Drees) Date: Mon, 03 Jun 2013 18:16:24 +0200 Subject: [Python-ideas] iterator/stream-based JSON API In-Reply-To: References: <20130603111206.7ed511dc@fsol> <20130603174304.2765409e@fsol> Message-ID: <51ACC158.3020807@drees.name> On 03.06.13 17:58, Daniel Holth wrote: > On Mon, Jun 3, 2013 at 11:43 AM, Antoine Pitrou wrote: >> On Mon, 3 Jun 2013 09:01:43 -0400 >> Daniel Holth wrote: >>> >>> The ijson API yields a stream of events containing the full path to >>> each item in the parsed JSON, an event name like "start_map", >>> "end_map", "start_array", ... >>> >>> list(ijson.parse(StringIO.StringIO("""{ "a": { "b": "c" } }"""))) >>> >>> [('', 'start_map', None), >>> ('', 'map_key', 'a'), >>> ('a', 'start_map', None), >>> ('a', 'map_key', 'b'), >>> ('a.b', 'string', u'c'), >>> ('a', 'end_map', None), >>> ('', 'end_map', None)] >>> >>> It also has a higher-level API yielding only the objects under a >>> certain prefix. Pass "a.b" and you would get only "c". >>> >>> Besides memory this kind of thing makes it much easier to know which >>> level of the JSON structure you are in compared to the existing >>> object_pairs hook. >> >> But did you encounter a use case where the existing API didn't fit the >> bill? > > Sometimes it's nice to have a stream-based API; when your memory is > very small, your JSON is very large, or your JSON may be very large, > or the JSON is being streamed to you little by little and you want to > parse part of it, or just don't want to wait for that closing }. It's > just a different way to parse than the current all-at-once option. I share this experience but would like to add: The JSON RFC (like standard python dicts) makes no ordering assumption on keys but OTOH serialized data must be somehow streamable, i.e. even with keys per application protocol predefined and meaningfully ordered. Per RFC you have to collect all keys of a certain level. Upon encountering the matching closing curly brace you may finally inspect them, where in real life(tm) you often look for an early out (ingredients a) missing or b) all mandatory present) to accelerate node processing. So I guess the usefullness of streaming solutions based on JSON really also depends on the additional artifact, that serializer and deserializer must share an out-of band convention. Stefan. From vinay_sajip at yahoo.co.uk Mon Jun 3 20:58:51 2013 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Mon, 3 Jun 2013 18:58:51 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?PEP_426=2C=09YAML_in_the_stdlib_and_impl?= =?utf-8?q?ementation_discovery?= References: Message-ID: Nick Coghlan writes: > Note that while JSON itself doesn't handle arbitrary reference graphs, > people actually *store* complex graph structures in JSON all the time. > It's just that there are no standard syntax/semantics for doing so > (logging.dictConfig, for example, uses context dependent named > references, while jsonschema uses $ref fields). A more generalised version of the dictConfig approach, using JSON-serialisable dictionaries to describe more complex object graphs (including cross-references), is described here: http://pymolurus.blogspot.co.uk/2013/04/using-dictionaries-to-configure-objects.html Regards, Vinay Sajip From ronaldoussoren at mac.com Tue Jun 4 12:00:16 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 4 Jun 2013 12:00:16 +0200 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: <1370049518.38392.YahooMailNeo@web184706.mail.ne1.yahoo.com> References: <1370025834.12159.YahooMailNeo@web184706.mail.ne1.yahoo.com> <8ED3592D-5613-4412-B792-4CF9D170A722@masklinn.net> <1370049518.38392.YahooMailNeo@web184706.mail.ne1.yahoo.com> Message-ID: On 1 Jun, 2013, at 3:18, Andrew Barnert wrote: > From: Philipp A. > Sent: Friday, May 31, 2013 12:39 PM > > >> and that?s where my idea?s ?strict API? comes into play: compatible implementations would *have to* pass a test suite and implement a certain API and comply with the standard. >> >> unsure if and how to test the latter (surely running a testsuite when something wants to register isn?t practical ? or is it?) > > > After sleeping on this question, I'm not sure the best-implementation wrapper really needs to be that dynamic. There are only so many YAML libraries out there, and the set doesn't change that rapidly. (And the same is true for dbm, ElementTree, etc.) > > So, maybe we just put a static list in yaml, like the one in dbm (http://hg.python.org/cpython/file/3.3/Lib/dbm/__init__.py#l41): _names = ['pyyaml', 'yaml.yaml'] If a new implementation reaches maturity, it goes through some process TBD, and in 3.5.2, we change that one line to _names = ['pyyaml', 'yayaml', 'yaml.yaml']. And that's how you "register" a new implementation. > > The only real downside I can see is that some people might stick to 3.5.1 for a long time after 3.5.2 is released (maybe because it comes pre-installed with OS X 10.9 or RHEL 7.2 ESR or something), but still want to accept yayaml. If that's really an issue, someone could put an "anyyaml" backport project on PyPI that offered the latest registered name list to older versions of Python (maybe even including 2.7). > > Is that good enough? Please don't. I have not particular opionion on adding YAML support to the stdlib, but if support is added it should be usuable on its own and users shouldn't have to rely on 3th-party libraries for serious use. That is, the stdlib version should be complete and fast enough (for some definition of fast enough). Having the stdlib on "random" 3th-party libraries is IMHO code smell and makes debugging issues harder (why does my script that only uses the stdlib work on machine 1 but not on machine 2... oops, one of the machines has some 3ht party yaml implementation that hides a bug in the stdlib even though I don't use it explicitly). BTW. That doesn't mean the stdlib version should contain as much features as possible. Compare this with XML parsing: the xml.etree implementation is quite usable on its own, but sometimes you need more advanced XML features and then you can use lxml which has a simular API but a lot more features. Ronald > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From ironfroggy at gmail.com Tue Jun 4 13:26:40 2013 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 4 Jun 2013 07:26:40 -0400 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: At this point I would argue against any new modules by default. How does the cost and liability of yaml in the standard library make up for such a ring benefit as excluding one little line from my requirements.txt? On May 31, 2013 12:46 PM, "Philipp A." wrote: > Hi, reading PEP 426, > I made a connection to a (IMHO) longstanding issue: YAML not being in the > stdlib. > > I?m no big fan of JSON, because it?s so strict and comparatively verbose > compared with YAML. I just think YAML is more pythonic, and a better choice > for any kind of human-written data format. > > So i devised 3 ideas: > > 1. *YAML in the stdlib* > The stdlib shouldn?t get more C code; that?s what I?ve gathered. > So let?s put a pure-python implementation of YAML into the stdlib. > Let?s also strictly define the API and make it secure-by-naming?. What > i mean is let?s use the safe load function that doesn?t instantiate > user-defined classes (in PyYAML called ?safe_load?) as default load > function ?load?, and call the unsafe one by a longer, explicit name (e.g. > ?unsafe_load? or ?extended_load? or something) > Let?s base the parser on generators, since generators are cool, easy > to debug, and allow us to emit and test the token stream (other than e.g. > the HTML parser we have) > 2. *Implementation discovery* > People want fast parsing. That?s incompatible with a pure python > implementation. > So let?s define (or use, if there is one I?m not aware of) a discovery > mechanism that allows implementations of certain APIs to register > themselves as such. > Let ?import yaml? use this mechanism to import a compatible 3rd party > implementation in preference to the stdlib one > Let?s define a property of the implementation that tells the user > which implementation he?s using, and a way to select a specific > implementation (Although that?s probably easily done by just not doing > ?import yaml?, but ?import std_yaml? or ?import pyyaml2?) > 3. Allow YAML to be used besides JSON as metadata like in PEP 426. (so > including either pymeta.yaml or pymeta.json makes a valid package) > I don?t propose that we exclusively use YAML, but only because I think > that PEP 426 shouldn?t be hindered from being implemented ASAP by waiting > for a new std-library to be ready. > > What do you think? > > Is there a reason for not including a YAML lib that i didn?t cover? > Is there a reason JSON is used other than YAML not being in the stdlib? > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ironfroggy at gmail.com Tue Jun 4 14:38:46 2013 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 4 Jun 2013 08:38:46 -0400 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: Right, "This specific idea isn't worth its specific costs in exchange for its specific benefits" is the same as "lets give up!" On Tue, Jun 4, 2013 at 7:29 AM, Philipp A. wrote: > 2013/6/4 Calvin Spealman > >> At this point I would argue against any new modules by default. How does >> the cost and liability of yaml in the standard library make up for such a >> ring benefit as excluding one little line from my requirements.txt? >> > how about we stop innovating completely and just give up python? there?s > so many languages out there, i can?t see the cost and liability of > maintaining python outweigh the small benefit of not using java. > > in other words: are you serious? > -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy -------------- next part -------------- An HTML attachment was scrubbed... URL: From ironfroggy at gmail.com Tue Jun 4 14:42:23 2013 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 4 Jun 2013 08:42:23 -0400 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: On Tue, Jun 4, 2013 at 8:38 AM, Paul Moore wrote: > On 4 June 2013 12:26, Calvin Spealman wrote: > >> At this point I would argue against any new modules by default. How does >> the cost and liability of yaml in the standard library make up for such a >> ring benefit as excluding one little line from my requirements.txt? > > > Windows users without compilers can't "just add one little line". IIRC, > Windows users *with* compilers have to fiddle around with dependent > libraries in a messy way. Users without compilers working on prerelease > versions of Python can't find prebuilt binaries anyway. Corporate > environments may not allow 3rd party libraries. Some environments require > pure-python (non-stdlib) code. Etc, etc. > The proposal was about adding, explicitly, only a pure-python YAML library, so the compiler and binary related arguments don't apply here. Also, the fact that we don't have practical packaging solutions to this is something that, hopefully, we'll fix and so I think, until then, adding new modules should be scrutinized with the question: Are we just doing this to get around packaging limitations? New modules have a pretty high bar to adoption in any case. Let's not make > it impossibly high by assuming (wrongly) that there are no difficulties > involved in external dependencies. > > Paul. > -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Tue Jun 4 14:56:57 2013 From: dholth at gmail.com (Daniel Holth) Date: Tue, 4 Jun 2013 08:56:57 -0400 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: Everyone argues against new standard library modules by default. YAML is the kind of thing that might make sense. It is a generally useful serialization format that fills a gap between ini, json, and xml, and it is a very stationary target. There seems to be some doubt about whether any of the existing libraries fit the bill. From p.f.moore at gmail.com Tue Jun 4 15:14:10 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 4 Jun 2013 14:14:10 +0100 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: On 4 June 2013 13:56, Daniel Holth wrote: > Everyone argues against new standard library modules by default. > > YAML is the kind of thing that might make sense. It is a generally > useful serialization format that fills a gap between ini, json, and > xml, and it is a very stationary target. > Agreed. I would definitely use YAML if it were in the stdlib. Because it's an external dependency, I tend to end up using ini format, and hacking round its limitations in at least some cases. There seems to be some doubt about whether any of the existing > libraries fit the bill. Agreed, but adoption of an existing library (presumably PyYAML as it's the only mature one I'm aware of) is the only sensible option, surely? Writing something brand new for the stdlib - or worse still, writing something *partial* and brand new - can't be a sensible option. (And I'd consider a pure-python implementation "partial" in that sense, I see no reason to have a version that is deliberately slower than the competition in the stdlib). Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jun 4 15:22:13 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 4 Jun 2013 23:22:13 +1000 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: On Tue, Jun 4, 2013 at 10:56 PM, Daniel Holth wrote: > Everyone argues against new standard library modules by default. > > YAML is the kind of thing that might make sense. It is a generally > useful serialization format that fills a gap between ini, json, and > xml, and it is a very stationary target. > > There seems to be some doubt about whether any of the existing > libraries fit the bill. +1 It makes sense to discuss having YAML in the standard library, because constrained variants of it can be an excellent user facing configuration file format that's easier to write than JSON or XML and more powerful than ini files. For purely programmatic data interchange, I'd still favour JSON or XML, but once humans get involved, YAML is a good option to have availble. While we are working hard to make the software distribution system easier to use (especially out of the box), that's still a far cry from educators and others being able to *assume* YAML is present whenever Python is. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Jun 4 15:33:21 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 4 Jun 2013 23:33:21 +1000 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: On Tue, Jun 4, 2013 at 11:14 PM, Paul Moore wrote: > Agreed, but adoption of an existing library (presumably PyYAML as it's the > only mature one I'm aware of) is the only sensible option, surely? Writing > something brand new for the stdlib - or worse still, writing something > *partial* and brand new - can't be a sensible option. (And I'd consider a > pure-python implementation "partial" in that sense, I see no reason to have > a version that is deliberately slower than the competition in the stdlib). Technically two of the recent additions (ipaddress and enum) were at least redesigns and arguably fairly substantial rewrites of their PyPI inspirations (ipaddr and flufl.enum respectively). This "like that existing library, but..." model can be useful at times, as the standard library often needs to take into consideration things that third party library developers don't need to worry about. In particular, third party library authors can usually assume that their users are already familiar with the domain that the library covers. By contrast, for the standard library we typically have to assume that our APIs will be used by people that need to *learn* the domain specific concepts. This was fairly stark with the ipaddr->ipaddress changes, where ipaddr played fast and loose with networking terminology in a way that anyone familiar with IP based networking could deal with easily, but was an absolute nightmare if you were trying to use the API to learn IP addressing concepts in the first place. So we ended up keeping the algorithmic guts of ipaddr (all the stuff related to RFC compliance) and changing the API around to more accurately reflect the formal terminology, as well as greatly improving the error messages produced for the inevitable mistakes that users will make (especially when entering IPv6 addresses). With YAML, we *may* end up with a similar situation where an existing library is *almost* suitable, but needs some API tweaks to be suitable for standard library inclusion, while preserving the core parsing and generation engine. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From flying-sheep at web.de Tue Jun 4 16:07:17 2013 From: flying-sheep at web.de (Philipp A.) Date: Tue, 4 Jun 2013 16:07:17 +0200 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: i think we should ask the main author of PyYAML if we could put a modified version of it into the stdlib. UNfortunately i don?t know how to contact him, since neither on github nor on bitbucketseems to be his mail address, and he doesn?t seem to be active enough there that abitbucket message makes sense. PyYAML might not implement YAML 1.2 fully on paper, but the most useful part of 1.2 (parsing arbitrary JSON) works flawlessly. If we may use it, we should tweak it to suit our to-be-specified API: i think we all agree after the ruby debacle that calling plain load() should be the safe variant. also, since YAML has the concept of document streams, i?d propose the addition of load_iter() which yields documents as soon as they have finished arriving. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at drees.name Tue Jun 4 16:25:21 2013 From: stefan at drees.name (Stefan Drees) Date: Tue, 04 Jun 2013 16:25:21 +0200 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: <51ADF8D1.90400@drees.name> Am 04.06.13 16:07, schrieb Philipp A.: > i think we should ask the main author of PyYAML if we could put a > modified version of it into the stdlib. UNfortunately i don?t know how > to contact him, since neither on github > nor on bitbucket seems to be his mail > address, and he doesn?t seem to be active enough there that abitbucket > message makes sense. > > PyYAML might not implement YAML 1.2 fully on paper, but the most useful > part of 1.2 (parsing arbitrary JSON) works flawlessly. > > If we may use it, we should tweak it to suit our to-be-specified API: i > think we all agree after the ruby debacle that calling plain load() > should be the safe variant. also, since YAML has the concept of document > streams, i?d propose the addition of load_iter() which yields documents > as soon as they have finished arriving. ... I wrote to the yaml-core mailinglist as this seems to be the best (and only suggested) way of contacting Kirill I could come up with: """ Dear Yaml-Core subscribers, I am writing this mail to reach out to Kirill Simonov as the autohor of PyYAML. This is as adviced at the current state of http://pyyaml.org/wiki/PyYAML under section "Development and bug reports" the best I could come up with. As there is an interesting discussion (IMO) going on at python-ideas finally opening the gates for first class YAML support in the python stdlib I am sure, all would benefit from Kirill and others related to PyYAML (https://pypi.python.org/pypi/PyYAML/3.10) to step up and contribute their thoughts and hopefully suggest some ideas, about a featureset and a roadmap based on the current PyYAML-State. Good idea? Not so good idea? Please tell me what you think. """ All the best, Stefan. From eric at trueblade.com Tue Jun 4 16:41:12 2013 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 04 Jun 2013 10:41:12 -0400 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: <1370025834.12159.YahooMailNeo@web184706.mail.ne1.yahoo.com> <8ED3592D-5613-4412-B792-4CF9D170A722@masklinn.net> <1370049518.38392.YahooMailNeo@web184706.mail.ne1.yahoo.com> Message-ID: <51ADFC88.8010409@trueblade.com> On 06/04/2013 06:00 AM, Ronald Oussoren wrote: > > On 1 Jun, 2013, at 3:18, Andrew Barnert wrote: > >> From: Philipp A. >> Sent: Friday, May 31, 2013 12:39 PM >> >> >>> and that?s where my idea?s ?strict API? comes into play: compatible implementations would *have to* pass a test suite and implement a certain API and comply with the standard. >>> >>> unsure if and how to test the latter (surely running a testsuite when something wants to register isn?t practical ? or is it?) >> >> >> After sleeping on this question, I'm not sure the best-implementation wrapper really needs to be that dynamic. There are only so many YAML libraries out there, and the set doesn't change that rapidly. (And the same is true for dbm, ElementTree, etc.) >> >> So, maybe we just put a static list in yaml, like the one in dbm (http://hg.python.org/cpython/file/3.3/Lib/dbm/__init__.py#l41): _names = ['pyyaml', 'yaml.yaml'] If a new implementation reaches maturity, it goes through some process TBD, and in 3.5.2, we change that one line to _names = ['pyyaml', 'yayaml', 'yaml.yaml']. And that's how you "register" a new implementation. >> >> The only real downside I can see is that some people might stick to 3.5.1 for a long time after 3.5.2 is released (maybe because it comes pre-installed with OS X 10.9 or RHEL 7.2 ESR or something), but still want to accept yayaml. If that's really an issue, someone could put an "anyyaml" backport project on PyPI that offered the latest registered name list to older versions of Python (maybe even including 2.7). >> >> Is that good enough? > > Please don't. I have not particular opionion on adding YAML support to the stdlib, but if support is added it should be usuable on its own and users shouldn't have to rely on 3th-party libraries for serious use. That is, the stdlib version should be complete and fast enough (for some definition of fast enough). > > Having the stdlib on "random" 3th-party libraries is IMHO code smell and makes debugging issues harder (why does my script that only uses the stdlib work on machine 1 but not on machine 2... oops, one of the machines has some 3ht party yaml implementation that hides a bug in the stdlib even though I don't use it explicitly). > > BTW. That doesn't mean the stdlib version should contain as much features as possible. Compare this with XML parsing: the xml.etree implementation is quite usable on its own, but sometimes you need more advanced XML features and then you can use lxml which has a simular API but a lot more features. I completely agree with Ronald here. I don't see the need for a complicated "yaml parser registry". If you want the one in the stdlib (if it ever exists), then use it. Otherwise, use a different one. As with XML parsing, this doesn't mean they all can't share an API. Eric. From guido at python.org Tue Jun 4 17:34:54 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 4 Jun 2013 08:34:54 -0700 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: On Tue, Jun 4, 2013 at 4:26 AM, Calvin Spealman wrote: > At this point I would argue against any new modules by default. How does the > cost and liability of yaml in the standard library make up for such a ring > benefit as excluding one little line from my requirements.txt? Disagree. Lots of users don't have the luxury of running in an environment where internet dependencies can be automatically installed. For many (probably most non-developer) users, there is a *huge* gap between "is in the stdlib" and "requires internet access to install". -- --Guido van Rossum (python.org/~guido) From abarnert at yahoo.com Tue Jun 4 18:17:05 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 4 Jun 2013 09:17:05 -0700 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: <931DE7C2-A2FE-4D59-B351-C46939B478D3@yahoo.com> On Jun 4, 2013, at 7:07, "Philipp A." wrote: > i think we should ask the main author of PyYAML if we could put a modified version of it into the stdlib. UNfortunately i don?t know how to contact him, since neither on github nor on bitbucket seems to be his mail address, and he doesn?t seem to be active enough there that abitbucket message makes sense. I wrote him an email at the address in the readme (xi at resolvent.net) and he responded almost immediately. I don't know where people got the impression that he's hard to track down. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jun 4 18:37:57 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 4 Jun 2013 09:37:57 -0700 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: On Jun 4, 2013, at 6:14, Paul Moore wrote: > Agreed, but adoption of an existing library (presumably PyYAML as it's the only mature one I'm aware of) is the only sensible option, surely? Writing something brand new for the stdlib - or worse still, writing something *partial* and brand new - can't be a sensible option. (And I'd consider a pure-python implementation "partial" in that sense, I see no reason to have a version that is deliberately slower than the competition in the stdlib). PyYAML has two implementations: a pure Python one, and a set of bindings around libyaml. Not having a dependency on an external C library may be a good reason to have a version that's slower than the competition. It's quite possible that the pure python implementation of PyYAML is already fast enough for the vast majority of use cases. Most people are talking about using it for config files. Even the massive, unwieldy config files used by some mail, web, and streaming servers are on the order of 10KB, not 10MB. Since libyaml is maintained by the same author as PyYAML, it might not be a problem to just incorporate it into the python source tree. But if there are other projects using it directly, forking it like that might be a problem. And I assume making Python require libyaml is not acceptable. Of course we could make libyaml an optional dependency at build time, and link it statically into _yaml.so, so it isn't a runtime dependency. Or make it a runtime dependency. Other options are to build a new C implementation based on the pure Python PyYAML or on the libyaml code, or to look for ways to optimize the pure Python code by implementing various helper pieces in C, instead of the whole thing. But first, let's make sure the problem actually needs to be solved. From amauryfa at gmail.com Tue Jun 4 20:05:47 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 4 Jun 2013 20:05:47 +0200 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: 2013/6/4 Andrew Barnert > Since libyaml is maintained by the same author as PyYAML, it might not be > a problem to just incorporate it into the python source tree. But if there > are other projects using it directly, forking it like that might be a > problem. > > And I assume making Python require libyaml is not acceptable. > > Of course we could make libyaml an optional dependency at build time, and > link it statically into _yaml.so, so it isn't a runtime dependency. Or make > it a runtime dependency. > There are already modules around libexpat and zlib, so this issue is already solved: on Unix-like systems, an installed libyaml is required, with C headers. on Windows, an "external" repository contains all the needed dependencies. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Tue Jun 4 20:34:53 2013 From: brett at python.org (Brett Cannon) Date: Tue, 4 Jun 2013 14:34:53 -0400 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list Message-ID: Titus and I have discussed it and we have decided to apply the PSF's Code of Conduct (http://hg.python.org/coc) to this mailing list. In a nutshell everyone should be open, respectful, and considerate to each other, i.e. don't be rude. If you think someone is not following the CoC, either publicly let them know you think they are not (obviously do this politely; publicly notifying someone is to help others on the list understand what may be considered inappropriate behaviour, not to chastise them in front of a crowd) or email python-ideas-owner@ privately with a link to the email in question from the archives and we will handle it w/o disclosing who brought forward the complaint. Anyone found repeatedly not following the CoC will be banned from the list. You can always ask to be allowed back on based on the discretion of the list owners. Because this is a new policy, no one is starting with any strikes against them. Everyone is assumed to have been polite and on good terms up to this point. In the 6.5 year history of this list there is possibly only one instance where this would have really come into play in removing someone from this list so this isn't expected to be used very often. But we felt we wanted something in place to remind folks that we expect all subscribers to be respectful of one another and to have something to use as a guideline for removals if it came to that. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vinay_sajip at yahoo.co.uk Tue Jun 4 21:57:21 2013 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Tue, 4 Jun 2013 19:57:21 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?PEP_426=2C=09YAML_in_the_stdlib_and_impl?= =?utf-8?q?ementation_discovery?= References: Message-ID: Philipp A. writes: > PyYAML might not implement YAML 1.2 fully on paper, but the most useful > part of 1.2 (parsing arbitrary JSON) works flawlessly. Does it? What about this issue? https://bitbucket.org/xi/pyyaml/issue/11/valid-json-not-being-loaded I also note that on the Trac issue tracker on PyYAML.org, there are 65 open issues of type "defect" relating to PyYAML, and 15 such relating to libyaml. I hope Kirill Simonov can indicate what the status of these issues is - they don't appear to have been moved over to the project's new home on BitBucket. Are they valid issues? Some of them appear to have been around for ~18 months. Some which relate to basic functionality (not odd corner cases) don't even have a response, so it's hard to assess whether or not they are valid and/or have been addressed. I do think there is a place for YAML support in the stdlib, but I'm concerned by the number of open issues relating to the PyYAML implementation, what functionality they relate to, how long they've been open for and the lack of clarity regarding their validity/status. PyYAML may be the most mature YAML implementation in Python, but surely that does not trump basic quality concerns? I hope Kirill Simonov can weigh in on this. Regards, Vinay Sajip From dreamingforward at gmail.com Tue Jun 4 22:49:47 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Tue, 4 Jun 2013 13:49:47 -0700 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: References: Message-ID: Guido called me "an embarrassment". Does that count? On 6/4/13, Brett Cannon wrote: > Titus and I have discussed it and we have decided to apply the PSF's Code > of Conduct (http://hg.python.org/coc) to this mailing list. In a nutshell > everyone should be open, respectful, and considerate to each other, i.e. > don't be rude. > > If you think someone is not following the CoC, either publicly let them > know you think they are not (obviously do this politely; publicly notifying > someone is to help others on the list understand what may be considered > inappropriate behaviour, not to chastise them in front of a crowd) or email > python-ideas-owner@ privately with a link to the email in question from the > archives and we will handle it w/o disclosing who brought forward the > complaint. > > Anyone found repeatedly not following the CoC will be banned from the list. > You can always ask to be allowed back on based on the discretion of the > list owners. > > Because this is a new policy, no one is starting with any strikes against > them. Everyone is assumed to have been polite and on good terms up to this > point. > > In the 6.5 year history of this list there is possibly only one instance > where this would have really come into play in removing someone from this > list so this isn't expected to be used very often. But we felt we wanted > something in place to remind folks that we expect all subscribers to be > respectful of one another and to have something to use as a guideline for > removals if it came to that. > -- MarkJ Tacoma, Washington From brett at python.org Tue Jun 4 23:05:43 2013 From: brett at python.org (Brett Cannon) Date: Tue, 4 Jun 2013 17:05:43 -0400 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: References: Message-ID: On Tue, Jun 4, 2013 at 4:49 PM, Mark Janssen wrote: > Guido called me "an embarrassment". Does that count? > I don't know the context so I can't really answer the question. -Brett > > On 6/4/13, Brett Cannon wrote: > > Titus and I have discussed it and we have decided to apply the PSF's Code > > of Conduct (http://hg.python.org/coc) to this mailing list. In a > nutshell > > everyone should be open, respectful, and considerate to each other, i.e. > > don't be rude. > > > > If you think someone is not following the CoC, either publicly let them > > know you think they are not (obviously do this politely; publicly > notifying > > someone is to help others on the list understand what may be considered > > inappropriate behaviour, not to chastise them in front of a crowd) or > email > > python-ideas-owner@ privately with a link to the email in question from > the > > archives and we will handle it w/o disclosing who brought forward the > > complaint. > > > > Anyone found repeatedly not following the CoC will be banned from the > list. > > You can always ask to be allowed back on based on the discretion of the > > list owners. > > > > Because this is a new policy, no one is starting with any strikes against > > them. Everyone is assumed to have been polite and on good terms up to > this > > point. > > > > In the 6.5 year history of this list there is possibly only one instance > > where this would have really come into play in removing someone from this > > list so this isn't expected to be used very often. But we felt we wanted > > something in place to remind folks that we expect all subscribers to be > > respectful of one another and to have something to use as a guideline for > > removals if it came to that. > > > > > -- > MarkJ > Tacoma, Washington > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ctb at msu.edu Wed Jun 5 00:37:26 2013 From: ctb at msu.edu (C. Titus Brown) Date: Tue, 4 Jun 2013 15:37:26 -0700 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: References: Message-ID: <20130604223726.GI15873@idyll.org> > On 6/4/13, Brett Cannon wrote: > > Because this is a new policy, no one is starting with any strikes against > > them. Everyone is assumed to have been polite and on good terms up to this > > point. On Tue, Jun 04, 2013 at 01:49:47PM -0700, Mark Janssen wrote: > Guido called me "an embarrassment". Does that count? Mark, please see the paragraph at the top from Brett, which seems rather unambiguous to me. We're starting with a clean slate. I think Brett laid out a decent procedure for inquiring about this if you want to ask this as a hypothetical. thanks, --titus -- C. Titus Brown, ctb at msu.edu From dreamingforward at gmail.com Wed Jun 5 01:17:07 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Tue, 4 Jun 2013 16:17:07 -0700 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: <20130604223726.GI15873@idyll.org> References: <20130604223726.GI15873@idyll.org> Message-ID: Sorry. As I responded in my private message to Brett, I was joking. --mark On 6/4/13, C. Titus Brown wrote: >> On 6/4/13, Brett Cannon wrote: >> > Because this is a new policy, no one is starting with any strikes >> > against >> > them. Everyone is assumed to have been polite and on good terms up to >> > this >> > point. > > On Tue, Jun 04, 2013 at 01:49:47PM -0700, Mark Janssen wrote: >> Guido called me "an embarrassment". Does that count? > > Mark, > > please see the paragraph at the top from Brett, which seems rather > unambiguous > to me. We're starting with a clean slate. I think Brett laid out a decent > procedure for inquiring about this if you want to ask this as a > hypothetical. > > thanks, > --titus > -- > C. Titus Brown, ctb at msu.edu > -- MarkJ Tacoma, Washington From steve at pearwood.info Wed Jun 5 01:20:49 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 05 Jun 2013 09:20:49 +1000 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: References: Message-ID: <51AE7651.9080507@pearwood.info> On 05/06/13 04:34, Brett Cannon wrote: > Titus and I have discussed it and we have decided to apply the PSF's Code > of Conduct (http://hg.python.org/coc) to this mailing list. In a nutshell > everyone should be open, respectful, and considerate to each other, i.e. > don't be rude. Excuse my ignorance, but who are you and Titus to make that decision? -- Steven From mertz at gnosis.cx Wed Jun 5 01:31:03 2013 From: mertz at gnosis.cx (David Mertz) Date: Tue, 4 Jun 2013 16:31:03 -0700 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: <51AE7651.9080507@pearwood.info> References: <51AE7651.9080507@pearwood.info> Message-ID: My understanding is that Brett and Titus are the list maintainers. On Tue, Jun 4, 2013 at 4:20 PM, Steven D'Aprano wrote: > On 05/06/13 04:34, Brett Cannon wrote: > >> Titus and I have discussed it and we have decided to apply the PSF's Code >> of Conduct (http://hg.python.org/coc) to this mailing list. In a nutshell >> everyone should be open, respectful, and considerate to each other, i.e. >> don't be rude. >> > > Excuse my ignorance, but who are you and Titus to make that decision? > > > -- > Steven > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at drees.name Wed Jun 5 08:25:50 2013 From: stefan at drees.name (Stefan Drees) Date: Wed, 05 Jun 2013 08:25:50 +0200 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: <51AED9EE.4060901@drees.name> On 04.06.13 21:57, Vinay Sajip wrote: > Philipp A. writes: > >> PyYAML might not implement YAML 1.2 fully on paper, but the most useful >> part of 1.2 (parsing arbitrary JSON) works flawlessly. > > Does it? What about this issue? > > https://bitbucket.org/xi/pyyaml/issue/11/valid-json-not-being-loaded ... if "TL;DR": summary() parsing arbitrary JSON is not guaranteed by[1] "the spec" (version 1.2, section 1.3, third paragraph). There I read wording like eg. """YAML can therefore be viewed as a natural superset of JSON, offering improved human readability and a more complete information model. This is also the case in practice; every JSON file is also a valid YAML file.[...]""" and even states, that the only issue might be """JSON's RFC4627 requires that mappings keys merely ?SHOULD? be unique, while YAML insists they ?MUST? be. Technically, YAML therefore complies with the JSON spec, choosing to treat duplicates as an error. In practice, since JSON is silent on the semantics of such duplicates, the only portable JSON files are those with unique keys, which are therefore valid YAML files. """ (4th paragraph ibid). So the first sentence might even match perl "can [...] be viewed" as python :-) and the second (as the 4th paragraph) is in error, as JSON allows insertion of backslash in double quoted string value and associates no meaning with it, but YAML (read on) does! So arbitrary JSON should cover the "russian doll" style of escaping data in serialization for some end-target (like the use case in the ticket, where some client likes slashes to be prepended by a backslash) are brittle at best. Here YAML spec is very clear, as it uses the C-escape characters. C.f. somewhere in section 2.3 "The double-quoted style provides escape sequences." Where escape sequences in YAML are explained in section 5.7, that starts with """ All non-printable characters must be escaped. YAML escape sequences use the ?\? notation common to most modern computer languages. Each escape sequence must be parsed into the appropriate Unicode character. The original escape sequence is a presentation detail and must not be used to convey content information. Note that escape sequences are only interpreted in double-quoted scalars. In all other scalar styles, the ?\? character has no special meaning and non-printable characters are not available. """ and continues with """ YAML escape sequences are a superset of C?s escape sequences:""" As the JSON of the ticket is {"key": "hi\/there"} this will not work in YAML as specified (in the relevant escape sequence section, as "\/" will not find the target Unicode character to replace it. This is not a PyYAML or libyaml problem. Consider the following C program: main(){char a[] = "\/";} compiling will not work, as the compiler catches the error: unknown escape sequence '\/' This is where PyYAML (and libyaml) is correct in throwing an error, as the spec is mandating escape sequences (and there interpretation). The above mentioned 3rd paragraph claiming the JSON - YAML relation as automatic, as long as the keys are uniqueis wrong and should be corrected, in aversion 1.3 or as errata to 1.2 (while I would prefer the former, as this is IMO a nasty and irritating inconsistency). References: [1]: http://www.yaml.org/spec/1.2/spec.html def summary(): """If the post is too long, summarize.""" print('YAML v1.2 is inconsistent and') print('can't parse \/ in a double quoted string') All the best, Stefan. From stefan at drees.name Wed Jun 5 09:31:35 2013 From: stefan at drees.name (Stefan Drees) Date: Wed, 05 Jun 2013 09:31:35 +0200 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: Message-ID: <51AEE957.6020105@drees.name> On 04.06.13 21:57, Vinay Sajip wrote: > Philipp A. writes: > >> PyYAML might not implement YAML 1.2 fully on paper, but the most useful >> part of 1.2 (parsing arbitrary JSON) works flawlessly. > > Does it? What about this issue? > > https://bitbucket.org/xi/pyyaml/issue/11/valid-json-not-being-loaded ... while some text of my previous reply to this mail (which is already in the archives, but has not yet reached my mailbox again) may be interesting read and the YAML spec might really profit from some rewriting to clarify the JSON-YAML relation, I overlooked "production 53" in [1] and the notion of what YAML uses a superset of C escape sequences realy means =( Too make it short, I hereby state: Oops, the YAML spec claims a superset of C escape-sequences, and in production 53 (c.f. [1]) explicitly adds "/" to the grammar as """Escaped ASCII slash (#x2F), for JSON compatibility. """ So automatic hand-over of a double-quoted JSON as YAML string to a C compiler goes not conform with the spec. References: [1]: http://www.yaml.org/spec/1.2/spec.html#id2776785 All the best and sorry for any confusion I may have introduced that was not educational nor entertaining, Stefan. From abarnert at yahoo.com Wed Jun 5 09:48:23 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 5 Jun 2013 00:48:23 -0700 (PDT) Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: <51AED9EE.4060901@drees.name> References: <51AED9EE.4060901@drees.name> Message-ID: <1370418503.24120.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: Stefan Drees Sent: Tuesday, June 4, 2013 11:25 PM > On 04.06.13 21:57, Vinay Sajip wrote: >> Philipp A. writes: >> >>> PyYAML might not implement YAML 1.2 fully on paper, but the most useful >>> part of 1.2 (parsing arbitrary JSON) works flawlessly. >> >> Does it? What about this issue? >> >> https://bitbucket.org/xi/pyyaml/issue/11/valid-json-not-being-loaded ... > > parsing arbitrary JSON is not guaranteed by[1] "the spec" (version > 1.2, section 1.3, third paragraph). Parsing arbitrary JSON _is_ guaranteed by the spec, except for the semantics of repeated keys (undefined in JSON, illegal in YAML). Parsing arbitrary JSON is not guaranteed by section 1.3, but that's non-normative text from the introduction, and doesn't really guarantee anything. Of course it's possible that there are errors in the spec, and therefore YAML is not a strict superset of JSON. (The errata list has been empty since the third patch to YAML 1.2 on 2009-10-01, but that just means nobody has _found_ any errors, not that none exist.) But the?example provided doesn't show that. > Here YAML spec is very clear, as it uses the C-escape characters. The YAML spec is very clear, as it uses _a superset of_ the C escape sequences, as you quoted immediately?below. As a superset, it includes sequences that C does not, and "\/" is one of them. In section 5.7, #53 is: ? ? ns-esc-slash ::= "/" With comment: ? ??Escaped ASCII slash (#x2F), for JSON compatibility. Then, #62 is: ? ? c-ns-esc-char ::= "\" ? ? ? ? ? ? ? ? ? ? ? ( ns-esc-null | ? | ns-esc-slash | ? ) So, "\/" is unambiguously a valid escape sequence. You could argue that the semantics could be explained more clearly. (It's a huge improvement over 1.1, which can be read to imply that each escape sequence is interpreter as itself.) But it's pretty clear what's intended, and I can't think of any other reasonable way to interpret it besides the intended way. You could also argue that it would be clearer for the spec to give the official Unicode names and/or codepoints of each escaped character, instead of informal descriptions. But I can't imagine anyone would interpret?"ASCII slash (#x2F)" as ambiguous, or as a description of any Unicode character other than "Solidus (Slash)" (#2xF). > As the JSON of the ticket is {"key": "hi\/there"} this? > will not work in YAML as specified (in the relevant escape sequence section, as > "\/" will not find the target Unicode character to replace it. Sure it will. The target Unicode character is "/". This will not work in YAML 1.1 (see section 5.6), but it will work in YAML 1.2; this was, in fact, one of the changes specifically made to fix YAML so that it's a strict superset of JSON. > This is not a PyYAML or libyaml problem. Well, it's not a PyYAML problem in that PyYAML supports YAML 1.1 properly, and 1.1, unlike 1.2, did not allow "\/". Whether it's a problem for using PyYAML as the basis for a stdlib package is a different question. Is it important that a stdlib yaml package support YAML 1.2, or that it support a strict superset of JSON? If so, it's definitely a problem; if not, it probably isn't. > The above mentioned 3rd paragraph claiming the JSON - YAML relation as? > automatic, as long as the keys are uniqueis wrong and should be corrected, in > aversion 1.3 or as errata to 1.2 (while I would prefer the former, as this is > IMO a nasty and irritating inconsistency). Both JSON and YAML 1.2 will interpret the sequence "\/" as the Unicode character "/". So, the 3rd paragraph is correct. At any rate, if you're just correcting?Vinay Sajip's raising of errors with PyYAML by pointing out that this isn't an error? well, that's true, but not for the reasons you gave. But if you're arguing that YAML 1.2 is a broken spec that's impossible to implement, or not worth implementing, you haven't made that point. And if you're not arguing either of those, I think I've missed your point entirely. That could easily be my fault. From abarnert at yahoo.com Wed Jun 5 09:53:40 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 5 Jun 2013 00:53:40 -0700 (PDT) Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: <51AEE957.6020105@drees.name> References: <51AEE957.6020105@drees.name> Message-ID: <1370418820.69717.YahooMailNeo@web184702.mail.ne1.yahoo.com> From: Stefan Drees Sent: Wednesday, June 5, 2013 12:31 AM > while some text of my previous reply to this mail (which is already in the > archives, but has not yet reached my mailbox again) may be interesting read and > the YAML spec might really profit from some rewriting to clarify the JSON-YAML > relation, I overlooked "production 53" in [1]? and the notion of what > YAML uses a superset of C escape sequences realy means =( Perfect timing; right after I hit "send", I see your followup? That's very different. Never mind. :) Still, despite your mistake, your raised the point that the version of YAML implemented by PyYAML (YAML 1.1) is not a strict superset of JSON, and that may be worth addressing. Would PyYAML have to upgrade to 1.2, or at least to 1.1 plus enough 1.2 changes to make it a strict superset of JSON, to be worth adding to the stdlib? Or would it be reasonable to say "Don't use the yaml module to parse JSON, dummy"? From stefan at drees.name Wed Jun 5 10:08:40 2013 From: stefan at drees.name (Stefan Drees) Date: Wed, 05 Jun 2013 10:08:40 +0200 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: <1370418820.69717.YahooMailNeo@web184702.mail.ne1.yahoo.com> References: <51AEE957.6020105@drees.name> <1370418820.69717.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: <51AEF208.8070407@drees.name> On 05.06.13 09:53, Andrew Barnert wrote: > From: Stefan Drees ... Sent: Wednesday, June 5, 2013 12:31 AM ... >> while some text of my previous reply to this mail (which is already in the >> archives, but has not yet reached my mailbox again) may be interesting read and >> the YAML spec might really profit from some rewriting to clarify the JSON-YAML >> relation, I overlooked "production 53" in [1] and the notion of what >> YAML uses a superset of C escape sequences realy means =( > > Perfect timing; right after I hit "send", I see your followup? > That'svery different. Never mind. :) mistakes are so effficient in getting attention ;-) > Still, despite your mistake, your raised the point that the version of > YAML implemented by PyYAML (YAML 1.1) is not a strict superset of JSON, > and that may be worth addressing. Would PyYAML have to upgrade to 1.2, > or at least to 1.1 plus enough 1.2 changes to make it a strict superset > of JSON, to be worth adding to the stdlib? Or would it be reasonable to > say "Don't use the yaml module to parse JSON, dummy"? Someone (I think the OP of the cited pyyaml issue #11) noted, that they receive a mix of JSON and YAML messages over some interface and tried to handle that by simply digesting through a YAML parser. In the light of the wide deployment of YAML v1.1 in tools I would at least for some time resort to your latter alternative, maybe checking the last word against some applicable CoC ;-) But, I will mainly think a bit or two more about what the price for this JSONesque attitude actually is. If you're so inclined, please repeat after me: "You can't simply delegate processing of a double quoted JSON string to C functions expecting C escape sequences" (and it is not only production 53 ...) I am all for a YAML parser inside python stdlib - at some time - with clear expectations on conformance, performance and "expectation management" for the users thereof. The latter because I am convinced, that as soon as something is part of the stdlib, people will start using it to also "learn" from it's API and "terms" about the domain handled. So terms, API and error messages should all be conforming to YAML spec Version 1.2+errata and well "settled". Glossary (for dummies like me :) CoC: Code of Conduct. All the best, Stefan. From vinay_sajip at yahoo.co.uk Wed Jun 5 13:18:20 2013 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Wed, 5 Jun 2013 11:18:20 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?PEP_426=2C=09YAML_in_the_stdlib_and_impl?= =?utf-8?q?ementation_discovery?= References: <51AED9EE.4060901@drees.name> <1370418503.24120.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: Andrew Barnert writes: > At any rate, if you're just correcting?Vinay Sajip's raising of errors > with PyYAML by pointing out that this isn't an error? well, that's true, > but not for the reasons you gave. I was pointing out that specific issue in response to Philipp A.'s assertion that PyYAML, when parsing arbitrary JSON, "works flawlessly". According to that issue, it doesn't - but whether it should or it shouldn't is a different question. Since we have JSON support in the stdlib already, I don't see why it's all that important that a YAML implementation can parse arbitrary JSON (unless it claims YAML 1.2 compliance or one wants 1.2 compliance, of course). The bulk of the open issues reported on the PyYAML.org tracker are related to how it handles YAML, which is altogether more pertinent when considering it as a potential candidate for stdlib inclusion. Regards, Vinay Sajip From g.rodola at gmail.com Wed Jun 5 13:43:33 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Wed, 5 Jun 2013 13:43:33 +0200 Subject: [Python-ideas] Timing hefty or long-running blocks of code In-Reply-To: <51A9BDE2.4060201@pearwood.info> References: <51A9BDE2.4060201@pearwood.info> Message-ID: 2013/6/1 Steven D'Aprano : > The timeit module is great for timing small code snippets, but it is rather > inconvenient for timing larger blocks of code. It's also over-kill: under > normal circumstances, there is little need for the heroic measures timeit > goes through to accurately time long-running code. [...] > Is there interest in seeing this in the standard library? A strong +1. This is one of those things I personally need quite often. FWIW In the 'utils' module I use across multiple projects I have a modified version of this: http://dabeaz.blogspot.it/2010/02/function-that-works-as-context-manager.html --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ From xi at resolvent.net Wed Jun 5 15:18:06 2013 From: xi at resolvent.net (Kirill Simonov) Date: Wed, 05 Jun 2013 08:18:06 -0500 Subject: [Python-ideas] Status of PyYAML - Re: Proposal for closing ticket pyyaml/issue/11 In-Reply-To: <51AEDDF4.2090009@drees.name> References: <51AEDDF4.2090009@drees.name> Message-ID: <51AF3A8E.1080105@resolvent.net> Hi Stefan, I'm cross-posting this to the python-ideas list. Regarding the status of PyYAML -- after 10 releases over 6 years, I consider it stable enough so that I don't need to pay much attention to it anymore. I'm migrating the project, including the issues database, from Trac to Bitbucket, but haven't finished it yet since Bitbucket added issues import/export API only a month ago and I didn't have a change to look at it. On discussion on python-ideas, I looked through it, but I'm not sure how I could contribute. Does it make sense to have YAML support in stdlib? Definitely, the format is popular enough and has some unique advantages over other options. But to the next question, how it should be done, I don't have a good answer. Would you include PyYAML+libyaml directly into the stdlib? It's a big chunk of C, Python and Pyrex/Cython code and I doubt core developers are willing to accept it. Besides, PyYAML API was designed for Python 2.3 and may feel outdated for Python 3.5. So you need to rewrite low-level parsing and emitting code to C+Python API and design a new high-level extension API, but who's going to do all this work? Thanks, Kirill On 06/05/2013 01:43 AM, Stefan Drees wrote: > Hi Kirill, > > as we have interesting discussions w.r.t. YAML and the python stdlib, I > thought you might be interested to know. > > Vinay Sajip was so nice, as to point to an open ticket at [1] as sample > and asked (in other words) "if this project is alive and well" ;-) > (he also noted that there """are 65 open issues of type "defect" > relating to PyYAML, and 15 such relating to libyaml.""" c.f. [2] > > As it seems to be unclear how and where issues are handled for PyYAML > and libyaml (TRAC on pyyaml.org or bitbucket repo?) I spent some minutes > to remove the common fairy tale that "arbitrary JSON (with unique keys) > is automatic correctly parseable YAML" and wrote[3]. > > Maybe you can use some of the wording at [3], to close the ticket and > also others in relation to this misconception, that YAML = JSON and > unique keys. > > I personally would really enjoy, if you could participate in these > discussions on python-ideas as it would certainly be beneficial, to read > your opinions and advice based on your experiences. > > Please consider it at least. It would be great! > > References: > > [1]: https://bitbucket.org/xi/pyyaml/issue/11/valid-json-not-being-loaded > [2]: http://mail.python.org/pipermail/python-ideas/2013-June/021088.html > [3]: http://mail.python.org/pipermail/python-ideas/2013-June/021095.html > > All the best, > Stefan. From brett at python.org Wed Jun 5 15:32:05 2013 From: brett at python.org (Brett Cannon) Date: Wed, 5 Jun 2013 09:32:05 -0400 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: References: <51AE7651.9080507@pearwood.info> Message-ID: On Tue, Jun 4, 2013 at 7:31 PM, David Mertz wrote: > My understanding is that Brett and Titus are the list maintainers. > What David said. Titus and I have been owned/managed the list since its inception. > > > On Tue, Jun 4, 2013 at 4:20 PM, Steven D'Aprano wrote: > >> On 05/06/13 04:34, Brett Cannon wrote: >> >>> Titus and I have discussed it and we have decided to apply the PSF's Code >>> of Conduct (http://hg.python.org/coc) to this mailing list. In a >>> nutshell >>> everyone should be open, respectful, and considerate to each other, i.e. >>> don't be rude. >>> >> >> Excuse my ignorance, but who are you and Titus to make that decision? >> >> >> -- >> Steven >> >> ______________________________**_________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/**mailman/listinfo/python-ideas >> > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Wed Jun 5 15:55:47 2013 From: flying-sheep at web.de (Philipp A.) Date: Wed, 5 Jun 2013 15:55:47 +0200 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: <51AED9EE.4060901@drees.name> <1370418503.24120.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: 2013/6/5 Vinay Sajip > I was pointing out that specific issue in response to Philipp A.'s > assertion > that PyYAML, when parsing arbitrary JSON, "works flawlessly". According to > that issue, it doesn't - but whether it should or it shouldn't is a > different question. > it should if the YAML document starts with %YAML 1.2 --- -------------- next part -------------- An HTML attachment was scrubbed... URL: From vinay_sajip at yahoo.co.uk Wed Jun 5 16:16:27 2013 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Wed, 5 Jun 2013 14:16:27 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?PEP_426=2C=09YAML_in_the_stdlib_and_impl?= =?utf-8?q?ementation_discovery?= References: <51AED9EE.4060901@drees.name> <1370418503.24120.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: Philipp A. writes: > it should if the YAML document starts with%YAML 1.2--- And if the library supports 1.2, which is what I was getting at. In [1], support for YAML 1.1 is asserted, but nothing is said about YAML 1.2. Regards, Vinay Sajip [1] http://pyyaml.org/wiki/PyYAML From stefan at drees.name Wed Jun 5 16:27:07 2013 From: stefan at drees.name (Stefan Drees) Date: Wed, 05 Jun 2013 16:27:07 +0200 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: References: <51AED9EE.4060901@drees.name> <1370418503.24120.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: <51AF4ABB.7080209@drees.name> On 2013-06-05 15:55, Philipp A. wrote: > 2013/6/5 Vinay Sajip ... wrote > >> I was pointing out that specific issue in response to Philipp A.'s >> assertion that PyYAML, when parsing arbitrary JSON, "works flawlessly". >> According to that issue, it doesn't - but whether it should or it >> shouldn't is a different question. > > it should if the YAML document starts with > > %YAML 1.2 > --- :) but that is hard for a valid JSON document ... I could not resist, sorry. If a receiving end digests a JSON message, do you mean, it should switch to YAML v1.2 mode? I think yes. But, I think we should concentrate on correct YAML parsing itself and simply not care why and how a JSON producer might utilize a YAML consumer. Now that Kirill gave some input and posed some questions, we might be better discussing how to proceed with YAML (as this seems a complicated enough beast). What do you think? Stefan. From brett at python.org Wed Jun 5 16:49:37 2013 From: brett at python.org (Brett Cannon) Date: Wed, 5 Jun 2013 10:49:37 -0400 Subject: [Python-ideas] Status of PyYAML - Re: Proposal for closing ticket pyyaml/issue/11 In-Reply-To: <51AF3A8E.1080105@resolvent.net> References: <51AEDDF4.2090009@drees.name> <51AF3A8E.1080105@resolvent.net> Message-ID: On Wed, Jun 5, 2013 at 9:18 AM, Kirill Simonov wrote: > Hi Stefan, > > I'm cross-posting this to the python-ideas list. > > Regarding the status of PyYAML -- after 10 releases over 6 years, > I consider it stable enough so that I don't need to pay much attention > to it anymore. I'm migrating the project, including the issues > database, from Trac to Bitbucket, but haven't finished it yet since > Bitbucket added issues import/export API only a month ago and I didn't > have a change to look at it. > > On discussion on python-ideas, I looked through it, but I'm not sure how I > could contribute. Does it make sense to have YAML support in > stdlib? Definitely, the format is popular enough and has some unique > advantages over other options. > > But to the next question, how it should be done, I don't have a good > answer. Would you include PyYAML+libyaml directly into the stdlib? > It's a big chunk of C, Python and Pyrex/Cython code and I doubt core > developers are willing to accept it. Besides, PyYAML API was designed > for Python 2.3 and may feel outdated for Python 3.5. So you need to > rewrite low-level parsing and emitting code to C+Python API and design > a new high-level extension API, but who's going to do all this work? > And that's the trick. =) If PyYAML was pulled into the stdlib presumably it would be you, Kirill. And development would need to move to python-dev and not be externally maintained (although you can make external releases). We also don't use Cython in Python itself so that could would not get pulled in. The basic requirements are outlined at http://docs.python.org/devguide/stdlibchanges.html -Brett > > Thanks, > Kirill > > > On 06/05/2013 01:43 AM, Stefan Drees wrote: > >> Hi Kirill, >> >> as we have interesting discussions w.r.t. YAML and the python stdlib, I >> thought you might be interested to know. >> >> Vinay Sajip was so nice, as to point to an open ticket at [1] as sample >> and asked (in other words) "if this project is alive and well" ;-) >> (he also noted that there """are 65 open issues of type "defect" >> relating to PyYAML, and 15 such relating to libyaml.""" c.f. [2] >> >> As it seems to be unclear how and where issues are handled for PyYAML >> and libyaml (TRAC on pyyaml.org or bitbucket repo?) I spent some minutes >> to remove the common fairy tale that "arbitrary JSON (with unique keys) >> is automatic correctly parseable YAML" and wrote[3]. >> >> Maybe you can use some of the wording at [3], to close the ticket and >> also others in relation to this misconception, that YAML = JSON and >> unique keys. >> >> I personally would really enjoy, if you could participate in these >> discussions on python-ideas as it would certainly be beneficial, to read >> your opinions and advice based on your experiences. >> >> Please consider it at least. It would be great! >> >> References: >> >> [1]: https://bitbucket.org/xi/**pyyaml/issue/11/valid-json-** >> not-being-loaded >> [2]: http://mail.python.org/**pipermail/python-ideas/2013-** >> June/021088.html >> [3]: http://mail.python.org/**pipermail/python-ideas/2013-** >> June/021095.html >> >> All the best, >> Stefan. >> > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Wed Jun 5 17:54:33 2013 From: dholth at gmail.com (Daniel Holth) Date: Wed, 5 Jun 2013 11:54:33 -0400 Subject: [Python-ideas] without(str(bytes)) Message-ID: Can I have a context manager that disables str(bytes) just for part of my code, the same as python -bb? with bb: serialize_something() From storchaka at gmail.com Wed Jun 5 18:33:49 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 05 Jun 2013 19:33:49 +0300 Subject: [Python-ideas] without(str(bytes)) In-Reply-To: References: Message-ID: 05.06.13 18:54, Daniel Holth ???????(??): > Can I have a context manager that disables str(bytes) just for part of > my code, the same as python -bb? > > with bb: > serialize_something() No, you can't. But it looks as interesting idea. Perhaps it should be a function in the sys module. with sys.alter_flags(bytes_warning=2, dont_write_bytecode=1): ... From brett at python.org Wed Jun 5 19:10:52 2013 From: brett at python.org (Brett Cannon) Date: Wed, 5 Jun 2013 13:10:52 -0400 Subject: [Python-ideas] without(str(bytes)) In-Reply-To: References: Message-ID: On Wed, Jun 5, 2013 at 12:33 PM, Serhiy Storchaka wrote: > 05.06.13 18:54, Daniel Holth ???????(??): > > Can I have a context manager that disables str(bytes) just for part of >> my code, the same as python -bb? >> >> with bb: >> serialize_something() >> > > No, you can't. > > But it looks as interesting idea. Perhaps it should be a function in the > sys module. > > with sys.alter_flags(bytes_warning=**2, dont_write_bytecode=1): > ... Future statements affect the parser, so by the time the code is executed there's nothing you could affect. If you really want this sort of thing you can use compile() and it's flag argument. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Wed Jun 5 19:53:16 2013 From: dholth at gmail.com (Daniel Holth) Date: Wed, 5 Jun 2013 13:53:16 -0400 Subject: [Python-ideas] without(str(bytes)) In-Reply-To: References: Message-ID: On Wed, Jun 5, 2013 at 1:10 PM, Brett Cannon wrote: > > > > On Wed, Jun 5, 2013 at 12:33 PM, Serhiy Storchaka > wrote: >> >> 05.06.13 18:54, Daniel Holth ???????(??): >> >>> Can I have a context manager that disables str(bytes) just for part of >>> my code, the same as python -bb? >>> >>> with bb: >>> serialize_something() >> >> >> No, you can't. >> >> But it looks as interesting idea. Perhaps it should be a function in the >> sys module. >> >> with sys.alter_flags(bytes_warning=2, dont_write_bytecode=1): >> ... > > > Future statements affect the parser, so by the time the code is executed > there's nothing you could affect. If you really want this sort of thing you > can use compile() and it's flag argument. I don't know about the bytecodes, I was thinking more along the lines of turning Py_BytesWarningFlag into a threadlocal and having the new context manager modify that and the warning filter for PyExc_BytesWarning. From benjamin at python.org Wed Jun 5 21:05:38 2013 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 5 Jun 2013 19:05:38 +0000 (UTC) Subject: [Python-ideas] without(str(bytes)) References: Message-ID: Brett Cannon writes: > Future statements affect the parser, so by the time the code is executed there's nothing you could affect. If you really want this sort of thing you can use compile() and it's flag argument.? Most (except -O I think) flags don't affect bytecode, though. From brett at python.org Wed Jun 5 22:16:22 2013 From: brett at python.org (Brett Cannon) Date: Wed, 5 Jun 2013 16:16:22 -0400 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: References: Message-ID: Here is an HTML version of the CoC: http://python.org/psf/codeofconduct/ On Tue, Jun 4, 2013 at 2:34 PM, Brett Cannon wrote: > Titus and I have discussed it and we have decided to apply the PSF's Code > of Conduct (http://hg.python.org/coc) to this mailing list. In a nutshell > everyone should be open, respectful, and considerate to each other, i.e. > don't be rude. > > If you think someone is not following the CoC, either publicly let them > know you think they are not (obviously do this politely; publicly notifying > someone is to help others on the list understand what may be considered > inappropriate behaviour, not to chastise them in front of a crowd) or email > python-ideas-owner@ privately with a link to the email in question from > the archives and we will handle it w/o disclosing who brought forward the > complaint. > > Anyone found repeatedly not following the CoC will be banned from the > list. You can always ask to be allowed back on based on the discretion of > the list owners. > > Because this is a new policy, no one is starting with any strikes against > them. Everyone is assumed to have been polite and on good terms up to this > point. > > In the 6.5 year history of this list there is possibly only one instance > where this would have really come into play in removing someone from this > list so this isn't expected to be used very often. But we felt we wanted > something in place to remind folks that we expect all subscribers to be > respectful of one another and to have something to use as a guideline for > removals if it came to that. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Wed Jun 5 22:26:46 2013 From: flying-sheep at web.de (Philipp A.) Date: Wed, 5 Jun 2013 22:26:46 +0200 Subject: [Python-ideas] PEP 426, YAML in the stdlib and implementation discovery In-Reply-To: <51AF4ABB.7080209@drees.name> References: <51AED9EE.4060901@drees.name> <1370418503.24120.YahooMailNeo@web184703.mail.ne1.yahoo.com> <51AF4ABB.7080209@drees.name> Message-ID: 2013/6/5 Stefan Drees > Now that Kirill gave some input and posed some questions, we might be > better discussing how to proceed with YAML (as this seems a complicated > enough beast). > > What do you think? > yes. so there are no objections to those tentative API cornerstones? 1. ?load? is PyYAML?s safe_load 2. PyYAML?s ?load? will require specifying a kwarg to ?load? or gets a longer toplevel name 3. there is a ?load_iter? which yields documents from a stream ? also possibly a ?load_all?, but maybe that?s just ?list(load_iter(?))? 4. similarly: we should supply a dump function that interacts well with streams what i?m unsure of: should ?load? and ?dump? determine if the input arg is a string or a file object, or should we supply ?loads? and ?dumps?? i prefer the former, since i can?t thing of a case where there?s a doubt which is which, but obviously someone decided to use the ??s? versions for the stdlib json module, so maybe there?s a reason?? then for implementation, if possible, we should use PyYAML, and modify it to our needs (e.g. fully implement YAML 1.2 if that isn?t too much, fix bugs, and change the API.) also we have to discuss on how and if we want to include the C implementation. PyYAML?s mechanism is to use ?Loaders? explicitly, but people are surprised and dissatisfied with that. if we include a C implementation, should we implicitly use it when the user imports yaml? -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Jun 6 00:44:06 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 06 Jun 2013 08:44:06 +1000 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: References: <51AE7651.9080507@pearwood.info> Message-ID: <51AFBF36.6030406@pearwood.info> On 05/06/13 09:31, David Mertz wrote: > My understanding is that Brett and Titus are the list maintainers. Thanks David, but I misworded the question. Rather than asking *who*, I should have asked why Brett and Titus *alone* are making this decision. Please someone correct me if I am misinformed, but I don't believe that this is a private mailing list "owned" by Brett and Titus, emphasis on the "private" part. As maintainers, they maintain the list on behalf of the community, they are not owners who get to unilaterally set policy for it. But even if I am wrong, and Brett and Titus are owners in the sense that they, and they alone, get to decide what happens with this list, it is hardly open, respectful or considerate to impose this sort of sort of policy change on the list without giving members the opportunity to express their thoughts on the matter first. > On Tue, Jun 4, 2013 at 4:20 PM, Steven D'Aprano wrote: > >> On 05/06/13 04:34, Brett Cannon wrote: >> >>> Titus and I have discussed it and we have decided to apply the PSF's Code >>> of Conduct (http://hg.python.org/coc) to this mailing list. In a nutshell >>> everyone should be open, respectful, and considerate to each other, i.e. >>> don't be rude. >>> >> >> Excuse my ignorance, but who are you and Titus to make that decision? >> >> >> -- >> Steven -- Steven From ethan at stoneleaf.us Thu Jun 6 00:49:52 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 05 Jun 2013 15:49:52 -0700 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: <51AFBF36.6030406@pearwood.info> References: <51AE7651.9080507@pearwood.info> <51AFBF36.6030406@pearwood.info> Message-ID: <51AFC090.2060604@stoneleaf.us> On 06/05/2013 03:44 PM, Steven D'Aprano wrote: > > But even if I am wrong, and Brett and Titus are owners in the sense that they, and they alone, get to decide what > happens with this list, it is hardly open, respectful or considerate to impose this sort of sort of policy change on the > list without giving members the opportunity to express their thoughts on the matter first. This list is not a democracy. If you (generic you) don't like the changes that are made, vote with your feet (fingers?) and create your own mailing list. -- ~Ethan~ From jbvsmo at gmail.com Thu Jun 6 01:47:20 2013 From: jbvsmo at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Bernardo?=) Date: Wed, 5 Jun 2013 20:47:20 -0300 Subject: [Python-ideas] PEP 443 - multiple types registered Message-ID: I was reading the recently accepted PEP 443 and this is one of the examples: @fun.register(float) @fun.register(Decimal) def fun_num(arg, verbose=False): if verbose: print("Half of your number:", end=" ") print(arg / 2) Wouldn't a tuple of types be more suitable? I mean, programatically you will not apply the decorators from a list, but define the function and then write a loop: for t in some_types: fun.register(t, fun_num) which is less pleasant for the eyes. I would rather do: @fun.register((float, Decimal)) def fun_num(arg, verbose=False): ... This makes a lot of sense when you think that Python functions and constructs dealing with types usually also deal with tuples of types. E.g: isinstance(foo, (Bar, Baz)) except (Foo, Bar) as baz: Last thought: People are used with decorators returning a decorated function. This is not the case here... For some, it may be confusing at first. Jo?o Bernardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From lukasz at langa.pl Thu Jun 6 02:02:09 2013 From: lukasz at langa.pl (=?utf-8?Q?=C5=81ukasz_Langa?=) Date: Thu, 6 Jun 2013 02:02:09 +0200 Subject: [Python-ideas] PEP 443 - multiple types registered In-Reply-To: References: Message-ID: <6F6BC630-B596-4633-90CF-855A7701B996@langa.pl> On 6 cze 2013, at 01:47, Jo?o Bernardo wrote: > I was reading the recently accepted PEP 443 and this is one of the examples: > > @fun.register(float) > @fun.register(Decimal) > def fun_num(arg, verbose=False): > if verbose: > print("Half of your number:", end=" ") > print(arg / 2) > > Wouldn't a tuple of types be more suitable? There is barely any gain from supporting a tuple syntax, whereas it closes up ways of extending the API in the future or introducing new ways of dispatch with an analogous API. We left it out consciously. > Last thought: > People are used with decorators returning a decorated function. This is not the case here... For some, it may be confusing at first. fun.register() does return the decorated function. This is why the example above works. @singledispatch returns a compatible wrapper, which is analogous to what @lru_cache and countless other decorators do. Is there something I am missing here? Thanks for your feedback! -- Best regards, ?ukasz Langa WWW: http://lukasz.langa.pl/ Twitter: @llanga IRC: ambv on #python-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbvsmo at gmail.com Thu Jun 6 02:05:51 2013 From: jbvsmo at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Bernardo?=) Date: Wed, 5 Jun 2013 21:05:51 -0300 Subject: [Python-ideas] PEP 443 - multiple types registered In-Reply-To: <6F6BC630-B596-4633-90CF-855A7701B996@langa.pl> References: <6F6BC630-B596-4633-90CF-855A7701B996@langa.pl> Message-ID: > fun.register() does return the decorated function. This is why the example above works. @singledispatch returns a compatible wrapper, which is analogous to what @lru_cache and countless other decorators do. Is there something I am missing > here? > Thanks for your feedback! Copied from the PEP text: The register() attribute returns the *undecorated *function. This enables decorator stacking, pickling, as well as creating unit tests for each variant independently: It's not big deal, just not common -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leapyear.org Thu Jun 6 02:19:42 2013 From: bruce at leapyear.org (Bruce Leban) Date: Wed, 5 Jun 2013 17:19:42 -0700 Subject: [Python-ideas] PEP 443 - multiple types registered In-Reply-To: References: <6F6BC630-B596-4633-90CF-855A7701B996@langa.pl> Message-ID: On Wed, Jun 5, 2013 at 5:05 PM, Jo?o Bernardo wrote: > Copied from the PEP text: > > The register() attribute returns the *undecorated *function. This > enables decorator stacking, pickling, as well as creating unit tests for > each variant independently: > The decorator doesn't change the function so there is no decorated version of the function to return. It just registers it in a dispatch table that the generic function uses. I think this is a fairly common pattern. For example, see flask.route. --- Bruce Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Jun 6 02:20:50 2013 From: tjreedy at udel.edu (Terry Jan Reedy) Date: Wed, 05 Jun 2013 20:20:50 -0400 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: <51AFBF36.6030406@pearwood.info> References: <51AE7651.9080507@pearwood.info> <51AFBF36.6030406@pearwood.info> Message-ID: On 6/5/2013 6:44 PM, Steven D'Aprano wrote: > On 05/06/13 09:31, David Mertz wrote: >> My understanding is that Brett and Titus are the list maintainers. > > Thanks David, but I misworded the question. Rather than asking *who*, I > should have asked why Brett and Titus *alone* are making this decision. > > Please someone correct me if I am misinformed, but I don't believe that > this is a private mailing list "owned" by Brett and Titus, emphasis on > the "private" part. As maintainers, they maintain the list on behalf of > the community, they are not owners who get to unilaterally set policy > for it. > > But even if I am wrong, and Brett and Titus are owners in the sense that > they, and they alone, get to decide what happens with this list, it is > hardly open, respectful or considerate to impose this sort of sort of > policy change on the list without giving members the opportunity to > express their thoughts on the matter first. I believe the PSF CoC was both ratified by a vote of members and intended to apply to all PSF activities, including PSF mailing lists based on PSF-paid servers. I understand Brett as saying that he and Titus intend to actively enforce it for this list. From dreamingforward at gmail.com Thu Jun 6 02:26:01 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Wed, 5 Jun 2013 17:26:01 -0700 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: <51AFBF36.6030406@pearwood.info> References: <51AE7651.9080507@pearwood.info> <51AFBF36.6030406@pearwood.info> Message-ID: On 6/5/13, Steven D'Aprano wrote: > On 05/06/13 09:31, David Mertz wrote: >> My understanding is that Brett and Titus are the list maintainers. > > Thanks David, but I misworded the question. Rather than asking *who*, I > should have asked why Brett and Titus *alone* are making this decision. > > Please someone correct me if I am misinformed, but I don't believe that this > is a private mailing list "owned" by Brett and Titus, emphasis on the > "private" part. As maintainers, they maintain the list on behalf of the > community, they are not owners who get to unilaterally set policy for it. That is not the community standard on the internet. The standard is that people who create a list are the de facto policy setters, mostly because anyone else can create their own list if they don't like the policies. Unless there's a severe problem with the policy, it would be considered rude to object to the volunteers who maintain the list (and tacit, if not actual, "owners"), since you are free to start your own. > But even if I am wrong, and Brett and Titus are owners in the sense that > they, and they alone, get to decide what happens with this list, it is > hardly open, respectful or considerate to impose this sort of sort of policy > change on the list without giving members the opportunity to express their > thoughts on the matter first. If you want this kind of governance here for the 'net, consider instead joining the US government where these political questions have been being hashed out for about 400 years. -- MarkJ Tacoma, Washington From jbvsmo at gmail.com Thu Jun 6 02:28:26 2013 From: jbvsmo at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Bernardo?=) Date: Wed, 5 Jun 2013 21:28:26 -0300 Subject: [Python-ideas] PEP 443 - multiple types registered In-Reply-To: References: <6F6BC630-B596-4633-90CF-855A7701B996@langa.pl> Message-ID: > The decorator doesn't change the function so there is no decorated version of the function to return. It just registers it in a dispatch table that the generic function uses. I think this is a fairly common pattern. For example, see flask.route. I've wrote all kind of decorators, including some that just store some information and returns the original function. This is not very important... The fact is that I still feel that register could allow a tuple of types without compromising the API. Maybe a "register_many" could be added then... -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Thu Jun 6 02:24:51 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Wed, 5 Jun 2013 20:24:51 -0400 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: References: <51AE7651.9080507@pearwood.info> <51AFBF36.6030406@pearwood.info> Message-ID: > This list is not a democracy. If you (generic you) don't like the changes that are made, vote with your feet (fingers?) and create your own mailing list. I agree with the fact that the list is not a democracy, and technically the list maintainers can do whatever they want, and I even think this decision is the right thing to do. Regardless, I believe that to some extent this list (like many similar lists) belongs to the community, and that generally a decision like this should be put out to air before becoming final. Saying "yeah GTFO if you don't like it" is perfectly valid, but isn't very conducive to building a friendly, collaborative and welcoming environment. Even if in the end the decision is the same, I think (some) people will generally be much happier for the feeling of involvement, and feeling that their opinion is valued. I support Steven because I believe that we should discuss issues we disagree with, rather than just immediately removing ourselves from the community at the slightest disagreement. It make not make any difference from an optimization/correctness point of view, but people generally feel happier when they get a heads up and they can talk about something before it actually happens. -Haoyi On Wed, Jun 5, 2013 at 8:20 PM, Terry Jan Reedy wrote: > On 6/5/2013 6:44 PM, Steven D'Aprano wrote: > >> On 05/06/13 09:31, David Mertz wrote: >> >>> My understanding is that Brett and Titus are the list maintainers. >>> >> >> Thanks David, but I misworded the question. Rather than asking *who*, I >> should have asked why Brett and Titus *alone* are making this decision. >> >> Please someone correct me if I am misinformed, but I don't believe that >> this is a private mailing list "owned" by Brett and Titus, emphasis on >> the "private" part. As maintainers, they maintain the list on behalf of >> the community, they are not owners who get to unilaterally set policy >> for it. >> >> But even if I am wrong, and Brett and Titus are owners in the sense that >> they, and they alone, get to decide what happens with this list, it is >> hardly open, respectful or considerate to impose this sort of sort of >> policy change on the list without giving members the opportunity to >> express their thoughts on the matter first. >> > > I believe the PSF CoC was both ratified by a vote of members and intended > to apply to all PSF activities, including PSF mailing lists based on > PSF-paid servers. I understand Brett as saying that he and Titus intend to > actively enforce it for this list. > > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jun 6 02:10:53 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 05 Jun 2013 17:10:53 -0700 Subject: [Python-ideas] PEP 443 - multiple types registered In-Reply-To: References: <6F6BC630-B596-4633-90CF-855A7701B996@langa.pl> Message-ID: <51AFD38D.9090103@stoneleaf.us> On 06/05/2013 05:05 PM, Jo?o Bernardo wrote: >> fun.register() does return the decorated function. This is why the example above works. @singledispatch returns a compatible wrapper, which is analogous to what @lru_cache and countless other decorators do. Is there something I am missing >> here? > > Copied from the PEP text: > > The register() attribute returns the *undecorated *function. This enables decorator stacking, pickling, as well as > creating unit tests for each variant independently: The function is undecorated because it is simply a registration mechanism. The function does not need to be altered for this to work. -- ~Ethan~ From haoyi.sg at gmail.com Thu Jun 6 02:42:22 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Wed, 5 Jun 2013 20:42:22 -0400 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: References: <51AE7651.9080507@pearwood.info> <51AFBF36.6030406@pearwood.info> Message-ID: > That is not the community standard on the internet. The standard is that people who create a list are the de facto policy setters, mostly because anyone else can create their own list if they don't like the policies. That hasn't been my (admittedly limited) experience. In many of the forums and lists I've been on, this kind of change gets a "Hey guys, we're thinking of ... so speak up now if you have any objections." before it happens. Nobody ever objects, but I really think this sort of thing goes a long way to maintaining a happy community. Emotions are funny like that. Nobody ever says "GTFO if you have any objections", and for good reason. On Wed, Jun 5, 2013 at 8:26 PM, Mark Janssen wrote: > On 6/5/13, Steven D'Aprano wrote: > > On 05/06/13 09:31, David Mertz wrote: > >> My understanding is that Brett and Titus are the list maintainers. > > > > Thanks David, but I misworded the question. Rather than asking *who*, I > > should have asked why Brett and Titus *alone* are making this decision. > > > > Please someone correct me if I am misinformed, but I don't believe that > this > > is a private mailing list "owned" by Brett and Titus, emphasis on the > > "private" part. As maintainers, they maintain the list on behalf of the > > community, they are not owners who get to unilaterally set policy for it. > > That is not the community standard on the internet. The standard is > that people who create a list are the de facto policy setters, mostly > because anyone else can create their own list if they don't like the > policies. > > Unless there's a severe problem with the policy, it would be > considered rude to object to the volunteers who maintain the list (and > tacit, if not actual, "owners"), since you are free to start your own. > > > But even if I am wrong, and Brett and Titus are owners in the sense that > > they, and they alone, get to decide what happens with this list, it is > > hardly open, respectful or considerate to impose this sort of sort of > policy > > change on the list without giving members the opportunity to express > their > > thoughts on the matter first. > > If you want this kind of governance here for the 'net, consider > instead joining the US government where these political questions have > been being hashed out for about 400 years. > -- > MarkJ > Tacoma, Washington > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Thu Jun 6 02:54:14 2013 From: dholth at gmail.com (Daniel Holth) Date: Wed, 5 Jun 2013 20:54:14 -0400 Subject: [Python-ideas] without(str(bytes)) In-Reply-To: References: Message-ID: On Wed, Jun 5, 2013 at 8:51 PM, Andrew Barnert wrote: > On Jun 5, 2013, at 10:53, Daniel Holth wrote: > >> On Wed, Jun 5, 2013 at 1:10 PM, Brett Cannon wrote: >>> >>> >>> >>> On Wed, Jun 5, 2013 at 12:33 PM, Serhiy Storchaka >>> wrote: >>>> >>>> 05.06.13 18:54, Daniel Holth ???????(??): >>>> >>>>> Can I have a context manager that disables str(bytes) just for part of >>>>> my code, the same as python -bb? >>>>> >>>>> with bb: >>>>> serialize_something() >>>> >>>> >>>> No, you can't. >>>> >>>> But it looks as interesting idea. Perhaps it should be a function in the >>>> sys module. >>>> >>>> with sys.alter_flags(bytes_warning=2, dont_write_bytecode=1): >>>> ... >>> >>> >>> Future statements affect the parser, so by the time the code is executed >>> there's nothing you could affect. If you really want this sort of thing you >>> can use compile() and it's flag argument. >> >> I don't know about the bytecodes, I was thinking more along the lines >> of turning Py_BytesWarningFlag into a threadlocal and having the new >> context manager modify that and the warning filter for >> PyExc_BytesWarning. > > The problem is that the warning happens at compile time, but the context manager runs at runtime, at which point it's too late to do any good. You'd need some kind of compile-time context manager (and compile-time-with statement) to make that work. > > Or you can just explicitly call the compiler at runtime, which is what the others were suggesting. No, this one is totally a runtime warning. It's very clear where to add in the CPython source code. I don't think I can alter the warnings state per-thread sanely though, but I think altering whether this particular warning is thrown at all may be enough. From abarnert at yahoo.com Thu Jun 6 02:51:24 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 5 Jun 2013 17:51:24 -0700 Subject: [Python-ideas] without(str(bytes)) In-Reply-To: References: Message-ID: On Jun 5, 2013, at 10:53, Daniel Holth wrote: > On Wed, Jun 5, 2013 at 1:10 PM, Brett Cannon wrote: >> >> >> >> On Wed, Jun 5, 2013 at 12:33 PM, Serhiy Storchaka >> wrote: >>> >>> 05.06.13 18:54, Daniel Holth ???????(??): >>> >>>> Can I have a context manager that disables str(bytes) just for part of >>>> my code, the same as python -bb? >>>> >>>> with bb: >>>> serialize_something() >>> >>> >>> No, you can't. >>> >>> But it looks as interesting idea. Perhaps it should be a function in the >>> sys module. >>> >>> with sys.alter_flags(bytes_warning=2, dont_write_bytecode=1): >>> ... >> >> >> Future statements affect the parser, so by the time the code is executed >> there's nothing you could affect. If you really want this sort of thing you >> can use compile() and it's flag argument. > > I don't know about the bytecodes, I was thinking more along the lines > of turning Py_BytesWarningFlag into a threadlocal and having the new > context manager modify that and the warning filter for > PyExc_BytesWarning. The problem is that the warning happens at compile time, but the context manager runs at runtime, at which point it's too late to do any good. You'd need some kind of compile-time context manager (and compile-time-with statement) to make that work. Or you can just explicitly call the compiler at runtime, which is what the others were suggesting. From python at mrabarnett.plus.com Thu Jun 6 03:09:16 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 06 Jun 2013 02:09:16 +0100 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: References: <51AE7651.9080507@pearwood.info> <51AFBF36.6030406@pearwood.info> Message-ID: <51AFE13C.8030201@mrabarnett.plus.com> On 06/06/2013 01:26, Mark Janssen wrote: > On 6/5/13, Steven D'Aprano wrote: >> On 05/06/13 09:31, David Mertz wrote: >>> My understanding is that Brett and Titus are the list maintainers. >> >> Thanks David, but I misworded the question. Rather than asking *who*, I >> should have asked why Brett and Titus *alone* are making this decision. >> >> Please someone correct me if I am misinformed, but I don't believe that this >> is a private mailing list "owned" by Brett and Titus, emphasis on the >> "private" part. As maintainers, they maintain the list on behalf of the >> community, they are not owners who get to unilaterally set policy for it. > > That is not the community standard on the internet. The standard is > that people who create a list are the de facto policy setters, mostly > because anyone else can create their own list if they don't like the > policies. > > Unless there's a severe problem with the policy, it would be > considered rude to object to the volunteers who maintain the list (and > tacit, if not actual, "owners"), since you are free to start your own. > >> But even if I am wrong, and Brett and Titus are owners in the sense that >> they, and they alone, get to decide what happens with this list, it is >> hardly open, respectful or considerate to impose this sort of sort of policy >> change on the list without giving members the opportunity to express their >> thoughts on the matter first. > > If you want this kind of governance here for the 'net, consider > instead joining the US government where these political questions have > been being hashed out for about 400 years. > "About 400 years"? I didn't know that the US government had been around for that amount of time! :-) From mertz at gnosis.cx Thu Jun 6 03:34:35 2013 From: mertz at gnosis.cx (David Mertz) Date: Wed, 5 Jun 2013 18:34:35 -0700 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: <51AFBF36.6030406@pearwood.info> References: <51AE7651.9080507@pearwood.info> <51AFBF36.6030406@pearwood.info> Message-ID: As I said, Brett and Titus should be the ones making this decision because they are the list maintainers. If you wish to maintain a list with different rules, that is your prerogative. I think they do a good job with this one (although I haven't subscribed for very long), and it is their decision alone. They should unilaterally set policy, but especially since this is so obviously in keeping with voted policy of the PSF as a whole (which *is* democratic, in a certain way). I dislike the facile notion of "democracy" that some people advocate about things like this. Democracy has little place in FLOSS. There are simply volunteers who choose to put in whatever specific efforts they can and wish to. Just as we don't "democratically vote a patch" to the Python language (but rather defer, ultimately, to the BDFL), a mailing list is a thing that specific people have taken responsibility for. On Wed, Jun 5, 2013 at 3:44 PM, Steven D'Aprano wrote: > On 05/06/13 09:31, David Mertz wrote: > >> My understanding is that Brett and Titus are the list maintainers. >> > > Thanks David, but I misworded the question. Rather than asking *who*, I > should have asked why Brett and Titus *alone* are making this decision. > > Please someone correct me if I am misinformed, but I don't believe that > this is a private mailing list "owned" by Brett and Titus, emphasis on the > "private" part. As maintainers, they maintain the list on behalf of the > community, they are not owners who get to unilaterally set policy for it. > > But even if I am wrong, and Brett and Titus are owners in the sense that > they, and they alone, get to decide what happens with this list, it is > hardly open, respectful or considerate to impose this sort of sort of > policy change on the list without giving members the opportunity to express > their thoughts on the matter first. > > > > > On Tue, Jun 4, 2013 at 4:20 PM, Steven D'Aprano >> wrote: >> >> On 05/06/13 04:34, Brett Cannon wrote: >>> >>> Titus and I have discussed it and we have decided to apply the PSF's >>>> Code >>>> of Conduct (http://hg.python.org/coc) to this mailing list. In a >>>> nutshell >>>> everyone should be open, respectful, and considerate to each other, i.e. >>>> don't be rude. >>>> >>>> >>> Excuse my ignorance, but who are you and Titus to make that decision? >>> >>> >>> -- >>> Steven >>> >> > > > > -- > Steven > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Jun 6 05:03:31 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 06 Jun 2013 13:03:31 +1000 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: References: <51AE7651.9080507@pearwood.info> <51AFBF36.6030406@pearwood.info> Message-ID: <51AFFC03.7070605@pearwood.info> On 06/06/13 10:20, Terry Jan Reedy wrote: > I believe the PSF CoC was both ratified by a vote of members and intended to apply to all PSF activities, This is not the PSF, this is an open list with many members who are not PSF members. -- Steven From steve at pearwood.info Thu Jun 6 05:38:29 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 06 Jun 2013 13:38:29 +1000 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: <51AFC090.2060604@stoneleaf.us> References: <51AE7651.9080507@pearwood.info> <51AFBF36.6030406@pearwood.info> <51AFC090.2060604@stoneleaf.us> Message-ID: <51B00435.9070800@pearwood.info> On 06/06/13 08:49, Ethan Furman wrote: > On 06/05/2013 03:44 PM, Steven D'Aprano wrote: >> >> But even if I am wrong, and Brett and Titus are owners in the sense that they, and they alone, get to decide what >> happens with this list, it is hardly open, respectful or considerate to impose this sort of sort of policy change on the >> list without giving members the opportunity to express their thoughts on the matter first. > > This list is not a democracy. Please read my comment again. Even if Brett and Titus are dictators entitled to do whatever they like with this mailing list, shut it down or put it behind a paywall or censor emails or evict people on a whim or retroactively change the terms and conditions of being a member, doing so *is not open, respectful or considerate* -- the three things which the CoC is supposed to enforce. > If you (generic you) don't like the changes that are made, vote with your feet (fingers?) and create your own mailing list. Is this an example of the openness to all that the CoC is supposed to stand for? "Agree with us, or GTFO". -- Steven From ncoghlan at gmail.com Thu Jun 6 05:46:19 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 6 Jun 2013 13:46:19 +1000 Subject: [Python-ideas] PEP 443 - multiple types registered In-Reply-To: References: <6F6BC630-B596-4633-90CF-855A7701B996@langa.pl> Message-ID: On 6 Jun 2013 10:29, "Jo?o Bernardo" wrote: > > > The decorator doesn't change the function so there is no decorated version of the function to return. It just registers it in a dispatch table that the generic function uses. I think this is a fairly common pattern. For example, see flask.route. > > I've wrote all kind of decorators, including some that just store some information and returns the original function. This is not very important... > > The fact is that I still feel that register could allow a tuple of types without compromising the API. It's ambiguous relative to multiple dispatch APIs. >Maybe a "register_many" could be added then... Anyone that is regularly registering multiple types for the same implementations would be well advised to start defining some appropriate ABCs and register those instead. Cheers, Nick. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wuwei23 at gmail.com Thu Jun 6 03:03:35 2013 From: wuwei23 at gmail.com (alex23) Date: Wed, 5 Jun 2013 18:03:35 -0700 (PDT) Subject: [Python-ideas] PEP 443 - multiple types registered In-Reply-To: References: <6F6BC630-B596-4633-90CF-855A7701B996@langa.pl> Message-ID: <3d116f97-c06a-4547-b7dc-ea3133c6a718@v5g2000pbv.googlegroups.com> On Jun 6, 10:28?am, Jo?o Bernardo wrote: > The ?fact is that I still feel that register could allow a tuple of types > without compromising the API. Maybe a "register_many" could be added then... Isn't it simple enough to do yourself? def register_many(types): def _register_many(fn): for t in types: fun.register(type)(fn) return fn return _register_many From ctb at msu.edu Thu Jun 6 05:27:21 2013 From: ctb at msu.edu (C. Titus Brown) Date: Wed, 5 Jun 2013 23:27:21 -0400 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: <51AFFC03.7070605@pearwood.info> References: <51AE7651.9080507@pearwood.info> <51AFBF36.6030406@pearwood.info> <51AFFC03.7070605@pearwood.info> Message-ID: Hi Steven, the list is nonetheless supported by psf resources and is an official part of the python developer workflow. While Brett and I could perhaps have been less dictatorial in our approach, I continue to fail to see why basic rules of civility and positivity represent any challenge for any member of the Python community - of which the members of this list are. If you or any others have concerns about the new list policy, I encourage you to take it up with the list hosts or the psf board itself. I'm not sure what else to say other than that I view continued discussion of list policy to be outside the scope of this list and hence something to be truncated. Feel free to contact Brett and myself off list if you want us to identify specific ways to pursue altering the list policy. Best, Titus --- C. Titus Brown, ctb at msu.edu On Jun 5, 2013, at 23:03, Steven D'Aprano wrote: > On 06/06/13 10:20, Terry Jan Reedy wrote: > >> I believe the PSF CoC was both ratified by a vote of members and intended to apply to all PSF activities, > > This is not the PSF, this is an open list with many members who are not PSF members. > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From ncoghlan at gmail.com Thu Jun 6 11:16:28 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 6 Jun 2013 19:16:28 +1000 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: <51B00435.9070800@pearwood.info> References: <51AE7651.9080507@pearwood.info> <51AFBF36.6030406@pearwood.info> <51AFC090.2060604@stoneleaf.us> <51B00435.9070800@pearwood.info> Message-ID: On 6 June 2013 13:38, Steven D'Aprano wrote: > On 06/06/13 08:49, Ethan Furman wrote: > >> On 06/05/2013 03:44 PM, Steven D'Aprano wrote: >> >>> >>> But even if I am wrong, and Brett and Titus are owners in the sense that >>> they, and they alone, get to decide what >>> happens with this list, it is hardly open, respectful or considerate to >>> impose this sort of sort of policy change on the >>> list without giving members the opportunity to express their thoughts on >>> the matter first. >>> >> >> This list is not a democracy. >> > > Please read my comment again. Even if Brett and Titus are dictators > entitled to do whatever they like with this mailing list, shut it down or > put it behind a paywall or censor emails or evict people on a whim or > retroactively change the terms and conditions of being a member, doing so > *is not open, respectful or considerate* -- the three things which the CoC > is supposed to enforce. Brett & Titus have always had the power to ban anyone they wanted on any grounds they chose. While it's been a close call on at least one occasion, they have to date chosen not to ban anyone. However, choosing not to exercise a power isn't the same thing as not having it. All this announcement means is that rather than leaving the expected standards of behaviour completely unwritten (which is the status quo), they have chosen to use the PSF's template CoC as an accurate description of the *way the list already operates*. To put it another way, the expected standards of behaviour on this list *haven't changed*. The only thing that has changed is that they're actually written down somewhere, rather than people having to pick them up by lurking on the list for a while. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jun 6 11:20:21 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 6 Jun 2013 19:20:21 +1000 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: References: <51AE7651.9080507@pearwood.info> <51AFBF36.6030406@pearwood.info> Message-ID: On 6 June 2013 10:20, Terry Jan Reedy wrote: > On 6/5/2013 6:44 PM, Steven D'Aprano wrote: > >> On 05/06/13 09:31, David Mertz wrote: >> >>> My understanding is that Brett and Titus are the list maintainers. >>> >> >> Thanks David, but I misworded the question. Rather than asking *who*, I >> should have asked why Brett and Titus *alone* are making this decision. >> >> Please someone correct me if I am misinformed, but I don't believe that >> this is a private mailing list "owned" by Brett and Titus, emphasis on >> the "private" part. As maintainers, they maintain the list on behalf of >> the community, they are not owners who get to unilaterally set policy >> for it. >> >> But even if I am wrong, and Brett and Titus are owners in the sense that >> they, and they alone, get to decide what happens with this list, it is >> hardly open, respectful or considerate to impose this sort of sort of >> policy change on the list without giving members the opportunity to >> express their thoughts on the matter first. >> > > I believe the PSF CoC was both ratified by a vote of members and intended > to apply to all PSF activities, including PSF mailing lists based on > PSF-paid servers. Not quite - we voted to publish a template CoC that the community (including list administrators) may choose to adopt, and *separately* voted to adopt a derivative of that template for the PSF itself. Whether any given list (or other group within the community) adopts the CoC is up to the organisers of that group. I understand Brett as saying that he and Titus intend to actively enforce > it for this list. Given the list generally follows that CoC *without* active enforcement, nothing is likely to have to change in that regard. The only difference is that the expected standards of behaviours are documented rather than needing to be picked up through observation. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Thu Jun 6 11:48:31 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 6 Jun 2013 11:48:31 +0200 Subject: [Python-ideas] without(str(bytes)) In-Reply-To: References: Message-ID: <22E4C211-5D79-4D31-A98C-617929EFB873@mac.com> On 6 Jun, 2013, at 2:51, Andrew Barnert wrote: > On Jun 5, 2013, at 10:53, Daniel Holth wrote: > >> On Wed, Jun 5, 2013 at 1:10 PM, Brett Cannon wrote: >>> >>> >>> >>> On Wed, Jun 5, 2013 at 12:33 PM, Serhiy Storchaka >>> wrote: >>>> >>>> 05.06.13 18:54, Daniel Holth ???????(??): >>>> >>>>> Can I have a context manager that disables str(bytes) just for part of >>>>> my code, the same as python -bb? >>>>> >>>>> with bb: >>>>> serialize_something() >>>> >>>> >>>> No, you can't. >>>> >>>> But it looks as interesting idea. Perhaps it should be a function in the >>>> sys module. >>>> >>>> with sys.alter_flags(bytes_warning=2, dont_write_bytecode=1): >>>> ... >>> >>> >>> Future statements affect the parser, so by the time the code is executed >>> there's nothing you could affect. If you really want this sort of thing you >>> can use compile() and it's flag argument. >> >> I don't know about the bytecodes, I was thinking more along the lines >> of turning Py_BytesWarningFlag into a threadlocal and having the new >> context manager modify that and the warning filter for >> PyExc_BytesWarning. > > The problem is that the warning happens at compile time, but the context manager runs at runtime, at which point it's too late to do any good. You'd need some kind of compile-time context manager (and compile-time-with statement) to make that work. I might be missing something, but wat Daniel proposes is a change in runtime behavior: the '-bb' flag affects the behavior of bytes.__str__ and not the generated bytecode (and the warning is emitted when bytes.__str__ is actually called, not during bytecode compilation). The contextmanager would set and reset 'Py_BytesWarningFlag'. Ronald From stephen at xemacs.org Thu Jun 6 12:02:39 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 06 Jun 2013 19:02:39 +0900 Subject: [Python-ideas] applying the PSF Code of Conduct to this mailing list In-Reply-To: <51B00435.9070800@pearwood.info> References: <51AE7651.9080507@pearwood.info> <51AFBF36.6030406@pearwood.info> <51AFC090.2060604@stoneleaf.us> <51B00435.9070800@pearwood.info> Message-ID: <87ppvza75s.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > Even if Brett and Titus are dictators entitled to do whatever they > like with this mailing list, shut it down or put it behind a > paywall or censor emails or evict people on a whim or retroactively > change the terms and conditions of being a member, doing so *is not > open, respectful or considerate* -- the three things which the CoC > is supposed to enforce. And they've not done, and will not do, any of those nasty things listed above. Several years of history demonstrate that they are *benevolent* dictators who do a good job of *representing* community values, rather than trying to *set* values. BDship is a governance structure that has proved its worth in Python in several ways. IMHO, this bugfix ("documenting the undocumented") proves it again. And we each get a pony: if any one violated any of the rules as unwritten, he is forgiven and starts fresh now that they are written. From ram.rachum at gmail.com Thu Jun 6 16:46:39 2013 From: ram.rachum at gmail.com (Ram Rachum) Date: Thu, 6 Jun 2013 07:46:39 -0700 (PDT) Subject: [Python-ideas] Allow using ** twice Message-ID: I'd like to be able to use ** twice (using 2 different dicts) when calling a function, and have them both feed arguments into the function. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram.rachum at gmail.com Thu Jun 6 16:47:29 2013 From: ram.rachum at gmail.com (Ram Rachum) Date: Thu, 6 Jun 2013 07:47:29 -0700 (PDT) Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: <97df847d-b231-4c72-8f1d-265b776ba7d4@googlegroups.com> I should have said "multiple times" instead of twice, as there's no reason to limit to just 2. I'm suggesting an unlimited number of ** usages in any single call. On Thursday, June 6, 2013 5:46:39 PM UTC+3, Ram Rachum wrote: > > I'd like to be able to use ** twice (using 2 different dicts) when calling > a function, and have them both feed arguments into the function. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus at unterwaditzer.net Thu Jun 6 16:54:12 2013 From: markus at unterwaditzer.net (Markus Unterwaditzer) Date: Thu, 06 Jun 2013 16:54:12 +0200 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: This indicates for me that it generally should be possible to generate the union of two dicts with sth like {} + {}. -- Markus (from phone) Ram Rachum wrote: >I'd like to be able to use ** twice (using 2 different dicts) when >calling >a function, and have them both feed arguments into the function. > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas From christian at python.org Thu Jun 6 16:58:36 2013 From: christian at python.org (Christian Heimes) Date: Thu, 06 Jun 2013 16:58:36 +0200 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: Am 06.06.2013 16:46, schrieb Ram Rachum: > I'd like to be able to use ** twice (using 2 different dicts) when > calling a function, and have them both feed arguments into the function. -1. It's trivial to do so with a few lines of code: kwargs = {} kwargs.update(d1) kwargs.update(d2) kwargs.update(d3) ... func(**kwargs) Christian From oscar.j.benjamin at gmail.com Thu Jun 6 16:58:36 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 6 Jun 2013 15:58:36 +0100 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <97df847d-b231-4c72-8f1d-265b776ba7d4@googlegroups.com> References: <97df847d-b231-4c72-8f1d-265b776ba7d4@googlegroups.com> Message-ID: On 6 June 2013 15:47, Ram Rachum wrote: > I should have said "multiple times" instead of twice, as there's no reason > to limit to just 2. I'm suggesting an unlimited number of ** usages in any > single call. Could you perhaps give some examples of usage and explain what behaviour you expect if a key is present in both dicts? Oscar From python at mrabarnett.plus.com Thu Jun 6 17:02:41 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 06 Jun 2013 16:02:41 +0100 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: <51B0A491.5090906@mrabarnett.plus.com> On 06/06/2013 15:46, Ram Rachum wrote: > I'd like to be able to use ** twice (using 2 different dicts) when > calling a function, and have them both feed arguments into the function. > It's not possible to use * twice to feed 2 sets of positional arguments, but you can use + to concatenate them, so would a better solution be to have a way to merge 2 dicts? For example: temp = dict1 | dict2 would have the same result as: temp = {} temp.update(dict1) temp.update(dict2) From oscar.j.benjamin at gmail.com Thu Jun 6 17:05:08 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 6 Jun 2013 16:05:08 +0100 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: On 6 June 2013 15:54, Markus Unterwaditzer wrote: > This indicates for me that it generally should be possible to generate the union of two dicts with sth like {} + {}. It cannot be a union since the dicts can have different values correspond to the same keys. However you can make a class that will try several mappings sequentially e.g.: class MapJoin: def __init__(self, *mappings): self.mappings = mappings def __getitem__(self, key): for mapping in self.mappings: try: return mapping[key] except KeyError: pass else: raise KeyError(str(key)) # Perhaps other mapping methods here... Then you can do: func(**MapJoin(map1, map2)) Oscar From oscar.j.benjamin at gmail.com Thu Jun 6 17:08:07 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 6 Jun 2013 16:08:07 +0100 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <51B0A491.5090906@mrabarnett.plus.com> References: <51B0A491.5090906@mrabarnett.plus.com> Message-ID: On 6 June 2013 16:02, MRAB wrote: > For example: > > temp = dict1 | dict2 > > would have the same result as: > > temp = {} > temp.update(dict1) > temp.update(dict2) I would expect this to have the reverse order e.g.: temp = {} temp.update(dict2) temp.update(dict1) I read `dict1 | dict2` as a mapping that would try dict1 *or* dict2 if the key is not in dict1. Oscar From zachary.ware+pyideas at gmail.com Thu Jun 6 17:08:42 2013 From: zachary.ware+pyideas at gmail.com (Zachary Ware) Date: Thu, 6 Jun 2013 10:08:42 -0500 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: On Thu, Jun 6, 2013 at 10:05 AM, Oscar Benjamin wrote: > On 6 June 2013 15:54, Markus Unterwaditzer wrote: >> This indicates for me that it generally should be possible to generate the union of two dicts with sth like {} + {}. > > It cannot be a union since the dicts can have different values > correspond to the same keys. > > However you can make a class that will try several mappings sequentially e.g.: > > class MapJoin: > def __init__(self, *mappings): > self.mappings = mappings > def __getitem__(self, key): > for mapping in self.mappings: > try: > return mapping[key] > except KeyError: > pass > else: > raise KeyError(str(key)) > # Perhaps other mapping methods here... > > Then you can do: > > func(**MapJoin(map1, map2)) > That sounds rather like collections.ChainMap. From oscar.j.benjamin at gmail.com Thu Jun 6 17:11:51 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 6 Jun 2013 16:11:51 +0100 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: On 6 June 2013 16:08, Zachary Ware wrote: >> However you can make a class that will try several mappings sequentially e.g.: > > That sounds rather like collections.ChainMap. So it does. Thanks for pointing this out; I wasn't aware of it. So as of PYthon 3.3 you can do: from collections import ChainMap func(**ChainMap(dict1, dict2)) Although this does not have the behaviour suggested by the OP (an error when a key is duplicated). Oscar From haoyi.sg at gmail.com Thu Jun 6 17:13:59 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Thu, 6 Jun 2013 11:13:59 -0400 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: -1 for multiple **, +1 for easy merging of dicts, since that would let you trivially do **(dict_a + dict_b) or similar. Looking at http://stackoverflow.com/questions/38987/how-can-i-merge-union-two-python-dictionaries-in-a-single-expression You have a bunch of different solutions that all work with varying degrees of obtuseness. The least obtuse being: dict(x,**y) But it's still entirely non-obvious what the intent of this statement is ("combine these dicts"). I agree that the multiple-key issue is a problem, but I think that providing support for the "common" case (whichever that may be, i suspect it's "dicts on right take priority over dicts on left") will be beneficial. Now merging two dicts takes far too much code and boilerplate, and there are far too many equivalent ways: dict(x.items() + y.items()) {xi:(x[xi] if xi not in list(y.keys()) else y[xi]) for xi in list(x.keys())+(list(y.keys()))} def dict_merge(a, b): c = a.copy() c.update(b) return c new = dict_merge(old, extras) dict(list(dict1.items()) + list(dict2.items())) from collections import ChainMap; ChainMap(y, x) of performing an operation that is almost (not quite) as fundamental as +ing lists and &ing sets. On Thu, Jun 6, 2013 at 10:54 AM, Markus Unterwaditzer < markus at unterwaditzer.net> wrote: > This indicates for me that it generally should be possible to generate the > union of two dicts with sth like {} + {}. > > -- Markus (from phone) > > Ram Rachum wrote: > >I'd like to be able to use ** twice (using 2 different dicts) when > >calling > >a function, and have them both feed arguments into the function. > > > > > >------------------------------------------------------------------------ > > > >_______________________________________________ > >Python-ideas mailing list > >Python-ideas at python.org > >http://mail.python.org/mailman/listinfo/python-ideas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Thu Jun 6 17:17:36 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Thu, 6 Jun 2013 11:17:36 -0400 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: > I read `dict1 | dict2` as a mapping that would try dict1 *or* dict2 if the key is not in dict1. Idea: dict1 + dict2 -> dict2 takes priority dict1 | dict2 -> dict1 takes priority Does that make sense to anyone? Both + and | are currently un-used iirc. On Thu, Jun 6, 2013 at 11:11 AM, Oscar Benjamin wrote: > On 6 June 2013 16:08, Zachary Ware wrote: > >> However you can make a class that will try several mappings > sequentially e.g.: > > > > That sounds rather like collections.ChainMap. > > So it does. Thanks for pointing this out; I wasn't aware of it. So as > of PYthon 3.3 you can do: > > from collections import ChainMap > > func(**ChainMap(dict1, dict2)) > > Although this does not have the behaviour suggested by the OP (an > error when a key is duplicated). > > > Oscar > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus at unterwaditzer.net Thu Jun 6 17:27:26 2013 From: markus at unterwaditzer.net (Markus Unterwaditzer) Date: Thu, 06 Jun 2013 17:27:26 +0200 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: <9d5e375e-ca96-4a2c-99f8-b7f44a98e42e@email.android.com> Ah, i didn't mean union then, but rather a behavior where the right-hand side dict overrides the values of the left-hand side. -- Markus (from phone) Oscar Benjamin wrote: >On 6 June 2013 15:54, Markus Unterwaditzer >wrote: >> This indicates for me that it generally should be possible to >generate the union of two dicts with sth like {} + {}. > >It cannot be a union since the dicts can have different values >correspond to the same keys. > >However you can make a class that will try several mappings >sequentially e.g.: > >class MapJoin: > def __init__(self, *mappings): > self.mappings = mappings > def __getitem__(self, key): > for mapping in self.mappings: > try: > return mapping[key] > except KeyError: > pass > else: > raise KeyError(str(key)) > # Perhaps other mapping methods here... > >Then you can do: > >func(**MapJoin(map1, map2)) > > >Oscar From markus at unterwaditzer.net Thu Jun 6 17:28:38 2013 From: markus at unterwaditzer.net (Markus Unterwaditzer) Date: Thu, 06 Jun 2013 17:28:38 +0200 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <51B0A491.5090906@mrabarnett.plus.com> References: <51B0A491.5090906@mrabarnett.plus.com> Message-ID: <7078067d-9b99-4bca-b2c8-44cf933b91b6@email.android.com> Yes, this was by suggestion, but rather using the + operator. -- Markus (from phone) MRAB wrote: >On 06/06/2013 15:46, Ram Rachum wrote: >> I'd like to be able to use ** twice (using 2 different dicts) when >> calling a function, and have them both feed arguments into the >function. >> >It's not possible to use * twice to feed 2 sets of positional >arguments, >but you can use + to concatenate them, so would a better solution be to >have a way to merge 2 dicts? > >For example: > > temp = dict1 | dict2 > >would have the same result as: > > temp = {} > temp.update(dict1) > temp.update(dict2) > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas From nathan at cmu.edu Thu Jun 6 17:31:35 2013 From: nathan at cmu.edu (Nathan Schneider) Date: Thu, 6 Jun 2013 11:31:35 -0400 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: On Thu, Jun 6, 2013 at 11:17 AM, Haoyi Li wrote: > > I read `dict1 | dict2` as a mapping that would try dict1 *or* dict2 if > the key is not in dict1. > > Idea: > > dict1 + dict2 -> dict2 takes priority > dict1 | dict2 -> dict1 takes priority > > Does that make sense to anyone? Both + and | are currently un-used iirc. > > It seems to me that adding *two* new ways of doing the same thing (in a different order) would only add to the confusion. Really, the issue is that neither the + or | idioms we are accustomed to for sequences and sets quite fit the dict use case. When this came up on the list in 2011, I suggested a new idiom using the << operator so as to make the priority less ambiguous.[1] But nobody expressed support for that idea, and the thread ultimately fizzled. Cheers, Nathan [1] http://mail.python.org/pipermail/python-ideas/2011-December/013232.html > > On Thu, Jun 6, 2013 at 11:11 AM, Oscar Benjamin < > oscar.j.benjamin at gmail.com> wrote: > >> On 6 June 2013 16:08, Zachary Ware >> wrote: >> >> However you can make a class that will try several mappings >> sequentially e.g.: >> > >> > That sounds rather like collections.ChainMap. >> >> So it does. Thanks for pointing this out; I wasn't aware of it. So as >> of PYthon 3.3 you can do: >> >> from collections import ChainMap >> >> func(**ChainMap(dict1, dict2)) >> >> Although this does not have the behaviour suggested by the OP (an >> error when a key is duplicated). >> >> >> Oscar >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus at unterwaditzer.net Thu Jun 6 17:35:02 2013 From: markus at unterwaditzer.net (Markus Unterwaditzer) Date: Thu, 06 Jun 2013 17:35:02 +0200 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> A + B would then be the same as B | A, but: There should be one way to do it... -- Markus (from phone) Haoyi Li wrote: >> I read `dict1 | dict2` as a mapping that would try dict1 *or* dict2 >if >the key is not in dict1. > >Idea: > >dict1 + dict2 -> dict2 takes priority >dict1 | dict2 -> dict1 takes priority > >Does that make sense to anyone? Both + and | are currently un-used >iirc. > > >On Thu, Jun 6, 2013 at 11:11 AM, Oscar Benjamin >wrote: > >> On 6 June 2013 16:08, Zachary Ware >wrote: >> >> However you can make a class that will try several mappings >> sequentially e.g.: >> > >> > That sounds rather like collections.ChainMap. >> >> So it does. Thanks for pointing this out; I wasn't aware of it. So as >> of PYthon 3.3 you can do: >> >> from collections import ChainMap >> >> func(**ChainMap(dict1, dict2)) >> >> Although this does not have the behaviour suggested by the OP (an >> error when a key is duplicated). >> >> >> Oscar >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas From haoyi.sg at gmail.com Thu Jun 6 17:43:18 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Thu, 6 Jun 2013 11:43:18 -0400 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> Message-ID: > There should be one way to do it... I'd agree except, well, there are two different things that people want to do here! So in the end you're gonna have to let people do both, somehow. I'd argue that providing two operators cuts down the number of ways people do these two things down to two ways, while not providing any operators leaves people to perform these tasks in a dozen different ways (see the SO thread I linked). Two ways for two things isn't too bad, when the status quo is a dozen ways of doing the same two things. -Haoy On Thu, Jun 6, 2013 at 11:35 AM, Markus Unterwaditzer < markus at unterwaditzer.net> wrote: > A + B would then be the same as B | A, but: > > There should be one way to do it... > > -- Markus (from phone) > > Haoyi Li wrote: > >> I read `dict1 | dict2` as a mapping that would try dict1 *or* dict2 > >if > >the key is not in dict1. > > > >Idea: > > > >dict1 + dict2 -> dict2 takes priority > >dict1 | dict2 -> dict1 takes priority > > > >Does that make sense to anyone? Both + and | are currently un-used > >iirc. > > > > > >On Thu, Jun 6, 2013 at 11:11 AM, Oscar Benjamin > >wrote: > > > >> On 6 June 2013 16:08, Zachary Ware > >wrote: > >> >> However you can make a class that will try several mappings > >> sequentially e.g.: > >> > > >> > That sounds rather like collections.ChainMap. > >> > >> So it does. Thanks for pointing this out; I wasn't aware of it. So as > >> of PYthon 3.3 you can do: > >> > >> from collections import ChainMap > >> > >> func(**ChainMap(dict1, dict2)) > >> > >> Although this does not have the behaviour suggested by the OP (an > >> error when a key is duplicated). > >> > >> > >> Oscar > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> http://mail.python.org/mailman/listinfo/python-ideas > >> > > > > > >------------------------------------------------------------------------ > > > >_______________________________________________ > >Python-ideas mailing list > >Python-ideas at python.org > >http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From larchange at gmail.com Thu Jun 6 17:49:05 2013 From: larchange at gmail.com (Gabriel AHTUNE) Date: Thu, 6 Jun 2013 23:49:05 +0800 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> Message-ID: 2013/6/6 Markus Unterwaditzer > A + B would then be the same as B | A, but: > There should be one way to do it... > ?Then I would prefers the + operator? ?because the "|" has a logic connotation and overwriting a value is logic independant... even if by twisting my mind I can understand from where this convention come from ??Gabriel AHTUNE -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Jun 6 18:03:03 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 06 Jun 2013 17:03:03 +0100 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: <51B0B2B7.4070306@mrabarnett.plus.com> On 06/06/2013 16:17, Haoyi Li wrote: > > I read `dict1 | dict2` as a mapping that would try dict1 *or* dict2 if > the key is not in dict1. > > Idea: > > dict1 + dict2 -> dict2 takes priority > dict1 | dict2 -> dict1 takes priority > > Does that make sense to anyone? Both + and | are currently un-used iirc. > It occurs to me that 'Counter' is dict-like, but already uses both + and |. Would there be any times when you want to merge Counter instances in the same manner? It could be confusing if you thought of them as dict-like but they didn't behave like dicts... From jsbueno at python.org.br Thu Jun 6 18:19:34 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 6 Jun 2013 13:19:34 -0300 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <51B0B2B7.4070306@mrabarnett.plus.com> References: <51B0B2B7.4070306@mrabarnett.plus.com> Message-ID: On 6 June 2013 13:03, MRAB wrote: > On 06/06/2013 16:17, Haoyi Li wrote: >> >> > I read `dict1 | dict2` as a mapping that would try dict1 *or* dict2 if >> the key is not in dict1. >> >> Idea: >> >> dict1 + dict2 -> dict2 takes priority >> dict1 | dict2 -> dict1 takes priority >> >> Does that make sense to anyone? Both + and | are currently un-used iirc. >> > It occurs to me that 'Counter' is dict-like, but already uses both + > and |. > > Would there be any times when you want to merge Counter instances in > the same manner? It could be confusing if you thought of them as > dict-like but they didn't behave like dicts... What if instead of simply checking if a key exists or not, these operators jsut operate themselves recursively into the values() ? It is not all unexpected, as "==" already does that - so "dct3 = dct1 + dct2" would actually perform: dct3 = dct1.copy() for k, v in dct2.items(): dct3[k] = dct3[k] + v if k in dct3 else v -In that case, it would make more sense to make use of "or" instead of "|" - although other binary logic and aritmetic operators could do the same. But that would bring no surprises to the already working-fine logic of counters. js -><- > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From ethan at stoneleaf.us Thu Jun 6 17:39:33 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 06 Jun 2013 08:39:33 -0700 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> Message-ID: <51B0AD35.2030406@stoneleaf.us> On 06/06/2013 08:35 AM, Markus Unterwaditzer wrote: > A + B would then be the same as B | A, but: > > There should be one way to do it... There should be *one obvious* way to do it. But using + for one way and | for the other is not obvious. -- ~Ethan~ From ethan at stoneleaf.us Thu Jun 6 18:43:58 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 06 Jun 2013 09:43:58 -0700 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <51B0B2B7.4070306@mrabarnett.plus.com> Message-ID: <51B0BC4E.9030907@stoneleaf.us> On 06/06/2013 09:19 AM, Joao S. O. Bueno wrote: > On 6 June 2013 13:03, MRAB wrote: >> On 06/06/2013 16:17, Haoyi Li wrote: >>> >>> > I read `dict1 | dict2` as a mapping that would try dict1 *or* dict2 if >>> the key is not in dict1. >>> >>> Idea: >>> >>> dict1 + dict2 -> dict2 takes priority >>> dict1 | dict2 -> dict1 takes priority >>> >>> Does that make sense to anyone? Both + and | are currently un-used iirc. >>> >> It occurs to me that 'Counter' is dict-like, but already uses both + >> and |. >> >> Would there be any times when you want to merge Counter instances in >> the same manner? It could be confusing if you thought of them as >> dict-like but they didn't behave like dicts... > > What if instead of simply checking if a key exists or not, these operators jsut > operate themselves recursively into the values() ? > > It is not all unexpected, as "==" already does that - > > so "dct3 = dct1 + dct2" would actually perform: > > dct3 = dct1.copy() > for k, v in dct2.items(): > dct3[k] = dct3[k] + v if k in dct3 else v > > -In that case, it would make more sense to make use of > "or" instead of "|" - although other binary logic and aritmetic > operators could do the same. > > But that would bring no surprises to the already working-fine logic > of counters. No surprises? What about when the function suddenly receives a list when it wasn't expecting one? Or did I totally misunderstand? -- ~Ethan~ From abarnert at yahoo.com Thu Jun 6 19:29:43 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 6 Jun 2013 10:29:43 -0700 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> Message-ID: <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> On Jun 6, 2013, at 8:43, Haoyi Li wrote: > > There should be one way to do it... > > I'd agree except, well, there are two different things that people want to do here! Actually, there's three: 1. Right values take precedence. 2. Left values take precedence. 3. Assert that there are no duplicate keys. I think the first is what people would expect, and want, more often, because it's exactly what you get when you do the obvious thing today. It also means a+=b means a.update(b), just as it means a.extend(b) for sequences--and note that there is no method that adds the new keys from b while leaving existing ones alone or raising an exception. But I also think that most of the time people won't care. If you write draw(x, y, **(image_keywords * window_keywords)), you've got two non overlapping sets of keys. > So in the end you're gonna have to let people do both, somehow. Or you can provide one obvious way to do the common case that people will use all the time, and explain how you do the less obvious cases when they occasionally come up. Also, note that if we implement case 1, it's pretty obvious how to do case 2: just do b+a instead of a+b. -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus at unterwaditzer.net Thu Jun 6 19:40:43 2013 From: markus at unterwaditzer.net (Markus Unterwaditzer) Date: Thu, 06 Jun 2013 19:40:43 +0200 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> Message-ID: <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> Actually i wouldn't expect += to be the same as dict.update, but it rather would create a new dictionary. -- Markus (from phone) Andrew Barnert wrote: >On Jun 6, 2013, at 8:43, Haoyi Li wrote: > >> > There should be one way to do it... >> >> I'd agree except, well, there are two different things that people >want to do here! > >Actually, there's three: > >1. Right values take precedence. >2. Left values take precedence. >3. Assert that there are no duplicate keys. > >I think the first is what people would expect, and want, more often, >because it's exactly what you get when you do the obvious thing today. >It also means a+=b means a.update(b), just as it means a.extend(b) for >sequences--and note that there is no method that adds the new keys from >b while leaving existing ones alone or raising an exception. > >But I also think that most of the time people won't care. If you write >draw(x, y, **(image_keywords * window_keywords)), you've got two non >overlapping sets of keys. > >> So in the end you're gonna have to let people do both, somehow. > >Or you can provide one obvious way to do the common case that people >will use all the time, and explain how you do the less obvious cases >when they occasionally come up. > >Also, note that if we implement case 1, it's pretty obvious how to do >case 2: just do b+a instead of a+b. From jeanpierreda at gmail.com Thu Jun 6 19:47:05 2013 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Thu, 6 Jun 2013 13:47:05 -0400 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> Message-ID: On Thu, Jun 6, 2013 at 1:40 PM, Markus Unterwaditzer wrote: > Actually i wouldn't expect += to be the same as dict.update, but it rather would create a new dictionary. That would be inconsistent with e.g. += on list (which is roughly equivalent to list.extend). -- Devin From haoyi.sg at gmail.com Thu Jun 6 19:53:50 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Thu, 6 Jun 2013 13:53:50 -0400 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> Message-ID: > Also, note that if we implement case 1, it's pretty obvious how to do case 2: just do b+a instead of a+b. I did not think of it like that, but now that you mention it, I agree 100%. b+a really isn't any harder than a+b. We won't need a second operator like | at all then. > No surprises? What about when the function suddenly receives a list when it wasn't expecting one? Or did I totally misunderstand? Then you have a bug, exactly like the status quo; I've had plenty of bugs writing list_a += dict_b instead of list_a += [dict_b], resulting in the *keys* of dict_b being appended to the list. Now that's a weird bug to have! > Actually i wouldn't expect += to be the same as dict.update, but it rather would create a new dictionary. Yeah, and I didn't expect list_a += list_b to be the same as list_a.extend(list_b) (I've had plenty of bugs from this, too!), but it is. I think dicts should be consistent with lists, even if I'd prefer it if both of them had a += b desugar into a = a + b. -Haoyi On Thu, Jun 6, 2013 at 1:40 PM, Markus Unterwaditzer < markus at unterwaditzer.net> wrote: > Actually i wouldn't expect += to be the same as dict.update, but it rather > would create a new dictionary. > > -- Markus (from phone) > > Andrew Barnert wrote: > >On Jun 6, 2013, at 8:43, Haoyi Li wrote: > > > >> > There should be one way to do it... > >> > >> I'd agree except, well, there are two different things that people > >want to do here! > > > >Actually, there's three: > > > >1. Right values take precedence. > >2. Left values take precedence. > >3. Assert that there are no duplicate keys. > > > >I think the first is what people would expect, and want, more often, > >because it's exactly what you get when you do the obvious thing today. > >It also means a+=b means a.update(b), just as it means a.extend(b) for > >sequences--and note that there is no method that adds the new keys from > >b while leaving existing ones alone or raising an exception. > > > >But I also think that most of the time people won't care. If you write > >draw(x, y, **(image_keywords * window_keywords)), you've got two non > >overlapping sets of keys. > > > >> So in the end you're gonna have to let people do both, somehow. > > > >Or you can provide one obvious way to do the common case that people > >will use all the time, and explain how you do the less obvious cases > >when they occasionally come up. > > > >Also, note that if we implement case 1, it's pretty obvious how to do > >case 2: just do b+a instead of a+b. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Jun 6 20:10:53 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 06 Jun 2013 19:10:53 +0100 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> Message-ID: <51B0D0AD.6040901@mrabarnett.plus.com> On 06/06/2013 18:47, Devin Jeanpierre wrote: > On Thu, Jun 6, 2013 at 1:40 PM, Markus Unterwaditzer > wrote: >> Actually i wouldn't expect += to be the same as dict.update, but it rather would create a new dictionary. > > That would be inconsistent with e.g. += on list (which is roughly > equivalent to list.extend). > With regard to '+=' vs '|=', '+=' on lists adds to what's already there, leaving existing items intact, but on dicts it wouldn't, which is why I think that '|=' might be clearer (I tend to think of it more like set union), but that's just my own view. From flying-sheep at web.de Thu Jun 6 20:19:23 2013 From: flying-sheep at web.de (Philipp A.) Date: Thu, 6 Jun 2013 20:19:23 +0200 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <51B0D0AD.6040901@mrabarnett.plus.com> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> <51B0D0AD.6040901@mrabarnett.plus.com> Message-ID: 2013/6/6 MRAB > With regard to '+=' vs '|=', '+=' on lists adds to what's already > there, leaving existing items intact, but on dicts it wouldn't, which > is why I think that '|=' might be clearer (I tend to think of it more > like set union), but that's just my own view. > += on deques with maxlength < ? *does* modify them. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Jun 6 20:36:23 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 6 Jun 2013 11:36:23 -0700 (PDT) Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> Message-ID: <1370543783.40384.YahooMailNeo@web184702.mail.ne1.yahoo.com> On Thu, Jun 6, 2013 at 1:40 PM, Markus Unterwaditzer??wrote: >>?Actually i wouldn't expect += to be the same as dict.update, but it rather would create a new dictionary. Why? The whole point of += is that it does a mutating in-place addition. That's how it works with all other mutable types; why would it work differently with dict? The reason we don't have this today is that it's not necessarily clear what "addition" means for dicts; it's completely clear what "mutable addition" would mean if we knew what "addition" meant. From: Haoyi Li Sent: Thursday, June 6, 2013 10:53 AM >Yeah, and I didn't expect list_a += list_b to be the same as list_a.extend(list_b) (I've had plenty of bugs from this, too!), but it is. I think dicts should be consistent with lists, even if I'd prefer it if both of them had a += b desugar into a = a + b. That would make += misleading. In any other language with a += operator, it mutates. (And pure immutable languages don't have a += operator.)?That's why we have __iadd__ and friends in the first place. In fact, I'd almost prefer it if a += b _never_ desugared into a = a + b (that is, if the default implementation of __iadd__ were to raise NotImplemented instead of to call __add__), but I understand why it's useful for, e.g., teaching novices with integer variables. In general, for mutable objects, += is the primitive operation (extend, update, etc.), and + is conceptually "copy, then += the copy". (But of course it's often more efficient or more readable to implement each of them independently.) From haoyi.sg at gmail.com Thu Jun 6 20:46:48 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Thu, 6 Jun 2013 14:46:48 -0400 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <1370543783.40384.YahooMailNeo@web184702.mail.ne1.yahoo.com> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> <1370543783.40384.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: > That would make += misleading. In any other language with a += operator, it mutates Does it? All the languages which only allow += on immutable values (e.g. Java, C, Javascript) obviously don't mutate anything. C# does a straight desugar into a = a + b. Scala allows you to override it separately, but the vast majority of them (by default) are simple desugars into a = a + b. Not familiar with how C++ does it, perhaps someone could chime in? I don't know the answer, but I don't think it's obvious at all that += mutates. What (non-python) background are you coming from where this is common practice? Anyway, this is all bikeshedding. Regardless of how the AugAssign operators work, I think that having dict_a + dict_b merge dict_a and dict_b, with dict_a's keys taking precedence, would be a wonderful thing! On Thu, Jun 6, 2013 at 2:36 PM, Andrew Barnert wrote: > On Thu, Jun 6, 2013 at 1:40 PM, Markus Unterwaditzer < > markus at unterwaditzer.net> wrote: > > >> Actually i wouldn't expect += to be the same as dict.update, but it > rather would create a new dictionary. > > Why? The whole point of += is that it does a mutating in-place addition. > That's how it works with all other mutable types; why would it work > differently with dict? > > The reason we don't have this today is that it's not necessarily clear > what "addition" means for dicts; it's completely clear what "mutable > addition" would mean if we knew what "addition" meant. > > > From: Haoyi Li > Sent: Thursday, June 6, 2013 10:53 AM > >Yeah, and I didn't expect list_a += list_b to be the same as > list_a.extend(list_b) (I've had plenty of bugs from this, too!), but it is. > I think dicts should be consistent with lists, even if I'd prefer it if > both of them had a += b desugar into a = a + b. > > > That would make += misleading. In any other language with a += operator, > it mutates. (And pure immutable languages don't have a += operator.) That's > why we have __iadd__ and friends in the first place. > > In fact, I'd almost prefer it if a += b _never_ desugared into a = a + b > (that is, if the default implementation of __iadd__ were to raise > NotImplemented instead of to call __add__), but I understand why it's > useful for, e.g., teaching novices with integer variables. > > > In general, for mutable objects, += is the primitive operation (extend, > update, etc.), and + is conceptually "copy, then += the copy". (But of > course it's often more efficient or more readable to implement each of them > independently.) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Thu Jun 6 20:51:54 2013 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Thu, 6 Jun 2013 14:51:54 -0400 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <1370543783.40384.YahooMailNeo@web184702.mail.ne1.yahoo.com> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> <1370543783.40384.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: On Thu, Jun 6, 2013 at 2:36 PM, Andrew Barnert wrote: > That would make += misleading. In any other language with a += operator, it mutates. (And pure immutable languages don't have a += operator.) That's why we have __iadd__ and friends in the first place. Can you name a language other than Python where `a += b` usually does something different for mutable values from `a = a + b`, because of a difference in how the + and += operators work w.r.t. if/how they mutate their arguments? I can't think of any. -- Devin From ethan at stoneleaf.us Thu Jun 6 19:50:56 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 06 Jun 2013 10:50:56 -0700 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> Message-ID: <51B0CC00.5020503@stoneleaf.us> On 06/06/2013 10:40 AM, Markus Unterwaditzer wrote: > > Actually i wouldn't expect += to be the same as dict.update, but it rather would create a new dictionary. += is an in-place instruction. The only time it doesn't (shouldn't) modify the object is when the object is immutable (such as an int). Having a mutable object not be updated in place would be very surprising. -- ~Ethan~ From jsbueno at python.org.br Thu Jun 6 21:20:25 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 6 Jun 2013 16:20:25 -0300 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <51B0BC4E.9030907@stoneleaf.us> References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> Message-ID: On 6 June 2013 13:43, Ethan Furman wrote: > On 06/06/2013 09:19 AM, Joao S. O. Bueno wrote: >> >> On 6 June 2013 13:03, MRAB wrote: >>> >>> On 06/06/2013 16:17, Haoyi Li wrote: >>>> >>>> >>>> > I read `dict1 | dict2` as a mapping that would try dict1 *or* dict2 >>>> if >>>> the key is not in dict1. >>>> >>>> Idea: >>>> >>>> dict1 + dict2 -> dict2 takes priority >>>> dict1 | dict2 -> dict1 takes priority >>>> >>>> Does that make sense to anyone? Both + and | are currently un-used iirc. >>>> >>> It occurs to me that 'Counter' is dict-like, but already uses both + >>> and |. >>> >>> Would there be any times when you want to merge Counter instances in >>> the same manner? It could be confusing if you thought of them as >>> dict-like but they didn't behave like dicts... >> >> >> What if instead of simply checking if a key exists or not, these operators >> jsut >> operate themselves recursively into the values() ? >> >> It is not all unexpected, as "==" already does that - >> >> so "dct3 = dct1 + dct2" would actually perform: >> >> dct3 = dct1.copy() >> for k, v in dct2.items(): >> dct3[k] = dct3[k] + v if k in dct3 else v >> >> -In that case, it would make more sense to make use of >> "or" instead of "|" - although other binary logic and aritmetic >> operators could do the same. >> >> But that would bring no surprises to the already working-fine logic >> of counters. > > > No surprises? What about when the function suddenly receives a list when it > wasn't expecting one? Or did I totally misunderstand? That is no surprises - there won't be a list in there if it was not the programer's intention: >>> dict1["arg1"] = 5 >>> dict2["arg1"] = 3 >>>dict1 + dict2 {"arg1": 8} >>> dict1["arg1"] = [5] >>> dict2["arg1"] = 3 >>> dict1 + dict2 TypeError: can only concatenate list (not "int") to list >>> dict1 or dict2 {"arg1": [5]} >>> dict1 and dict2 {"arg1": 3} ------------------ But them, I agree that it feels loose - although apropriate. So maybe, an "operate" method that would perform the operations above for each item, and still allow one to specify the behavior when one key is missing - I think it would resolve the question for the need for one or more ambiguous opertors = dict.operate (or dict.foreach) -semantically equivalent to : class dict: (...) def operate(self, other, operator=None, default=None): res = self.copy() for k, v in other: if default == "overwrite": res[k] = v elif k in res and default is None: res[k] = operator(res[k], v) # + extra logic for the default behavior, like leaving only the intersection # of the keys, and so on return res Thus, the problem of the O.P. would be reduced to: my_func(param1, **dict1.operate(dict2, default="overwrite") ) And there would be many other possibilities and no ambiguities. > > -- > ~Ethan~ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From flying-sheep at web.de Thu Jun 6 21:23:28 2013 From: flying-sheep at web.de (Philipp A.) Date: Thu, 6 Jun 2013 21:23:28 +0200 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <51B0CC00.5020503@stoneleaf.us> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> <51B0CC00.5020503@stoneleaf.us> Message-ID: 2013/6/6 Ethan Furman > += is an in-place instruction. The only time it doesn't (shouldn't) > modify the object is when the object is immutable (such as an int). > > Having a mutable object not be updated in place would be very surprising. > you are right. obj1 += obj2 is different from obj1 = obj1 + obj2, because it just calls a method (__iadd__) on obj1 instead of calling a method (__add__) AND reassigning the name ?obj1? to that method?s return value. 2013/6/6 Devin Jeanpierre > Can you name a language other than Python where `a += b` usually does > something different for mutable values from `a = a + b`, because of a > difference in how the + and += operators work w.r.t. if/how they mutate > their arguments? I can't think of any. > scala. it works exactly like python, i.e. that += mutates objects without rebinding the variable name, and + computing a new value and binding the variable to it. e.g.: val s = StringBuffer("mutable constant") s += "foo" //s.+=("foo") is called var s = "immutable variable" s += "foo" //Straing has no method ?+=?, so it gets transformed to s = s + "foo" val s = "immutable constant" s += "foo" //String has no method ?+=?, so it gets transformed to s = s + "foo" ? compile error due to trying to rebind constant -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilmiacs at gmail.com Thu Jun 6 21:33:57 2013 From: ilmiacs at gmail.com (Peter Jung) Date: Thu, 6 Jun 2013 21:33:57 +0200 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> <1370543783.40384.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: <0530ED6A-F500-4585-AA26-D3BE3ABDEF60@gmail.com> Immutability of an object is defined as the guarantee that the object cannot change state after creation. > All the languages which only allow += on immutable values therefore is a misinterpretation of the term immutable. Example: In C with int a = 42; rhe variable a is not immutable per definition because after a += 1; the state did change to 43. I'd take "+= is not allowed" as synonymous to "is immutable". Anyway, I'd be superglad to have A + B on dicts. My intuition tells, the left operand should be evaluated first, so the keys of B should win. Am 06.06.2013 um 20:46 schrieb Haoyi Li : > > That would make += misleading. In any other language with a += operator, it mutates > > Does it? All the languages which only allow += on immutable values (e.g. Java, C, Javascript) obviously don't mutate anything. C# does a straight desugar into a = a + b. Scala allows you to override it separately, but the vast majority of them (by default) are simple desugars into a = a + b. Not familiar with how C++ does it, perhaps someone could chime in? > > I don't know the answer, but I don't think it's obvious at all that += mutates. What (non-python) background are you coming from where this is common practice? > > Anyway, this is all bikeshedding. Regardless of how the AugAssign operators work, I think that having dict_a + dict_b merge dict_a and dict_b, with dict_a's keys taking precedence, would be a wonderful thing! > > > > > On Thu, Jun 6, 2013 at 2:36 PM, Andrew Barnert wrote: >> On Thu, Jun 6, 2013 at 1:40 PM, Markus Unterwaditzer wrote: >> >> >> Actually i wouldn't expect += to be the same as dict.update, but it rather would create a new dictionary. >> >> Why? The whole point of += is that it does a mutating in-place addition. That's how it works with all other mutable types; why would it work differently with dict? >> >> The reason we don't have this today is that it's not necessarily clear what "addition" means for dicts; it's completely clear what "mutable addition" would mean if we knew what "addition" meant. >> >> >> From: Haoyi Li >> Sent: Thursday, June 6, 2013 10:53 AM >> >Yeah, and I didn't expect list_a += list_b to be the same as list_a.extend(list_b) (I've had plenty of bugs from this, too!), but it is. I think dicts should be consistent with lists, even if I'd prefer it if both of them had a += b desugar into a = a + b. >> >> >> That would make += misleading. In any other language with a += operator, it mutates. (And pure immutable languages don't have a += operator.) That's why we have __iadd__ and friends in the first place. >> >> In fact, I'd almost prefer it if a += b _never_ desugared into a = a + b (that is, if the default implementation of __iadd__ were to raise NotImplemented instead of to call __add__), but I understand why it's useful for, e.g., teaching novices with integer variables. >> >> >> In general, for mutable objects, += is the primitive operation (extend, update, etc.), and + is conceptually "copy, then += the copy". (But of course it's often more efficient or more readable to implement each of them independently.) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From vernondcole at gmail.com Thu Jun 6 21:36:41 2013 From: vernondcole at gmail.com (Vernon D. Cole) Date: Thu, 6 Jun 2013 13:36:41 -0600 Subject: [Python-ideas] Allow using ** twice Message-ID: On Thu, Jun 6, 2013 at 12:47 PM, wrote: > > Does it? All the languages which only allow += on immutable values (e.g. > Java, C, Javascript) obviously don't mutate anything. Eh? It's been twenty years or so since I programmed C for a living, but I seem to remember that a C variable is actually only a handy label for a memory location, and that therefore *everything* is mutable -- even when it shouldn't be. And += was invariably implemented in machine language by a two-argument add straight to that memory location. Some of the early C compilers would even allow you to use a literal on the left side, like "2 += 3" which would change the value of the literal '2' to become five. Made for some interesting bugs! Ignoring all of that, if + worked on dictionaries, I would expect that duplicate keys would receive the value from the right-hand argument, and that the operation would create a new dictionary. (and I think it would be a great +1 idea.) -- Vernon Cole -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Jun 6 21:51:06 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 06 Jun 2013 20:51:06 +0100 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> <1370543783.40384.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: <51B0E82A.7010107@mrabarnett.plus.com> On 06/06/2013 19:46, Haoyi Li wrote: > > That would make += misleading. In any other language with a += > operator, it mutates > > Does it? All the languages which only allow += on immutable values (e.g. > Java, C, Javascript) obviously don't mutate anything. C# does a straight > desugar into a = a + b. Scala allows you to override it separately, but > the vast majority of them (by default) are simple desugars into a = a + > b. Not familiar with how C++ does it, perhaps someone could chime in? > > I don't know the answer, but I don't think it's obvious at all that += > mutates. What (non-python) background are you coming from where this is > common practice? > > Anyway, this is all bikeshedding. Regardless of how the AugAssign > operators work, I think that having dict_a + dict_b merge dict_a and > dict_b, with dict_a's keys taking precedence, would be a wonderful thing! > Except that dict a'b keys should take precedence. :-) From ethan at stoneleaf.us Thu Jun 6 21:19:24 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 06 Jun 2013 12:19:24 -0700 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> <1370543783.40384.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: <51B0E0BC.6080400@stoneleaf.us> On 06/06/2013 11:46 AM, Haoyi Li wrote: >>That would make += misleading. In any other language with a += operator, it mutates > > Does it? Which is all irrelevant. Python already has a well-defined meaning for += and friends, which is in-place mutation*. *Immutables, of course, fall back to the corresponding non-mutatation methods. -- ~Ethan~ From ilmiacs at gmail.com Thu Jun 6 21:56:32 2013 From: ilmiacs at gmail.com (Peter Jung) Date: Thu, 6 Jun 2013 21:56:32 +0200 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: Am 06.06.2013 um 21:36 schrieb "Vernon D. Cole" : > > On Thu, Jun 6, 2013 at 12:47 PM, wrote: >> >> Does it? All the languages which only allow += on immutable values (e.g. >> Java, C, Javascript) obviously don't mutate anything. > > Eh? It's been twenty years or so since I programmed C for a living, but I seem to remember that a C variable is actually only a handy label for a memory location, and that therefore *everything* is mutable -- even when it shouldn't be. And += was invariably implemented in machine language by a two-argument add straight to that memory location. Some of the early C compilers would even allow you to use a literal on the left side, like "2 += 3" which would change the value of the literal '2' to become five. Made for some interesting bugs! > Agreed 100% > Ignoring all of that, if + worked on dictionaries, I would expect that duplicate keys would receive the value from the right-hand argument, this in fact would be the only order consistent with the evaluation order defined in python: A+B+C is interpreted as (A+B)+C, always. > and that the operation would create a new dictionary. (and I think it would be a great +1 idea.) I disagree. With C=A A+=B I'd expect A==C still to be true, whatever B would be. This is how all nonatomic types in python behave (the symbol is just a pointer) and a dict is not atomic by any means. > -- > Vernon Cole > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jun 6 21:41:35 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 06 Jun 2013 12:41:35 -0700 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> Message-ID: <51B0E5EF.4010508@stoneleaf.us> On 06/06/2013 12:20 PM, Joao S. O. Bueno wrote: > On 6 June 2013 13:43, Ethan Furman wrote: >> On 06/06/2013 09:19 AM, Joao S. O. Bueno wrote: >>> >>> What if instead of simply checking if a key exists or not, these operators >>> just operate themselves recursively into the values() ? >>> >>> It is not all unexpected, as "==" already does that - I still don't understand this comment on '=='. >>> so "dct3 = dct1 + dct2" would actually perform: >>> >>> dct3 = dct1.copy() >>> for k, v in dct2.items(): >>> dct3[k] = dct3[k] + v if k in dct3 else v >>> >>> -In that case, it would make more sense to make use of >>> "or" instead of "|" - although other binary logic and aritmetic >>> operators could do the same. >>> >>> But that would bring no surprises to the already working-fine logic >>> of counters. >> >> >> No surprises? What about when the function suddenly receives a list when it >> wasn't expecting one? Or did I totally misunderstand? > > That is no surprises - there won't be a list in there if it was not > the programer's intention: > >>>> dict1["arg1"] = 5 >>>> dict2["arg1"] = 3 >>>> dict1 + dict2 > {"arg1": 8} This strikes me as very surprising. And what happens when arg1 is a str? Or some other interesting datatype? -1 from me. -- ~Ethan~ From jeanpierreda at gmail.com Thu Jun 6 23:30:41 2013 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Thu, 6 Jun 2013 17:30:41 -0400 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <0530ED6A-F500-4585-AA26-D3BE3ABDEF60@gmail.com> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> <1370543783.40384.YahooMailNeo@web184702.mail.ne1.yahoo.com> <0530ED6A-F500-4585-AA26-D3BE3ABDEF60@gmail.com> Message-ID: On Thu, Jun 6, 2013 at 3:33 PM, Peter Jung wrote: > I'd take "+= is not allowed" as synonymous to "is immutable". >>> a = (1,2) >>> a += (3, 4) >>> a (1, 2, 3, 4) By this definition, tuples are not immutable. That would be nonstandard terminology. I think you've confused "mutability" of variables (more clearly: rebindability) with mutability of objects. -- Devin From jsbueno at python.org.br Thu Jun 6 23:36:51 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 6 Jun 2013 18:36:51 -0300 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <51B0E5EF.4010508@stoneleaf.us> References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> Message-ID: On 6 June 2013 16:41, Ethan Furman wrote: > On 06/06/2013 12:20 PM, Joao S. O. Bueno wrote: >> >> On 6 June 2013 13:43, Ethan Furman wrote: >>> >>> On 06/06/2013 09:19 AM, Joao S. O. Bueno wrote: >>>> >>>> >>>> What if instead of simply checking if a key exists or not, these >>>> operators >>>> just operate themselves recursively into the values() ? >>>> >>>> >>>> It is not all unexpected, as "==" already does that - > > > I still don't understand this comment on '=='. >>> a = {"a": 1, "b": [2,3]} >>> b = {"a": 1, "b": [2,3]} >>> a == b True > > > >>>> so "dct3 = dct1 + dct2" would actually perform: >>>> >>>> dct3 = dct1.copy() >>>> for k, v in dct2.items(): >>>> dct3[k] = dct3[k] + v if k in dct3 else v >>>> >>>> -In that case, it would make more sense to make use of >>>> "or" instead of "|" - although other binary logic and aritmetic >>>> operators could do the same. >>>> >>>> But that would bring no surprises to the already working-fine logic >>>> of counters. >>> >>> >>> >>> No surprises? What about when the function suddenly receives a list when >>> it >>> wasn't expecting one? Or did I totally misunderstand? >> >> >> That is no surprises - there won't be a list in there if it was not >> the programer's intention: >> >>>>> dict1["arg1"] = 5 >>>>> dict2["arg1"] = 3 >>>>> dict1 + dict2 >> >> {"arg1": 8} > > > This strikes me as very surprising. And what happens when arg1 is a str? Or > some other interesting datatype? > What happens when you try to do "+" with incompatible data types? You would get a TypeError raised. That is it. Now - I think the proposed behavior on the thread for "dict1 + dict2" is allright - but a specialized method like the one you truncated on this reply would allow better control of what happens - otoh, one can easily subclass dict if such functionality is needed - I am withdrawing both proposals - both the enhanced "+" and the "operate" dict method. > -1 from me. > > > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From phd at phdru.name Thu Jun 6 23:45:31 2013 From: phd at phdru.name (Oleg Broytman) Date: Fri, 7 Jun 2013 01:45:31 +0400 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <021943c5-a418-4c90-83d6-d643700108cb@email.android.com> <1370543783.40384.YahooMailNeo@web184702.mail.ne1.yahoo.com> <0530ED6A-F500-4585-AA26-D3BE3ABDEF60@gmail.com> Message-ID: <20130606214531.GA21935@iskra.aviel.ru> On Thu, Jun 06, 2013 at 05:30:41PM -0400, Devin Jeanpierre wrote: > On Thu, Jun 6, 2013 at 3:33 PM, Peter Jung wrote: > > I'd take "+= is not allowed" as synonymous to "is immutable". > > >>> a = (1,2) > >>> a += (3, 4) > >>> a > (1, 2, 3, 4) > > By this definition, tuples are not immutable. That would be > nonstandard terminology. > > I think you've confused "mutability" of variables (more clearly: > rebindability) with mutability of objects. It's quite easy to explain the difference: >>> l = [1, 2] >>> id(l) 3071923884L >>> l += [3, 4] >>> id(l) 3071923884L >>> >>> t = (1, 2) >>> id(t) 3074977068L >>> t += (3, 4) >>> id(t) 3075097020L Lists are updated in-place. Tuples are recreated. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From python at mrabarnett.plus.com Fri Jun 7 00:03:51 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 06 Jun 2013 23:03:51 +0100 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <51B0E5EF.4010508@stoneleaf.us> References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> Message-ID: <51B10747.8000502@mrabarnett.plus.com> On 06/06/2013 20:41, Ethan Furman wrote: > On 06/06/2013 12:20 PM, Joao S. O. Bueno wrote: >> On 6 June 2013 13:43, Ethan Furman wrote: >>> On 06/06/2013 09:19 AM, Joao S. O. Bueno wrote: >>>> >>>> What if instead of simply checking if a key exists or not, these operators >>>> just operate themselves recursively into the values() ? >>>> >>>> It is not all unexpected, as "==" already does that - > > I still don't understand this comment on '=='. > > >>>> so "dct3 = dct1 + dct2" would actually perform: >>>> >>>> dct3 = dct1.copy() >>>> for k, v in dct2.items(): >>>> dct3[k] = dct3[k] + v if k in dct3 else v >>>> >>>> -In that case, it would make more sense to make use of >>>> "or" instead of "|" - although other binary logic and aritmetic >>>> operators could do the same. >>>> >>>> But that would bring no surprises to the already working-fine logic >>>> of counters. >>> >>> >>> No surprises? What about when the function suddenly receives a list when it >>> wasn't expecting one? Or did I totally misunderstand? >> >> That is no surprises - there won't be a list in there if it was not >> the programer's intention: >> >>>>> dict1["arg1"] = 5 >>>>> dict2["arg1"] = 3 >>>>> dict1 + dict2 >> {"arg1": 8} > > This strikes me as very surprising. And what happens when arg1 is a str? Or some other interesting datatype? > > -1 from me. > If you wanted to do that you'd use a Counter: >>> from collections import Counter >>> dict1 = Counter() >>> dict2 = Counter() >>> dict1["arg1"] = 5 >>> dict2["arg1"] = 3 >>> dict1 + dict2 Counter({'arg1': 8}) As for dicts, -1. From ncoghlan at gmail.com Fri Jun 7 00:38:48 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 7 Jun 2013 08:38:48 +1000 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <51B10747.8000502@mrabarnett.plus.com> References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> <51B10747.8000502@mrabarnett.plus.com> Message-ID: > If you wanted to do that you'd use a Counter: > > >>> from collections import Counter > >>> dict1 = Counter() > >>> dict2 = Counter() > > >>> dict1["arg1"] = 5 > >>> dict2["arg1"] = 3 > >>> dict1 + dict2 > Counter({'arg1': 8}) > > As for dicts, -1. *If* anything were to change for dicts, it would be to change or add methods (such as a new alternative constructor) rather than add operator support. Cheers, Nick. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Jun 7 02:06:57 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 07 Jun 2013 10:06:57 +1000 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: <51B12421.20009@pearwood.info> On 07/06/13 00:54, Markus Unterwaditzer wrote: > This indicates for me that it generally should be possible to generate the union of two dicts with sth like {} + {}. What is the union of two dicts? Given: a = {'a':1, 'b':2, 'c':3} b = {'b':20, 'c':30, 'd':40} the union a+b could be any of: {'a':1, 'b':2, 'c':3, 'd':40} {'a':1, 'b':20, 'c':30, 'd':40} {'a':1, 'b':22, 'c':33, 'd':40} raise an exception due to duplicate keys and I am sure that there are use-cases for all four, or more, strategies. Since all of these can be easily done with a small helper function, that is probably the best way to do this. E.g. to implement the first behaviour: def merge(*dicts): D = {} for d in reversed(dicts): D.update(d) return D and then: spam(**merge(a, b)) solves the O.P.'s problem nicely. So I am -1 on both the original suggestion spam(**a, **b) and any particular union or merge operator for dicts. -- Steven From steve at pearwood.info Fri Jun 7 02:18:15 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 07 Jun 2013 10:18:15 +1000 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: <51B126C7.7050903@pearwood.info> On 07/06/13 05:56, Peter Jung wrote: > I disagree. With > > C=A > A+=B > > I'd expect A==C still to be true, whatever B would be. > This is how all nonatomic types in python behave (the symbol is just a pointer) and a dict is not atomic by any means. Strings are not atomic, nor are tuples: py> a = () py> c = a py> a += (1,) py> a == c False The relevant distinction here is not atomic or nonatomic, but mutable or immutable. -- Steven From steve at pearwood.info Fri Jun 7 02:21:33 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 07 Jun 2013 10:21:33 +1000 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: Message-ID: <51B1278D.7080003@pearwood.info> On 07/06/13 05:36, Vernon D. Cole wrote: > Some of the early C > compilers would even allow you to use a literal on the left side, like "2 > += 3" which would change the value of the literal '2' to become five. > Made for some interesting bugs! I have never heard this said about C. Are you sure you aren't thinking of Fortran? There was a well-known issue in *very* early Fortran compiles, FORTRAN 63 or even older if I remember correctly, that you could assign to a literal constant and change it. I would provide a link, but my google-fu is failing me at the moment. -- Steven From ncoghlan at gmail.com Fri Jun 7 04:27:49 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 7 Jun 2013 12:27:49 +1000 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> <51B10747.8000502@mrabarnett.plus.com> Message-ID: On 7 June 2013 08:38, Nick Coghlan wrote: > > > If you wanted to do that you'd use a Counter: > > > > >>> from collections import Counter > > >>> dict1 = Counter() > > >>> dict2 = Counter() > > > > >>> dict1["arg1"] = 5 > > >>> dict2["arg1"] = 3 > > >>> dict1 + dict2 > > Counter({'arg1': 8}) > > > > As for dicts, -1. > > *If* anything were to change for dicts, it would be to change or add methods (such as a new alternative constructor) rather than add operator support. In an attempt to frame the discussion more productively: 1. Python already has a well-defined notion of what it means to merge two dictionaries: d1.update(d2) 2. This notion is asymmetric, thus it makes sense to use an asymmetric notation for it (specifically, a method call) rather than a traditionally symmetric binary operator notation like + or |. 3. However, like other mutating methods on builtin types, dict.update does *not* return a reference to the original object (this is deliberate, to encourage the treatment of containers as immutable when using a functional programming style) 4. Thus, just as sorted() was added as a functional counterpart to list.sort, is there something that can be added as a functional counterpart to dict.update? There are a few available responses to this kind of question: 1. Do nothing and preserve the status quo 2. Add a new standard library function. 3. Add a new non-mutating instance method. 4. Change the instance constructor (or add an alternate constructor). 5. Add a new builtin 6. Add new syntax Merging dictionaries isn't a common enough use case for options 5 or 6 to be on the table. The signature of the dict constructor (and that of dict.update) is already incredibly complicated, so you really wouldn't want to add support for multiple positional arguments to those. Coming up with a good name for an alternate constructor is also difficult, effectively ruling out option 4 as well. The following variants (based on Haoyi's list of existing alternatives) pretty much cover option 1: z = dict(x.items() + y.items()) z = dict(ChainMap(y, x)) # Note the reversed order of the arguments! z = copy_and_update(x, y) Where ChainMap is collections.ChainMap and copy_and_update is something like: def copy_and_update(base, *others): result = base.copy() for other in others: result.update(other) return result That last alternative also suggests possible names for a standard library function (collections.copy_and_update) or a new instance method (dict.copy_and_update and collections.Mapping.copy_and_update). The need to reverse the arguments to ChainMap to get the standard update behaviour, together with the fact an instance method would need to be implemented in at least two places, means that I am +0 on the idea of adding a helper function to collections (in the spirit of providing one-obvious-way to do it), -0 for adding an instance method or retaining the status quo indefinitely, but -1 for the other alternatives. Cheers, Nick. > > Cheers, > Nick. > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From vince.vinet at gmail.com Fri Jun 7 05:02:34 2013 From: vince.vinet at gmail.com (Vince Vinet) Date: Thu, 6 Jun 2013 23:02:34 -0400 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> <51B10747.8000502@mrabarnett.plus.com> Message-ID: On Thu, Jun 6, 2013 at 10:27 PM, Nick Coghlan wrote: > > In an attempt to frame the discussion more productively: > > 1. Python already has a well-defined notion of what it means to merge > two dictionaries: d1.update(d2) > In the context of the original post and using ** multiple times, I am not so sure about what it would actually mean to provide the same argument multiple times. Let us consider the following call: >>> foo(a=1, **{"a":2}) Traceback (most recent call last): ... TypeError: foo() got multiple values for keyword argument 'a' If functions were to support multiple **kw, an error on multiple keys would seem (to me) to be the current behavior. I do not think everyone would have the same expectation as me concerning what should happen, or that any particular behavior is obvious. As far as providing other ways to do the call itself, functools.partial is yet another... partial(foo, **d1)(**d2) which overrides d1 with d2. -Vince -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Fri Jun 7 05:08:52 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Thu, 6 Jun 2013 23:08:52 -0400 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> <51B10747.8000502@mrabarnett.plus.com> Message-ID: I agree with everything Nick said. Here's another idea I'll throw out there for option #3: dict_a.updated(dict_b) - Asymmetry is obvious (In contrast to `dict_a + dict_b` or `updated(dict_a, dict_b)`) - Concise with obvious intent (contrast with the various snippets pulled from SO) - update/updated mirrors sort/sorted reverse/reversed (contrast to `x.update(y)`/`collections.copy_and_update(x, y)`) There's a bit of unconventionality in that it's a method rather than a builtin (bullet 3), but even that has benefits in terms of making the asymmetry clear (bullet 1). On Thu, Jun 6, 2013 at 10:27 PM, Nick Coghlan wrote: > On 7 June 2013 08:38, Nick Coghlan wrote: > > > > > If you wanted to do that you'd use a Counter: > > > > > > >>> from collections import Counter > > > >>> dict1 = Counter() > > > >>> dict2 = Counter() > > > > > > >>> dict1["arg1"] = 5 > > > >>> dict2["arg1"] = 3 > > > >>> dict1 + dict2 > > > Counter({'arg1': 8}) > > > > > > As for dicts, -1. > > > > *If* anything were to change for dicts, it would be to change or add > methods (such as a new alternative constructor) rather than add operator > support. > > In an attempt to frame the discussion more productively: > > 1. Python already has a well-defined notion of what it means to merge > two dictionaries: d1.update(d2) > > 2. This notion is asymmetric, thus it makes sense to use an asymmetric > notation for it (specifically, a method call) rather than a > traditionally symmetric binary operator notation like + or |. > > 3. However, like other mutating methods on builtin types, dict.update > does *not* return a reference to the original object (this is > deliberate, to encourage the treatment of containers as immutable when > using a functional programming style) > > 4. Thus, just as sorted() was added as a functional counterpart to > list.sort, is there something that can be added as a functional > counterpart to dict.update? > > There are a few available responses to this kind of question: > > 1. Do nothing and preserve the status quo > 2. Add a new standard library function. > 3. Add a new non-mutating instance method. > 4. Change the instance constructor (or add an alternate constructor). > 5. Add a new builtin > 6. Add new syntax > > Merging dictionaries isn't a common enough use case for options 5 or 6 > to be on the table. > > The signature of the dict constructor (and that of dict.update) is > already incredibly complicated, so you really wouldn't want to add > support for multiple positional arguments to those. Coming up with a > good name for an alternate constructor is also difficult, effectively > ruling out option 4 as well. > > The following variants (based on Haoyi's list of existing > alternatives) pretty much cover option 1: > > z = dict(x.items() + y.items()) > z = dict(ChainMap(y, x)) # Note the reversed order of the arguments! > z = copy_and_update(x, y) > > Where ChainMap is collections.ChainMap and copy_and_update is something > like: > > def copy_and_update(base, *others): > result = base.copy() > for other in others: > result.update(other) > return result > > That last alternative also suggests possible names for a standard > library function (collections.copy_and_update) or a new instance > method (dict.copy_and_update and collections.Mapping.copy_and_update). > > The need to reverse the arguments to ChainMap to get the standard > update behaviour, together with the fact an instance method would need > to be implemented in at least two places, means that I am +0 on the > idea of adding a helper function to collections (in the spirit of > providing one-obvious-way to do it), -0 for adding an instance method > or retaining the status quo indefinitely, but -1 for the other > alternatives. > > Cheers, > Nick. > > > > > > Cheers, > > Nick. > > > > > > > > > > > _______________________________________________ > > > Python-ideas mailing list > > > Python-ideas at python.org > > > http://mail.python.org/mailman/listinfo/python-ideas > > > > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jun 7 05:25:29 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 7 Jun 2013 13:25:29 +1000 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> <51B10747.8000502@mrabarnett.plus.com> Message-ID: On 7 June 2013 13:02, Vince Vinet wrote: > > On Thu, Jun 6, 2013 at 10:27 PM, Nick Coghlan wrote: >> >> In an attempt to frame the discussion more productively: >> >> 1. Python already has a well-defined notion of what it means to merge >> two dictionaries: d1.update(d2) > > > In the context of the original post and using ** multiple times That syntax change is never going to happen, and hence has no further implications for any design changes. I explained why new syntax isn't an option in my post. All current mechanisms involved merging the dictionaries *first*, and thus always use the principle of rebinding keys to refer to different values (unless someone completely ignores dict.update and creates their own merging mechanism). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Jun 7 05:31:37 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 7 Jun 2013 13:31:37 +1000 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> <51B10747.8000502@mrabarnett.plus.com> Message-ID: On 7 June 2013 13:08, Haoyi Li wrote: > I agree with everything Nick said. > > Here's another idea I'll throw out there for option #3: > > dict_a.updated(dict_b) > > - Asymmetry is obvious (In contrast to `dict_a + dict_b` or `updated(dict_a, > dict_b)`) > - Concise with obvious intent (contrast with the various snippets pulled > from SO) > - update/updated mirrors sort/sorted reverse/reversed (contrast to > `x.update(y)`/`collections.copy_and_update(x, y)`) > > There's a bit of unconventionality in that it's a method rather than a > builtin (bullet 3), but even that has benefits in terms of making the > asymmetry clear (bullet 1). I did consider that, but I didn't like the idea of hinging such a significant difference in semantics on a single "d". That's a bug magnet if ever I saw one. If you consider it as the possible name of a class method instead of an instance method, you still have a bug magnet problem, in that "dict_a.updated(dict_b)" would mean the same thing as "dict.updated(dict_b)", which almost certainly isn't what the programmer wanted. By contrast, a collections.copy_and_update(dict_a, dict_b) function is explicit and has the virtue of working with any ducktyped container with copy() and update() methods, rather than being specific to dict and collections.Mapping. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Fri Jun 7 07:04:08 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 6 Jun 2013 22:04:08 -0700 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> <51B10747.8000502@mrabarnett.plus.com> Message-ID: On Jun 6, 2013, at 20:31, Nick Coghlan wrote: > On 7 June 2013 13:08, Haoyi Li wrote: >> I agree with everything Nick said. >> >> Here's another idea I'll throw out there for option #3: >> >> dict_a.updated(dict_b) >> >> - Asymmetry is obvious (In contrast to `dict_a + dict_b` or `updated(dict_a, >> dict_b)`) >> - Concise with obvious intent (contrast with the various snippets pulled >> from SO) >> - update/updated mirrors sort/sorted reverse/reversed (contrast to >> `x.update(y)`/`collections.copy_and_update(x, y)`) >> >> There's a bit of unconventionality in that it's a method rather than a >> builtin (bullet 3), but even that has benefits in terms of making the >> asymmetry clear (bullet 1). > > I did consider that, but I didn't like the idea of hinging such a > significant difference in semantics on a single "d". That's a bug > magnet if ever I saw one. At least it would be a very obvious bug, because update returns None. But I think you already have a perfect argument against this below, so it doesn't really matter. > If you consider it as the possible name of a class method instead of > an instance method, you still have a bug magnet problem, in that > "dict_a.updated(dict_b)" would mean the same thing as > "dict.updated(dict_b)", which almost certainly isn't what the > programmer wanted. This would surely be an error for too few args, so again, dead obvious-but again, doesn't matter, because: > By contrast, a collections.copy_and_update(dict_a, dict_b) function is > explicit and has the virtue of working with any ducktyped container > with copy() and update() methods, rather than being specific to dict > and collections.Mapping. To me, this clearly beats all other considerations for all of the other options except for operators. If you rule out __add__ and __iadd__, nothing else has any significant compensating advantages. Personally I'm still not sold on the operators being a bad idea, but it sounds like its a non-starter. So, the only question is whether this actually needs to be in collections, or whether it's so trivial that it can just be mentioned in a tutorial or recipe or something for novice and that's all that's needed. From ncoghlan at gmail.com Fri Jun 7 08:47:35 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 7 Jun 2013 16:47:35 +1000 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> <51B10747.8000502@mrabarnett.plus.com> Message-ID: On 7 June 2013 15:04, Andrew Barnert wrote: > On Jun 6, 2013, at 20:31, Nick Coghlan wrote: >> By contrast, a collections.copy_and_update(dict_a, dict_b) function is >> explicit and has the virtue of working with any ducktyped container >> with copy() and update() methods, rather than being specific to dict >> and collections.Mapping. > > To me, this clearly beats all other considerations for all of the other options except for operators. If you rule out __add__ and __iadd__, nothing else has any significant compensating advantages. > > Personally I'm still not sold on the operators being a bad idea, but it sounds like its a non-starter. > > So, the only question is whether this actually needs to be in collections, or whether it's so trivial that it can just be mentioned in a tutorial or recipe or something for novice and that's all that's needed. You just nailed why I'm only +0 instead of +1. In the "For" column, we have the advantage of making the obvious way to do it (a helper function) more obvious by providing that function as an included battery in the collections module. In the "Against" column, we have the general principle "not every three line function needs to be a builtin (or in the standard library"). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Fri Jun 7 09:31:38 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 07 Jun 2013 17:31:38 +1000 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> <51B10747.8000502@mrabarnett.plus.com> Message-ID: <51B18C5A.9010200@pearwood.info> On 07/06/13 12:27, Nick Coghlan wrote: > In an attempt to frame the discussion more productively: > > 1. Python already has a well-defined notion of what it means to merge > two dictionaries: d1.update(d2) With the proviso that while it might be Python's idea of merging two dicts, it isn't necessarily the most useful way of merging two dicts. I've sometimes found myself needing to ignore dict.update and write my own. This is why I'm only luke-warm, at best, for any of these proposals. I expect that whatever solution gets taken up, if any, I'm still going to need to write my own helper function from time to time :-) > 2. This notion is asymmetric, thus it makes sense to use an asymmetric > notation for it (specifically, a method call) rather than a > traditionally symmetric binary operator notation like + or |. The usual term for this is "commutative". But note that many non-commutative operations are still written with binary operators. We don't get confused by the fact that: 7-2 != 2-7 1/5 != 5/1 3**4 != 4**3 or even: "foo" + "bar" != "bar" + "foo" so while the non-commutativity of (some definitions of) dict merging can be taken as a weak argument against using an operator, it is a *very* weak argument. If I were to argue against an operator, I'd be more inclined to argue the more general point "operator overloading considered harmful" than "merging dicts is non-commutative". > 3. However, like other mutating methods on builtin types, dict.update > does *not* return a reference to the original object (this is > deliberate, to encourage the treatment of containers as immutable when > using a functional programming style) > > 4. Thus, just as sorted() was added as a functional counterpart to > list.sort, is there something that can be added as a functional > counterpart to dict.update? At this point, I'd begin to think of a built-in updated function. Here's one possible implementation, slightly different from yours below: def updated(*mappings, **kw): new = {} for mapping in mappings + (kw,): new.update(mapping) return new Given how trivial is it, I'm not sure it's even worthwhile. But then, prior to sorted and reversed being added as built-ins, people argued that they were "too trivial" too. On the third hand, the existence of two, or more, implementations with subtly different semantics is an argument on picking the most useful version and putting it in the standard library. So I'm now +0 for a built-in updated and +0.5 for collections.updated. [...] > def copy_and_update(base, *others): > result = base.copy() > for other in others: > result.update(other) > return result > > That last alternative also suggests possible names for a standard > library function (collections.copy_and_update) or a new instance > method (dict.copy_and_update and collections.Mapping.copy_and_update). But we don't write "copy_and_sort(L)", we write "sorted(L)". It's enough to document that the function return a new dict, no need to explicitly state it copies its arguments. It's possible to be *too explicit* :-) -- Steven From stephen at xemacs.org Fri Jun 7 09:23:35 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 07 Jun 2013 16:23:35 +0900 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> Message-ID: <87fvwu9yfc.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > On Jun 6, 2013, at 8:43, Haoyi Li wrote: >?There should be one way to do it...I'd agree except, well, there > are two different things that people want to do here! > Actually, there's three: Four: associate a multivalue containing all of the individual values to the key (sort of a set-valued Counter). (This comes up in practice in applications that assume UUIDs on the Internet, especially in mail. For example, when I receive a post via both a mailing list and a personal copy, eventually I want to save the mailing list's version in preference to the personal copy. But I only want to display one version in the MUA. In archiving, you run into the occasional author who provides their own non-unique message ID conflicting with a *different* message. Etc.) And somebody was just asking for Counter addition. Counter is a dict subclass, so "dict(...) + dict(...) = updated_dict" would imply really perverse semantics for "Counter(...) + Counter(...)". How about extending .update to take multiple positional arguments? Then TOOWTDI idiom would be {}.update(d1, d2, ...) The "first found wins" interpretation could use a different method: {}.extend(d1, d2, ...) From masklinn at masklinn.net Fri Jun 7 10:07:21 2013 From: masklinn at masklinn.net (Masklinn) Date: Fri, 7 Jun 2013 10:07:21 +0200 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <87fvwu9yfc.fsf@uwakimon.sk.tsukuba.ac.jp> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <87fvwu9yfc.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <99DB21E2-5EC4-4102-9BE4-7CBE8707D7BA@masklinn.net> On 2013-06-07, at 09:23 , Stephen J. Turnbull wrote: > Andrew Barnert writes: >> On Jun 6, 2013, at 8:43, Haoyi Li wrote: > >> There should be one way to do it...I'd agree except, well, there >> are two different things that people want to do here! > >> Actually, there's three: > > Four: associate a multivalue containing all of the individual values > to the key (sort of a set-valued Counter). (This comes up in practice > in applications that assume UUIDs on the Internet, especially in mail. > For example, when I receive a post via both a mailing list and a > personal copy, eventually I want to save the mailing list's version in > preference to the personal copy. But I only want to display one > version in the MUA. In archiving, you run into the occasional author > who provides their own non-unique message ID conflicting with a > *different* message. Etc.) > > And somebody was just asking for Counter addition. Counter is a dict > subclass, so "dict(...) + dict(...) = updated_dict" would imply really > perverse semantics for "Counter(...) + Counter(...)". > > How about extending .update to take multiple positional arguments? > Then TOOWTDI idiom would be > > {}.update(d1, d2, ?) One of the rather annoying things in dict.update is that it alters the caller in place, so it can't be used to merge multiple dicts in an expression. dict(d1, **d2) works to a certain extent, but is not ideal either. > The "first found wins" interpretation could use a different method: > > {}.extend(d1, d2, ?) IIRC in underscore.js this is called "defaults", d1, d2, ? are a "stack" of applicable defaults, and thus are merged into the subject if and only if the corresponding keys are missing from the subject. From storchaka at gmail.com Fri Jun 7 12:06:35 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 07 Jun 2013 13:06:35 +0300 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <51B18C5A.9010200@pearwood.info> References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> <51B10747.8000502@mrabarnett.plus.com> <51B18C5A.9010200@pearwood.info> Message-ID: 07.06.13 10:31, Steven D'Aprano ???????(??): > At this point, I'd begin to think of a built-in updated function. Here's > one possible implementation, slightly different from yours below: > > def updated(*mappings, **kw): > new = {} > for mapping in mappings + (kw,): > new.update(mapping) > return new Or def updated(*mappings, **kw): from collections import ChainMap return ChainMap((kw,) + mappings[::-1]) > Given how trivial is it, I'm not sure it's even worthwhile. But then, > prior to sorted and reversed being added as built-ins, people argued > that they were "too trivial" too. On the third hand, the existence of > two, or more, implementations with subtly different semantics is an > argument on picking the most useful version and putting it in the > standard library. > > So I'm now +0 for a built-in updated and +0.5 for collections.updated. I'm -1 for both. The collections module already contains an appropriate tool. And if ChainMap() is not a builtin, why updated() should be? From yoavglazner at gmail.com Fri Jun 7 12:27:48 2013 From: yoavglazner at gmail.com (yoav glazner) Date: Fri, 7 Jun 2013 13:27:48 +0300 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> <51B10747.8000502@mrabarnett.plus.com> <51B18C5A.9010200@pearwood.info> Message-ID: On Jun 7, 2013 1:08 PM, "Serhiy Storchaka" wrote: > > 07.06.13 10:31, Steven D'Aprano ???????(??): > >> At this point, I'd begin to think of a built-in updated function. Here's >> one possible implementation, slightly different from yours below: >> >> def updated(*mappings, **kw): >> new = {} >> for mapping in mappings + (kw,): >> new.update(mapping) >> return new > > > Or > > def updated(*mappings, **kw): > from collections import ChainMap > return ChainMap((kw,) + mappings[::-1]) > > >> Given how trivial is it, I'm not sure it's even worthwhile. But then, >> prior to sorted and reversed being added as built-ins, people argued >> that they were "too trivial" too. On the third hand, the existence of >> two, or more, implementations with subtly different semantics is an >> argument on picking the most useful version and putting it in the >> standard library. >> >> So I'm now +0 for a built-in updated and +0.5 for collections.updated. > > > I'm -1 for both. The collections module already contains an appropriate tool. And if ChainMap() is not a builtin, why updated() should be? > ChainMap does not return a copy its a view -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Jun 7 12:54:24 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 07 Jun 2013 13:54:24 +0300 Subject: [Python-ideas] Allow using ** twice In-Reply-To: References: <51B0B2B7.4070306@mrabarnett.plus.com> <51B0BC4E.9030907@stoneleaf.us> <51B0E5EF.4010508@stoneleaf.us> <51B10747.8000502@mrabarnett.plus.com> <51B18C5A.9010200@pearwood.info> Message-ID: 07.06.13 13:27, yoav glazner ???????(??): > ChainMap does not return a copy its a view This is its advantage. As many functions and methods in Python 3 return view or iterator instead of a copy. You always can get a dict with explicit constructor: dict(ChainMap(...)). From ericsnowcurrently at gmail.com Fri Jun 7 18:06:51 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 7 Jun 2013 10:06:51 -0600 Subject: [Python-ideas] Allow using ** twice In-Reply-To: <99DB21E2-5EC4-4102-9BE4-7CBE8707D7BA@masklinn.net> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <87fvwu9yfc.fsf@uwakimon.sk.tsukuba.ac.jp> <99DB21E2-5EC4-4102-9BE4-7CBE8707D7BA@masklinn.net> Message-ID: On Fri, Jun 7, 2013 at 2:07 AM, Masklinn wrote: > On 2013-06-07, at 09:23 , Stephen J. Turnbull wrote: >> How about extending .update to take multiple positional arguments? >> Then TOOWTDI idiom would be >> >> {}.update(d1, d2, ?) > > One of the rather annoying things in dict.update is that it alters the > caller in place, so it can't be used to merge multiple dicts in an > expression. dict(d1, **d2) works to a certain extent, but is not ideal > either. I've sometimes found it useful to have the copy() method on some of my classes to take a signature similar to that of dict.update(), rather than no arguments. The semantics are more akin to namedtuple's _replace(), where a copy is made with certain things replaced. Ultimately it's equivalent to `copied = myclass(); copied.update(**other)`. So it's like a version of update that returns a new updated copy rather than updating in place. Applying this to dict, Suppose you could replace dict's copy() method (for which I'm not advocating). You could do something like this: old_copy = dict.copy @wraps(dict.copy) def new_copy_for_dict(self, *args, **kwargs): copied = old_copy(self) copied.update(*args, **kwargs) return copied dict.copy = new_copy_for_dict d = {1:2, 3:4} updated = d.copy({5:6}) assert d == {1:2, 3:4} assert updated == {1:2, 3:4, 5:6} assert d is not updated or to go along with the original request: f(**d1.copy(d2)) f(**d1.copy(d2.copy(d3...))) Doing this to dict.copy() might not be the right thing to do, but I'd like to suggest this as one of the possibilities to consider if anything progresses out of this discussion. In my mind it is the simplest way to address the issue and introduces the pattern for explicitly creating updated copies of mutable objects. -eric From stephen at xemacs.org Sat Jun 8 07:47:43 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 08 Jun 2013 14:47:43 +0900 Subject: [Python-ideas] Adding dictionary merge operator(s) [was: Allow using ** twice] In-Reply-To: <99DB21E2-5EC4-4102-9BE4-7CBE8707D7BA@masklinn.net> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <87fvwu9yfc.fsf@uwakimon.sk.tsukuba.ac.jp> <99DB21E2-5EC4-4102-9BE4-7CBE8707D7BA@masklinn.net> Message-ID: <878v2l9mrk.fsf@uwakimon.sk.tsukuba.ac.jp> Updating the topic. Masklinn writes: > One of the rather annoying things in dict.update is that it alters the > caller in place, If it didn't, it wouldn't be "update", it would be "merge". > so it can't be used to merge multiple dicts in an expression. The fundamental problem is that "dict.merge" is not a well-defined operation. Instead, it's more like image composition, which is a family of trinary operators (something like 26 of them), combining a target, source, and alpha mask in various ways. "Update" presumably is the most often used operator, but once you get into *any* variations from that, there are lots of plausible ones. > > The "first found wins" interpretation could use a different method: > > > > {}.extend(d1, d2, ?) > > IIRC in underscore.js this is called "defaults", d1, d2, ? are a > "stack" of applicable defaults, and thus are merged into the > subject if and only if the corresponding keys are missing from > the subject. The "stack of defaults" interpretation of the list seems natural to me, but it's not obvious to me whether the list *constructs* the stack by pushing each argument in turn (ie, di is consulted in R2L order), or *is* the stack (so each di is consulted in L2R order). I guess I would use "fallback", so that d0.fallback(d1, d2, ...) reads "start with d0, fall back to d1 for missing keys, fall back to d2 for still missing keys, ...". YMMV. Then-again-maybe-I've-just-been-programming-Lisp-for-too-long-ly y'rs, From guido at python.org Sat Jun 8 08:17:39 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 7 Jun 2013 23:17:39 -0700 Subject: [Python-ideas] Adding dictionary merge operator(s) [was: Allow using ** twice] In-Reply-To: <878v2l9mrk.fsf@uwakimon.sk.tsukuba.ac.jp> References: <497e71b6-7699-46c6-a1c8-5185ba64c811@email.android.com> <0A34E9EB-B34B-4DA8-8873-FA5ABCA123D2@yahoo.com> <87fvwu9yfc.fsf@uwakimon.sk.tsukuba.ac.jp> <99DB21E2-5EC4-4102-9BE4-7CBE8707D7BA@masklinn.net> <878v2l9mrk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Actually the C API recognizes exactly two variations. See PyDict_Merge in http://docs.python.org/3/c-api/dict.html. On Friday, June 7, 2013, Stephen J. Turnbull wrote: > Updating the topic. > > Masklinn writes: > > > One of the rather annoying things in dict.update is that it alters the > > caller in place, > > If it didn't, it wouldn't be "update", it would be "merge". > > > so it can't be used to merge multiple dicts in an expression. > > The fundamental problem is that "dict.merge" is not a well-defined > operation. Instead, it's more like image composition, which is a > family of trinary operators (something like 26 of them), combining a > target, source, and alpha mask in various ways. > > "Update" presumably is the most often used operator, but once you get > into *any* variations from that, there are lots of plausible ones. > > > > The "first found wins" interpretation could use a different method: > > > > > > {}.extend(d1, d2, ?) > > > > IIRC in underscore.js this is called "defaults", d1, d2, ? are a > > "stack" of applicable defaults, and thus are merged into the > > subject if and only if the corresponding keys are missing from > > the subject. > > The "stack of defaults" interpretation of the list seems natural to > me, but it's not obvious to me whether the list *constructs* the stack > by pushing each argument in turn (ie, di is consulted in R2L order), > or *is* the stack (so each di is consulted in L2R order). I guess I > would use "fallback", so that > > d0.fallback(d1, d2, ...) > > reads "start with d0, fall back to d1 for missing keys, fall back to > d2 for still missing keys, ...". YMMV. > > Then-again-maybe-I've-just-been-programming-Lisp-for-too-long-ly y'rs, > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Sat Jun 8 15:13:22 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sat, 8 Jun 2013 16:13:22 +0300 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts Message-ID: Without reading subject of this letter, what is your idea about which encoding Python 3 uses with open() calls on a text file? Please write in reply and then scroll down. Without cheating my opinion was cp1252 (latin-1), because it was the way Python 2 assumed all text files are. Or Python 2 uses ISO-8859-1? But it appeared to be different way - http://docs.python.org/3/library/functions.html#open. No, it appeared here - https://bitbucket.org/techtonik/hexdump/pull-request/1/ and after a small lecture I realized how things are bad. open() in Python uses system encoding to read files by default. So, if Python script writes text file with some Cyrillic character on my Russian Windows, another Python script on English Windows or Greek Windows will not be able to read it. This is just what happened. The solution proposed is to specify encoding explicitly. That means I have to know it. Luckily, in this case the text file is my .py where I knew the encoding beforehand. In real world you can never know the encoding beforehand. So, what should Python do if it doesn't know the encoding of text file it opens: 1. Assume that encoding of text file is the encoding of your operating system 2. Assume that encoding of text file is ASCII 3. Assume that encoding of text file is UTF-8 Please write in reply and then scroll down. I propose three, because ASCII is a binary compatible subset of UTF-8. Choice one is the current behaviour, and it is very bad. Troubleshooting this issue, which should be very common, requires a lot of prior knowledge about encodings and awareness of difference system defaults. For cross-platform work with text files this fact implicitly requires you to always use 'encoding' parameter for open(). Is it enough for a PEP? This stuff is rather critical IMO, so even if it will be rejected there will be a documented design decision. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Sat Jun 8 15:57:24 2013 From: dholth at gmail.com (Daniel Holth) Date: Sat, 8 Jun 2013 09:57:24 -0400 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: Message-ID: I certainly wish it was all utf8 all the time. There are other encodings?! ;-) On Jun 8, 2013 9:14 AM, "anatoly techtonik" wrote: > Without reading subject of this letter, what is your idea about which > encoding Python 3 uses with open() calls on a text file? Please write in > reply and then scroll down. > > > Without cheating my opinion was cp1252 (latin-1), because it was the way > Python 2 assumed all text files are. Or Python 2 uses ISO-8859-1? > > But it appeared to be different way - > http://docs.python.org/3/library/functions.html#open. No, it appeared > here - https://bitbucket.org/techtonik/hexdump/pull-request/1/ and after > a small lecture I realized how things are bad. > > open() in Python uses system encoding to read files by default. So, if > Python script writes text file with some Cyrillic character on my Russian > Windows, another Python script on English Windows or Greek Windows will not > be able to read it. This is just what happened. > > The solution proposed is to specify encoding explicitly. That means I have > to know it. Luckily, in this case the text file is my .py where I knew the > encoding beforehand. In real world you can never know the encoding > beforehand. > > So, what should Python do if it doesn't know the encoding of text file it > opens: > 1. Assume that encoding of text file is the encoding of your operating > system > 2. Assume that encoding of text file is ASCII > 3. Assume that encoding of text file is UTF-8 > > Please write in reply and then scroll down. > > > I propose three, because ASCII is a binary compatible subset of UTF-8. > Choice one is the current behaviour, and it is very bad. Troubleshooting > this issue, which should be very common, requires a lot of prior knowledge > about encodings and awareness of difference system defaults. For > cross-platform work with text files this fact implicitly requires you to > always use 'encoding' parameter for open(). > > > Is it enough for a PEP? This stuff is rather critical IMO, so even if it > will be rejected there will be a documented design decision. > -- > anatoly t. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Sat Jun 8 16:07:01 2013 From: flying-sheep at web.de (Philipp A.) Date: Sat, 8 Jun 2013 16:07:01 +0200 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: Message-ID: from io import open at the top, problem solved ;) but i agree, letting Pat? do it for you would be nice! -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Sat Jun 8 16:07:46 2013 From: flying-sheep at web.de (Philipp A.) Date: Sat, 8 Jun 2013 16:07:46 +0200 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: Message-ID: 2013/6/8 Philipp A. > but i agree, letting Pat? do it for you would be nice! > oh, wrong list, excuse the spam (headdesk) -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at gmail.com Sat Jun 8 18:30:33 2013 From: eliben at gmail.com (Eli Bendersky) Date: Sat, 8 Jun 2013 09:30:33 -0700 Subject: [Python-ideas] A Simpler Enum For Python 3 In-Reply-To: <20130602172658.GA10367@acooke.org> References: <20130602172658.GA10367@acooke.org> Message-ID: > Not sure if this is appropriate or useful, so won't repeat, but I wrote an > alternative Enum implementation for Python 3, which is available at > https://github.com/andrewcooke/simple-enum > > It may have been appropriate and useful if you gave us the benefit of the doubt and wouldn't assume we are oblivious to the possibility of such an approach. Hint: it had seen probably at least 100 emails of discussion in this list (months back). Yes, it's way more fun to write flamy rants and new code than digging in mailing list archives; yet, language design often requires much of the latter as well. Other than that, it's perfectly fine to have alternative enum proposals out in the wild. I sincerely wish you well in the sense that if someone doesn't like the stdlib enum and wants to use something like your bnum, they will find it online and will happily use it. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Jun 8 18:48:46 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 08 Jun 2013 17:48:46 +0100 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: Message-ID: <51B3606E.6050305@mrabarnett.plus.com> On 08/06/2013 14:13, anatoly techtonik wrote: > Without reading subject of this letter, what is your idea about which > encoding Python 3 uses with open() calls on a text file? Please write in > reply and then scroll down. > > > Without cheating my opinion was cp1252 (latin-1), because it was the way > Python 2 assumed all text files are. Or Python 2 uses ISO-8859-1? > > But it appeared to be different way - > http://docs.python.org/3/library/functions.html#open. No, it appeared > here - https://bitbucket.org/techtonik/hexdump/pull-request/1/ and after > a small lecture I realized how things are bad. > > open() in Python uses system encoding to read files by default. So, if > Python script writes text file with some Cyrillic character on my > Russian Windows, another Python script on English Windows or Greek > Windows will not be able to read it. This is just what happened. > > The solution proposed is to specify encoding explicitly. That means I > have to know it. Luckily, in this case the text file is my .py where I > knew the encoding beforehand. In real world you can never know the > encoding beforehand. > > So, what should Python do if it doesn't know the encoding of text file > it opens: > 1. Assume that encoding of text file is the encoding of your operating > system > 2. Assume that encoding of text file is ASCII > 3. Assume that encoding of text file is UTF-8 > [snip] I always use '''encoding="utf-8"''', but it's annoying that it's not the default. 'open' defaults to universal newline support when opening for reading (though that's not possible when opening for writing!), and it would be nice if it also defaulted to a 'universal' encoding, i.e. UTF-8. You can still use '''encoding=None''' if you want the operating system's encoding. From eliben at gmail.com Sat Jun 8 19:00:22 2013 From: eliben at gmail.com (Eli Bendersky) Date: Sat, 8 Jun 2013 10:00:22 -0700 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <51B3606E.6050305@mrabarnett.plus.com> References: <51B3606E.6050305@mrabarnett.plus.com> Message-ID: [snip] > I always use '''encoding="utf-8"''', but it's annoying that it's not > the default. > > 'open' defaults to universal newline support when opening for reading > (though that's not possible when opening for writing!), and it would be > nice if it also defaulted to a 'universal' encoding, i.e. UTF-8. > > [snip] > You can still use '''encoding=None''' if you want the operating > system's encoding. > And this is the default... -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat Jun 8 20:02:03 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 09 Jun 2013 03:02:03 +0900 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <51B3606E.6050305@mrabarnett.plus.com> References: <51B3606E.6050305@mrabarnett.plus.com> Message-ID: <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> MRAB writes: > 'open' defaults to universal newline support when opening for > reading (though that's not possible when opening for writing!), and > it would be nice if it also defaulted to a 'universal' encoding, > i.e. UTF-8. There's no such thing as a universal encoding. Unicode is a universal character set in the sense that it can encode all characters, but there is no universal encoding that can be used to decode all texts. If the OS's default encoding is not UTF-8, then you can and should bet that most files on that system will not be in UTF-8. That's still true today. Few users will be made happy by a Python that forces them to do something special to read files in the default encoding. The real question is why your system's default encoding is something that makes you unhappy, isn't it? From ubershmekel at gmail.com Sat Jun 8 20:26:01 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sat, 8 Jun 2013 21:26:01 +0300 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sat, Jun 8, 2013 at 9:02 PM, Stephen J. Turnbull wrote: > MRAB writes: > > > 'open' defaults to universal newline support when opening for > > reading (though that's not possible when opening for writing!), and > > it would be nice if it also defaulted to a 'universal' encoding, > > i.e. UTF-8. > > There's no such thing as a universal encoding. Unicode is a > universal character set in the sense that it can encode all > characters, but there is no universal encoding that can be used to > decode all texts. > > If the OS's default encoding is not UTF-8, then you can and should bet > that most files on that system will not be in UTF-8. That's still > true today. Few users will be made happy by a Python that forces them > to do something special to read files in the default encoding. > > The real question is why your system's default encoding is something > that makes you unhappy, isn't it? > > The real question is on which side of the following tradeoff you want to be. Make python consistent and explicit across platforms when handling text files. Do you want this to work: open('//remote_pc/text_file', 'w').write('whatever') # on remote_pc open('text_file').read() Or have python implicitly play nice with the platform's native encoding, i.e. have this work: open('a_file_i_wrote_in_this_platforms_notepad').read() open('a_file_i_want_to_open_in_this_platforms_notepad', 'w').write('whatever') Both are legitimate decisions. Personally I favor the first because more often than not files aren't encoded in the platform's chosen encoding, so it's better to be explicit and consistent. Yuval utf-8-4-life -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at gmail.com Sat Jun 8 21:35:44 2013 From: eliben at gmail.com (Eli Bendersky) Date: Sat, 8 Jun 2013 12:35:44 -0700 Subject: [Python-ideas] A Simpler Enum For Python 3 In-Reply-To: <20130608180405.GV545@acooke.org> References: <20130602172658.GA10367@acooke.org> <20130608180405.GV545@acooke.org> Message-ID: On Sat, Jun 8, 2013 at 11:04 AM, andrew cooke wrote: > > > or, you know, you could have documented the design in the pep? isn't that > what's supposed to happen? > Did you read the PEP before you started spreading your FUD? http://www.python.org/dev/peps/pep-0435/#not-having-to-specify-values-for-enums > > if you want a secretay, pay them. > > I have absolutely no idea what you're talking about here. Eli > andrew > > > On Sat, Jun 08, 2013 at 09:30:33AM -0700, Eli Bendersky wrote: > > > Not sure if this is appropriate or useful, so won't repeat, but I > wrote an > > > alternative Enum implementation for Python 3, which is available at > > > https://github.com/andrewcooke/simple-enum > > > > > > > > It may have been appropriate and useful if you gave us the benefit of the > > doubt and wouldn't assume we are oblivious to the possibility of such an > > approach. Hint: it had seen probably at least 100 emails of discussion in > > this list (months back). Yes, it's way more fun to write flamy rants and > > new code than digging in mailing list archives; yet, language design > often > > requires much of the latter as well. > > > > Other than that, it's perfectly fine to have alternative enum proposals > out > > in the wild. I sincerely wish you well in the sense that if someone > doesn't > > like the stdlib enum and wants to use something like your bnum, they will > > find it online and will happily use it. > > > > Eli > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Jun 8 21:43:12 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 08 Jun 2013 20:43:12 +0100 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51B38950.6010108@mrabarnett.plus.com> On 08/06/2013 19:02, Stephen J. Turnbull wrote: > MRAB writes: > > > 'open' defaults to universal newline support when opening for > > reading (though that's not possible when opening for writing!), and > > it would be nice if it also defaulted to a 'universal' encoding, > > i.e. UTF-8. > > There's no such thing as a universal encoding. Unicode is a > universal character set in the sense that it can encode all > characters, but there is no universal encoding that can be used to > decode all texts. > I didn't say "universal encoding", I said "'universal' encoding". :-) What I meant was that I'd prefer it to default to an encoding that was the same on all platforms, not whatever encoding _this_ machine happens to be using, which might be different from whatever encoding _that_ machine happens to be using. Or, in summary, I think that portability is more important. > If the OS's default encoding is not UTF-8, then you can and should bet > that most files on that system will not be in UTF-8. That's still > true today. Few users will be made happy by a Python that forces them > to do something special to read files in the default encoding. > It would be the default encoding only for the machine on which it was created. If I moved the file to another machine, however, I could get mojibake. > The real question is why your system's default encoding is something > that makes you unhappy, isn't it? > From victor.stinner at gmail.com Sun Jun 9 00:30:25 2013 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 9 Jun 2013 00:30:25 +0200 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: Message-ID: Changing the default encoding of open() was already discussed 2 years ago. See this discussion: http://mail.python.org/pipermail/python-dev/2011-June/112086.html I did a long analysis of the Python standard library and I tried a modified Python with a default encoding set to utf-8. The conclusion is that the locale encoding is the least worst choice. The main reason is the compatibility with all other applications on the same computer. Using a different encoding than the locale encoding leads quickly to mojibake issues and other bugs. Just one example: configure script generates a Makefile using the locale encoding, Python gets data from Makefile. If you use a path with non-ascii character, use utf-8 in python whereas the locale is iso-8859-1, python cannot be compiled anymore or will refuse to start. Remember the zen of python: explicit is better of implicit. So set encoding parameter in your code. When i made the encoding mandatory in my test, more than 70% of calls to open() used encoding="locale". So it's simpler to keep the current default choice. The documentation can maybe be improved? Victor Le 8 juin 2013 15:14, "anatoly techtonik" a ?crit : > Without reading subject of this letter, what is your idea about which > encoding Python 3 uses with open() calls on a text file? Please write in > reply and then scroll down. > > > Without cheating my opinion was cp1252 (latin-1), because it was the way > Python 2 assumed all text files are. Or Python 2 uses ISO-8859-1? > > But it appeared to be different way - > http://docs.python.org/3/library/functions.html#open. No, it appeared > here - https://bitbucket.org/techtonik/hexdump/pull-request/1/ and after > a small lecture I realized how things are bad. > > open() in Python uses system encoding to read files by default. So, if > Python script writes text file with some Cyrillic character on my Russian > Windows, another Python script on English Windows or Greek Windows will not > be able to read it. This is just what happened. > > The solution proposed is to specify encoding explicitly. That means I have > to know it. Luckily, in this case the text file is my .py where I knew the > encoding beforehand. In real world you can never know the encoding > beforehand. > > So, what should Python do if it doesn't know the encoding of text file it > opens: > 1. Assume that encoding of text file is the encoding of your operating > system > 2. Assume that encoding of text file is ASCII > 3. Assume that encoding of text file is UTF-8 > > Please write in reply and then scroll down. > > > I propose three, because ASCII is a binary compatible subset of UTF-8. > Choice one is the current behaviour, and it is very bad. Troubleshooting > this issue, which should be very common, requires a lot of prior knowledge > about encodings and awareness of difference system defaults. For > cross-platform work with text files this fact implicitly requires you to > always use 'encoding' parameter for open(). > > > Is it enough for a PEP? This stuff is rather critical IMO, so even if it > will be rejected there will be a documented design decision. > -- > anatoly t. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jun 9 00:30:52 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 8 Jun 2013 15:30:52 -0700 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: Message-ID: [Diverting to python-ideas, since this isn't as clear-cut as you think.] Why exactly is that expected behavior? What's the use case? (Surely you don't have a keyboard that generates \u2212 when you hit the minus key? :-) Is there a Unicode standard for parsing numbers? IIRC there are a variety of other things marked as "digits" in the Unicode standard -- do we do anything with those? If we do anything we should be consistent. For now, I think we *are* consistent -- we only support the ASCII representation of numbers. (And that's the only representation we generate as output as well -- think about symmetry too.) This page scares me: http://en.wikipedia.org/wiki/Numerals_in_Unicode --Guido On Sat, Jun 8, 2013 at 2:49 PM, ?ukasz Langa wrote: > Expected behaviour: >>>> float('\N{MINUS SIGN}12.34') > -12.34 > > > Current behaviour: > Traceback (most recent call last): > ... > ValueError: could not convert string to float: '?12.34' > > > Please note: '\N{MINUS SIGN}' == '\u2212' > > -- > Best regards, > ?ukasz Langa > > WWW: http://lukasz.langa.pl/ > Twitter: @llanga > IRC: ambv on #python-dev > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From guido at python.org Sun Jun 9 00:52:41 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 8 Jun 2013 15:52:41 -0700 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: Message-ID: Apologies, Python 3 does actually have limited support for the other Unicode digits (actually only the ones marked "Decimal" IIUC). I'd totally forgotten about that (since I still live primarily in an ASCII world :-). E.g. >>> a = '\uff14\uff12' >>> int(a) 42 >>> Still I'd like to understand your use case better. Is there a character property to identify the minus sign? How many are there? A little investigation reveals a lot of minus signs just in the basic plane: >>> import unicodedata >>> for i in range(2**16): ... c = chr(i) ... if 'MINUS' in unicodedata.name(c, ''): print(i, c, unicodedata.name(c, '')) ... 45 - HYPHEN-MINUS 177 ? PLUS-MINUS SIGN 727 ? MODIFIER LETTER MINUS SIGN 800 ? COMBINING MINUS SIGN BELOW 8274 ? COMMERCIAL MINUS SIGN 8315 ? SUPERSCRIPT MINUS 8331 ? SUBSCRIPT MINUS 8722 ? MINUS SIGN 8723 ? MINUS-OR-PLUS SIGN 8726 ? SET MINUS 8760 ? DOT MINUS 8770 ? MINUS TILDE 8854 ? CIRCLED MINUS 8863 ? SQUARED MINUS 10070 ? BLACK DIAMOND MINUS WHITE X 10134 ? HEAVY MINUS SIGN 10556 ? TOP ARC CLOCKWISE ARROW WITH MINUS 10793 ? MINUS SIGN WITH COMMA ABOVE 10794 ? MINUS SIGN WITH DOT BELOW 10795 ? MINUS SIGN WITH FALLING DOTS 10796 ? MINUS SIGN WITH RISING DOTS 10810 ? MINUS SIGN IN TRIANGLE 10817 ? UNION WITH MINUS SIGN 10860 ? SIMILAR MINUS SIMILAR 65123 ? SMALL HYPHEN-MINUS 65293 ? FULLWIDTH HYPHEN-MINUS >>> There are also a lot of plus signs (including INVISIBLE PLUS :-). Again, maybe the Unicode consortium has a standard we could implement? --Guido On Sat, Jun 8, 2013 at 3:30 PM, Guido van Rossum wrote: > [Diverting to python-ideas, since this isn't as clear-cut as you think.] > > Why exactly is that expected behavior? What's the use case? (Surely > you don't have a keyboard that generates \u2212 when you hit the minus > key? :-) > > Is there a Unicode standard for parsing numbers? IIRC there are a > variety of other things marked as "digits" in the Unicode standard -- > do we do anything with those? If we do anything we should be > consistent. For now, I think we *are* consistent -- we only support > the ASCII representation of numbers. (And that's the only > representation we generate as output as well -- think about symmetry > too.) > > This page scares me: http://en.wikipedia.org/wiki/Numerals_in_Unicode > > --Guido > > On Sat, Jun 8, 2013 at 2:49 PM, ?ukasz Langa wrote: >> Expected behaviour: >>>>> float('\N{MINUS SIGN}12.34') >> -12.34 >> >> >> Current behaviour: >> Traceback (most recent call last): >> ... >> ValueError: could not convert string to float: '?12.34' >> >> >> Please note: '\N{MINUS SIGN}' == '\u2212' >> >> -- >> Best regards, >> ?ukasz Langa >> >> WWW: http://lukasz.langa.pl/ >> Twitter: @llanga >> IRC: ambv on #python-dev >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > -- > --Guido van Rossum (python.org/~guido) -- --Guido van Rossum (python.org/~guido) From mertz at gnosis.cx Sun Jun 9 01:21:22 2013 From: mertz at gnosis.cx (David Mertz) Date: Sat, 8 Jun 2013 16:21:22 -0700 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: Message-ID: <82D342C0-2960-40F5-BAB5-62FDFC3128F7@gnosis.cx> On Jun 8, 2013, at 3:52 PM, Guido van Rossum wrote: > Apologies, Python 3 does actually have limited support for the other > Unicode digits (actually only the ones marked "Decimal" IIUC). I'd > totally forgotten about that (since I still live primarily in an ASCII > world :-). E.g. This is cool, and I hadn't known about it. I had just written a toy implementation of my own _float() to show a possible behavior. Then looking at Guido's post, I find that: >>> import unicodedata >>> x = ( ... unicodedata.lookup('ARABIC-INDIC DIGIT ONE')+ ... unicodedata.lookup('ARABIC-INDIC DIGIT TWO')+ ... unicodedata.lookup('ARABIC-INDIC DIGIT THREE')+ ... "."+ ... unicodedata.lookup('ARABIC-INDIC DIGIT FOUR')+ ... unicodedata.lookup('ARABIC-INDIC DIGIT FIVE')) >>> x '???.??' >>> float(x) 123.45 ... my idea was to add an optional named argument like 'lang="Arabic"', but really it isn't needed since the digits MEAN the same thing in various scripts. However, this DOES seem a arguably strange as behavior: >>> x = ('123.'+ ... unicodedata.lookup('ARABIC-INDIC DIGIT FOUR')+ ... unicodedata.lookup('ARABIC-INDIC DIGIT FIVE')) >>> x '123.??' >>> float(x) 123.45 Not wrong, but possibly surprising. -- If I seem shortsighted to you, it is only because I have stood on the backs of midgets. From stephen at xemacs.org Sun Jun 9 02:08:19 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 09 Jun 2013 09:08:19 +0900 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <51B38950.6010108@mrabarnett.plus.com> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <51B38950.6010108@mrabarnett.plus.com> Message-ID: <87obbg87t8.fsf@uwakimon.sk.tsukuba.ac.jp> MRAB writes: > What I meant was that I'd prefer it to default to an encoding that was > the same on all platforms, not whatever encoding _this_ machine happens > to be using, which might be different from whatever encoding _that_ > machine happens to be using. > > Or, in summary, I think that portability is more important. No, you think that it would be more convenient for you if you didn't have to specify UTF-8 because that's often what you want, and would like to impose that decision on people whose needs are *manifestly* different from yours. It's *obvious* that their needs are different because they have a default encoding different from UTF-8. And even if that's an historical accident, backward compatibility with existing texts is a real need for them. On several occasions (including PEP 263, where I took a position similar to yours, and was wrong), within-system compatibility has been opposed to across-system compatibility. The former won each time. It's not an entirely satisfactory compromise, but it really was the best default policy. I don't see that has changed, except that the number of systems where the two conflict has decreased. Still, on those systems, within- system compatibility by default is the better choice. If it weren't, the systems' admins should be convinced to change their default. From python at mrabarnett.plus.com Sun Jun 9 02:45:30 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 09 Jun 2013 01:45:30 +0100 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <82D342C0-2960-40F5-BAB5-62FDFC3128F7@gnosis.cx> References: <82D342C0-2960-40F5-BAB5-62FDFC3128F7@gnosis.cx> Message-ID: <51B3D02A.2050500@mrabarnett.plus.com> On 09/06/2013 00:21, David Mertz wrote: > On Jun 8, 2013, at 3:52 PM, Guido van Rossum wrote: >> Apologies, Python 3 does actually have limited support for the other >> Unicode digits (actually only the ones marked "Decimal" IIUC). I'd >> totally forgotten about that (since I still live primarily in an ASCII >> world :-). E.g. > > This is cool, and I hadn't known about it. I had just written a toy implementation of my own _float() to show a possible behavior. Then looking at Guido's post, I find that: > >>>> import unicodedata >>>> x = ( > ... unicodedata.lookup('ARABIC-INDIC DIGIT ONE')+ > ... unicodedata.lookup('ARABIC-INDIC DIGIT TWO')+ > ... unicodedata.lookup('ARABIC-INDIC DIGIT THREE')+ > ... "."+ > ... unicodedata.lookup('ARABIC-INDIC DIGIT FOUR')+ > ... unicodedata.lookup('ARABIC-INDIC DIGIT FIVE')) >>>> x > '???.??' >>>> float(x) > 123.45 > > ... my idea was to add an optional named argument like 'lang="Arabic"', but really it isn't needed since the digits MEAN the same thing in various scripts. However, this DOES seem a arguably strange as behavior: > >>>> x = ('123.'+ > ... unicodedata.lookup('ARABIC-INDIC DIGIT FOUR')+ > ... unicodedata.lookup('ARABIC-INDIC DIGIT FIVE')) >>>> x > '123.??' >>>> float(x) > 123.45 > > Not wrong, but possibly surprising. > FYI, you don't need to use 'unicodedata.lookup': >>> import unicodedata >>> '\N{ARABIC-INDIC DIGIT ONE}' == unicodedata.lookup('ARABIC-INDIC DIGIT ONE') True From stephen at xemacs.org Sun Jun 9 03:04:54 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 09 Jun 2013 10:04:54 +0900 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: Message-ID: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > Is there a Unicode standard for parsing numbers? Probably UTR #35 is most relevant. Unicode Technical Standard #35 Unicode Locale Data Markup Language (LDML) Part 3: Numbers http://www.unicode.org/reports/tr35/tr35-numbers.html (Maintained separately from "the" Unicode standard.) Cf. http://www.unicode.org/versions/Unicode6.2.0/ch05.pdf, section 5.5, and http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf, section 4.4. tl;dr version: *If* you recognize a character as a digit, it *must* be given the correct value. Parsing numerical components from text is inherently hairy (eg, Roman numerals, which exist in Unicode only as compatibility characters and therefore are not normatively digits, and variable names such as "x2"), and so considered application-specific. UTR #35 recommends the "lenient" rules appended below for parsing numerical data. N.B. Some terms such as "'ignore' set" are defined elsewhere in the TR. Apparently lenient parsing is expected to cause no problems in "honest" environments (the only exception mentioned is security, eg, confusables). The parse is explicitly locale-dependent in UTR #35, so could depend on a subset of all possible numeric expression characters. Python could take the position that the only numeric formats known to Python are those based on ASCII (but even then there are ambiguities by locale -- cf the 4th rule below). I think this is probably best by default. Parsing numbers expressed in mixed scripts is clearly a security risk, and doesn't seem to serve a useful purpose in numerical data. To do a good job of parsing numerical information from general text, a separate library which is sensitive to the context in which the numbers are embedded is required. ------------------------------------------------------------------------ Here is a set of heuristic rules that may be helpful: - Any character with the decimal digit property is unambiguous and should be accepted. Note: In some environments, applications may independently wish to restrict the decimal digit set to prevent security problems. See [UTR36]. - The exponent character can only be interpreted as such if it occurs after at least one digit, and if it is followed by at least one digit, with only an optional sign in between. A regular expression may be helpful here. - For the sign, decimal separator, percent, and per mille, use a set of all possible characters that can serve those functions. For example, the decimal separator set could include all of [.,']. (The actual set of characters can be derived from the number symbols in the By-Type charts [ByType], which list all of the values in CLDR.) To disambiguate, the decimal separator for the locale must be removed from the "ignore" set, and the grouping separator for the locale must be removed from the decimal separator set. The same principle applies to all sets and symbols: any symbol must appear in at most one set. - Since there are a wide variety of currency symbols and codes, this should be tried before the less ambiguous elements. It may be helpful to develop a set of characters that can appear in a symbol or code, based on the currency symbols in the locale. - Otherwise, a character should be ignored unless it is in the "stop" set. This includes even characters that are meaningful for formatting, for example, the grouping separator. - If more than one sign, currency symbol, exponent, or percent/per mille occurs in the input, the first found should be used. - A currency symbol in the input should be interpreted as the longest match found in the set of possible currency symbols. - Especially in cases of ambiguity, the user's input should be echoed back, properly formatted according to the locale, before it is actually used for anything. ------------------------------------------------------------------------ From abarnert at yahoo.com Sun Jun 9 03:10:30 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 8 Jun 2013 18:10:30 -0700 (PDT) Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: Message-ID: <1370740230.6093.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: anatoly techtonik Sent: Saturday, June 8, 2013 6:13 AM >Without reading subject of this letter, what is your idea about which encoding Python 3 uses with open() calls on a text file? Please write in reply and then scroll down. It uses the system locale encoding (which is slightly complicated on Windows by the fact that there are two different things that count as default in different cases, but whichever one is the right one is the one Python uses). That's?why I can write a file in vi/emacs/TextEdit/Notepad/whatever and read it in a Python script (or vice-versa) and everything works. >open() in Python uses system encoding to read files by default. So, if Python script writes text file with some Cyrillic character on my Russian Windows, another Python script on English Windows or Greek Windows will not be able to read it. This is just what happened. True. But if I create a text file with "type >foo.txt" or Notepad on a Russian system, I won't be able to open it on that English or Greek system either? but at least I'll be able to open it on Python on the same Russian system. If you changed Python to ignore the locale, that would cause a new problem (the latter would no longer be true) without fixing any existing problem (the former would still not be true). >The solution proposed is to specify encoding explicitly. That means I have to know it. Luckily, in this case the text file is my .py where I knew the encoding beforehand. In real world you can never know the encoding beforehand. This is an inherent problem that Python didn't cause, and can't solve. As long as there are text files in different encodings out there, you need to pass the encoding out-of-band. If you're behind the process that creates the files (whether it's a program you wrote, or options you set in Notepad's Save As dialog), you can just make sure to use the same encoding on every system, and you have no problem. But if you need to deal with files that others have created, that won't work. >So, what should Python do if it doesn't know the encoding of text file??it opens: >1. Assume that encoding of text file is the encoding of your operating system >2. Assume that encoding of text file is ASCII >3. Assume that encoding of text file is UTF-8 > >Please write in reply and then scroll down. That order happens to be exactly my preference.?#1 helps for one very common problem?files created by other programs on the same machine. #2 is generally at least safe, in that you'll get an error instead of mojibake. #3 doesn't really help anything. >I propose three, because ASCII is a binary compatible subset of UTF-8. Choice one is the current behaviour, and it is very bad. Troubleshooting this issue, which should be very common, requires a lot of prior knowledge about encodings and awareness of difference system defaults. For cross-platform work with text files this fact implicitly requires you to always use 'encoding' parameter for open(). I'm not sure what you mean by "cross-platform" here. Most non-Windows platforms nowadays set the locale to UTF-8 by default (and if you're using an older *nix, or deliberately chose not to use UTF-8 even though it's the default, you already know how to deal with these issues). So, it's really a Windows problem, if anything. I can understand why Windows users are confused. While OS X and most linux distros decided that the only way to solve this problem was to push UTF-8 as hard as possible, Windows went a different way. For everything but text files (remember that encoding is an issue for stdio, filesystems, etc.), there's a UTF-16 API that you can use, and "native" apps use it consistently. And for text files, many "native" apps will save in UTF-8, with the legal-but-discouraged UTF-8 BOM, and can automatically load both UTF-8 and local files by checking for the BOM. Python _could_ do the same thing. By default, opening a new file for "w" could select UTF-8 and write the BOM, opening a file for "r" (or "r+" or "a") could look for a BOM and decide whether to select UTF-8 or the locale. And that would improve interoperability on Windows.?However, I think that would be a very bad idea. Most non-Windows programs, and even some Windows programs, do not expect the UTF-8 BOM, and will open the file with your locale charset anyway, and show three garbage characters at the front. That's why you sometimes get that "???" garbage at the start of files, and why you often get errors from programs after you try to write their INI, YAML, etc. files in Notepad, Word, etc. Switching to UTF-8 would make it harder to read and write files created by other programs on the same machine?and it still wouldn't magically make you able to read and write files created on other machines, unless you only care about files created on recent *nix platforms. The only case it would help is making it easier to read and write files created by _your program_ without worrying about the local machine. While that isn't _nothing_, I don't think it's so important that we can just dismiss dealing with files created by other programs. After all, you're presumably using plain text files, rather than some binary format or JSON or YAML or XML or whatever, because you want users to be able to view and edit those files, right? From grosser.meister.morti at gmx.net Sun Jun 9 03:34:25 2013 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Sun, 09 Jun 2013 03:34:25 +0200 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51B3DBA1.9060801@gmx.net> On 06/08/2013 08:02 PM, Stephen J. Turnbull wrote: > > Unicode is a > universal character set in the sense that it can encode all > characters, I guess Japanese people beg to differ. There are some Japanese symbols that aren't covered by Unicode, or at least not to the extend Japanese people would like it to be. Which is why they use (Shift-)JIS a lot of the time. Basically Shift-JIS <-> Unicode is not round trip safe. From steve at pearwood.info Sun Jun 9 03:40:53 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 09 Jun 2013 11:40:53 +1000 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <51B38950.6010108@mrabarnett.plus.com> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <51B38950.6010108@mrabarnett.plus.com> Message-ID: <51B3DD25.4060105@pearwood.info> On 09/06/13 05:43, MRAB wrote: > On 08/06/2013 19:02, Stephen J. Turnbull wrote: >> MRAB writes: >> >> > 'open' defaults to universal newline support when opening for >> > reading (though that's not possible when opening for writing!), and >> > it would be nice if it also defaulted to a 'universal' encoding, >> > i.e. UTF-8. >> >> There's no such thing as a universal encoding. Unicode is a >> universal character set in the sense that it can encode all >> characters, but there is no universal encoding that can be used to >> decode all texts. >> > I didn't say "universal encoding", I said "'universal' encoding". :-) > > What I meant was that I'd prefer it to default to an encoding that was > the same on all platforms, not whatever encoding _this_ machine happens > to be using, which might be different from whatever encoding _that_ > machine happens to be using. > > Or, in summary, I think that portability is more important. Oh, I *dream* of the day when everyone everywhere standardizes on UTF-8 for storage of text data. As a Linux user, I'm partly there -- the locale on most Linux systems default to UTF-8, and most apps honour that. But so long as Windows machines normally default to some legacy encoding, as I believe they do, Python cannot afford to force the issue. It's unfortunate when Python cannot trivially[1] read text files created on a Windows 8 box using (say) ISO-8859-7 (Greek) and then poorly transferred to another machine using (say) UTF-8. But it would be unacceptable if Python could not trivially read files *on the same machine* if they happened to have been created by some other app. >> If the OS's default encoding is not UTF-8, then you can and should bet >> that most files on that system will not be in UTF-8. That's still >> true today. Few users will be made happy by a Python that forces them >> to do something special to read files in the default encoding. >> > It would be the default encoding only for the machine on which it was > created. If I moved the file to another machine, however, I could get > mojibake. How is Python supposed to know which files were created on the same machine, and which came from somewhere else? [1] By trivially, I mean without having to worry about encoding issues. -- Steven From guido at python.org Sun Jun 9 07:30:52 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 8 Jun 2013 22:30:52 -0700 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I'm beginning to feel that it was even a mistake to accept all those other Unicode decimal digits, because it leads to the mistaken belief that one can parse a number without knowing the locale. Python's position about this is very different from what the heuristics you quote seem to suggest: "refuse the temptation to guess" leads to our current, simple rule which only accepts '.' as the decimal indicator, and leaves localized parsing strictly to the application (or to some other library module). Still, I suppose I could defend the current behavior from the perspective of writing such a localized parser -- at some point you've got a digit and you need to know its numeric value, and it is convenient that int(c) does so. But interpreting the sign is different -- once you know that a minus sign is present, there's already a built-in operation to apply it (-x). So I'm not at all sure that the behavior ?ukasz observed should be considered a bug in the language. --Guido On Sat, Jun 8, 2013 at 6:04 PM, Stephen J. Turnbull wrote: > Guido van Rossum writes: > > > Is there a Unicode standard for parsing numbers? > > Probably UTR #35 is most relevant. > > Unicode Technical Standard #35 > Unicode Locale Data Markup Language (LDML) > Part 3: Numbers > http://www.unicode.org/reports/tr35/tr35-numbers.html > (Maintained separately from "the" Unicode standard.) > > Cf. http://www.unicode.org/versions/Unicode6.2.0/ch05.pdf, section > 5.5, and http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf, > section 4.4. tl;dr version: *If* you recognize a character as a > digit, it *must* be given the correct value. Parsing numerical > components from text is inherently hairy (eg, Roman numerals, which > exist in Unicode only as compatibility characters and therefore are > not normatively digits, and variable names such as "x2"), and so > considered application-specific. > > UTR #35 recommends the "lenient" rules appended below for parsing > numerical data. N.B. Some terms such as "'ignore' set" are defined > elsewhere in the TR. Apparently lenient parsing is expected to cause > no problems in "honest" environments (the only exception mentioned is > security, eg, confusables). > > The parse is explicitly locale-dependent in UTR #35, so could depend > on a subset of all possible numeric expression characters. Python > could take the position that the only numeric formats known to Python > are those based on ASCII (but even then there are ambiguities by > locale -- cf the 4th rule below). I think this is probably best by > default. Parsing numbers expressed in mixed scripts is clearly a > security risk, and doesn't seem to serve a useful purpose in numerical > data. To do a good job of parsing numerical information from general > text, a separate library which is sensitive to the context in which > the numbers are embedded is required. > > > ------------------------------------------------------------------------ > Here is a set of heuristic rules that may be helpful: > > - Any character with the decimal digit property is unambiguous and > should be accepted. > Note: In some environments, applications may independently wish to > restrict the decimal digit set to prevent security problems. See > [UTR36]. > > - The exponent character can only be interpreted as such if it > occurs after at least one digit, and if it is followed by at least > one digit, with only an optional sign in between. A regular > expression may be helpful here. > > - For the sign, decimal separator, percent, and per mille, use a set > of all possible characters that can serve those functions. For > example, the decimal separator set could include all of > [.,']. (The actual set of characters can be derived from the > number symbols in the By-Type charts [ByType], which list all of > the values in CLDR.) To disambiguate, the decimal separator for > the locale must be removed from the "ignore" set, and the grouping > separator for the locale must be removed from the decimal > separator set. The same principle applies to all sets and symbols: > any symbol must appear in at most one set. > > - Since there are a wide variety of currency symbols and codes, this > should be tried before the less ambiguous elements. It may be > helpful to develop a set of characters that can appear in a symbol > or code, based on the currency symbols in the locale. > > - Otherwise, a character should be ignored unless it is in the > "stop" set. This includes even characters that are meaningful for > formatting, for example, the grouping separator. > > - If more than one sign, currency symbol, exponent, or percent/per > mille occurs in the input, the first found should be used. > > - A currency symbol in the input should be interpreted as the > longest match found in the set of possible currency symbols. > > - Especially in cases of ambiguity, the user's input should be > echoed back, properly formatted according to the locale, before it > is actually used for anything. > ------------------------------------------------------------------------ > > -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Sun Jun 9 08:21:25 2013 From: tjreedy at udel.edu (Terry Jan Reedy) Date: Sun, 09 Jun 2013 02:21:25 -0400 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 6/9/2013 1:30 AM, Guido van Rossum wrote: > I'm beginning to feel that it was even a mistake to accept all those > other Unicode decimal digits, because it leads to the mistaken belief > that one can parse a number without knowing the locale. Python's > position about this is very different from what the heuristics you > quote seem to suggest: "refuse the temptation to guess" leads to our > current, simple rule which only accepts '.' as the decimal indicator, > and leaves localized parsing strictly to the application (or to some > other library module). The referenced algorithm is about extracting number literals, especially integel, out of text. Doing that 'properly' depends on local and purpose. Int is about converting a literal to its numberic value, whether written directly or extracted. Once a decision is made to convert a literal, conversion itself (for integers) is not location dependent. > Still, I suppose I could defend the current behavior from the > perspective of writing such a localized parser -- at some point you've > got a digit and you need to know its numeric value, and it is > convenient that int(c) does so. Exactly. I believe that is why it was extended to do that. tjr From abarnert at yahoo.com Sun Jun 9 09:22:43 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 9 Jun 2013 00:22:43 -0700 (PDT) Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <51B3DBA1.9060801@gmx.net> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <51B3DBA1.9060801@gmx.net> Message-ID: <1370762563.71270.YahooMailNeo@web184706.mail.ne1.yahoo.com> From: Mathias Panzenb?ck Sent: Saturday, June 8, 2013 6:34 PM > On 06/08/2013 08:02 PM, Stephen J. Turnbull wrote: >> >> Unicode is a >> universal character set in the sense that it can encode all >> characters, > > I guess Japanese people beg to differ. There are some Japanese symbols that > aren't covered by Unicode, > or at least not to the extend Japanese people would like it to be. Which is why > they use (Shift-)JIS a > lot of the time. Basically Shift-JIS <-> Unicode is not round trip safe. That's not true. Shift-JIS <-> Unicode 6.0 is completely round-trip safe. And there hasn't been a practical problem for UTF-8 or UTF-32 since they were introduced in Unicode 2.0 in the mid-90s. The problem is with UTF-16. Many early Unicode apps were built around UCS-2, a fixed-width 16-byte encoding. UCS-2 didn't have room for the extra characters that Japanese (among other languages) needed, so it was replaced with UTF-16, a variable-width 16-or-32-byte encoding. But historically, there's been a lot of software that treated UTF-16 as fixed-width (after all, you can test with hiragana and common kanji and it seems to work), which means it breaks if you give it any of the new characters added since the original version. This is sometimes still a problem today for Windows native apps. But again, it does not affect UTF-8, just UTF-16. Another reason people used Shift-JIS until a few years ago was emoji. But today, Unicode supports more emoji than Shift-JIS, and in fact people complain about only having the original 176 if they're forced to use Shift-JIS. Some Japanese people still refuse to use Unicode because of the Unihan controversy. Briefly: Characters like?? (U+5203) are drawn differently in Japanese and Chinese, but Unicode considers them the same character (to get the Chinese variation, you have to use a Chinese font). This is a problem?but Shift-JIS has the exact same problem. Finally,?for typical Japanese text,?Shift-JIS takes the fewest bits per character of any major charset. Shift-JIS takes 1 byte for ASCII, 2 bytes for everything else. UTF-8 takes 3 bytes for kana and kanji, so if you, e.g., download an article and store it as UTF-8, it'll get almost 50% bigger. UTF-16 solves that by making kana and most kanji 2 bytes (although uncommon kanji are 4), but it makes ASCII 2 bytes instead of 1, which means you double the size of many files. Shift-JIS is a pretty good compromise for compactness. From phd at phdru.name Sun Jun 9 10:51:22 2013 From: phd at phdru.name (Oleg Broytman) Date: Sun, 9 Jun 2013 12:51:22 +0400 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <1370740230.6093.YahooMailNeo@web184703.mail.ne1.yahoo.com> References: <1370740230.6093.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: <20130609085122.GA28749@iskra.aviel.ru> Hi! On Sat, Jun 08, 2013 at 06:10:30PM -0700, Andrew Barnert wrote: > if I create a text file with "type >foo.txt" or Notepad on a Russian system, I won't be able to open it on that English or Greek system either??? but at least I'll be able to open it on Python on the same Russian system. He-he, wrong! If you create a text file on a Russian system with "type >foo.txt" you get a file in a different encoding comparing with a file created in Notepad. Notepad creates files in the system encoding -- code page 1251 on Russian systems; "console" works with OEM encoding which in this example is cp 866. So if you create a text file on a Russian system with "type >foo.txt" and want to open it in Python you have to know in advance that the file is not in the system encoding but in OEM encoding, and what that OEM encoding is. I don't know how to query the OEM encoding. I'm sure pywin32 can do that, but how? Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From techtonik at gmail.com Sun Jun 9 11:26:45 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 9 Jun 2013 12:26:45 +0300 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <20130609085122.GA28749@iskra.aviel.ru> References: <1370740230.6093.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130609085122.GA28749@iskra.aviel.ru> Message-ID: On Sun, Jun 9, 2013 at 11:51 AM, Oleg Broytman wrote: > Hi! > > On Sat, Jun 08, 2013 at 06:10:30PM -0700, Andrew Barnert < > abarnert at yahoo.com> wrote: > > if I create a text file with "type >foo.txt" or Notepad on a Russian > system, I won't be able to open it on that English or Greek system > either??? but at least I'll be able to open it on Python on the same > Russian system. > > He-he, wrong! If you create a text file on a Russian system with > "type >foo.txt" you get a file in a different encoding comparing with a > file created in Notepad. Notepad creates files in the system encoding -- > code page 1251 on Russian systems; "console" works with OEM encoding > which in this example is cp 866. > So if you create a text file on a Russian system with "type >foo.txt" > and want to open it in Python you have to know in advance that the file > is not in the system encoding but in OEM encoding, and what that OEM > encoding is. > Right, Oleg. Thanks for reminding that on 90% of machines [1] the argument "reading file in the same encoding it was written" in not a reason to complicate open() logic with system defaults. I'd add that user files on these platforms are mostly binaries created with dedicated tools (Word etc.). Python programs are being typed in Notepad or Wordpad (because Notepad doesn't understand Unix linefeeds). 1. http://en.wikipedia.org/wiki/Usage_share_of_operating_systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Sun Jun 9 12:00:51 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 9 Jun 2013 13:00:51 +0300 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: Message-ID: On Sun, Jun 9, 2013 at 1:30 AM, Victor Stinner wrote: > Changing the default encoding of open() was already discussed 2 years ago. > See this discussion: > http://mail.python.org/pipermail/python-dev/2011-June/112086.html > > I did a long analysis of the Python standard library and I tried a > modified Python with a default encoding set to utf-8. > > The conclusion is that the locale encoding is the least worst choice. The > main reason is the compatibility with all other applications on the same > computer. Using a different encoding than the locale encoding leads quickly > to mojibake issues and other bugs. > Any default encoding means deterministic behavior for an open() call with the same set of input data. For a cross-platform language, as a programmer, you're responsible to detect the particular feature of operating > Just one example: configure script generates a Makefile using the locale > encoding, Python gets data from Makefile. If you use a path with non-ascii > character, use utf-8 in python whereas the locale is iso-8859-1, python > cannot be compiled anymore or will refuse to start. > I am not a C developer, but as SCons committer I don't know Python tools that directly work with Makefiles. To me that example with Makefile generated by configure is out of Python domain, so real examples are still welcome. > Remember the zen of python: explicit is better of implicit. So set > encoding parameter in your code. > =) And because of that Zen, the prototype to open is: open(..., encoding=None) instead of open(..., encoding='utf-8') or open(..., encoding=sys.encoding) This choice also breaks key Unix principle of doing one thing good, because it is not the responsibility of open() call to determine system encoding. Maybe sys.open() would be better for that? The subprocess precedent of overly complicated cross-platform logic are not surviving open source development, and it's a pity that it didn't serve a lesson for language design decision. > When i made the encoding mandatory in my test, more than 70% of calls to > open() used encoding="locale". So it's simpler to keep the current default > choice. > How many systems have you covered? Something makes me think that you had deterministic behavior for all your cases, because you run them on a single system. Most packages distributed from PyPI are designed to be cross-platform, and most of them use persistence schemes that are either pickled (speed) or system independent (portability). > The documentation can maybe be improved? > I doubt that it can be improved - simple Python functions are already complicated enough. I wish there was a reverse process of simplifying things back. Victor > Le 8 juin 2013 15:14, "anatoly techtonik" a ?crit : > >> Without reading subject of this letter, what is your idea about which >> encoding Python 3 uses with open() calls on a text file? Please write in >> reply and then scroll down. >> >> >> Without cheating my opinion was cp1252 (latin-1), because it was the way >> Python 2 assumed all text files are. Or Python 2 uses ISO-8859-1? >> >> But it appeared to be different way - >> http://docs.python.org/3/library/functions.html#open. No, it appeared >> here - https://bitbucket.org/techtonik/hexdump/pull-request/1/ and after >> a small lecture I realized how things are bad. >> >> open() in Python uses system encoding to read files by default. So, if >> Python script writes text file with some Cyrillic character on my Russian >> Windows, another Python script on English Windows or Greek Windows will not >> be able to read it. This is just what happened. >> >> The solution proposed is to specify encoding explicitly. That means I >> have to know it. Luckily, in this case the text file is my .py where I knew >> the encoding beforehand. In real world you can never know the encoding >> beforehand. >> >> So, what should Python do if it doesn't know the encoding of text file it >> opens: >> 1. Assume that encoding of text file is the encoding of your operating >> system >> 2. Assume that encoding of text file is ASCII >> 3. Assume that encoding of text file is UTF-8 >> >> Please write in reply and then scroll down. >> >> >> I propose three, because ASCII is a binary compatible subset of UTF-8. >> Choice one is the current behaviour, and it is very bad. Troubleshooting >> this issue, which should be very common, requires a lot of prior knowledge >> about encodings and awareness of difference system defaults. For >> cross-platform work with text files this fact implicitly requires you to >> always use 'encoding' parameter for open(). >> >> >> Is it enough for a PEP? This stuff is rather critical IMO, so even if it >> will be rejected there will be a documented design decision. >> -- >> anatoly t. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sun Jun 9 12:16:41 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 09 Jun 2013 19:16:41 +0900 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87ip1n8u7q.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Jan Reedy writes: > Once a decision is made to convert a literal, conversion itself > (for integers) is not location dependent. Sure, but your restriction to integers is specious, because Python itself doesn't restrict interpretation of non-ASCII decimals to either integers or base 10. From mal at egenix.com Sun Jun 9 12:52:20 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 09 Jun 2013 12:52:20 +0200 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: Message-ID: <51B45E64.80302@egenix.com> On 09.06.2013 00:30, Guido van Rossum wrote: > [Diverting to python-ideas, since this isn't as clear-cut as you think.] > > Why exactly is that expected behavior? What's the use case? (Surely > you don't have a keyboard that generates \u2212 when you hit the minus > key? :-) > > Is there a Unicode standard for parsing numbers? IIRC there are a > variety of other things marked as "digits" in the Unicode standard -- > do we do anything with those? If we do anything we should be > consistent. For now, I think we *are* consistent -- we only support > the ASCII representation of numbers. (And that's the only > representation we generate as output as well -- think about symmetry > too.) > > This page scares me: http://en.wikipedia.org/wiki/Numerals_in_Unicode Python supports the Unicode decimal numeric type code points (Numeric_Type=Decimal) and uses the field 6 value from the UnicodeData.txt database as decimal digit value: http://www.unicode.org/reports/tr44/#UnicodeData.txt You can e.g. write numbers using many different scripts, not only ASCII. Python uses an (internal) decimal codec for converting string representations of decimals to numeric values. There's no support for interpretation of minus and plus signs, or decimal dots, except for the ASCII ones. Support for these would have to be added to number parsing code. Here's the code range for mathematical operators: http://www.unicode.org/charts/PDF/U2200.pdf > --Guido > > On Sat, Jun 8, 2013 at 2:49 PM, ?ukasz Langa wrote: >> Expected behaviour: >>>>> float('\N{MINUS SIGN}12.34') >> -12.34 >> >> >> Current behaviour: >> Traceback (most recent call last): >> ... >> ValueError: could not convert string to float: '?12.34' >> >> >> Please note: '\N{MINUS SIGN}' == '\u2212' >> >> -- >> Best regards, >> ?ukasz Langa >> >> WWW: http://lukasz.langa.pl/ >> Twitter: @llanga >> IRC: ambv on #python-dev >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > > > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 09 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 22 days to go 2013-07-16: Python Meeting Duesseldorf ... 37 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From techtonik at gmail.com Sun Jun 9 13:09:34 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 9 Jun 2013 14:09:34 +0300 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <1370740230.6093.YahooMailNeo@web184703.mail.ne1.yahoo.com> References: <1370740230.6093.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: On Sun, Jun 9, 2013 at 4:10 AM, Andrew Barnert wrote: > From: anatoly techtonik > Sent: Saturday, June 8, 2013 6:13 AM > > >open() in Python uses system encoding to read files by default. So, if > Python script writes text file with some Cyrillic character on my Russian > Windows, another Python script on English Windows or Greek Windows will not > be able to read it. This is just what happened. > > > True. But if I create a text file with "type >foo.txt" or Notepad on a > Russian system, I won't be able to open it on that English or Greek system > either? but at least I'll be able to open it on Python on the same Russian > system. If you changed Python to ignore the locale, that would cause a new > problem (the latter would no longer be true) without fixing any existing > problem (the former would still not be true). I'd say that all popular user software on two language Windows that opens plain text files uses auto-detection of encoding. Unlike Linux, where such bugs are not important (I an speaking about GEdit in particular), because people sure that I will open only ASCII files on my Ubuntu box. > >The solution proposed is to specify encoding explicitly. That means I > have to know it. Luckily, in this case the text file is my .py where I knew > the encoding beforehand. In real world you can never know the encoding > beforehand. > > This is an inherent problem that Python didn't cause, and can't solve. As > long as there are text files in different encodings out there, you need to > pass the encoding out-of-band. If you're behind the process that creates > the files (whether it's a program you wrote, or options you set in > Notepad's Save As dialog), you can just make sure to use the same encoding > on every system, and you have no problem. But if you need to deal with > files that others have created, that won't work. Right. So this way or the other - you will inevitably face with the problem that user installed your Python program to open file that is in different encoding than expected. The major difference is in the procedure that you, as a software developer, will undergo to troubleshoot and cover the issue. In case of implicit system encoding you'll have to deal with magic of detecting user system encoding, will have to mock this magic in your tests. In case of explicit 'utf-8' setting you will fail to open it right away and it will be system independent -1 to head ache. >So, what should Python do if it doesn't know the encoding of text file it > opens: > >1. Assume that encoding of text file is the encoding of your operating > system > >2. Assume that encoding of text file is ASCII > >3. Assume that encoding of text file is UTF-8 > > > > >Please write in reply and then scroll down. > > That order happens to be exactly my preference. #1 helps for one very > common problem?files created by other programs on the same machine. #2 is > generally at least safe, in that you'll get an error instead of mojibake. > #3 doesn't really help anything. > So far 5+ vs 1-, and 4 people without personal preference. IIRC, Python 2 on Windows open()ing text files with operating system encoding always results in error. As for mojibaje you need to be very explicit in Python to get it. >I propose three, because ASCII is a binary compatible subset of UTF-8. > Choice one is the current behaviour, and it is very bad. Troubleshooting > this issue, which should be very common, requires a lot of prior knowledge > about encodings and awareness of difference system defaults. For > cross-platform work with text files this fact implicitly requires you to > always use 'encoding' parameter for open(). > > I'm not sure what you mean by "cross-platform" here. Most non-Windows > platforms nowadays set the locale to UTF-8 by default (and if you're using > an older *nix, or deliberately chose not to use UTF-8 even though it's the > default, you already know how to deal with these issues). So, it's really a > Windows problem, if anything. > This argument makes me assume that perhaps people who are not on Windows and not using localized OS version are not fully realize the problem, because their native encoding is already UTF-8. Therefore they receive and commit files in UTF-8 and don't see when three is a file created on different system with different encoding, which will cause problems. Switching to UTF-8 would make it harder to read and write files created by > other programs on the same machine Sorry for breaking the quote, but this argument is not generally valid. There is a high percentage of software that by default, explicitly create plain text files in UTF-8 encoding regardless of system defaults. I may even assume that correct percentage of such programs is higher. > ?and it still wouldn't magically make you able to read and write files > created on other machines, unless you only care about files created on > recent *nix platforms. The only case it would help is making it easier to > read and write files created by _your program_ without worrying about the > local machine. While that isn't _nothing_, I don't think it's so important > that we can just dismiss dealing with files created by other programs. > I can not agree with your generic priorities in approach to application desing, which are: 1. Program should be able to read 3rd-party files produced on the same system 2. Program should be able to read its own files on any system My choice is 2 then 1. > After all, you're presumably using plain text files, rather than some > binary format or JSON or YAML or XML or whatever, because you want users to > be able to view and edit those files, right? > Vice versa. For user editable files I make sure that it is JSON, YAML or XML that can be easy validated against user errors. System configuration files are using UTF-8 compatible English ASCII set, all other files are source files that are checked out into version control system and by definition should be cross-platform compatible. My Python tools work with platform independent files, and follow the "In the face of ambiguity, refuse the temptation to guess." Zen by either detecting or requiring specific encoding standard. Their behavior was deterministic until I ported them to Python 3. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Sun Jun 9 13:13:01 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 09 Jun 2013 13:13:01 +0200 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51B4633D.1070305@egenix.com> On 09.06.2013 08:21, Terry Jan Reedy wrote: > On 6/9/2013 1:30 AM, Guido van Rossum wrote: >> I'm beginning to feel that it was even a mistake to accept all those >> other Unicode decimal digits, because it leads to the mistaken belief >> that one can parse a number without knowing the locale. Python's >> position about this is very different from what the heuristics you >> quote seem to suggest: "refuse the temptation to guess" leads to our >> current, simple rule which only accepts '.' as the decimal indicator, >> and leaves localized parsing strictly to the application (or to some >> other library module). > > The referenced algorithm is about extracting number literals, especially integel, out of text. Doing > that 'properly' depends on local and purpose. Int is about converting a literal to its numberic > value, whether written directly or extracted. Once a decision is made to convert a literal, > conversion itself (for integers) is not location dependent. > >> Still, I suppose I could defend the current behavior from the >> perspective of writing such a localized parser -- at some point you've >> got a digit and you need to know its numeric value, and it is >> convenient that int(c) does so. > > Exactly. I believe that is why it was extended to do that. I added this support as part of the Unicode integration in Python 1.6 in order to support the full range of Unicode digits - just like we support more characters and mappings for e.g. line breaks, lower/upper case, etc. in Unicode than in 8-bit character sets. Note that the decimal encoder does not support right-to-left scripts. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 09 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 22 days to go 2013-07-16: Python Meeting Duesseldorf ... 37 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From rosuav at gmail.com Sun Jun 9 13:24:08 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 9 Jun 2013 21:24:08 +1000 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: <1370740230.6093.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: On Sun, Jun 9, 2013 at 9:09 PM, anatoly techtonik wrote: > I can not agree with your generic priorities in approach to application > desing, which are: > 1. Program should be able to read 3rd-party files produced on the same > system > 2. Program should be able to read its own files on any system > > My choice is 2 then 1. #2 is easily achieved: just provide a specific encoding. It's your program, so you can control what it writes. ChrisA From steve at pearwood.info Sun Jun 9 13:38:25 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 09 Jun 2013 21:38:25 +1000 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51B46931.1080209@pearwood.info> On 09/06/13 15:30, Guido van Rossum wrote: > I'm beginning to feel that it was even a mistake to accept all those > other Unicode decimal digits, because it leads to the mistaken belief > that one can parse a number without knowing the locale. You can parse unsigned integers, at least, without knowing the locale. Decimal digits are unambiguously decimal digits, no matter the locale. "?" \N{THAI DIGIT TWO} does not cease to be a character representing the number two just because you're not in Thailand, just as "2" does not cease to have the same meaning once you enter Thailand. People might not recognise those digits, but they are still unambiguous. Even the order of digits is, as far as I know, always Big Endian (most significant digit on the left) even for right-to-left or bidirectional scripts. So as I see it, int() should certainly support non-ASCII decimal digits. float() is a bit trickier, because of course here you do need to know the locale to tell whether . or , or ? \N{MIDDLE DOT} is a radix point or an error. And I'm not sure what conventions there are for exponents. But even if float() is not fully locale-compliant, it seems rather silly for it to be more restrictive than int() -- since int('?') correctly returns 2, I think it is reasonable for float('?.?') to return 2.1. So +1 on the current behaviour for int, float and Decimal: I think they make the right compromises, without being excessively complex or unreasonably restrictive. As far as supporting non-ASCII plus and minus signs, I'm keen in principle but luke-warm in practice. I think it would be a Nice To Have, and if somebody did the work to identify which characters should be accepted, I'd support adding it as a new feature. But I don't think that the lack of support for non-ASCII numeric signs is a bug. -- Steven From stephen at xemacs.org Sun Jun 9 14:41:17 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 09 Jun 2013 21:41:17 +0900 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <51B46931.1080209@pearwood.info> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> Message-ID: <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > So as I see it, int() should certainly support non-ASCII decimal > digits. *Python* sure should support non-ASCII decimal digits. But for int(), which handles types other than Unicode, and bases other than 10, what that means is hardly obvious. Eg, is "?" \N{THAI DIGIT TWO} an octal digit? What transliteration of "1B" into Thai would be a hexadecimal representation? Does "\x?B" represent ASCII '+'? (It doesn't.) > But even if float() is not fully locale-compliant, it seems rather > silly for it to be more restrictive than int() -- since int('?') > correctly returns 2, I think it is reasonable for float('?.?') to > return 2.1. But Python goes much farther. float('?.?') also returns 2.1 (not to mention that int('??') returns 21). The cat is apparently out of the bag already, but it seems to me that providing this extension in *builtins* was rash. From ubershmekel at gmail.com Sun Jun 9 15:13:10 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sun, 9 Jun 2013 16:13:10 +0300 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: <1370740230.6093.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: On Sun, Jun 9, 2013 at 2:24 PM, Chris Angelico wrote: > On Sun, Jun 9, 2013 at 9:09 PM, anatoly techtonik > wrote: > > I can not agree with your generic priorities in approach to application > > desing, which are: > > 1. Program should be able to read 3rd-party files produced on the same > > system > > 2. Program should be able to read its own files on any system > > > > My choice is 2 then 1. > > #2 is easily achieved: just provide a specific encoding. It's your > program, so you can control what it writes. > > Both are easily achieved. The discussion is about open()'s default, implicit behavior. I'm in favor of #2 with utf-8 because it's consistent, easing python collaboration across platforms, and does the right thing. Albeit an opinionated thing. Those who worship false encodings turn their backs on all utf-8's mercies... Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jun 9 16:05:23 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 10 Jun 2013 00:05:23 +1000 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51B48BA3.70502@pearwood.info> On 09/06/13 22:41, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > So as I see it, int() should certainly support non-ASCII decimal > > digits. > > *Python* sure should support non-ASCII decimal digits. > > But for int(), which handles types other than Unicode, and bases other > than 10, what that means is hardly obvious. Eg, is "?" \N{THAI DIGIT > TWO} an octal digit? I don't see why not. Octal digits are a strict subset of decimal digits, they aren't different digits. Octal 2 and decimal 2 are the same 2. Why should octal ? be considered different from decimal ?? > What transliteration of "1B" into Thai would be > a hexadecimal representation? I see where you are going with that question. You're suggesting that if we can translate ASCII digit 1 to Thai digit ?, we should also be able to translate ASCII hexdigit B to ... something in Thai, but what? A nice thought, but no. Even if Thai has an equivalent to B, which it may not, unless it is flagged by the Unicode Consortium as a hex digit, it doesn't count. For example, Russian ? is equivalent to the English B, but unlike B does not have the Hex_Digit property and is not treated as a hex digit: http://www.unicode.org/faq/hex-digit-values.txt So the answer to your question is: py> int('\N{THAI DIGIT ONE}B', 16) 27 > Does "\x?B" represent ASCII '+'? (It doesn't.) And nor should it. Python is rightly conservative about the use of non-ASCII symbols in the language itself, and \x string escapes is part of Python's syntax, not data. It has nothing to do with the question of what int() considers a valid numeric string. I'm very happy to have Python be more restricted in what it will accept as source than int, float and Decimal accept as data. For example, I do not suggest, and would not support, that Python accepts non-ASCII numeric *literals*. (If other languages wish to blaze that trail first, more power to them.) > > But even if float() is not fully locale-compliant, it seems rather > > silly for it to be more restrictive than int() -- since int('?') > > correctly returns 2, I think it is reasonable for float('?.?') to > > return 2.1. > > But Python goes much farther. float('?.?') also returns 2.1 (not to > mention that int('??') returns 21). Yes. And why is this a problem? There is no ambiguity. It might look untidy to be mixing Arab and Thai numerals in the same number, but it is still well-defined. The first digit has numeric value 2, the second has numeric value 1, and they both are decimal digits. It's no worse than having int('Aa', 16) return 170. If you don't like the look of mixed upper and lowercase hexdigits, don't mix them. Python shouldn't and doesn't enforce that sense of aesthetics. -- Steven From ncoghlan at gmail.com Sun Jun 9 16:10:03 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 10 Jun 2013 00:10:03 +1000 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: <1370740230.6093.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: On 9 June 2013 23:13, Yuval Greenfield wrote: > On Sun, Jun 9, 2013 at 2:24 PM, Chris Angelico wrote: >> >> On Sun, Jun 9, 2013 at 9:09 PM, anatoly techtonik >> wrote: >> > I can not agree with your generic priorities in approach to application >> > desing, which are: >> > 1. Program should be able to read 3rd-party files produced on the same >> > system >> > 2. Program should be able to read its own files on any system >> > >> > My choice is 2 then 1. >> >> #2 is easily achieved: just provide a specific encoding. It's your >> program, so you can control what it writes. >> > > Both are easily achieved. > > The discussion is about open()'s default, implicit behavior. I'm in favor of > #2 with utf-8 because it's consistent, easing python collaboration across > platforms, and does the right thing. Albeit an opinionated thing. There is no discussion to be had, as the default is not going to change. The decision to favour local system compatibility over cross platform consistency was made long ago and is no longer open to negotiation. In a distributed environment where all the systems are correctly to use UTF-8, then Python's default makes no difference (you will get UTF-8 either way). If systems are NOT configured to use UTF-8, then Python must choose between compatibility with other applications on that system and with Python installations on other systems that may be configured differently. If people don't want the system encoding, then they have to be explicit about the encoding they want, ensuring cross-platform consistency *regardless* of Python's default. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Sun Jun 9 18:13:20 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 10 Jun 2013 01:13:20 +0900 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <51B48BA3.70502@pearwood.info> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> <51B48BA3.70502@pearwood.info> Message-ID: <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > > But Python goes much farther. float('?.?') also returns 2.1 (not to > > mention that int('??') returns 21). > > Yes. And why is this a problem? There is no ambiguity. It might > look untidy to be mixing Arab and Thai numerals in the same number, > but it is still well-defined. To whom? Unicode didacts, maybe, but I doubt there are any real users who would consider that well-defined. So the same arguments you made for not permitting non-ASCII numerals in Python source code apply here, although they are somewhat weaker when applied to numeric data expressed as text. In any case, there's not really that much use for this generality of numerals. On the one hand, I think these days anyone who uses information technology is fluent in ASCII numeration. On the other, if you want to allow people to write in other scripts, you probably are dealing with "naive" users who should be allowed to use grouping characters and the usual conventions for their locale, and int () and float() just aren't good enough anyway. From stephen at xemacs.org Sun Jun 9 18:18:41 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 10 Jun 2013 01:18:41 +0900 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87ehcb8dge.fsf@uwakimon.sk.tsukuba.ac.jp> Yuval Greenfield writes: > Personally I favor the first because more often than not files > aren't encoded in the platform's chosen encoding, so it's better to > be explicit and consistent. I've been doing development of multilingual and multiscript software for two decades. As much as I'd like to agree with you, in my experience you're wrong by a large factor where it matters: text files where there's an issue of "guessing" the encoding. Those are far more often than not encoded in the platform's default encoding. If there's no guessing involved, then explicitly specifying the known encoding is an inconvenience, indeed. But is it really that big an inconvenience? From lukasz at langa.pl Sun Jun 9 18:47:15 2013 From: lukasz at langa.pl (=?utf-8?Q?=C5=81ukasz_Langa?=) Date: Sun, 9 Jun 2013 18:47:15 +0200 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <51B45E64.80302@egenix.com> References: <51B45E64.80302@egenix.com> Message-ID: <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> On 9 cze 2013, at 12:52, M.-A. Lemburg wrote: > There's no support for interpretation of minus and plus signs, > or decimal dots, except for the ASCII ones. Support for these > would have to be added to number parsing code. > > Here's the code range for mathematical operators: > > http://www.unicode.org/charts/PDF/U2200.pdf The reason I expect those to be handled properly is that they are totally unambiguous. Unless anyone can point me to a case where \N{MINUS SIGN} should not be treated as a (duh) minus sign, we should go and try to make life easier for our users by adopting at least a few of such characters. We can flesh out which ones in Issue 6632 if there's general agreement that we can. I believe we do. More importantly, this is not a theoretical musing. Wikipedia started serving numerical data with \N{MINUS SIGN} instead of 0x2D, for instance on climate tables. I think we'll see increased usage of such characters in the wild. -- Best regards, ?ukasz Langa WWW: http://lukasz.langa.pl/ Twitter: @llanga IRC: ambv on #python-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sun Jun 9 19:25:15 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 10 Jun 2013 02:25:15 +0900 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <51B3DBA1.9060801@gmx.net> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <51B3DBA1.9060801@gmx.net> Message-ID: <87d2rv8adg.fsf@uwakimon.sk.tsukuba.ac.jp> Mathias Panzenb?ck writes: > On 06/08/2013 08:02 PM, Stephen J. Turnbull wrote: > > > > Unicode is a universal character set in the sense that it can > > encode all characters, > > I guess Japanese people beg to differ. There are some Japanese > symbols that aren't covered by Unicode, I've heard that said, but that's simply not the reality I have faced in my day job here in Tsukuba, Japan since 1990. People do use Shift JIS still, simply because there are a large number of legacy systems using it. But I've never heard complaints that Unicode lacks necessary characters.[1] I have seen it go the other way around, though, because the JIS standard unifies some traditional variant glyphs that some Unicode source distinguishes, and some people prefer them for their names. Footnotes: [1] Except emoticons. But even that has been fixed recently. From alexander.belopolsky at gmail.com Sun Jun 9 21:35:23 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 9 Jun 2013 15:35:23 -0400 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> Message-ID: On Sun, Jun 9, 2013 at 12:47 PM, ?ukasz Langa wrote: > Wikipedia started serving numerical data with \N{MINUS SIGN} instead of > 0x2D, for instance on climate tables. I think we'll see increased usage of > such characters in the wild. While I don't think MINUS SIGN can be abused this way, you should be very careful when you copy numerical data from the web. Consider this case: >>> float('123?95') 123095.0 Depending on your font, '123?95' may be indistinguishable from '123.95'. If you do research using numerical data published on the web, you will be well advised not to assume that anything that looks like a number to your eye can be fed to python's float(). -------------- next part -------------- An HTML attachment was scrubbed... URL: From ubershmekel at gmail.com Sun Jun 9 21:59:00 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sun, 9 Jun 2013 22:59:00 +0300 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <87ehcb8dge.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <87ehcb8dge.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Jun 9, 2013 at 7:18 PM, Stephen J. Turnbull wrote: > Yuval Greenfield writes: > > > Personally I favor the first because more often than not files > > aren't encoded in the platform's chosen encoding, so it's better to > > be explicit and consistent. > > I've been doing development of multilingual and multiscript software > for two decades. As much as I'd like to agree with you, in my > experience you're wrong by a large factor where it matters: text files > where there's an issue of "guessing" the encoding. Those are far more > often than not encoded in the platform's default encoding. > > I'm always glad to learn and agree to disagree. I'm only 8 years in the "development of multilingual and multiscript software". Living in Israel - Hebrew compatibility has been the nuisance and these are the encodings I had to fight: utf-8, ucs-2, utf-16, ucs-4, ISO-8859-8, ISO-8859-8-I, Windows-1255. It's plagued websites, browsers, email clients, adobe photoshop and premiere, excel, word, and powerpoint. It's always been a guessing game when a friend would call for help proclaiming "all I'm getting is Chinese" which is the written gibberish euphemism used around here. Sometimes it's just the word or letter ordering that's messed up (Hebrew is an RTL language). Most Israelis have experienced and fear this phenomenon. If I were to try and fix a problem I'd either be using notepad with its heuristics or iterating through the above options. Sometimes the above encodings were the platform's (windows') default encoding, but in my experience it was mainly applications or websites that chose their encoding for whatever reasons. E.g. Windows Internals 4th edition promoted ucs-2 as the killer encoding that all windows applications should be implemented with. Though I remember a VBScript of a friend spawning a ucs-4 csv file that turned into Chinese when opened in Excel. I did not check which one of those if any was the system default encoding at the time. So I appreciate an app being consistent and promoting utf-8 more than being compliant with the operating system, which the apps I've used don't comply with. Another related annoyance I struggled with recently was that git gives you the platform's newline scheme, which means I can't have a git repository in a dropbox shared between Windows and Ubuntu without meddling with this stuff (the solution is a repo config file). If there's no guessing involved, then explicitly specifying the known > encoding is an inconvenience, indeed. But is it really that big an > inconvenience? > > It's perfectly fine. Perhaps you guys are used to more os-encoding-abiding applications and value that quality. That kind of consistency indeed would have saved me from at least some heart ache. I just wish we can get rid of these problems for good, and promoting utf-8 everywhere is one way to go about it. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From tismer at stackless.com Sun Jun 9 21:52:49 2013 From: tismer at stackless.com (Christian Tismer) Date: Sun, 09 Jun 2013 21:52:49 +0200 Subject: [Python-ideas] Line continuations with comments In-Reply-To: <519D51AF.1080505@gmail.com> References: <519D51AF.1080505@gmail.com> Message-ID: <51B4DD11.4030006@stackless.com> On 23.05.13 01:15, Ron Adam wrote: > > I'm not sure why some people dislike the back slash so much as a > continuation tool. Python is the language that avoids using {braces > around blocks}, so you'd think it would be the other way around. > > I really don't think the '\' will be over used. New programmers do > try a lot of strange things at first for making their own programs > easier to read, (I did), but usually they come around to what the > community practices and recommends. > I think the reason that people don't like the '\' is that it opens up so many bad memories: - having to use them inside C macros - having to deal with it on windows - felt unpythonic because it is a low-level trick to produce longer lines Especially the unpythonic feeling comes from the fact that '\' either has special meaning in strings (which is accepted) or has very restricted use outside of strings: there must not be anything but the line end after it. So by lifting the usability of '\' to allow comments after it makes it from an ugly left-over C feature into something useful that people _want_ to use for splitting the continuation of something from the comments. I think after introducing it, the bad prejudice will vanish. The mental change is from "a hack because the line doesn't fit" into "a well-formed structuring tool to split code and comment"... cheers - chris > >> >> >> The reason \# works, but not \ #, is when the comment comes directly >> after the back slash, it's removed and leaves a (backslash + >> new-line) >> pair. >> >> >> Whether we require there be no space between \ and # or, conversely, >> require there be at least one whitespace character should not be >> based on >> the relative ease of patching the current code. Personally, I would >> prefer >> the latter as I believe requiring a space after \ will increase >> readability. > > My preference is to allow any number of spaces before and after the > '\'. and that any comments after the slash not change what it means. > > A built in single space rule would probably frustrate people who don't > want to do it that way. Like how you get a character after a > continuation error, even if it's only a single space. Yeah, it's how > it works, but it's still annoying to get that error in that situation. > > A comment of course, would still uses up the rest of the line. So a > '\' after '#' is just part of the comment. > > > Cheers, > Ron > > > > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From abarnert at yahoo.com Sun Jun 9 22:56:18 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 9 Jun 2013 13:56:18 -0700 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> <51B48BA3.70502@pearwood.info> <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <636B9BDF-E1E6-4E51-8ADE-513B8B1851BB@yahoo.com> On Jun 9, 2013, at 9:13, "Stephen J. Turnbull" wrote: > Steven D'Aprano writes: > >>> But Python goes much farther. float('?.?') also returns 2.1 (not to >>> mention that int('??') returns 21). >> >> Yes. And why is this a problem? There is no ambiguity. It might >> look untidy to be mixing Arab and Thai numerals in the same number, >> but it is still well-defined. > > To whom? Unicode didacts, maybe, but I doubt there are any real users > who would consider that well-defined. To anyone who would conceivably every type or paste that into a program that expects an integer, there's only one meaning it could have. To anyone else, it can't ever matter. Also, consider how much more complicated parsing gets. Instead of just getting the digit value, you also have to get the group that the value comes from and make sure you haven't gotten any digits from an incompatible group yet. Why write, debug, and maintain that code? > So the same arguments you made > for not permitting non-ASCII numerals in Python source code apply > here, although they are somewhat weaker when applied to numeric data > expressed as text. > > In any case, there's not really that much use for this generality of > numerals. On the one hand, I think these days anyone who uses > information technology is fluent in ASCII numeration. On the other, > if you want to allow people to write in other scripts, you probably > are dealing with "naive" users who should be allowed to use grouping > characters and the usual conventions for their locale, and int () and > float() just aren't good enough anyway. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From abarnert at yahoo.com Sun Jun 9 23:07:54 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 9 Jun 2013 14:07:54 -0700 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> Message-ID: On Jun 9, 2013, at 12:35, Alexander Belopolsky wrote: > > On Sun, Jun 9, 2013 at 12:47 PM, ?ukasz Langa wrote: >> Wikipedia started serving numerical data with \N{MINUS SIGN} instead of 0x2D, for instance on climate tables. I think we'll see increased usage of such characters in the wild. > > While I don't think MINUS SIGN can be abused this way, you should be very careful when you copy numerical data from the web. Consider this case: > > >>> float('123?95') > 123095.0 > > Depending on your font, '123?95' may be indistinguishable from '123.95'. > > If you do research using numerical data published on the web, you will be well advised not to assume that anything that looks like a number to your eye can be fed to python's float(). That's good general advice, but what's the specific advice in this case? You want data from a Wikipedia page, you've looked at it and verified that what looks like -123.45 actually is that float, even though the first character is a Unicode minus, so? you should write your own parser, or at least explicitly call x.replace('\N{MINUS SIGN}', '-')) before you can feed x to float (or a numpy array constructor, or whatever)? If this were some nonstandard or ambiguous case, I'd say yes. But if it has exactly one possible standardized meaning, and it really is (or is becoming) a common, standard way to represent negative floats in text, I don't see the harm in accepting it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sun Jun 9 23:34:19 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 9 Jun 2013 17:34:19 -0400 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> Message-ID: On Sun, Jun 9, 2013 at 5:07 PM, Andrew Barnert wrote: > On Jun 9, 2013, at 12:35, Alexander Belopolsky < > alexander.belopolsky at gmail.com> wrote: > > .. > If you do research using numerical data published on the web, you will be > well advised not to assume that anything that looks like a number to your > eye can be fed to python's float(). > > > That's good general advice, but what's the specific advice in this case? > You want data from a Wikipedia page, you've looked at it and verified that > what looks like -123.45 actually is that float, even though the first > character is a Unicode minus, so? you should write your own parser, or at > least explicitly call x.replace('\N{MINUS SIGN}', '-')) before you can feed > x to float (or a numpy array constructor, or whatever)? > My specific advise would be to use a parser that would reject anything other than well-formatted numbers according to the specs for this particular data source. That parser should definitely reject non-ascii digits and possibly even reject ascii '-' because that may be an indication of vandalism. Note that python float() is a wrong choice for this task regardless of what we decide to do with '\N{MINUS SIGN}', but if we make float() more promiscuous, it will become more likely that it will be used naively with data scrubbed from web pages. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun Jun 9 23:52:32 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 9 Jun 2013 14:52:32 -0700 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <87ehcb8dge.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <24890DE3-D1CB-451D-8DA1-6F8D5D0DEAC7@yahoo.com> On Jun 9, 2013, at 12:59, Yuval Greenfield wrote: > It's plagued websites, browsers, email clients, adobe photoshop and premiere, excel, word, and powerpoint. It's always been a guessing game when a friend would call for help proclaiming "all I'm getting is Chinese" which is the written gibberish euphemism used around here. Sometimes it's just the word or letter ordering that's messed up (Hebrew is an RTL language). Most Israelis have experienced and fear this phenomenon. > > If I were to try and fix a problem I'd either be using notepad with its heuristics or iterating through the above options. There's definitely a case to be made for implementing some kind of Notepad-like heuristics in Python. It would be great to be able to do this at the interactive interpreter: line = text.partition('\n')[0] for encoding in codecs.guess(text)[:10]: print(encoding, line.decode(encoding)) In fact, if you wrote that at pushed it to PyPI I'd start using it today, and maybe even lobbying for its inclusion in the stdlib. But I wouldn't want open to use it, and I don't think you would either. > Sometimes the above encodings were the platform's (windows') default encoding, but in my experience it was mainly applications or websites that chose their encoding for whatever reasons. But open wouldn't affect those things anyway. You're dealing with urlopen or socket.makefile or a file you've opened as binary and extracted text from. The fact that local text files benefit from assuming the default encoding, but nothing else does, is an argument for, not against, the 3.x status quo: open (in text mode) assumes the default encoding, even though nothing else does. > E.g. Windows Internals 4th edition promoted ucs-2 as the killer encoding that all windows applications should be implemented with. Yes, Microsoft strongly encouraged first UCS-2, then UTF-16, consistently for over 15 years, and their APIs are still all built around it. But note that the one exception they've always made is in text files. If you want to save a file in UTF-16, you don't use the "narrow" API functions and en/decode to UTF-16; you use the wide functions. The narrow functions are for writing in the OEM codepage. That's why programs like Notepad, going back to 95 and NT 3 gave you separate "Save As" options for "Text File" and "Unicode Text File", instead of a pulldown or checkbox to select an encoding. A (narrow) text file is a file in your OEM code page, period. For a while, Microsoft encouraged you to save files in UTF-8 with an explicit BOM, even though that's strongly discouraged by the standards. But they've never suggested just writing UTF-8 to text files, without the BOM, instead of using the OEM codepage. I'm not saying that this was a good decision by Microsoft, or that it doesn't have bad repercussions today. Just that the least bad answer for Python on Windows is what Python 3 already does. > Perhaps you guys are used to more os-encoding-abiding applications and value that quality. Yes, it has helped me numerous times in the past, with ini files and log files, files generated by DOS programs and ports from Unix, etc. I've been able to take Python code that I wrote on other platforms and use it on Windows and?not every time, but more often than not?it just worked. And in the cases where it didn't work, it's usually been because the Windows files were in UTF-16, so defaulting to UTF-8 wouldn't have helped anything. > That kind of consistency indeed would have saved me from at least some heart ache. I just wish we can get rid of these problems for good, and promoting utf-8 everywhere is one way to go about it. I agree wholeheartedly. But the only reasonable fix that will solve the problem for text files is Microsoft doing what Apple, Red Hat, Ubuntu, Google, etc. did?ship systems where the default encoding is UTF-8 in every region. And I don't think Python working poorly with local text files on Windows would be a significant stick beating them in that direction. From alexander.belopolsky at gmail.com Mon Jun 10 00:12:21 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 9 Jun 2013 18:12:21 -0400 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <636B9BDF-E1E6-4E51-8ADE-513B8B1851BB@yahoo.com> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> <51B48BA3.70502@pearwood.info> <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> <636B9BDF-E1E6-4E51-8ADE-513B8B1851BB@yahoo.com> Message-ID: On Sun, Jun 9, 2013 at 4:56 PM, Andrew Barnert wrote: > Also, consider how much more complicated parsing gets. Instead of just > getting the digit value, you also have to get the group that the value > comes from and make sure you haven't gotten any digits from an incompatible > group yet. Why write, debug, and maintain that code? This is not that hard because the latest Unicode standard guarantees that decimal digits are "encoded in a contiguous range, with ascending order of Numeric_Value, and with the digit zero as the first code point in the range." (See 4.6 Numeric Value / Decimal Digits. < http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf>.) In other words, two digits belong to the same group if they have the same ord(x) - int(x) value. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Jun 10 00:13:01 2013 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 10 Jun 2013 08:13:01 +1000 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <24890DE3-D1CB-451D-8DA1-6F8D5D0DEAC7@yahoo.com> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <87ehcb8dge.fsf@uwakimon.sk.tsukuba.ac.jp> <24890DE3-D1CB-451D-8DA1-6F8D5D0DEAC7@yahoo.com> Message-ID: On Mon, Jun 10, 2013 at 7:52 AM, Andrew Barnert wrote: > There's definitely a case to be made for implementing some kind of Notepad-like heuristics in Python. It would be great to be able to do this at the interactive interpreter: > > line = text.partition('\n')[0] > for encoding in codecs.guess(text)[:10]: > print(encoding, line.decode(encoding)) > > In fact, if you wrote that at pushed it to PyPI I'd start using it today, and maybe even lobbying for its inclusion in the stdlib. > > But I wouldn't want open to use it, and I don't think you would either. Hang on, you can't partition it on the Unicode string '\n' while it's still a bytes :) But I agree, this would be a neat feature. It ought to be able to guess ASCII or UTF-8 with near-certainty, UTF-16 if it has a BOM, and other things heuristically. Would help a lot when I'm trying to answer Nikos on python-list... ChrisA From abarnert at yahoo.com Mon Jun 10 00:15:09 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 9 Jun 2013 15:15:09 -0700 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> Message-ID: On Jun 9, 2013, at 14:34, Alexander Belopolsky wrote: > > On Sun, Jun 9, 2013 at 5:07 PM, Andrew Barnert wrote: >> On Jun 9, 2013, at 12:35, Alexander Belopolsky wrote: >>> .. >>> If you do research using numerical data published on the web, you will be well advised not to assume that anything that looks like a number to your eye can be fed to python's float(). >> >> That's good general advice, but what's the specific advice in this case? You want data from a Wikipedia page, you've looked at it and verified that what looks like -123.45 actually is that float, even though the first character is a Unicode minus, so? you should write your own parser, or at least explicitly call x.replace('\N{MINUS SIGN}', '-')) before you can feed x to float (or a numpy array constructor, or whatever)? > > > My specific advise would be to use a parser that would reject anything other than well-formatted numbers according to the specs for this particular data source. Seriously? That's going to be a couple orders of magnitude slower, and much, much more complicated (and therefore buggy) than just calling float. Even if you need validation for your use case, it's a lot simpler, and faster, to validate then call float, than to parse it manually. And the obvious definition of success for this code is that it returns the same thing that validate-and-float would. > That parser should definitely reject non-ascii digits and possibly even reject ascii '-' because that may be an indication of vandalism. Except that wikipedia doesn't transition all at once, and never will. There are pages with each minus sign, and even pages with both minus signs. Readers, except for a few zealots, don't care about the difference. People writing scrapers, except scrapers used as tools for helping the transition, don't care either. Do you really think that every time a Wikipedia page deviates from current (often recently-changed) standards, that's evidence of vandalism, and therefore all information on that page should be ignored? And even sites that aren't continuously edited will have similar cases?e.g., all pages created before the flag day have one minus sign, those created after have the other?and possibly a few fall a couple days on the wrong side of the line because they were already in processing when the changeover happened. > Note that python float() is a wrong choice for this task regardless of what we decide to do with '\N{MINUS SIGN}', Why? Maybe you want Decimal instead of float, but then the same arguments apply there. Otherwise, in what way is float wrong for parsing floating point string representations into numbers? > but if we make float() more promiscuous, it will become more likely that it will be used naively with data scrubbed from web pages. Which makes it more likely that people will write those programs, which work, instead of failing to write anything. It's like arguing that BeautifulSoup is bad because it allows you to write HTML scraping code without understanding and dealing with the total HTML structure. It's not bad?besides allowing novices to write scraping code at all, it also allows experienced developers to write scraping code with less effort and fewer bugs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Mon Jun 10 00:20:10 2013 From: phd at phdru.name (Oleg Broytman) Date: Mon, 10 Jun 2013 02:20:10 +0400 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <24890DE3-D1CB-451D-8DA1-6F8D5D0DEAC7@yahoo.com> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <87ehcb8dge.fsf@uwakimon.sk.tsukuba.ac.jp> <24890DE3-D1CB-451D-8DA1-6F8D5D0DEAC7@yahoo.com> Message-ID: <20130609222010.GA14125@iskra.aviel.ru> On Sun, Jun 09, 2013 at 02:52:32PM -0700, Andrew Barnert wrote: > There's definitely a case to be made for implementing some kind of Notepad-like heuristics in Python. It would be great to be able to do this at the interactive interpreter: > > line = text.partition('\n')[0] > for encoding in codecs.guess(text)[:10]: > print(encoding, line.decode(encoding)) > > In fact, if you wrote that at pushed it to PyPI I'd start using it today, and maybe even lobbying for its inclusion in the stdlib. Chardet (and variants)? https://pypi.python.org/pypi?%3Aaction=search&term=chardet&submit=search Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From abarnert at yahoo.com Mon Jun 10 00:28:25 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 9 Jun 2013 15:28:25 -0700 (PDT) Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> <51B48BA3.70502@pearwood.info> <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> <636B9BDF-E1E6-4E51-8ADE-513B8B1851BB@yahoo.com> Message-ID: <1370816905.74762.YahooMailNeo@web184704.mail.ne1.yahoo.com> From: Alexander Belopolsky Sent: Sunday, June 9, 2013 3:12 PM >On Sun, Jun 9, 2013 at 4:56 PM, Andrew Barnert wrote: > >Also, consider how much more complicated parsing gets. Instead of just getting the digit value, you also have to get the group that the value comes from and make sure you haven't gotten any digits from an incompatible group yet. Why write, debug, and maintain that code? >This is not that hard because the latest Unicode standard guarantees that decimal digits are "encoded in a contiguous range, with ascending order of Numeric_Value, and with the digit zero as the first code point in the range." (See?4.6 Numeric Value /?Decimal Digits. .) ? In other words, two digits belong to the same group if they have the same ord(x) - int(x) value. It's not _hard_, but certainly it's a lot harder than not doing it: parse_int_digit = int def parse_int_1(s): ? ? i = 0 ? ? for digit in map(parse_int_digit, s): ? ? ? ? i *= 10 ? ? ? ? i += digit ? ? return i def parse_int_2(s): ? ? i = 0 ? ? first_digit_range = None ? ? for digit in map(parse_int_digit, s): ? ? ? ? i *= 10 ? ? ? ? digit_range = ord(c) - digit ? ? ? ? if first_digit_range is None: ? ? ? ? ? ? first_digit_range = digit_range ? ? ? ? elif first_digit_range != digit_range: ? ? ? ? ? ? raise ValueError('Mixed digit ranges at {}: {} vs. {}'.format(digit, digit_range ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , first_digit_range)) ? ? ? ? i += digit ? ? return i I think I got them both right, but I'm less sure I about the second one. It's clearly harder to read and understand. It's also likely to take about twice as long to execute. From tismer at stackless.com Mon Jun 10 00:55:13 2013 From: tismer at stackless.com (Christian Tismer) Date: Mon, 10 Jun 2013 00:55:13 +0200 Subject: [Python-ideas] namedtuple is not as good as it should be Message-ID: <51B507D1.1000001@stackless.com> Sorry Raymond if this offends you. But after some extensive use of namedtuple I think it needs a re-design. The pros: _________ - a namedtuple can be easily used in place of a tuple. - it provides names for its fields and behaves like a tuple, otherwise. Furthermorre: - a named tuple is the natural choice for modelling the records of a database. The cons: - All of the above becomes wrong if you use namedtuple as a real replacement for tuple. - especially for databases it makes little sense as it is. Reason: Pickling ________ - pickling of tuples: always possible if it contains built-in types. tuples are simply tuples. There is just _one_ type. - pickling of namedtuple: sometimes possible, if your definition is static enough. namedtuple has a subtype per tuple layout, and you need to cope with that. Just to be clear about that: Sure, it is possible to pickle named tuples, but you have to think about it! And to have to thing about it trashes a lot of the fun of having those names for free. And typically this happens after you did your analysis of 20 GB of data: You cannot pickle your nicely formatted namedtuple instances after the fact. Actually, to save all the computation, you do a hack that turns all your alive namedtuple instances back into ordinary tuples. Silent implications introduced by namedtuple: _____________________________________________ Without being very explicit, namedtuple makes you use it happily instead of tuples. But instead of using a native, simple type, you now use a not-so-simple, user-defined type. This type - is not built in - has a class definition - needs a global, constant definition _somewhrere_ Typically, you run a script interactively, and typically you need to pickle some __main__.somename namedtuple class instances. This is exactly what you don't want! You want to have some anonymous data in the pickle and don't want to make anything fixed in stone. namedtuple() Factory Function for Tuples with Named Fields __________________________________________________________ It would be great if namedtuple were just this, as the doku says. But instead, it - creates a named class, i.e. forces me to name it - you create instances of that specific class and not just tuple. Usability for databases ______________________ For simple databases which enumerate (employee, salary, ...) or (shoesize, height, married) as example "database"s, namedtuple is ok. As soon as you write a real database implementation with no fixed layout, you get into trouble. Easy database approach: You define a dbtable as a collection of tuples, maybe as a dict with fast index keys. Not a problem with tuples, which are of type tuple. With named tuple, you suddenly see yourself creating namedtuple instead. But those namedtuple records cannot be pickled when used as a replacement of regular tuples, because they now have a dynamically created type, and extra actions are necessary to make it possible to pickle those. From the documentation: EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade') This implicitly suggests that namedtuple is the tool of choice to support databases in Python. Wrong! It supports simple, fixed data structures in Python. For instance, you can use it to define a fixed structural record to define the general description of a database field and its attributes. For a database itself instead, this is very wrong. Nobody uses a fixed class definition to define the structure of some database tables. Instead, you use a generic approach with a row class that describes what a row is, but dynamically. So why all this rant? ___________________ What I'm trying to explain is that namedtuple should be closer to tuple! A namedtuple should not be an explicit class with instances, but a generic subclass of tuple, for all namedtuples. Then, if the user decides to use a namedtuple to build his own class upon that, fine. He then might want to do everything needed to support and pickle his class. But the namedtuple should be a singe (maybe builtin) class that is just a tuple with field names, nothing more. Implementation idea (roughly) _____________________________ Whatever a namedtuple does, it should behave as closely as possible to a tuple, just providing attribute names. Pickling support should be so that the user does not need to know that a namedtuple has a special class. Actually, there should be only a generic class, and the namedtuple "class" is a template instance that just holds the names. Those names could go into some registry or whatever. The only interesting thing about a namedtuple is the set of names used. This set of names is not eligible to enforce the whole import machinery, the associated problems etc. The set of attribute names defines the namedtuple, and that's it. If it is necessary to have class instances like today, ok. But there is no need to search that class in a pickle! Instead, the defining set of attributes could be pickled (uniquely stored by tuple comparison), and the class could be re-created on-the-fly at unpickling time. Conclusion ___________ I love namedtuple, and I hate it. I want to get rid of the second half of this sentence. Let us invent one that does not enforce class behavior. I am thinking of a prototype... cheers - chris p.s.: there is a lot about database design not mentioned here. -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Jun 10 00:57:08 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 9 Jun 2013 15:57:08 -0700 (PDT) Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <20130609222010.GA14125@iskra.aviel.ru> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <87ehcb8dge.fsf@uwakimon.sk.tsukuba.ac.jp> <24890DE3-D1CB-451D-8DA1-6F8D5D0DEAC7@yahoo.com> <20130609222010.GA14125@iskra.aviel.ru> Message-ID: <1370818628.89078.YahooMailNeo@web184702.mail.ne1.yahoo.com> From: Oleg Broytman Sent: Sunday, June 9, 2013 3:20 PM > On Sun, Jun 09, 2013 at 02:52:32PM -0700, Andrew Barnert > wrote: >> There's definitely a case to be made for implementing some kind of > Notepad-like heuristics in Python. It would be great to be able to do this at > the interactive interpreter: >> >> line = text.partition('\n')[0] >> for encoding in codecs.guess(text)[:10]: >> ? ? print(encoding, line.decode(encoding)) >> >> In fact, if you wrote that at pushed it to PyPI I'd start using it > today, and maybe even lobbying for its inclusion in the stdlib. > > ? Chardet (and variants)? > https://pypi.python.org/pypi?%3Aaction=search&term=chardet&submit=search Since this is now the third such reply I've gotten, I'll reply to the list. chardet2 is great. But chardet2 doesn't do Notepad-like heuristics. It doesn't consider your system and OEM charsets, it doesn't understand Microsoft's nonstandard UTF-8 BOM rules, it doesn't detect EBCDIC (or contain the helpful message "This version of OS/2 or MS-DOS does not support EBCDIC. Please contact IBM for support."?but I don't think WRITE.EXE has that message either nowadays?), etc. I sometimes use both chardet2 from the command line, and Notepad or Wordpad, on the same file when trying to puzzle things out. I'd like to have access to both from the interactive interpreter. And I don't think it would be necessary, or likely reasonable, for the MS heuristics to get added to chardet2. And, if it were, I'm not sure what I'd like the API to look like (one function with options?). Which would actually be a problem if we got both into the stdlib, so I probably shouldn't have suggested that. Anyway, sorry for not making that clear in the first place. From alexander.belopolsky at gmail.com Mon Jun 10 01:02:29 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 9 Jun 2013 19:02:29 -0400 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> Message-ID: On Sun, Jun 9, 2013 at 6:15 PM, Andrew Barnert wrote: > > On Jun 9, 2013, at 14:34, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > > My specific advise would be to use a parser that would reject anything other than well-formatted numbers according to the specs for this particular data source. > > > Seriously? That's going to be a couple orders of magnitude slower, and much, much more complicated (and therefore buggy) than just calling float. Even if you need validation for your use case, it's a lot simpler, and faster, to validate then call float, than to parse it manually. > > And the obvious definition of success for this code is that it returns the same thing that validate-and-float would. Validate + float() is a fine strategy, but once you do validation, it is trivial to add .replace('\N{MINUS SIGN}', '-'). I would expect that there is more data on the web where negative numbers are shown in financial format using pantheses than using U+2212. Web scrubbing is just wrong use case to consider when designing a core language feature. Making int('\N{MINUS SIGN}1') valid may improve the language, but not because it will make copy and paste from web to python easier. I think more people are tripped by copying code examples from the web into python session than by copying numerical data. For example, Python's own reference manual has numerous unicode issues in PDF version. (See < http://docs.python.org/3/download.html>.) Ironically, the first problem shows up in the encoding section: "In addition, if the first bytes of the file are the UTF-8 byte-order mark (b?\xef\xbb\xbf?), the declared file encoding is UTF-8." If you copy the BOM string from the manual to python session you get. >>> b?\xef\xbb\xbf? File "", line 1 b?\xef\xbb\xbf? ^ SyntaxError: invalid character in identifier This problem is widespread on the web, but I don't think anyone would argue that python syntax should start accepting all kinds of quotation marks. The proposal to allow int('\N{MINUS SIGN}1') should be considered on its own merit as a language feature and without referring to what can be found in web pages. I think it will do more harm than good because it will introduce more differences between what is accepted by int() and what is accepted by eval() and as a result cause more user confusion than it will resolve. -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Mon Jun 10 01:54:53 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 9 Jun 2013 18:54:53 -0500 Subject: [Python-ideas] namedtuple is not as good as it should be In-Reply-To: <51B507D1.1000001@stackless.com> References: <51B507D1.1000001@stackless.com> Message-ID: On Jun 9, 2013, at 5:55 PM, Christian Tismer wrote: > Without being very explicit, namedtuple makes you use it happily instead of tuples. > But instead of using a native, simple type, you now use a not-so-simple, > user-defined type. > This type > > - is not built in > - has a class definition > - needs a global, constant definition _somewhrere_ > > Typically, you run a script interactively, and typically you need to pickle > some __main__.somename namedtuple class instances. Does you whole issue boil down to this? If you interactively define any function, class, named tuple, or enumeration, the pickle module will only store its name, not its definition. Then the unpickler can't find it later. The essential problem being that data gets stored separately from its definition (similar to the relationship between a database and its ORM models). I don't know if it will help in your case, but one thing I've seen done it is store the definitions and data together. The developments in Python 3 may help in this regard. In Py3.3, you can access (and possibly store) the definition using the _source attribute. And in Py3.4, your will be able to specify the module (see http://bugs.python.org/issue17941 ). Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Mon Jun 10 06:06:07 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 9 Jun 2013 23:06:07 -0500 Subject: [Python-ideas] namedtuple is not as good as it should be In-Reply-To: <51B507D1.1000001@stackless.com> References: <51B507D1.1000001@stackless.com> Message-ID: <37D44DD6-4D59-41A2-A62A-BFB32E0603B0@gmail.com> On Jun 9, 2013, at 5:55 PM, Christian Tismer wrote: > If it is necessary to have class instances like today, ok. But there is no > need to search that class in a pickle! Instead, the defining set of attributes > could be pickled (uniquely stored by tuple comparison), and the class could > be re-created on-the-fly at unpickling time. That is a reasonable wish :-) But, I'm not sure how you propose for it to work. What would you expect from: >>> Soldier = namedtuple('Soldier', ['name', 'rank', 'serial_number']) >>> johnny = Soldier('John Doe', 'Private', 12345) >>> roger = Soldier('Roger', 'Corporal', 67890) >>> fireteam = [johnny, roger] >>> pickletools.dis(pickle.dumps(fireteam)) Would it re-specify the class for every instance (at a cost of both speed and space)? [(nametuple, 'Soldier', ['name', 'rank', 'serial_number'], ('John Doe', 'Private', 12345)), (nametuple, 'Soldier', ['name', 'rank', 'serial_number'], ('Roger', 'Corporal', 67890))] Or would you have a mechanism to specify the names just once? [(nametuple, 'Soldier', ['name', 'rank', 'serial_number']) (Soldier, ('John Doe', 'Private', 12345)), (Soldier, ('Roger', 'Corporal', 67890))] What would you do with a customized named tuples? >>> class PrettySoldier(Soldier): def __repr__(self): return 'N:{0} R:{1} S:{2}'.format(*self) How about enumerations? To pickle the value, Color.red, would you propose for that instance to store the definition of the enumeration as well as its value? Raymond P.S. This was supposed to be part of my previous email, but I had to board a flight. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ryan at ryanhiebert.com Mon Jun 10 06:56:41 2013 From: ryan at ryanhiebert.com (Ryan Hiebert) Date: Sun, 9 Jun 2013 21:56:41 -0700 Subject: [Python-ideas] namedtuple is not as good as it should be In-Reply-To: <37D44DD6-4D59-41A2-A62A-BFB32E0603B0@gmail.com> References: <51B507D1.1000001@stackless.com> <37D44DD6-4D59-41A2-A62A-BFB32E0603B0@gmail.com> Message-ID: <1BE57934-9D37-4C40-A36B-C4CEAAB24F92@ryanhiebert.com> A namedtuple as an easy class has been great. But the way that I had figured namedtuple would work before I figured out how it actually worked was that it would just create it at instantiation. So, there would be no namedtuple template, just a call like this: point = namedtuple(x=1, y=0) And then access could be, in addition to point[0] and point[1], point.x and point.y Since then, with the discussion about using an ordereddict for kwargs, I've realized that that wouldn't be possible, because there wouldn't be a way to get the order of these arguments. But, if it did work, it would make sense that a simple subclass of (my version of a) namedtuple could easily serve the purpose of how namedtuple is constructed, which is to name the fields once, rather than on instantiation. I see also issues with coercing tuples or other sequences to namedtuples using that method, since my ideas for that involve using keyword arguments, but that would mean something different already. As an aside, even though I know why -- because it would require using a list or real tuple to make a namedtuple instance -- it's been weird coercing a tuple into a namedtuple, because it doesn't use the typical convention that passing a sequence into the constructor makes a namedtuple with that sequence. For example, I'd expected this would work like coercion to a list: >>> Point = namedtuple('Point', 'x y') >>> coordinates = (1,0) >>> point = Point(coordinates) Traceback (most recent call last): File "", line 1, in TypeError: __new__() missing 1 required positional argument: 'y' >>> point = Point(*coordinates) >>> point Point(x=1, y=0) >>> list(coordinates) [1, 0] I'm not sure how much value I've added to the conversation, but these are the views of someone who just started actually using namedtuples. Ryan On Jun 9, 2013, at 9:06 PM, Raymond Hettinger wrote: > > On Jun 9, 2013, at 5:55 PM, Christian Tismer wrote: > >> If it is necessary to have class instances like today, ok. But there is no >> need to search that class in a pickle! Instead, the defining set of attributes >> could be pickled (uniquely stored by tuple comparison), and the class could >> be re-created on-the-fly at unpickling time. > > That is a reasonable wish :-) > But, I'm not sure how you propose for it to work. > What would you expect from: > > >>> Soldier = namedtuple('Soldier', ['name', 'rank', 'serial_number']) > >>> johnny = Soldier('John Doe', 'Private', 12345) > >>> roger = Soldier('Roger', 'Corporal', 67890) > >>> fireteam = [johnny, roger] > >>> pickletools.dis(pickle.dumps(fireteam)) > > Would it re-specify the class for every instance (at a cost of both speed and space)? > > [(nametuple, 'Soldier', ['name', 'rank', 'serial_number'], ('John Doe', 'Private', 12345)), > (nametuple, 'Soldier', ['name', 'rank', 'serial_number'], ('Roger', 'Corporal', 67890))] > > Or would you have a mechanism to specify the names just once? > > [(nametuple, 'Soldier', ['name', 'rank', 'serial_number']) > (Soldier, ('John Doe', 'Private', 12345)), > (Soldier, ('Roger', 'Corporal', 67890))] > > What would you do with a customized named tuples? > > >>> class PrettySoldier(Soldier): > def __repr__(self): > return 'N:{0} R:{1} S:{2}'.format(*self) > > How about enumerations? To pickle the value, Color.red, would you propose > for that instance to store the definition of the enumeration as well as its value? > > > > Raymond > > > P.S. This was supposed to be part of my previous email, but I had to board a flight. > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4142 bytes Desc: not available URL: From ncoghlan at gmail.com Mon Jun 10 07:14:29 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 10 Jun 2013 15:14:29 +1000 Subject: [Python-ideas] Ordering keyword dicts Message-ID: (migrating thread from python-dev to python-ideas) On 10 June 2013 13:13, Alexander Belopolsky wrote: > > On Sun, May 19, 2013 at 1:47 AM, Guido van Rossum wrote: >> >> I'm slow at warming up to the idea. My main concern is speed -- since >> most code doesn't need it and function calls are already slow (and >> obviously very common :-) it would be a shame if this slowed down >> function calls that don't need it noticeably. > > > Here is an idea that will not affect functions that don't need to know the > order of keywords: a special __kworder__ local variable. The use of this > variable inside the function will signal compiler to generate additional > bytecode to copy keyword names from the stack to a tuple and save it in > __kworder__. With that feature, an OrderedDict constructor, for example > can be written as > > def odict(**kwargs): > return OrderedDict([(key, kwargs[key]) for key in __kworder__]) The problem is that this is too late to help. To help folks understand the *technical* (rather than conceptual) limitations that are at issue in only *sometimes* ordering the keyword arguments here's an overview of the way the binding of arguments to parameters currently works: 1. In the calling bytecode, the arguments are collected together on the stack as positional arguments and keyword arguments: >>> def f(): ... return call(1, 2, *(3, 4), x=1, y=2, **{'z':3}) ... >>> dis(f) 2 0 LOAD_GLOBAL 0 (call) 3 LOAD_CONST 1 (1) 6 LOAD_CONST 2 (2) 9 LOAD_CONST 3 ('x') 12 LOAD_CONST 1 (1) 15 LOAD_CONST 4 ('y') 18 LOAD_CONST 2 (2) 21 LOAD_CONST 8 ((3, 4)) 24 BUILD_MAP 1 27 LOAD_CONST 5 (3) 30 LOAD_CONST 7 ('z') 33 STORE_MAP 34 CALL_FUNCTION_VAR_KW 514 (2 positional, 2 keyword pair) 37 RETURN_VALUE The various "CALL_FUNCTION*" opcodes in CPython almost always end up passing through a snippet like the following in ceval.c [1,2] (there are a couple of exceptions related to optimisation of calls that only involve positional arguments and parameters): if (PyCFunction_Check(func)) { PyThreadState *tstate = PyThreadState_GET(); C_TRACE(result, PyCFunction_Call(func, callargs, kwdict)); } else result = PyObject_Call(func, callargs, kwdict); [1] http://hg.python.org/cpython/file/default/Python/ceval.c#l4389 [2] http://hg.python.org/cpython/file/default/Python/ceval.c#l4484 You can see a couple of things here: * That *any* collection of arguments, regardless of syntax, is reduced to a positional argument tuple and a keyword argument dictionary before handing it over to the callable to handle the binding of arguments to parameters. * That this applies even to the optimised fast path that lets functions implemented in C avoid some of the overhead associated with the method dispatch machinery. 2. In the called object, the supplied arguments are bound appropriately as parameters. There's a lot more variability in how this happens. C functions usually use PyArg_ParseTuple or PyArg_ParseTupleAndKeywords. Python functions handle it as an implicit part of the frame initialization based on the function metadata. Guido's speed concern is specifically with any approach which requires the calling code to *always* check callables to see whether they want an ordered dictionary or not. Before we proceed to considering more exotic design ideas that require an "ordered or not" decision at the call sites, this concern should be validated by someone trying it out and checking the macro benchmark suite for the consequences. This can be done without actually making it possible to define functions that require order preservation - since we're mostly interested in the performance consequences when such a flag *isn't* set on the callables, an extra redundant check against "PyCFunction_GET_FLAGS(func) & 0x8000" (for PyCFunction instances), falling back to something like "PyObject_GetAttrId(__name__)" (for everything else), for all code paths in ceval.c that lead to creation of a kwdict instance should suffice. It may be that the cost of the flag check is swamped by the cost of actually creating the keyword dictionary, in which case the runtime check would be a preferable design choice, since function *invocation* wouldn't need to change, only function declarations. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan at drees.name Mon Jun 10 07:33:01 2013 From: stefan at drees.name (Stefan Drees) Date: Mon, 10 Jun 2013 07:33:01 +0200 Subject: [Python-ideas] Line continuations with comments In-Reply-To: <51B4DD11.4030006@stackless.com> References: <519D51AF.1080505@gmail.com> <51B4DD11.4030006@stackless.com> Message-ID: <51B5650D.2050708@drees.name> On 2013-06-09 21:52, Christian Tismer wrote: > On 23.05.13 01:15, Ron Adam wrote: >> >> I'm not sure why some people dislike the back slash so much as a >> continuation tool. Python is the language that avoids using {braces >> around blocks}, so you'd think it would be the other way around. >> >> I really don't think the '\' will be over used. New programmers do >> try a lot of strange things at first for making their own programs >> easier to read, (I did), but usually they come around to what the >> community practices and recommends. >> > > I think the reason that people don't like the '\' is that it opens up so > many > bad memories: > > - having to use them inside C macros > - having to deal with it on windows > - felt unpythonic because it is a low-level trick to produce longer lines > > Especially the unpythonic feeling comes from the fact that '\' either has > special meaning in strings (which is accepted) or has very restricted use > outside of strings: > > there must not be anything but the line end after it. > > So by lifting the usability of '\' to allow comments after it makes it from > an ugly left-over C feature into something useful that people _want_ to use > for splitting the continuation of something from the comments. > > I think after introducing it, the bad prejudice will vanish. > > The mental change is from "a hack because the line doesn't fit" > into "a well-formed structuring tool to split code and comment"... there are two other reasons that jump into sight: - as continuation signal in shell: who has not been bitten by an innocent space following the backslash? ... and it's not the space we dislike ;-) - on my console I use three fingers to produce this one symbol "\" eg. so I even might visit trigraph-land again ??/ All the best, Stefan. > cheers - chris > >> >>> >>> >>> The reason \# works, but not \ #, is when the comment comes directly >>> after the back slash, it's removed and leaves a (backslash + >>> new-line) >>> pair. >>> >>> >>> Whether we require there be no space between \ and # or, conversely, >>> require there be at least one whitespace character should not be >>> based on >>> the relative ease of patching the current code. Personally, I would >>> prefer >>> the latter as I believe requiring a space after \ will increase >>> readability. >> >> My preference is to allow any number of spaces before and after the >> '\'. and that any comments after the slash not change what it means. >> >> A built in single space rule would probably frustrate people who don't >> want to do it that way. Like how you get a character after a >> continuation error, even if it's only a single space. Yeah, it's how >> it works, but it's still annoying to get that error in that situation. >> >> A comment of course, would still uses up the rest of the line. So a >> '\' after '#' is just part of the comment. >> >> >> Cheers, >> Ron >> >> >> >> >> >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > From alexander.belopolsky at gmail.com Mon Jun 10 07:46:51 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 10 Jun 2013 01:46:51 -0400 Subject: [Python-ideas] Ordering keyword dicts In-Reply-To: References: Message-ID: On Mon, Jun 10, 2013 at 1:14 AM, Nick Coghlan wrote: > > The various "CALL_FUNCTION*" opcodes in CPython almost always end up > passing through a snippet like the following in ceval.c [1,2] (there > are a couple of exceptions related to optimisation of calls that only > involve positional arguments and parameters): > > if (PyCFunction_Check(func)) { > PyThreadState *tstate = PyThreadState_GET(); > C_TRACE(result, PyCFunction_Call(func, callargs, kwdict)); > } > else > result = PyObject_Call(func, callargs, kwdict); This is correct for calling functions implemented in C, but python function calls go through a special treatment: if (PyFunction_Check(func)) x = fast_function(func, pp_stack, n, na, nk); Eventually, this gets to PyEval_EvalCodeEx, where keyword arguments are still ordered. I believe this would be the place where a new co_* flag can be checked and order preserved. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jun 10 08:07:41 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 10 Jun 2013 16:07:41 +1000 Subject: [Python-ideas] Ordering keyword dicts In-Reply-To: References: Message-ID: On 10 June 2013 15:46, Alexander Belopolsky wrote: > > On Mon, Jun 10, 2013 at 1:14 AM, Nick Coghlan wrote: >> >> The various "CALL_FUNCTION*" opcodes in CPython almost always end up >> passing through a snippet like the following in ceval.c [1,2] (there >> are a couple of exceptions related to optimisation of calls that only >> involve positional arguments and parameters): >> >> if (PyCFunction_Check(func)) { >> PyThreadState *tstate = PyThreadState_GET(); >> C_TRACE(result, PyCFunction_Call(func, callargs, kwdict)); >> } >> else >> result = PyObject_Call(func, callargs, kwdict); > > This is correct for calling functions implemented in C, but python function > calls go through a special treatment: > > if (PyFunction_Check(func)) > x = fast_function(func, pp_stack, n, na, nk); > > Eventually, this gets to PyEval_EvalCodeEx, where keyword arguments are > still ordered. I believe this would be the place where a new co_* flag can > be checked and order preserved. There are *many* ways to invoke a Python function which won't go through fast_function. Either there's a flag on callables for the runtime to check to see if order needs to be preserved, or else there needs to be an explicit syntactic marker at the call site to indicate that order should be preserved. Whichever approach is being considered, the other problem with the sometimes-ordered-sometimes-not issue is how to avoid having wrappers implicitly discard the ordering, regardless of whether the request for ordering is "outside in" (requested at the call site), "inside out" (requested in the function definition) or both. A new language could easily just define keyword arguments as ordered and accept the resulting performance hit. Retrofitting ordered keyword arguments to an existing language without substantially hurting performance of an already slow operation, while still making the feature usable enough to be attractive is a much harder problem. "Just add an ordered dict literal instead" is tempting, but then nice spellings for that start looking like they could be made more general :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tismer at stackless.com Mon Jun 10 08:28:52 2013 From: tismer at stackless.com (Christian Tismer) Date: Mon, 10 Jun 2013 08:28:52 +0200 Subject: [Python-ideas] namedtuple is not as good as it should be In-Reply-To: References: <51B507D1.1000001@stackless.com> Message-ID: <51B57224.3070205@stackless.com> Hi Raymond, On 10.06.13 01:54, Raymond Hettinger wrote: > > On Jun 9, 2013, at 5:55 PM, Christian Tismer > wrote: > >> Without being very explicit, namedtuple makes you use it happily >> instead of tuples. >> But instead of using a native, simple type, you now use a not-so-simple, >> user-defined type. >> This type >> >> - is not built in >> - has a class definition >> - needs a global, constant definition _somewhrere_ >> >> Typically, you run a script interactively, and typically you need to >> pickle >> some __main__.somename namedtuple class instances. > > Does you whole issue boil down to this? If you interactively define > any function, class, named tuple, or enumeration, the pickle module > will only store its name, not its definition. Then the unpickler > can't find it later. > No, this is just part of it, and just distracting a bit: If you start to implement things, namedtuple is great. When you implement your structure definitions with namedtuple, all good. But when you continue and create tables, which are typically not having a static class definition and you come from the dynamic of tuples, the problem suddenly becomes very visible. The whole issue is that a dynamically usable structure like tuple suddenly becomes statically defined, and you need to cope with that. The problem is not per se: Sure I know how pickling works and what I have to do to pickle anything. But names for tuples is such a small thing that is not worth to need to dig into that whole pickling issue, because all the reasons why pickling of classes is done by module lookup do not apply here, at least not for a native named tuple: I don't need to subclass and invent fancy methods in the first place. When I derive a class from a tuple, well ok. I asked for it, and I need to cope with it. But namedtuple as a tool pretty much pretends to be almost a builtin and therefore should not introduce the pickling problem, which is not there when you use tuples instead. > The essential problem being that data gets stored separately > from its definition (similar to the relationship > between a database and its ORM models). > > I don't know if it will help in your case, but one thing I've seen done > it is store the definitions and data together. The developments > in Python 3 may help in this regard. In Py3.3, you can access > (and possibly store) the definition using the _source attribute. > And in Py3.4, your will be able to specify the module > (see http://bugs.python.org/issue17941 ). > I was actually reading about that discussion of the separation issue, the enum discussion as well, and that brought me back to the tuples. Although I use Py3.3, the _source attribute was not obvious, yet. Actually I hoped for the Py3.4 solution of http://bugs.python.org/issue17941 It also related to the sys._getframe() issue that needs to live on for a while because of the implied need to take care of __module__ in enum and namedtuple. I think that we need to go this route at all is a fundamental misconception and goes into the wrong direction. I think defining constants like namedtuple or enum is not a matter for classes, but more like a _prototype_, conceptually. This prototype needs to live somewhere to be used as a blueprint. Reconstruction should not need to require a static class definition, but a simple prototype tuple, which can be pickled. (this is not solving the problem, but just trying to change how we think of it.) cheers - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Mon Jun 10 08:33:22 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 10 Jun 2013 00:33:22 -0600 Subject: [Python-ideas] Ordering keyword dicts In-Reply-To: References: Message-ID: On Jun 9, 2013 11:15 PM, "Nick Coghlan" wrote: > Guido's speed concern is specifically with any approach which requires > the calling code to *always* check callables to see whether they want > an ordered dictionary or not. Before we proceed to considering more > exotic design ideas that require an "ordered or not" decision at the > call sites, this concern should be validated by someone trying it out > and checking the macro benchmark suite for the consequences. > > This can be done without actually making it possible to define > functions that require order preservation - since we're mostly > interested in the performance consequences when such a flag *isn't* > set on the callables, an extra redundant check against > "PyCFunction_GET_FLAGS(func) & 0x8000" (for PyCFunction instances), > falling back to something like "PyObject_GetAttrId(__name__)" (for > everything else), for all code paths in ceval.c that lead to creation > of a kwdict instance should suffice. > > It may be that the cost of the flag check is swamped by the cost of > actually creating the keyword dictionary, in which case the runtime > check would be a preferable design choice, since function *invocation* > wouldn't need to change, only function declarations. I'm actually doing some of this checking incidentally as part of testing my OrderedDict implementation. I'm using a decorator to set the flag. Don't have anything conclusive to say yet. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Jun 10 08:47:46 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 9 Jun 2013 23:47:46 -0700 Subject: [Python-ideas] namedtuple is not as good as it should be In-Reply-To: <51B57224.3070205@stackless.com> References: <51B507D1.1000001@stackless.com> <51B57224.3070205@stackless.com> Message-ID: <8749021F-9A6F-4F9F-B937-4DE9FE43FB83@yahoo.com> On Jun 9, 2013, at 23:28, Christian Tismer wrote: > But when you continue and create tables, which are typically not having > a static class definition and you come from the dynamic of tuples, > the problem suddenly becomes very visible. > > The whole issue is that a dynamically usable structure like tuple suddenly > becomes statically defined, and you need to cope with that. If your structure is dynamic, shouldn't you be using mapping syntax rather than attribute syntax anyway? After all, you can't write row.last_name unless you know last_name statically, and you wouldn't want to write getattr(row, namefield) in place of row[namefield]. In other words, I think you just want OrderedDict here, or maybe something like sqlite3.Row. Also, tuples themselves are normally used for static structures?you know you're getting back 4 values, and tup[1] is the last name, etc. Of course you _can_ use them just as "frozen lists", but that's not the paradigm case that we should be looking to extend to new types. I realize that if you're building an ORM library, or a bridge library like appscript or win32com, you need to create things dynamically that will be used as static objects by the client code, even though you don't know the structure inside the library until runtime. But I think that's a special case. From alexander.belopolsky at gmail.com Mon Jun 10 09:09:10 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 10 Jun 2013 03:09:10 -0400 Subject: [Python-ideas] Ordering keyword dicts In-Reply-To: References: Message-ID: On Mon, Jun 10, 2013 at 2:33 AM, Eric Snow wrote: > > > On Jun 9, 2013 11:15 PM, "Nick Coghlan" wrote: > > > It may be that the cost of the flag check is swamped by the cost of > > actually creating the keyword dictionary, in which case the runtime > > check would be a preferable design choice, since function *invocation* > > wouldn't need to change, only function declarations. > > I'm actually doing some of this checking incidentally as part of testing my > OrderedDict implementation. I'm using a decorator to set the flag. Don't > have anything conclusive to say yet. Let's not duplicate the effort. I started experimenting with a patch on top of issue16991 which is essentially this: --- a/Python/ceval.c Mon Jun 10 02:24:00 2013 -0400 +++ b/Python/ceval.c Mon Jun 10 02:58:38 2013 -0400 @@ -3378,7 +3378,7 @@ /* Parse arguments. */ if (co->co_flags & CO_VARKEYWORDS) { - kwdict = PyDict_New(); + kwdict = PyODict_New(); if (kwdict == NULL) goto fail; i = total_args; @@ -3441,7 +3441,7 @@ keyword); goto fail; } - PyDict_SetItem(kwdict, keyword, value); + PyODict_SetItem(kwdict, keyword, value); continue; kw_found: if (GETLOCAL(j) != NULL) { This was enough to get >>> def f(**kw): return kw ... >>> f(a=2, b=3) OrderedDict([('a', 2), ('b', 3)]) In this particular code path, checking for another flag should not cost anything in performance: even if compiler will not optimize it completely the flags will be in a register and at most the other check will cost a few CPU cycles and with no memory access. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: odict-kw.diff Type: application/octet-stream Size: 976 bytes Desc: not available URL: From tismer at stackless.com Mon Jun 10 09:45:35 2013 From: tismer at stackless.com (Christian Tismer) Date: Mon, 10 Jun 2013 09:45:35 +0200 Subject: [Python-ideas] namedtuple is not as good as it should be In-Reply-To: <8749021F-9A6F-4F9F-B937-4DE9FE43FB83@yahoo.com> References: <51B507D1.1000001@stackless.com> <51B57224.3070205@stackless.com> <8749021F-9A6F-4F9F-B937-4DE9FE43FB83@yahoo.com> Message-ID: <51B5841F.5060706@stackless.com> On 10.06.13 08:47, Andrew Barnert wrote: > On Jun 9, 2013, at 23:28, Christian Tismer wrote: > >> But when you continue and create tables, which are typically not having >> a static class definition and you come from the dynamic of tuples, >> the problem suddenly becomes very visible. >> >> The whole issue is that a dynamically usable structure like tuple suddenly >> becomes statically defined, and you need to cope with that. > If your structure is dynamic, shouldn't you be using mapping syntax rather than attribute syntax anyway? After all, you can't write row.last_name unless you know last_name statically, and you wouldn't want to write getattr(row, namefield) in place of row[namefield]. > > In other words, I think you just want OrderedDict here, or maybe something like sqlite3.Row. Sure there are alternatives. Surely not OrderedDict. I have hundreds of columns in a table with a billion rows. This is what still fits into memory for a very fast readonly database that I generate instead of using a real DB monster. namedtuple is just a convenience that I used to play interactively with some records, and it turned out to be a showstopper in the end. > Also, tuples themselves are normally used for static structures?you know you're getting back 4 values, and tup[1] is the last name, etc. Of course you _can_ use them just as "frozen lists", but that's not the paradigm case that we should be looking to extend to new types. I am talking of rows as tuples. That is the right way to use tuples. > I realize that if you're building an ORM library, or a bridge library like appscript or win32com, you need to create things dynamically that will be used as static objects by the client code, even though you don't know the structure inside the library until runtime. But I think that's a special case. I don't need advice how to build a database. I want the namedtuple to improve, and be able to switch their use on and off without having to think. That was my problem: I wanted to try alternative implementations, and there namedtuple helps very much at first sight but then creates a problem that you don't expect. ciao - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From victor.stinner at gmail.com Mon Jun 10 11:34:45 2013 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 10 Jun 2013 11:34:45 +0200 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: Message-ID: 2013/6/9 anatoly techtonik : > For a cross-platform language, as a programmer, you're responsible to > detect the particular feature of operating It is not how Python is designed. Python tries to remove some minor differencies betwen operating systems, but it cannot remove all differencies. For example, the mmap module has an API different on UNIX and on Windows: http://docs.python.org/3/library/mmap.html#mmap.mmap Python stays close to the operating system for best performances. If you would like a real portable language with a well defined behaviour, you may develop libraries on top of Python and its stdlib. Slowly, Python begins to integrate such libraries to have an higher level API. shutil is based on the os module for example, and provide an higher level API. >> Just one example: configure script generates a Makefile using the locale >> encoding, Python gets data from Makefile. If you use a path with non-ascii >> character, use utf-8 in python whereas the locale is iso-8859-1, python >> cannot be compiled anymore or will refuse to start. > > I am not a C developer, but as SCons committer I don't know Python tools > that directly work with Makefiles. It is just one example. Please read the old thread of python-dev for other examples. > This choice also breaks key Unix principle of doing one thing good, because > it is not the responsibility of open() call to determine system encoding. Most applications on all platforms use the locale encoding. Applications do not have to "guess" the encoding, it's simple and reliable to get it (ex: sys.getfilesystemencoding() in Python). >> When i made the encoding mandatory in my test, more than 70% of calls to >> open() used encoding="locale". So it's simpler to keep the current default >> choice. > > How many systems have you covered? I ran tests on Mac OS X, FreeBSD, Windows, Solaris, Linux. I got errors in the test suite, so no need to copy a file from a platform to another to get issues. I'm not sure that you understood the real problem: in short, using the locale encoding provides the best compliance and cause less bugs (mojibake) than any other encoding. Anyway, as written in the python-dev thread, and as Nick repeated: the default vale of encoding parameter of open() is not gonna to change in Python 3. Victor From mal at egenix.com Mon Jun 10 14:08:47 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 10 Jun 2013 14:08:47 +0200 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> Message-ID: <51B5C1CF.9030006@egenix.com> On 09.06.2013 18:47, ?ukasz Langa wrote: > On 9 cze 2013, at 12:52, M.-A. Lemburg wrote: > >> There's no support for interpretation of minus and plus signs, >> or decimal dots, except for the ASCII ones. Support for these >> would have to be added to number parsing code. >> >> Here's the code range for mathematical operators: >> >> http://www.unicode.org/charts/PDF/U2200.pdf > > The reason I expect those to be handled properly is that they are totally unambiguous. Unless anyone can point me to a case where \N{MINUS SIGN} should not be treated as a (duh) minus sign, we should go and try to make life easier for our users by adopting at least a few of such characters. We can flesh out which ones in Issue 6632 if there's general agreement that we can. I believe we do. > > More importantly, this is not a theoretical musing. Wikipedia started serving numerical data with \N{MINUS SIGN} instead of 0x2D, for instance on climate tables. I think we'll see increased usage of such characters in the wild. FWIW, I'm +1 on adding support for the minus code point, since it's the correct correspondent to the plus code point in Unicode. The traditional ASCII "-" is a compromise between a mathematical minus and a hyphen. https://en.wikipedia.org/wiki/Hyphen-minus While we're at it, we should probably also include the FULLWIDTH PLUS SIGN (U+FF0B) and SUPERSCRIPT PLUS SIGN (U+207A) as alternative for the plus sign, and additionally the SUPERSCRIPT MINUS (U+207B) as alternative for the minus sign. http://en.wikipedia.org/wiki/Minus_sign -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 10 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 21 days to go 2013-07-16: Python Meeting Duesseldorf ... 36 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tismer at stackless.com Mon Jun 10 14:37:48 2013 From: tismer at stackless.com (Christian Tismer) Date: Mon, 10 Jun 2013 14:37:48 +0200 Subject: [Python-ideas] namedtuple is not as good as it should be In-Reply-To: <37D44DD6-4D59-41A2-A62A-BFB32E0603B0@gmail.com> References: <51B507D1.1000001@stackless.com> <37D44DD6-4D59-41A2-A62A-BFB32E0603B0@gmail.com> Message-ID: <51B5C89C.10404@stackless.com> Hey Raymond, On 10.06.13 06:06, Raymond Hettinger wrote: > > On Jun 9, 2013, at 5:55 PM, Christian Tismer > wrote: > >> If it is necessary to have class instances like today, ok. But there >> is no >> need to search that class in a pickle! Instead, the defining set of >> attributes >> could be pickled (uniquely stored by tuple comparison), and the class >> could >> be re-created on-the-fly at unpickling time. > > That is a reasonable wish :-) > But, I'm not sure how you propose for it to work. I'm not sure about this, yet. It just hit me because I'm fiddling with DB issues, where I use identity dicts to fold billions of data records, so I'm thinking a bit like: There could be a dict somewhere which holds all the seen namedtuple names in a mapping that memorizes them all. This is a bit similar to interning. You could have named tuple fields without actually naming the tuple. But let us for now assume namedtuple has a name, still. ;-) > What would you expect from: > > >>> Soldier = namedtuple('Soldier', ['name', 'rank', 'serial_number']) > >>> johnny = Soldier('John Doe', 'Private', 12345) > >>> roger = Soldier('Roger', 'Corporal', 67890) > >>> fireteam = [johnny, roger] > >>> pickletools.dis(pickle.dumps(fireteam)) > > Would it re-specify the class for every instance (at a cost of both > speed and space)? > > [(nametuple, 'Soldier', ['name', 'rank', 'serial_number'], ('John > Doe', 'Private', 12345)), > (nametuple, 'Soldier', ['name', 'rank', > 'serial_number'], ('Roger', 'Corporal', 67890))] > No, of course not! > Or would you have a mechanism to specify the names just once? > > [(nametuple, 'Soldier', ['name', 'rank', 'serial_number']) > (Soldier, ('John Doe', 'Private', 12345)), > (Soldier, ('Roger', 'Corporal', 67890))] > Rough idea: The names would be specified once, like now. The defining string tuple goes somewhere in a central store that keeps the named tuples, identified by exactly the tuple of names. The name of a namedtuple is just a handle. So the tuple of names can be used to find its definition in a global store for namedtuple prototypes, and the associated class can be looked up there. I think also the generated class can be cached there. But this can all be done on demand, because there is only one implementation for one tuple of names. By having such a registry thing under the hood, the user sees just tuples of names anonymously and does not need to think of a class at all. This makes the namedtuple very much like a tuple, again. > What would you do with a customized named tuples? > > >>> class PrettySoldier(Soldier): > def __repr__(self): > return 'N:{0} R:{1} S:{2}'.format(*self) > If you want to build a custom named tuple, you would get the basic class out of the prototype store and then derive your own methods. Sure, we are then back on field one. ;-) --- But: maybe the idea really can be extended, because we are still defining something decorative for constants, just with some modifications. Maybe namedtuple should have its own metaclass that copes with registering the templates of some derived PrettyTuple. > How about enumerations? To pickle the value, Color.red, would you propose > for that instance to store the definition of the enumeration as well > as its value? > I'm not sure about enums, yet. Is an enum as dynamic as the field names of a generic database table? Enum is different in that you want to have certain constants being of a certain type, and two similar definitions are intentionally not compatible by default. So I think the name of an enum is important. Named tuples are still tuples and compatible. They are just a bit more specific since they have optional names. They suffer right now from the fact that their class-ness suddenly jumps into your face when you need to store them. Some implementation ------------------ The function namedtuple(name, fields) right now produces a namedtuple class and returns that. Instead of creating the class immediately, the class is looked up in a cache that stores the created namedtuple classes once. Creation only happens if a tuple of names is not found in the cache. Pickling of a namedtuple class does not pickle the class but calls namedtuple with arguments, as you wrote above: [(nametuple, 'Soldier', ['name', 'rank', 'serial_number']) (Soldier, ('John Doe', 'Private', 12345)), (Soldier, ('Roger', 'Corporal', 67890))] We could maybe get further and have a "namelesstuple" which is simply identified by tuple identity. The shape ('name', 'rank', 'serial_number') is made unique by having a dict like the interneds dict. We have one such dict that holds our tuples, like I use: class _UniqeDict(dict): ''' A dict that stores its records and returns a unique version of that key' Usage: str_val = unique[str_val] ''' def __missing__(self, key): self[key] = key return key unique = _UniqeDict() So namelesstuple(*seq) does - check if the seq is already an interned tuple, which then can continue without any analysis, or does all the magic to create a namelesstuple class. - a second dict maps from {tuple: namelessclass} where the keys are from unique. Now we have tuples that can be identified just by their structure and don't need a name. Soldier = namelesstuple('name', 'rank', 'serial_number') You can now use soldier to create many such instances. s1 = Soldier('John Doe', 'Private', 12345) s2 = Soldier('Roger', 'Corporal', 67890) Then the repr: >>> s1 (name='John Doe', rank='Private', serial_number=12345) >>> soldier namlesstuple('name', 'rank', 'serial_number') I think that approach also meets the idea of Ryan. I have just written a very rough implementation as a proof of concept. will send it later. cheers - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Jun 10 14:40:22 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 10 Jun 2013 21:40:22 +0900 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <87ehcb8dge.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <878v2i87gp.fsf@uwakimon.sk.tsukuba.ac.jp> Yuval Greenfield writes: > Living in Israel - Hebrew compatibility has been the nuisance and > these are the encodings I had to fight:utf-8, ucs-2, utf-16, ucs-4, > ISO-8859-8,?ISO-8859-8-I,?Windows-1255. It's plagued websites, > browsers, email clients, adobe photoshop and premiere, excel, word, > and powerpoint. You have my sympathies. Russia is just as bad, and Japan, well, Japan *invented* mojibake. When I first got here in 1990, I was *triple* booting to deal with the charset insanity. At least for Hebrew, the encodings you're likely to encounter in plain text divide into two groups (UTF-8 and ISO Hebrew-like), and the latter can probably mostly be read with cat(1) (or DOS "type"). > Perhaps you guys are used to more os-encoding-abiding applications > and value that quality. I can't speak for others, but I live in the home country of charset self-abuse, and have been dealing with it for more than 20 years. Even today, *most* users here are in environments where everybody they share files with has the same default encoding and it is *not* UTF-8 (mostly Shift JIS, aka cp932). There's another big group (Mac users) who do use UTF-8, plus the odd Linux/*BSD/whatever users, who mostly default to UTF-8. The charset issues[1] have put a fair amount of pressure on the Mac users. Problems are frequent, I just don't think it's a good idea for Python to default to UTF-8 yet. > I just wish we can get rid of these problems for good, and > promoting utf-8 everywhere is one way to go about it. I believe that attempting to promote it by making it Python's default will have a much bigger (negative) effect on Python's popularity than it will have a (positive) effect on UTF-8 usage. UTF-8 *is* the future. Only Microsoft disagrees, and that doesn't really matter because Microsoft's plan for world domination involves proprietary binary file formats rather than text files in a standard encoding. So if you use it wherever possible in your programs, explain to your correspondents why you do that, and help them in the (rarer and rarer) cases where it gives them a problem, you will be doing a great service to promote UTF-8. The problem with Python doing the same thing is that it's going to embarrass programmers with deadlines to meet in front of their bosses, and they won't have you, me, or Guido to hold their hands and explain to the bosses. Neither the programmers nor the bosses are going *raise* their evaluations of Python in such cases. On the other hand, problems with conflicting defaults across systems are "business as usual", and nobody's going to blame Python for that. Footnotes: [1] And font issues; MS Office files tend to look and print poorly on the Mac due to differences in font rendering. From ericsnowcurrently at gmail.com Mon Jun 10 16:22:19 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 10 Jun 2013 08:22:19 -0600 Subject: [Python-ideas] Ordering keyword dicts In-Reply-To: References: Message-ID: On Mon, Jun 10, 2013 at 1:09 AM, Alexander Belopolsky wrote: > I started experimenting with a patch on top > of issue16991 which is essentially this: > > > --- a/Python/ceval.c Mon Jun 10 02:24:00 2013 -0400 > +++ b/Python/ceval.c Mon Jun 10 02:58:38 2013 -0400 > @@ -3378,7 +3378,7 @@ > > /* Parse arguments. */ > if (co->co_flags & CO_VARKEYWORDS) { > - kwdict = PyDict_New(); > + kwdict = PyODict_New(); > if (kwdict == NULL) > goto fail; > i = total_args; > @@ -3441,7 +3441,7 @@ > keyword); > goto fail; > } > - PyDict_SetItem(kwdict, keyword, value); > + PyODict_SetItem(kwdict, keyword, value); > continue; > kw_found: > if (GETLOCAL(j) != NULL) { > > This was enough to get > >>>> def f(**kw): return kw > ... >>>> f(a=2, b=3) > OrderedDict([('a', 2), ('b', 3)]) That is pretty cool. > > In this particular code path, checking for another flag should not cost > anything in performance: even if compiler will not optimize it completely > the flags will be in a register and at most the other check will cost a few > CPU cycles and with no memory access. Good. From what I could see, there would be 7 calls that would change: PyEval_EvalCodeEx() PyDict_New() PyDict_SetItem() update_keyword_args() PyDict_New() PyDict_Copy() PyDict_SetItem() ext_do_call() PyDict_New() PyDict_Update() Each of these would switch on the flag on the function object. -eric From tismer at stackless.com Mon Jun 10 17:27:40 2013 From: tismer at stackless.com (Christian Tismer) Date: Mon, 10 Jun 2013 17:27:40 +0200 Subject: [Python-ideas] namedtuple is not as good as it should be In-Reply-To: <51B5C89C.10404@stackless.com> References: <51B507D1.1000001@stackless.com> <37D44DD6-4D59-41A2-A62A-BFB32E0603B0@gmail.com> <51B5C89C.10404@stackless.com> Message-ID: <51B5F06C.3010706@stackless.com> Hi Raymond, I have written a simple implementation of something like a namelesstuple, as a POC. I did not make generic tuple classes, yet, there is no class generator. It shows that the result of pickling is quite compact and has only a single helper function to restore the pickled objects. What needs to be implemented is - the unique dict that holds all fieldname tuples - the class store that hold all generated classes. And of course the features that make namedtuple nice. ;-) What I wanted to see is how efficiently this stuff can be pickled. ------------------------------------------------------------ # namelesstuple # proof of concept for nametuple implementation using # the identity of the tuple of field names. from operator import itemgetter as _itemgetter _tuple = tuple _repr_template = '{name}=%r' class _nlt1(tuple): __slots__ = () _fields = ('eins', 'zwei', 'drei') def __new__(_cls, args): return _tuple.__new__(_cls, args) eins = property(_itemgetter(0), doc='Alias for field number 0') zwei = property(_itemgetter(1), doc='Alias for field number 1') drei = property(_itemgetter(2), doc='Alias for field number 2') def __repr__(self): 'Return a nicely formatted representation string' fmt = '{}=%r' return self._repr_fmt % self _repr_fmt = ', '.join(_repr_template.format(name=name) for name in _fields) _repr_fmt = '(%s)' % _repr_fmt def __reduce__(self): return (rebuild_namelesstuple, (tuple(self), self._fields)) class _nlt2(tuple): __slots__ = () _fields = ('alpha', 'beta', 'gamma') def __new__(_cls, args): return _tuple.__new__(_cls, args) alpha = property(_itemgetter(0), doc='Alias for field number 0') beta = property(_itemgetter(1), doc='Alias for field number 1') gamma = property(_itemgetter(2), doc='Alias for field number 2') def __repr__(self): 'Return a nicely formatted representation string' fmt = '{}=%r' return self._repr_fmt % self _repr_fmt = ', '.join(_repr_template.format(name=name) for name in _fields) _repr_fmt = '(%s)' % _repr_fmt def __reduce__(self): return (rebuild_namelesstuple, (tuple(self), self._fields)) def rebuild_namelesstuple(args, fields): # just a dummy for testing pickle if fields[0] == 'eins': return _nlt1(args) else: return _nlt2(args) # quick check insts = [ _nlt1((2, 3, 5)), _nlt1((7, 8, 9)), _nlt2((2, 3, 5)), _nlt2((101, 102, 103)), ] import pickletools, pickle x=pickle.dumps(insts) pickletools.dis(x) pickle.loads(x) -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From grosser.meister.morti at gmx.net Mon Jun 10 18:03:59 2013 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Mon, 10 Jun 2013 18:03:59 +0200 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <1370762563.71270.YahooMailNeo@web184706.mail.ne1.yahoo.com> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <51B3DBA1.9060801@gmx.net> <1370762563.71270.YahooMailNeo@web184706.mail.ne1.yahoo.com> Message-ID: <51B5F8EF.8000408@gmx.net> On 06/09/2013 09:22 AM, Andrew Barnert wrote: > From: Mathias Panzenb?ck > > Sent: Saturday, June 8, 2013 6:34 PM > > >> On 06/08/2013 08:02 PM, Stephen J. Turnbull wrote: >>> >>> Unicode is a >>> universal character set in the sense that it can encode all >>> characters, >> >> I guess Japanese people beg to differ. There are some Japanese symbols that >> aren't covered by Unicode, >> or at least not to the extend Japanese people would like it to be. Which is why >> they use (Shift-)JIS a >> lot of the time. Basically Shift-JIS <-> Unicode is not round trip safe. > > > That's not true. Shift-JIS <-> Unicode 6.0 is completely round-trip safe. And there hasn't been a practical problem for UTF-8 or UTF-32 since they were introduced in Unicode 2.0 in the mid-90s. > > The problem is with UTF-16. Many early Unicode apps were built around UCS-2, a fixed-width 16-byte encoding. UCS-2 didn't have room for the extra characters that Japanese (among other languages) needed, so it was replaced with UTF-16, a variable-width 16-or-32-byte encoding. But historically, there's been a lot of software that treated UTF-16 as fixed-width (after all, you can test with hiragana and common kanji and it seems to work), which means it breaks if you give it any of the new characters added since the original version. This is sometimes still a problem today for Windows native apps. But again, it does not affect UTF-8, just UTF-16. > > Another reason people used Shift-JIS until a few years ago was emoji. But today, Unicode supports more emoji than Shift-JIS, and in fact people complain about only having the original 176 if they're forced to use Shift-JIS. > > > Some Japanese people still refuse to use Unicode because of the Unihan controversy. Briefly: Characters like ? (U+5203) are drawn differently in Japanese and Chinese, but Unicode considers them the same character (to get the Chinese variation, you have to use a Chinese font). This is a problem?but Shift-JIS has the exact same problem. > That's what I meant, but I thought Shift-JIS doesn't have this problem? I don't work with such encodings, I just read about that problems. See also "More Information" here: http://support.microsoft.com/kb/170559 ...which isn't where I read about this initially. I can't find where I first read about it. > Finally, for typical Japanese text, Shift-JIS takes the fewest bits per character of any major charset. Shift-JIS takes 1 byte for ASCII, 2 bytes for everything else. UTF-8 takes 3 bytes for kana and kanji, so if you, e.g., download an article and store it as UTF-8, it'll get almost 50% bigger. UTF-16 solves that by making kana and most kanji 2 bytes (although uncommon kanji are 4), but it makes ASCII 2 bytes instead of 1, which means you double the size of many files. Shift-JIS is a pretty good compromise for compactness. > From grosser.meister.morti at gmx.net Mon Jun 10 18:05:43 2013 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Mon, 10 Jun 2013 18:05:43 +0200 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <87d2rv8adg.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <51B3DBA1.9060801@gmx.net> <87d2rv8adg.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51B5F957.9090209@gmx.net> On 06/09/2013 07:25 PM, Stephen J. Turnbull wrote: > Mathias Panzenb?ck writes: > > On 06/08/2013 08:02 PM, Stephen J. Turnbull wrote: > > > > > > Unicode is a universal character set in the sense that it can > > > encode all characters, > > > > I guess Japanese people beg to differ. There are some Japanese > > symbols that aren't covered by Unicode, > > I've heard that said, but that's simply not the reality I have faced > in my day job here in Tsukuba, Japan since 1990. People do use Shift > JIS still, simply because there are a large number of legacy systems > using it. But I've never heard complaints that Unicode lacks > necessary characters.[1] I have seen it go the other way around, > though, because the JIS standard unifies some traditional variant > glyphs that some Unicode source distinguishes, and some people prefer > them for their names. > > Footnotes: > [1] Except emoticons. But even that has been fixed recently. > > Hmm, maybe I made a mistake and read it exactly the wrong way around when I read up on this? From stephen at xemacs.org Mon Jun 10 19:24:36 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 11 Jun 2013 02:24:36 +0900 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <51B5F8EF.8000408@gmx.net> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <51B3DBA1.9060801@gmx.net> <1370762563.71270.YahooMailNeo@web184706.mail.ne1.yahoo.com> <51B5F8EF.8000408@gmx.net> Message-ID: <877gi198vf.fsf@uwakimon.sk.tsukuba.ac.jp> Mathias Panzenb?ck writes: > > Some Japanese people still refuse to use Unicode because of the > > Unihan controversy. Briefly: Characters like ? (U+5203) are > > drawn differently in Japanese and Chinese, but Unicode considers > > them the same character (to get the Chinese variation, you have > > to use a Chinese font). This is a problem?but Shift-JIS has the > > exact same problem. > > That's what I meant, but I thought Shift-JIS doesn't have this > problem? I don't work with such encodings, I just read about that > problems. It depends on the font format and font selection algorithm. 20 years ago Shift JIS was less likely to have the problem because a legacy- format font used Shift JIS directly as an index into the glyph table, and nobody who wasn't Japanese used Shift JIS, so you could bet on a Japanese font. 10 years ago, Type 1 CID fonts and TrueType fonts which indirectly index by translating character codes to glyph indexes, then looking them up became popular. Many times configuration was done poorly (for example, many Chinese fonts claim to be able to represent Japanese, which is true but ugly), and rendering engines often made poor choices, even though you could almost always make an accurate guess as to which language was being rendered from the character encoding. Today rendering is slowing improving, but you still have the problem that because Japanese and Chinese prefer different styles in drawing the glyphs, some fonts are more appropriate for Japanese than for Chinese and vice versa, but systems aren't very often configured to make the fine distinctions automatically for multilingual users. I have heard that the same problem occurs in very nice fonts for Latin characters. Some languages consider umlauts and other diacritics to be part of the character, others consider them to be additions. The former languages tend to prefer fonts with less space between the base character and the diacritical mark than the latter. (So I have heard, but I've also heard it's B.S. ;-) Anyway, this is way OT. If you want to know more about Asian character encodings and related topics like fonts, Ken Lunde's _CJKV Information Processing_ is the bible. From stephen at xemacs.org Mon Jun 10 19:27:16 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 11 Jun 2013 02:27:16 +0900 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <51B5F957.9090209@gmx.net> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <51B3DBA1.9060801@gmx.net> <87d2rv8adg.fsf@uwakimon.sk.tsukuba.ac.jp> <51B5F957.9090209@gmx.net> Message-ID: <8761xl98qz.fsf@uwakimon.sk.tsukuba.ac.jp> Mathias Panzenb?ck writes: > Hmm, maybe I made a mistake and read it exactly the wrong way around > when I read up on this? No, there is a small but vocal group of Japanese (and I believe similar groups in Korea and China) who absolutely hate the Unihan. Of course there is some truth to it, see my other post. So I'm sure you read it right, but be aware that some of that stuff is disinformation. :-) From abarnert at yahoo.com Mon Jun 10 20:50:33 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 10 Jun 2013 11:50:33 -0700 (PDT) Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <51B5F8EF.8000408@gmx.net> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <51B3DBA1.9060801@gmx.net> <1370762563.71270.YahooMailNeo@web184706.mail.ne1.yahoo.com> <51B5F8EF.8000408@gmx.net> Message-ID: <1370890233.33199.YahooMailNeo@web184702.mail.ne1.yahoo.com> From: Mathias Panzenb?ck Sent: Monday, June 10, 2013 9:03 AM > On 06/09/2013 09:22 AM, Andrew Barnert wrote: >> Some Japanese people still refuse to use Unicode because of the Unihan >> controversy. Briefly: Characters like ? (U+5203) are drawn differently in >> Japanese and Chinese, but Unicode considers them the same character (to get the >> Chinese variation, you have to use a Chinese font). This is a problem?but >> Shift-JIS has the exact same problem. > > That's what I meant, but I thought Shift-JIS doesn't have this problem? > I don't work with such encodings, I just read about that problems. Just like Unicode, Shift-JIS only has one character for this kanji, and you have to use out-of-band meta-textual information to determine whether to display the Chinese or Japanese version. Of course in Unicode, it's a script tag or file metadata or user preference setting that controls which font is used;?in Shift-JIS, the fact that nobody uses Shift-JIS for Chinese is generally all the information you need.?But, either way, if you want to write "I could tell my pen-pal was a Chinese spy because she wrote?? instead of??", you can't. >?See also "More Information" here:>?http://support.microsoft.com/kb/170559 > ...which isn't where I read about this initially. I can't find where I first read about it. Note that the "products this article applies to" list is "Microsoft Platform Software Development Kit-January 2000 Edition". The problem was mostly fixed in Unicode 2.0, but Windows ME and 2000 had only partial support for 2.0. While they could display SIP characters, their codepage maps weren't updated to make use of them. So, the Shift-JIS (and Big-5, etc.) mappings were ambiguous?two different Shift-JIS characters mapped to the same Unicode character. Microsoft fixed that in XP and 2003 by upgrading to Unicode 3.0 and implementing the correct mappings. If you still need to support Windows 2000 or 9x/ME or CE 3.0, or apps built for them, it still occasionally shows up today. Classic Mac OS and Palm OS had smaller problems, but nobody cares about those platforms anymore anyway. Pretty much every other platform either ignored Unicode until well after 2.0, or went for UCS-4 or UTF-8 from the start, making the Unicode 2.0 upgrade much easier. From jeanpierreda at gmail.com Mon Jun 10 21:13:27 2013 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Mon, 10 Jun 2013 15:13:27 -0400 Subject: [Python-ideas] Line continuations with comments In-Reply-To: <51B5650D.2050708@drees.name> References: <519D51AF.1080505@gmail.com> <51B4DD11.4030006@stackless.com> <51B5650D.2050708@drees.name> Message-ID: On Mon, Jun 10, 2013 at 1:33 AM, Stefan Drees wrote: > - as continuation signal in shell: who has not been bitten by an > innocent space following the backslash? ... and it's not the space > we dislike ;-) Right, what we dislike is that \ takes invisible whitespace into account for no good reason. > - on my console I use three fingers to produce this one symbol "\" > eg. so I even might visit trigraph-land again ??/ This isn't a reason to oppose making "\" less painful to use. You can just as easily not use "\" after the change as before. -- Devin From rosuav at gmail.com Mon Jun 10 22:15:32 2013 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 11 Jun 2013 06:15:32 +1000 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <1370890233.33199.YahooMailNeo@web184702.mail.ne1.yahoo.com> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <51B3DBA1.9060801@gmx.net> <1370762563.71270.YahooMailNeo@web184706.mail.ne1.yahoo.com> <51B5F8EF.8000408@gmx.net> <1370890233.33199.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: On Tue, Jun 11, 2013 at 4:50 AM, Andrew Barnert wrote: > Of course in Unicode, it's a script tag or file metadata or user preference setting that controls which font is used; in Shift-JIS, the fact that nobody uses Shift-JIS for Chinese is generally all the information you need. But, either way, if you want to write "I could tell my pen-pal was a Chinese spy because she wrote ? instead of ?", you can't. You don't even need Japanese/Chinese confusion to get that. Sherlock Holmes, "A Study in Scarlet" - am I allowed to spoil a minor subpoint in something that's over a hundred years old? - has the discovery of the German word "RACHE" ( == "revenge") written in blood. But Holmes knew it to be an imitator, because the A was written the wrong way; we, reading the book, can't see that, because all we get is the letters. It takes out-of-band information - in this case, dialogue between various characters - to tell us about the difference between the German and the Latin way of writing the letter. http://en.wikisource.org/wiki/A_Study_in_Scarlet/Part_1/Chapter_3 http://en.wikisource.org/wiki/A_Study_in_Scarlet/Part_1/Chapter_4 Unicode represents the symbols, not how they're drawn. It's up to the application to take it the rest of the way. ChrisA From stefan at drees.name Mon Jun 10 22:47:53 2013 From: stefan at drees.name (Stefan Drees) Date: Mon, 10 Jun 2013 22:47:53 +0200 Subject: [Python-ideas] Line continuations with comments In-Reply-To: References: <519D51AF.1080505@gmail.com> <51B4DD11.4030006@stackless.com> <51B5650D.2050708@drees.name> Message-ID: <51B63B79.9050501@drees.name> On 2013-06-10 21:13 CEST, Devin Jeanpierre wrote: > On Mon, Jun 10, 2013 at 1:33 AM, Stefan Drees ... wrote: >> - as continuation signal in shell: who has not been bitten by an >> innocent space following the backslash? ... and it's not the space >> we dislike ;-) > > Right, what we dislike is that \ takes invisible whitespace into > account for no good reason. I really thought about not sending the above list item. We are so used to having explicitely visible whitpace character displays in our working environments flanked by linting away all politically incorrect whitespace (we even have a nanny for that ...), but sometimes, I simply enjoy the pure sight of wide empty space separating words and than, invisible WS strikes me after a \\ ... thanks for also blaming the ??/ >> - on my console I use three fingers to produce this one symbol "\" >> eg. so I even might visit trigraph-land again ??/ > > This isn't a reason to oppose making "\" less painful to use. You can > just as easily not use "\" after the change as before. of course that is right :-) I mean "not for me" and "less painful for the rest" is fully ok. I thought Chris was just exhaustively listing all known sources for "bad fee\ings" and I had one or two I did not spot on his shopping list. Stefan. From tismer at stackless.com Tue Jun 11 02:15:34 2013 From: tismer at stackless.com (Christian Tismer) Date: Tue, 11 Jun 2013 02:15:34 +0200 Subject: [Python-ideas] namedtuple is not as good as it should be In-Reply-To: <51B5F06C.3010706@stackless.com> References: <51B507D1.1000001@stackless.com> <37D44DD6-4D59-41A2-A62A-BFB32E0603B0@gmail.com> <51B5C89C.10404@stackless.com> <51B5F06C.3010706@stackless.com> Message-ID: <51B66C26.4020000@stackless.com> Hi Raymond, I have now written a preliminary and rough, but almost complete implementation of namedtuple/namelesstuple. The result is: For simple tuples, the behavior is like I intended it to be: - definitions are only created once, and on demand - pickling works by value without class reference - tuple-like behavior therefore seems to be complete When you derive a class, the generated objects are aware of it and fall back to their original behavior. I.E. for derived classes, you need to write the pickling, yourself. No idea if this can be extended, but it is sufficient to close this thread. cheers - chris https://bitbucket.org/ctismer/namelesstuple/ On 10.06.13 17:27, Christian Tismer wrote: > Hi Raymond, > > I have written a simple implementation of something like a > namelesstuple, as a POC. > > I did not make generic tuple classes, yet, there is no class generator. > It shows that the result of pickling is quite compact and has only > a single helper function to restore the pickled objects. > > What needs to be implemented is > > - the unique dict that holds all fieldname tuples > - the class store that hold all generated classes. > > And of course the features that make namedtuple nice. ;-) > > What I wanted to see is how efficiently this stuff can be pickled. > > ------------------------------------------------------------ > > # namelesstuple > # proof of concept for nametuple implementation using > # the identity of the tuple of field names. > > from operator import itemgetter as _itemgetter > > _tuple = tuple > _repr_template = '{name}=%r' > > class _nlt1(tuple): > __slots__ = () > > _fields = ('eins', 'zwei', 'drei') > > def __new__(_cls, args): > return _tuple.__new__(_cls, args) > > eins = property(_itemgetter(0), doc='Alias for field number 0') > zwei = property(_itemgetter(1), doc='Alias for field number 1') > drei = property(_itemgetter(2), doc='Alias for field number 2') > > def __repr__(self): > 'Return a nicely formatted representation string' > fmt = '{}=%r' > return self._repr_fmt % self > > _repr_fmt = ', '.join(_repr_template.format(name=name) > for name in _fields) > _repr_fmt = '(%s)' % _repr_fmt > > def __reduce__(self): > return (rebuild_namelesstuple, (tuple(self), self._fields)) > > > class _nlt2(tuple): > __slots__ = () > > _fields = ('alpha', 'beta', 'gamma') > > def __new__(_cls, args): > return _tuple.__new__(_cls, args) > > alpha = property(_itemgetter(0), doc='Alias for field number 0') > beta = property(_itemgetter(1), doc='Alias for field number 1') > gamma = property(_itemgetter(2), doc='Alias for field number 2') > > def __repr__(self): > 'Return a nicely formatted representation string' > fmt = '{}=%r' > return self._repr_fmt % self > > _repr_fmt = ', '.join(_repr_template.format(name=name) > for name in _fields) > _repr_fmt = '(%s)' % _repr_fmt > > def __reduce__(self): > return (rebuild_namelesstuple, (tuple(self), self._fields)) > > > def rebuild_namelesstuple(args, fields): > # just a dummy for testing pickle > if fields[0] == 'eins': > return _nlt1(args) > else: > return _nlt2(args) > > # quick check > insts = [ > _nlt1((2, 3, 5)), > _nlt1((7, 8, 9)), > _nlt2((2, 3, 5)), > _nlt2((101, 102, 103)), > ] > > > import pickletools, pickle > x=pickle.dumps(insts) > pickletools.dis(x) > pickle.loads(x) > -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From steve at pearwood.info Tue Jun 11 02:23:24 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 11 Jun 2013 10:23:24 +1000 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: <878v2i87gp.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <87ehcb8dge.fsf@uwakimon.sk.tsukuba.ac.jp> <878v2i87gp.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51B66DFC.3000703@pearwood.info> On 10/06/13 22:40, Stephen J. Turnbull wrote: > UTF-8 *is* the future. Only Microsoft disagrees, and that doesn't > really matter because Microsoft's plan for world domination involves > proprietary binary file formats rather than text files in a standard > encoding. [...] > On the other hand, > problems with conflicting defaults across systems are "business as > usual", and nobody's going to blame Python for that. Well said. Compatibility with local files is more important than universal compatibility. When users on Windows systems cannot easily and reliably read their own text files created with other tools, it will be no comfort to tell them "Yes, but we've solved the problem of you reading files transferred from another machine! Well, almost solved. Provided the other machine is using UTF-8, and the file transfer program doesn't screw something up." -- Steven From steve at pearwood.info Tue Jun 11 02:53:02 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 11 Jun 2013 10:53:02 +1000 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> Message-ID: <51B674EE.3020103@pearwood.info> On 10/06/13 05:35, Alexander Belopolsky wrote: > On Sun, Jun 9, 2013 at 12:47 PM, ?ukasz Langa wrote: > >> Wikipedia started serving numerical data with \N{MINUS SIGN} instead of >> 0x2D, for instance on climate tables. I think we'll see increased usage of >> such characters in the wild. > > > While I don't think MINUS SIGN can be abused this way, you should be very > careful when you copy numerical data from the web. Consider this case: > >>>> float('123?95') > 123095.0 > > Depending on your font, '123?95' may be indistinguishable from '123.95'. Indistinguishable *by eye* maybe, but the same applies to ASCII, 365lO98, and there are plenty of ways to distinguish them other than by a careless glance at the screen. [Aside: I have seen users type I or O for digits, based on the fact that it works fine when using a typewriter, and I've read books from the 1970s that recommended that number parsers accept I, L and O for just that reason.] It would be a pretty awful font that made ? look like . But even if it did, what is the concern here? If somebody enters a mixed script number, presumably they have some reason for it. If they don't, the point is moot. It is hardly likely that mixed script numbers will form by accident. Postel's Law, or the Robustness Principle, supports the current behaviour: "Be conservative in what you send, be liberal in what you accept". str(number) is conservative, and emits only ASCII digits. int(string) and float(string) are liberal, and accept any valid digit as a digit. > If you do research using numerical data published on the web, you will be > well advised not to assume that anything that looks like a number to your > eye can be fed to python's float(). What exactly are you concerned about? If it is a syntactically valid number made up of nothing but digits (plus a sign and decimal point), Python will convert it, using the correct value for each digit. If it contains non-digits, Python will give you a traceback. -- Steven From stephen at xemacs.org Tue Jun 11 03:55:02 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 11 Jun 2013 10:55:02 +0900 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <51B674EE.3020103@pearwood.info> References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> <51B674EE.3020103@pearwood.info> Message-ID: <874nd58l8p.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > It would be a pretty awful font that made ? look like . Or aging eyes. > But even if it did, what is the concern here? If somebody enters a > mixed script number, presumably they have some reason for it. Unicode Technical Report #36 explains the concerns. Mostly that the reason may be nefarious. I specifically draw your attention to section 2.7: 2.7 Numeric Spoofs Turning away from the focus on domain names for a moment, there is another area where visual spoofs can be used. Many scripts have sets of decimal digits that are different in shape from the typical European digits. For example, Bengali has {? ? ? ? ? ? ? ? ? ?}, while Oriya has {? ? ? ? ? ? ? ? ? ?}. Individual digits may have the same shapes as digits from other scripts, even digits of different values. For example, the Bengali string "??" is visually confusable with the European digits "89", but actually has the numeric value 42! * If software interprets the numeric value of a string of digits without * detecting that the digits are from different or inappropriate scripts, * such spoofs can be used. Emphasis (*) added. Noting that the number 42 is the answer to Life, the Universe, and Everything (including this thread), I conclude we're done! > Postel's Law, or the Robustness Principle, supports the current > behaviour: "Be conservative in what you send, be liberal in what > you accept". str(number) is conservative, and emits only ASCII > digits. int(string) and float(string) are liberal, and accept any > valid digit as a digit. The Postel Principle may apply to Python as a whole; I believe it does. But not every input with a plausible interpretation needs to be acceptable to *builtins*. For example, as with "universal newlines" we could have "universal decimal points", accepting any of . , ' as dividing the integer part from the fractional part. This would be unambiguous, since Python numbers do not admit grouping characters. Your version of the Postel Principle suggests that this is a strong candidate for addition to the float() builtin. WDYT? The builtins are in any case poorly suited for input conversion, since they should not be localized. From python at mrabarnett.plus.com Tue Jun 11 04:21:45 2013 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 11 Jun 2013 03:21:45 +0100 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <874nd58l8p.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> <51B674EE.3020103@pearwood.info> <874nd58l8p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51B689B9.4050709@mrabarnett.plus.com> On 11/06/2013 02:55, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > It would be a pretty awful font that made ? look like . > > Or aging eyes. > > > But even if it did, what is the concern here? If somebody enters a > > mixed script number, presumably they have some reason for it. > > Unicode Technical Report #36 explains the concerns. Mostly that the > reason may be nefarious. I specifically draw your attention to > section 2.7: > > 2.7 Numeric Spoofs > > Turning away from the focus on domain names for a moment, there is > another area where visual spoofs can be used. Many scripts have sets > of decimal digits that are different in shape from the typical > European digits. For example, Bengali has {? ? ? ? ? ? ? ? ? ?}, while > Oriya has {? ? ? ? ? ? ? ? ? ?}. Individual digits may have the same > shapes as digits from other scripts, even digits of different > values. For example, the Bengali string "??" is visually confusable > with the European digits "89", but actually has the numeric value 42! > * If software interprets the numeric value of a string of digits without > * detecting that the digits are from different or inappropriate scripts, > * such spoofs can be used. > > Emphasis (*) added. Noting that the number 42 is the answer to Life, > the Universe, and Everything (including this thread), I conclude we're > done! > In that case, float and int should accept different scripts, but not mixed scripts. > > Postel's Law, or the Robustness Principle, supports the current > > behaviour: "Be conservative in what you send, be liberal in what > > you accept". str(number) is conservative, and emits only ASCII > > digits. int(string) and float(string) are liberal, and accept any > > valid digit as a digit. > > The Postel Principle may apply to Python as a whole; I believe it > does. But not every input with a plausible interpretation needs to be > acceptable to *builtins*. For example, as with "universal newlines" > we could have "universal decimal points", accepting any of . , ' as > dividing the integer part from the fractional part. This would be > unambiguous, since Python numbers do not admit grouping characters. > Your version of the Postel Principle suggests that this is a strong > candidate for addition to the float() builtin. WDYT? > > The builtins are in any case poorly suited for input conversion, since > they should not be localized. > I think that it would be best to trial it on PyPI first! From alexander.belopolsky at gmail.com Tue Jun 11 04:28:24 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 10 Jun 2013 22:28:24 -0400 Subject: [Python-ideas] Mixed script numbers. Was: Unicode minus sign in numeric conversions Message-ID: I am changing the subject because the issue of mixing digits from different scripts is quite different from the issue of accepting MINUS SIGN. I left a comment on the original subject at the issue tracker: < http://bugs.python.org/issue6632#msg190881>. On Mon, Jun 10, 2013 at 8:53 PM, Steven D'Aprano wrote: > > On 10/06/13 05:35, Alexander Belopolsky wrote: >> >> ... Consider this case: >> >>>>> float('123?95') >> 123095.0 >> >> Depending on your font, '123?95' may be indistinguishable from '123.95'. > > > Indistinguishable *by eye* maybe, but the same applies to ASCII, > 365lO98, and there are plenty of ways to distinguish them other > than by a careless glance at the screen. > > [Aside: I have seen users type I or O for digits, based on the fact > that it works fine when using a typewriter, and I've read books from > the 1970s that recommended that number parsers accept I, L and O > for just that reason.] I am not sure why your example is relevant. There is little harm from accepting O for zero, but accepting it for say 8 would be a different story. > > It would be a pretty awful font that made ? look like . But even if it did, what is the concern here? > If somebody enters a mixed script number, presumably they have some reason for it. Sure, say someone who wants to sell you a $23.95 gadget for $23,095. :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jun 11 05:56:00 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 11 Jun 2013 12:56:00 +0900 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <51B689B9.4050709@mrabarnett.plus.com> References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> <51B674EE.3020103@pearwood.info> <874nd58l8p.fsf@uwakimon.sk.tsukuba.ac.jp> <51B689B9.4050709@mrabarnett.plus.com> Message-ID: <8738sp8fn3.fsf@uwakimon.sk.tsukuba.ac.jp> MRAB writes: > In that case, float and int should accept different scripts, but not > mixed scripts. Possibly. I don't see any compelling use case for builtins accepting any numerical syntax that can't be used in a Python program. I doubt there's any computer[1] user in the world who doesn't understand "-1" as well as its representation in their local script, or isn't already used to using ASCII characters for numeric input. > > The builtins are in any case poorly suited for input conversion, since > > they should not be localized. > > > I think that it would be best to trial it on PyPI first! Unfortunately, we already have float and int that violate the UTR #36 recommendation, and eval("\uff11") raises an identifier syntax error, while int("\uff11") == 1. For web use, obviously we need to consider cut and paste and other situations where data is received in presentation format, and it's quite possible that in the future programs like Matlab and R will output \N{MINUS SIGN} instead of "-".[2] If I could do what I wanted to without worrying about backward compatibility, I'd move the current builtins to someplace like unicodedata.parsers.numeric.integer_insecure and so on, and revert the builtins to accepting only the output of str(int) and str(float). Footnotes: [1] Defining "computer" broadly to at least include 4 function calculators and telephones smart enough to text message. [2] This is a Unicode recommendation, I'd be willing to bet. From alexander.belopolsky at gmail.com Tue Jun 11 06:43:49 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 11 Jun 2013 00:43:49 -0400 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <8738sp8fn3.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> <51B674EE.3020103@pearwood.info> <874nd58l8p.fsf@uwakimon.sk.tsukuba.ac.jp> <51B689B9.4050709@mrabarnett.plus.com> <8738sp8fn3.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Jun 10, 2013 at 11:56 PM, Stephen J. Turnbull wrote: > I don't see any compelling use case for builtins accepting > any numerical syntax that can't be used in a Python program. > TOOWTDI >>> float('?') 3.0 Is the more obvious way to extract numeric value than >>> unicodedata.numeric('?') 3.0 > I'd move the current builtins to someplace like > unicodedata.parsers.numeric.integer_insecure .. That would be an overkill. You can easily write safe code using float(x.encode('ascii')), but if you remove non-ascii digits support from float, there wont be an obvious way to parse non-ascii numerals. I think we can drop support for mixed scripts, though. It is hard to justify accepting input that looks invalid to a human reader or worse is interpreted by humans differently. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Jun 11 06:57:32 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 11 Jun 2013 00:57:32 -0400 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> <51B674EE.3020103@pearwood.info> <874nd58l8p.fsf@uwakimon.sk.tsukuba.ac.jp> <51B689B9.4050709@mrabarnett.plus.com> <8738sp8fn3.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Here is another tracker item relevant to this discussion: < http://bugs.python.org/issue10581>. On Tue, Jun 11, 2013 at 12:43 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Mon, Jun 10, 2013 at 11:56 PM, Stephen J. Turnbull wrote: > >> I don't see any compelling use case for builtins accepting >> any numerical syntax that can't be used in a Python program. >> > > TOOWTDI > > >>> float('?') > 3.0 > > Is the more obvious way to extract numeric value than > > >>> unicodedata.numeric('?') > 3.0 > > > I'd move the current builtins to someplace like > > unicodedata.parsers.numeric.integer_insecure .. > > That would be an overkill. You can easily write safe code using > float(x.encode('ascii')), but if you remove non-ascii digits support from > float, there wont be an obvious way to parse non-ascii numerals. > > I think we can drop support for mixed scripts, though. It is hard to > justify accepting input that looks invalid to a human reader or worse is > interpreted by humans differently. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Tue Jun 11 09:13:48 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 11 Jun 2013 09:13:48 +0200 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> <51B674EE.3020103@pearwood.info> <874nd58l8p.fsf@uwakimon.sk.tsukuba.ac.jp> <51B689B9.4050709@mrabarnett.plus.com> <8738sp8fn3.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51B6CE2C.3050108@egenix.com> On 11.06.2013 06:57, Alexander Belopolsky wrote: > Here is another tracker item relevant to this discussion: < > http://bugs.python.org/issue10581>. +1 on your idea to only accept one range of digits in a given numeric string. > On Tue, Jun 11, 2013 at 12:43 AM, Alexander Belopolsky < > alexander.belopolsky at gmail.com> wrote: > >> >> On Mon, Jun 10, 2013 at 11:56 PM, Stephen J. Turnbull wrote: >> >>> I don't see any compelling use case for builtins accepting >>> any numerical syntax that can't be used in a Python program. >>> >> >> TOOWTDI >> >>>>> float('?') >> 3.0 >> >> Is the more obvious way to extract numeric value than >> >>>>> unicodedata.numeric('?') >> 3.0 >> >>> I'd move the current builtins to someplace like >>> unicodedata.parsers.numeric.integer_insecure .. >> >> That would be an overkill. You can easily write safe code using >> float(x.encode('ascii')), but if you remove non-ascii digits support from >> float, there wont be an obvious way to parse non-ascii numerals. >> >> I think we can drop support for mixed scripts, though. It is hard to >> justify accepting input that looks invalid to a human reader or worse is >> interpreted by humans differently. >> > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Marc-Andre Lemburg Director Python Software Foundation http://www.python.org/psf/ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 11 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 20 days to go 2013-07-16: Python Meeting Duesseldorf ... 35 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From storchaka at gmail.com Tue Jun 11 13:21:49 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 11 Jun 2013 14:21:49 +0300 Subject: [Python-ideas] Add "namereplace" error handler Message-ID: I propose to add "namereplace" error handler which is similar to "backslashreplace" error handler but use \N escapes instead of \x/\u/\U escapes. >>> '?1'.encode('ascii', 'backslashreplace') b'\\u22121' >>> '?1'.encode('ascii', 'namereplace') b'\\N{MINUS SIGN}1' In some cases such representation is more readable. What are you think about this? Are there suggestions for better name? From amauryfa at gmail.com Tue Jun 11 13:47:03 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 11 Jun 2013 13:47:03 +0200 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: References: Message-ID: It's easy to implement in Python code: import codecs, unicodedata def namereplace_errors(exc): replace = "\\N{{{}}}".format( unicodedata.name(exc.object[exc.start])) return replace, exc.start + 1 codecs.register_error('namereplace', namereplace_errors) 2013/6/11 Serhiy Storchaka > I propose to add "namereplace" error handler which is similar to > "backslashreplace" error handler but use \N escapes instead of \x/\u/\U > escapes. > > >>> '-1'.encode('ascii', 'backslashreplace') > b'\\u22121' > >>> '-1'.encode('ascii', 'namereplace') > b'\\N{MINUS SIGN}1' > > In some cases such representation is more readable. > > What are you think about this? Are there suggestions for better name? > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Jun 11 14:01:46 2013 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 11 Jun 2013 14:01:46 +0200 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: References: Message-ID: 2013/6/11 Serhiy Storchaka : > I propose to add "namereplace" error handler which is similar to > "backslashreplace" error handler but use \N escapes instead of \x/\u/\U > escapes. > ... > What are you think about this? Are there suggestions for better name? I like your idea (+1). For the name, I propose "unicodename": str.encode('ascii', 'unicodename") But I like "namereplace". What is the complexity of unicodedata.lookup(char)? O(1)? From mal at egenix.com Tue Jun 11 14:34:48 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 11 Jun 2013 14:34:48 +0200 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: References: Message-ID: <51B71968.3030700@egenix.com> On 11.06.2013 13:21, Serhiy Storchaka wrote: > I propose to add "namereplace" error handler which is similar to "backslashreplace" error handler > but use \N escapes instead of \x/\u/\U escapes. > >>>> '?1'.encode('ascii', 'backslashreplace') > b'\\u22121' >>>> '?1'.encode('ascii', 'namereplace') > b'\\N{MINUS SIGN}1' > > In some cases such representation is more readable. > > What are you think about this? Are there suggestions for better name? Nice idea. I think 'namereplace' is fine. An alternative would be 'unicodenamereplace'. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 11 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 20 days to go 2013-07-16: Python Meeting Duesseldorf ... 35 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Tue Jun 11 14:37:53 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 11 Jun 2013 22:37:53 +1000 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: References: Message-ID: On 11 June 2013 22:01, Victor Stinner wrote: > 2013/6/11 Serhiy Storchaka : >> I propose to add "namereplace" error handler which is similar to >> "backslashreplace" error handler but use \N escapes instead of \x/\u/\U >> escapes. >> ... >> What are you think about this? Are there suggestions for better name? > > I like your idea (+1). For the name, I propose "unicodename": > > str.encode('ascii', 'unicodename") > > But I like "namereplace". +1 from me for "namereplace" (it fits well with the existing "replace" naming scheme for the encoding-only error handlers: xmlcharrefreplace and backslashreplace) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Tue Jun 11 15:01:31 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 11 Jun 2013 23:01:31 +1000 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: <51B71968.3030700@egenix.com> References: <51B71968.3030700@egenix.com> Message-ID: <51B71FAB.1040605@pearwood.info> On 11/06/13 22:34, M.-A. Lemburg wrote: > > > On 11.06.2013 13:21, Serhiy Storchaka wrote: >> I propose to add "namereplace" error handler which is similar to "backslashreplace" error handler >> but use \N escapes instead of \x/\u/\U escapes. >> >>>>> '?1'.encode('ascii', 'backslashreplace') >> b'\\u22121' >>>>> '?1'.encode('ascii', 'namereplace') >> b'\\N{MINUS SIGN}1' >> >> In some cases such representation is more readable. >> >> What are you think about this? Are there suggestions for better name? > > Nice idea. > > I think 'namereplace' is fine. An alternative would be > 'unicodenamereplace'. +1 on namereplace. unicodenamereplace is unnecessary, since strings in Python 3 are Unicode. Might as well say "stringnamereplace" :-) -- Steven From victor.stinner at gmail.com Tue Jun 11 15:32:02 2013 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 11 Jun 2013 15:32:02 +0200 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: <51B71FAB.1040605@pearwood.info> References: <51B71968.3030700@egenix.com> <51B71FAB.1040605@pearwood.info> Message-ID: 2013/6/11 Steven D'Aprano : > +1 on namereplace. unicodenamereplace is unnecessary, since strings in > Python 3 are Unicode. Might as well say "stringnamereplace" :-) Names come from the *Unicode* standard and the *unicode*data module. Victor From mal at egenix.com Tue Jun 11 15:33:53 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 11 Jun 2013 15:33:53 +0200 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: <51B71FAB.1040605@pearwood.info> References: <51B71968.3030700@egenix.com> <51B71FAB.1040605@pearwood.info> Message-ID: <51B72741.60602@egenix.com> On 11.06.2013 15:01, Steven D'Aprano wrote: > On 11/06/13 22:34, M.-A. Lemburg wrote: >> >> >> On 11.06.2013 13:21, Serhiy Storchaka wrote: >>> I propose to add "namereplace" error handler which is similar to "backslashreplace" error handler >>> but use \N escapes instead of \x/\u/\U escapes. >>> >>>>>> '?1'.encode('ascii', 'backslashreplace') >>> b'\\u22121' >>>>>> '?1'.encode('ascii', 'namereplace') >>> b'\\N{MINUS SIGN}1' >>> >>> In some cases such representation is more readable. >>> >>> What are you think about this? Are there suggestions for better name? >> >> Nice idea. >> >> I think 'namereplace' is fine. An alternative would be >> 'unicodenamereplace'. > > > +1 on namereplace. unicodenamereplace is unnecessary, since strings in Python 3 are Unicode. Might > as well say "stringnamereplace" :-) The reference is to the "Unicode Name Property": https://en.wikipedia.org/wiki/Unicode_character_property#Name One detail such an error handler would have to define is what to use in case no name is defined. >>> unicodedata.lookup(unichr(13)) Traceback (most recent call last): File "", line 1, in KeyError: "undefined character name '\r'" Since the handler should probably be round-trip safe, I guess falling back to the \u backslash notation would be appropriate. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 11 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 20 days to go 2013-07-16: Python Meeting Duesseldorf ... 35 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From storchaka at gmail.com Tue Jun 11 16:21:15 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 11 Jun 2013 17:21:15 +0300 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: References: Message-ID: 11.06.13 14:47, Amaury Forgeot d'Arc ???????(??): > It's easy to implement in Python code: > > import codecs, unicodedata > > def namereplace_errors(exc): > replace = "\\N{{{}}}".format( > unicodedata.name (exc.object[exc.start])) > return replace, exc.start + 1 > > codecs.register_error('namereplace', namereplace_errors) Indead. But namereplace_errors() should be a little more verbose: def namereplace_errors(exc): if not isinstance(exc, UnicodeEncodeError): raise exc try: replace = r'\N{%s}' % unicodedata.name(exc.object[exc.start]) except ValueError: return codecs.backslashreplace_errors(exc) return replace, exc.start + 1 Even if do not register this handler from the start, it may be worth to provide namereplace_errors() in the unicodedata module. From storchaka at gmail.com Tue Jun 11 16:25:45 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 11 Jun 2013 17:25:45 +0300 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: References: Message-ID: 11.06.13 15:37, Nick Coghlan ???????(??): > +1 from me for "namereplace" (it fits well with the existing > "replace" naming scheme for the encoding-only error handlers: > xmlcharrefreplace and backslashreplace) I just hesitate between "namereplace", "ucdnamereplace", "unicodenamereplace", etc. From masklinn at masklinn.net Tue Jun 11 16:40:37 2013 From: masklinn at masklinn.net (Masklinn) Date: Tue, 11 Jun 2013 16:40:37 +0200 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: References: <51B71968.3030700@egenix.com> <51B71FAB.1040605@pearwood.info> Message-ID: <6603741B-21EA-4DA7-8E8F-0548FBE8C3C2@masklinn.net> On 2013-06-11, at 15:32 , Victor Stinner wrote: > 2013/6/11 Steven D'Aprano : >> +1 on namereplace. unicodenamereplace is unnecessary, since strings in >> Python 3 are Unicode. Might as well say "stringnamereplace" :-) > > Names come from the *Unicode* standard and the *unicode*data module. "name" seems sufficiently unambiguous though, we're talking about the replacement for any codepoint in a (unicode) string which can't be encoded (in whatever encoding is requested), I'm not aware of other systems which would allow *naming* all of these codepoints outside of unicode. "namereplace" feels short, sweet and clear. From storchaka at gmail.com Tue Jun 11 16:49:51 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 11 Jun 2013 17:49:51 +0300 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler Message-ID: I propose to add "htmlcharrefreplace" error handler which is similar to "xmlcharrefreplace" error handler but use html entity names if possible. >>> '? x??'.encode('ascii', 'xmlcharrefreplace') b'∀ x∈ℜ' >>> '? x??'.encode('ascii', 'htmlcharrefreplace') b'∀ x∈ℜ' Possible implementation: import codecs from html.entities import codepoint2name def htmlcharrefreplace_errors(exc): if not isinstance(exc, UnicodeEncodeError): raise exc try: replace = r'&%s;' % codepoint2name[ord(exc.object[exc.start])] except KeyError: return codecs.xmlcharrefreplace_errors(exc) return replace, exc.start + 1 codecs.register_error('htmlcharrefreplace', htmlcharrefreplace_errors) Even if do not register this handler from the start, it may be worth to provide htmlcharrefreplace_errors() in the html or html.entities module. From p.f.moore at gmail.com Tue Jun 11 16:53:27 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 11 Jun 2013 15:53:27 +0100 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: Message-ID: On 11 June 2013 15:49, Serhiy Storchaka wrote: > I propose to add "htmlcharrefreplace" error handler which is similar to > "xmlcharrefreplace" error handler but use html entity names if possible. +1. This is usually what I want when I use xmlcharrefreplace. The implementation is simple, but I was unaware of the ability to add my own error handlers, so having this in the stdlib would improve discoverability a lot. Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Jun 11 16:51:47 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 11 Jun 2013 17:51:47 +0300 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: <6603741B-21EA-4DA7-8E8F-0548FBE8C3C2@masklinn.net> References: <51B71968.3030700@egenix.com> <51B71FAB.1040605@pearwood.info> <6603741B-21EA-4DA7-8E8F-0548FBE8C3C2@masklinn.net> Message-ID: 11.06.13 17:40, Masklinn ???????(??): > I'm not aware of other > systems which would allow *naming* all of these codepoints outside of > unicode. "namereplace" feels short, sweet and clear. HTML, (La)TeX? From storchaka at gmail.com Tue Jun 11 16:54:48 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 11 Jun 2013 17:54:48 +0300 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: <51B72741.60602@egenix.com> References: <51B71968.3030700@egenix.com> <51B71FAB.1040605@pearwood.info> <51B72741.60602@egenix.com> Message-ID: 11.06.13 16:33, M.-A. Lemburg ???????(??): > One detail such an error handler would have to define is > what to use in case no name is defined. > >>>> unicodedata.lookup(unichr(13)) > Traceback (most recent call last): > File "", line 1, in > KeyError: "undefined character name '\r'" > > Since the handler should probably be round-trip safe, I > guess falling back to the \u backslash notation would be > appropriate. "namereplace" produces output which can be interpreted as Python literal. Therefore fallback to "backslashreplace" looks reasonable. From ronaldoussoren at mac.com Tue Jun 11 16:54:02 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 11 Jun 2013 16:54:02 +0200 Subject: [Python-ideas] Hooks into super()'s __getattribute__ Message-ID: <30DA8E13-726D-4C94-ABDD-796C54E8A93B@mac.com> Hi, Super() currently does not have a way to hook into the behavior of attribute lookup, the __getattribute__ method of super peeks in the __dict__ of types along the MRO until it finds what it is looking for. That can be a problem when a class implements __getattribute__ and doesn't necessarily store (all) attributes in the type __dict__. PyObjC is an example where the current behavior causes problems: PyObjC defines proxy classes for classes in the Objective-C runtime. The __dict__ of those classes is filled on demand, whenever a method is called that isn't in the __dict__ yet PyObjC looks in the Objective-C runtime datastructures for the method and adds it to __dict__. This works fine for normal method resolution, but can fail with super: super will only return the correct value when the superclass method happens to be in __dict__ already. My current solution for this is a custom subclass of super that must be used with Cocoa subclasses, and that's something I'd like to get rid off. I'd therefore like to propose adding a slot to PyTypeObject that is called by super's __getattribute__ when present, instead of peeking in the type's tp_dict slot. Does this look like a sane solution? I've filed an issue about this that includes a proof of concept patch: http://bugs.python.org/issue18181. That patch is incomplete, but does make it possible to use the builtin super for Cocoa classes (with a suitably patched version of PyObjC). An earlier issue mentions that the current behavior of super can be inconsistent with the behavior of __getattribute__, see http://bugs.python.org/issue783528 (that issue is closed, but IMHO for the wrong reason). I'm appearently not the only one running into this problem ;-) Ronald From mal at egenix.com Tue Jun 11 17:04:03 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 11 Jun 2013 17:04:03 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: Message-ID: <51B73C63.6060407@egenix.com> On 11.06.2013 16:49, Serhiy Storchaka wrote: > I propose to add "htmlcharrefreplace" error handler which is similar to "xmlcharrefreplace" error > handler but use html entity names if possible. > >>>> '? x??'.encode('ascii', 'xmlcharrefreplace') > b'∀ x∈ℜ' >>>> '? x??'.encode('ascii', 'htmlcharrefreplace') > b'∀ x∈ℜ' > > Possible implementation: > > import codecs > from html.entities import codepoint2name > > def htmlcharrefreplace_errors(exc): > if not isinstance(exc, UnicodeEncodeError): > raise exc > try: > replace = r'&%s;' % codepoint2name[ord(exc.object[exc.start])] > except KeyError: > return codecs.xmlcharrefreplace_errors(exc) > return replace, exc.start + 1 > > codecs.register_error('htmlcharrefreplace', htmlcharrefreplace_errors) > > Even if do not register this handler from the start, it may be worth to provide > htmlcharrefreplace_errors() in the html or html.entities module. +1 on that one as well :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 11 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 20 days to go 2013-07-16: Python Meeting Duesseldorf ... 35 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From storchaka at gmail.com Tue Jun 11 17:29:45 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 11 Jun 2013 18:29:45 +0300 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: Message-ID: 11.06.13 17:49, Serhiy Storchaka ???????(??): > I propose to add "htmlcharrefreplace" error handler which is similar to > "xmlcharrefreplace" error handler but use html entity names if possible. Or it should be named "htmlentityreplace"? From mal at egenix.com Tue Jun 11 17:38:06 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 11 Jun 2013 17:38:06 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: Message-ID: <51B7445E.8020607@egenix.com> On 11.06.2013 17:29, Serhiy Storchaka wrote: > 11.06.13 17:49, Serhiy Storchaka ???????(??): >> I propose to add "htmlcharrefreplace" error handler which is similar to >> "xmlcharrefreplace" error handler but use html entity names if possible. > > Or it should be named "htmlentityreplace"? Yes, since that's the more accurate and intuitive name. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 11 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 20 days to go 2013-07-16: Python Meeting Duesseldorf ... 35 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From steve at pearwood.info Tue Jun 11 17:47:54 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 12 Jun 2013 01:47:54 +1000 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: References: <51B71968.3030700@egenix.com> <51B71FAB.1040605@pearwood.info> Message-ID: <51B746AA.8050605@pearwood.info> On 11/06/13 23:32, Victor Stinner wrote: > 2013/6/11 Steven D'Aprano : >> +1 on namereplace. unicodenamereplace is unnecessary, since strings in >> Python 3 are Unicode. Might as well say "stringnamereplace" :-) > > Names come from the *Unicode* standard and the *unicode*data module. Well I should hope so. Where else would they come from? The TRON[1] standard? *wink* My point is that there is no need for a long, verbose name in this instance. Take, for example, the backslashescape handler. We don't make it explicit that it is the *Python* backslash escape (rather than, say, Java backslash escapes[2]), because that is the obvious and expected system to use. Some day, if another handler is added that uses Java escapes, it should get the longer, more explicit name: javabackslashescape. Perl's Larry Wall is fond of talking about "Huffman coding" language features. Common features should be short: len rather than length. Uncommon features can be longer. Since we're more likely to want Python backslash escapes than Java ones, the Python system gets the shorter name. Contrariwise, we have a single character reference replacement handler. In this case, it isn't obvious which system of character references will be used: XML, HTML, TeX, something else? So it needs to be specified explicitly: xmlcharrefreplace. I believe that *Unicode* names is sufficiently obvious that it does not need to be explicitly stated in the handler name. If, some day, another set of name replacements (say, HTML character entity names) is added, that can be given the more verbose name. So I'm +1 on calling it simply "namereplace". But regardless of the handler name, this is a great suggestion and I will definitely find it useful. [1] https://en.wikipedia.org/wiki/TRON_(encoding) [2] I believe Java does not support \a, \v or \0. -- Steven From steve at pearwood.info Tue Jun 11 17:52:54 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 12 Jun 2013 01:52:54 +1000 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <51B7445E.8020607@egenix.com> References: <51B7445E.8020607@egenix.com> Message-ID: <51B747D6.2020605@pearwood.info> On 12/06/13 01:38, M.-A. Lemburg wrote: > On 11.06.2013 17:29, Serhiy Storchaka wrote: >> 11.06.13 17:49, Serhiy Storchaka ???????(??): >>> I propose to add "htmlcharrefreplace" error handler which is similar to >>> "xmlcharrefreplace" error handler but use html entity names if possible. >> >> Or it should be named "htmlentityreplace"? > > Yes, since that's the more accurate and intuitive name. Intuitive, perhaps, but I'm not sure about accurate. According to Wikipedia: [quote] Although in popular usage character references are often called "entity references" or even "entities", this usage is wrong.[citation needed] A character reference is a reference to a character, not to an entity. Entity reference refers to the content of a named entity. An entity declaration is created by using the syntax in a document type definition (DTD) or XML schema. [end quote] https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references -- Steven From python at mrabarnett.plus.com Tue Jun 11 18:11:22 2013 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 11 Jun 2013 17:11:22 +0100 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> <51B48BA3.70502@pearwood.info> <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51B74C2A.3080908@mrabarnett.plus.com> On 09/06/2013 17:13, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > > But Python goes much farther. float('?.?') also returns 2.1 (not to > > > mention that int('??') returns 21). > > > > Yes. And why is this a problem? There is no ambiguity. It might > > look untidy to be mixing Arab and Thai numerals in the same number, > > but it is still well-defined. > > To whom? Unicode didacts, maybe, but I doubt there are any real users > who would consider that well-defined. So the same arguments you made > for not permitting non-ASCII numerals in Python source code apply > here, although they are somewhat weaker when applied to numeric data > expressed as text. > > In any case, there's not really that much use for this generality of > numerals. On the one hand, I think these days anyone who uses > information technology is fluent in ASCII numeration. On the other, > if you want to allow people to write in other scripts, you probably > are dealing with "naive" users who should be allowed to use grouping > characters and the usual conventions for their locale, and int () and > float() just aren't good enough anyway. > I was thinking that we could also add a function for numeric translation/transliteration somewhere: >>> # Translate toBengali >>> translate_number('0123456789', 'bengali') '??????????' >>> # Translate to Oriya >>> translate_number('0123456789', 'oriya') '??????????' >>> # Defaults to translating to ASCII range >>> translate_number('??????????') '0123456789' Non-numeric strings and mixed scripts would raise an exception. From steve at pearwood.info Tue Jun 11 18:22:25 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 12 Jun 2013 02:22:25 +1000 Subject: [Python-ideas] Add \e escape code Message-ID: <51B74EC1.4080508@pearwood.info> Should Python support \e in strings for the ESC control character, ASCII 0x1B (27)? \e is supported by Perl, PHP, Ruby, and although it is not standard C, gcc: http://gcc.gnu.org/onlinedocs/gcc-3.1/gcc/Character-Escapes.html I have the Python Pocket Reference, by Mark Lutz, 1st Edition from 1998, which lists \e as a string escape code. I don't know if that was a mistake, or if \e used to be supported prior to 1.5 but was then removed. -- Steven From abarnert at yahoo.com Tue Jun 11 18:19:34 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 11 Jun 2013 09:19:34 -0700 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: References: Message-ID: <1A09ABC1-D063-4C97-B765-8EBF9595633D@yahoo.com> On Jun 11, 2013, at 4:21, Serhiy Storchaka wrote: > I propose to add "namereplace" error handler which is similar to "backslashreplace" error handler but use \N escapes instead of \x/\u/\U escapes. > > >>> '?1'.encode('ascii', 'backslashreplace') > b'\\u22121' > >>> '?1'.encode('ascii', 'namereplace') > b'\\N{MINUS SIGN}1' > > In some cases such representation is more readable. > > What are you think about this? Are there suggestions for better name? I believe \N escape sequences are called "name escape", and the official description of the semantics in the docs is "Character named name in the Unicode database", so "namereplace" seems like a perfect name for this error handler. From mal at egenix.com Tue Jun 11 18:33:22 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 11 Jun 2013 18:33:22 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <51B747D6.2020605@pearwood.info> References: <51B7445E.8020607@egenix.com> <51B747D6.2020605@pearwood.info> Message-ID: <51B75152.5090708@egenix.com> On 11.06.2013 17:52, Steven D'Aprano wrote: > On 12/06/13 01:38, M.-A. Lemburg wrote: >> On 11.06.2013 17:29, Serhiy Storchaka wrote: >>> 11.06.13 17:49, Serhiy Storchaka ???????(??): >>>> I propose to add "htmlcharrefreplace" error handler which is similar to >>>> "xmlcharrefreplace" error handler but use html entity names if possible. >>> >>> Or it should be named "htmlentityreplace"? >> >> Yes, since that's the more accurate and intuitive name. > > Intuitive, perhaps, but I'm not sure about accurate. According to Wikipedia: > > [quote] > Although in popular usage character references are often called "entity references" or even > "entities", this usage is wrong.[citation needed] A character reference is a reference to a > character, not to an entity. Entity reference refers to the content of a named entity. An entity > declaration is created by using the syntax in a document type definition > (DTD) or XML schema. > [end quote] > > > https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references I think the HTML standard is the correct reference here, not some "citation needed" comment ;-) In HTML4, the official name is "character entity references". http://www.w3.org/TR/1998/REC-html40-19980424/charset.html#h-5.3.2 In the HTML5 draft they are now called "named character references". http://www.w3.org/TR/html5/syntax.html#character-references The Python module is called html.entities, so let's stick with that. BTW: Just like with the Unicode names, a lot of code points outside the ASCII range do not have a character entity reference. I guess those should be replaced with numeric character references: http://www.w3.org/TR/1998/REC-html40-19980424/charset.html#h-5.3.1 Note: It's not clear whether HTML allows numeric character references outside the base plane. In theory it should be possible, but whether browsers and other tools can actually handle non-BMP 𝒞 is not obvious. It works in recent Firefox and SeaMonkey. Some examples: http://stackoverflow.com/questions/5567249/what-are-the-most-common-non-bmp-unicode-characters-in-actual-use -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 11 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 20 days to go 2013-07-16: Python Meeting Duesseldorf ... 35 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ethan at stoneleaf.us Tue Jun 11 18:35:13 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 11 Jun 2013 09:35:13 -0700 Subject: [Python-ideas] Add \e escape code In-Reply-To: <51B74EC1.4080508@pearwood.info> References: <51B74EC1.4080508@pearwood.info> Message-ID: <51B751C1.9060503@stoneleaf.us> On 06/11/2013 09:22 AM, Steven D'Aprano wrote: > Should Python support \e in strings for the ESC control character, ASCII 0x1B (27)? > > \e is supported by Perl, PHP, Ruby, and although it is not standard C, gcc: > > http://gcc.gnu.org/onlinedocs/gcc-3.1/gcc/Character-Escapes.html > > I have the Python Pocket Reference, by Mark Lutz, 1st Edition from 1998, which lists \e as a string escape code. I don't > know if that was a mistake, or if \e used to be supported prior to 1.5 but was then removed. +1 to have it (back?) -- ~Ethan~ From abarnert at yahoo.com Tue Jun 11 18:36:35 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 11 Jun 2013 09:36:35 -0700 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <51B74C2A.3080908@mrabarnett.plus.com> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> <51B48BA3.70502@pearwood.info> <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> <51B74C2A.3080908@mrabarnett.plus.com> Message-ID: On Jun 11, 2013, at 9:11, MRAB wrote: > On 09/06/2013 17:13, Stephen J. Turnbull wrote: >> Steven D'Aprano writes: >> >> > > But Python goes much farther. float('?.?') also returns 2.1 (not to >> > > mention that int('??') returns 21). >> > >> > Yes. And why is this a problem? There is no ambiguity. It might >> > look untidy to be mixing Arab and Thai numerals in the same number, >> > but it is still well-defined. >> >> To whom? Unicode didacts, maybe, but I doubt there are any real users >> who would consider that well-defined. So the same arguments you made >> for not permitting non-ASCII numerals in Python source code apply >> here, although they are somewhat weaker when applied to numeric data >> expressed as text. >> >> In any case, there's not really that much use for this generality of >> numerals. On the one hand, I think these days anyone who uses >> information technology is fluent in ASCII numeration. On the other, >> if you want to allow people to write in other scripts, you probably >> are dealing with "naive" users who should be allowed to use grouping >> characters and the usual conventions for their locale, and int () and >> float() just aren't good enough anyway. > I was thinking that we could also add a function for numeric > translation/transliteration somewhere: > >>>> # Translate toBengali >>>> translate_number('0123456789', 'bengali') > '??????????' >>>> # Translate to Oriya >>>> translate_number('0123456789', 'oriya') > '??????????' > >>> # Defaults to translating to ASCII range >>>> translate_number('??????????') > '0123456789' > > Non-numeric strings and mixed scripts would raise an exception. I like this, but I'm not sure how to completely specify it, or how to describe it. What does translate_number('-1.2e+3', 'oriya') return? Or '2j'? What about translate_number('????') or '2?3?'? Or does it only handle Arabic-style place-value numerals? If this is meant to solve the problem with "naive users", does it handle grouping characters as well, even though they can't be used with int or float? Or locale decimal points? Or parens for negatives? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jun 11 18:17:27 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 11 Jun 2013 09:17:27 -0700 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: References: Message-ID: <51B74D97.6060701@stoneleaf.us> On 06/11/2013 04:21 AM, Serhiy Storchaka wrote: > I propose to add "namereplace" error handler which is similar to "backslashreplace" error handler but use \N escapes > instead of \x/\u/\U escapes. > >>>> '?1'.encode('ascii', 'backslashreplace') > b'\\u22121' >>>> '?1'.encode('ascii', 'namereplace') > b'\\N{MINUS SIGN}1' > > In some cases such representation is more readable. > > What are you think about this? Are there suggestions for better name? +1 for the idea and the name -- ~Ethan~ From masklinn at masklinn.net Tue Jun 11 18:50:55 2013 From: masklinn at masklinn.net (Masklinn) Date: Tue, 11 Jun 2013 18:50:55 +0200 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: References: <51B71968.3030700@egenix.com> <51B71FAB.1040605@pearwood.info> <6603741B-21EA-4DA7-8E8F-0548FBE8C3C2@masklinn.net> Message-ID: <94837DA6-39FA-4B69-9C64-15FCD70D3741@masklinn.net> On 2013-06-11, at 16:51 , Serhiy Storchaka wrote: > 11.06.13 17:40, Masklinn ???????(??): >> I'm not aware of other >> systems which would allow *naming* all of these codepoints outside of >> unicode. "namereplace" feels short, sweet and clear. > > HTML, (La)TeX? There are only 252 HTML entity names, which is obviously insufficient. Named latex symbols are more extensive (at 2826 according to the reference I found) but still woefully inadequate. Furthermore one would expect such domain-specific (but not python-specific) replacements to be patterned after xmlcharrefreplace. From ethan at stoneleaf.us Tue Jun 11 18:18:23 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 11 Jun 2013 09:18:23 -0700 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: Message-ID: <51B74DCF.2070806@stoneleaf.us> On 06/11/2013 07:49 AM, Serhiy Storchaka wrote: > I propose to add "htmlcharrefreplace" error handler which is similar to "xmlcharrefreplace" error handler but use html > entity names if possible. > >>>> '? x??'.encode('ascii', 'xmlcharrefreplace') > b'∀ x∈ℜ' >>>> '? x??'.encode('ascii', 'htmlcharrefreplace') > b'∀ x∈ℜ' > > Possible implementation: > > import codecs > from html.entities import codepoint2name > > def htmlcharrefreplace_errors(exc): > if not isinstance(exc, UnicodeEncodeError): > raise exc > try: > replace = r'&%s;' % codepoint2name[ord(exc.object[exc.start])] > except KeyError: > return codecs.xmlcharrefreplace_errors(exc) > return replace, exc.start + 1 > > codecs.register_error('htmlcharrefreplace', htmlcharrefreplace_errors) > > Even if do not register this handler from the start, it may be worth to provide htmlcharrefreplace_errors() in the html > or html.entities module. +1 for the idea and the name of 'htmlcharrefreplace'. -- ~Ethan~ From python at mrabarnett.plus.com Tue Jun 11 19:02:57 2013 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 11 Jun 2013 18:02:57 +0100 Subject: [Python-ideas] Add \e escape code In-Reply-To: <51B74EC1.4080508@pearwood.info> References: <51B74EC1.4080508@pearwood.info> Message-ID: <51B75841.3060309@mrabarnett.plus.com> On 11/06/2013 17:22, Steven D'Aprano wrote: > Should Python support \e in strings for the ESC control character, > ASCII 0x1B (27)? > > \e is supported by Perl, PHP, Ruby, and although it is not standard > C, gcc: > > http://gcc.gnu.org/onlinedocs/gcc-3.1/gcc/Character-Escapes.html > > I have the Python Pocket Reference, by Mark Lutz, 1st Edition from > 1998, which lists \e as a string escape code. I don't know if that > was a mistake, or if \e used to be supported prior to 1.5 but was > then removed. > I wouldn't say no, but, on the other hand, it has been a very long time since I personally have used it. In other words, is there sufficient demand for it? From steve at pearwood.info Tue Jun 11 19:10:35 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 12 Jun 2013 03:10:35 +1000 Subject: [Python-ideas] Add \e escape code In-Reply-To: <51B75841.3060309@mrabarnett.plus.com> References: <51B74EC1.4080508@pearwood.info> <51B75841.3060309@mrabarnett.plus.com> Message-ID: <51B75A0B.6090001@pearwood.info> On 12/06/13 03:02, MRAB wrote: > On 11/06/2013 17:22, Steven D'Aprano wrote: >> Should Python support \e in strings for the ESC control character, >> ASCII 0x1B (27)? >> >> \e is supported by Perl, PHP, Ruby, and although it is not standard >> C, gcc: >> >> http://gcc.gnu.org/onlinedocs/gcc-3.1/gcc/Character-Escapes.html >> >> I have the Python Pocket Reference, by Mark Lutz, 1st Edition from >> 1998, which lists \e as a string escape code. I don't know if that >> was a mistake, or if \e used to be supported prior to 1.5 but was >> then removed. >> > I wouldn't say no, but, on the other hand, it has been a very long time > since I personally have used it. In other words, is there sufficient > demand for it? I use it often enough to miss having \e, but not often enough to remember what the hex or octal code for ESC is. -- Steven From storchaka at gmail.com Tue Jun 11 19:56:34 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 11 Jun 2013 20:56:34 +0300 Subject: [Python-ideas] Add \e escape code In-Reply-To: <51B75A0B.6090001@pearwood.info> References: <51B74EC1.4080508@pearwood.info> <51B75841.3060309@mrabarnett.plus.com> <51B75A0B.6090001@pearwood.info> Message-ID: 11.06.13 20:10, Steven D'Aprano ???????(??): > On 12/06/13 03:02, MRAB wrote: >> I wouldn't say no, but, on the other hand, it has been a very long time >> since I personally have used it. In other words, is there sufficient >> demand for it? > > I use it often enough to miss having \e, but not often enough to > remember what the hex or octal code for ESC is. In such case perhaps \N{ESCAPE} is more helpful for you. From alexander.belopolsky at gmail.com Tue Jun 11 20:12:58 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 11 Jun 2013 14:12:58 -0400 Subject: [Python-ideas] Add \e escape code In-Reply-To: <51B75A0B.6090001@pearwood.info> References: <51B74EC1.4080508@pearwood.info> <51B75841.3060309@mrabarnett.plus.com> <51B75A0B.6090001@pearwood.info> Message-ID: On Tue, Jun 11, 2013 at 1:10 PM, Steven D'Aprano wrote: > > I use it often enough to miss having \e, but not often enough to remember > what the hex or octal code for ESC is. +1 ANSI colors at the shell prompt are getting popular again these days. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Tue Jun 11 20:16:39 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 11 Jun 2013 20:16:39 +0200 Subject: [Python-ideas] Add \e escape code In-Reply-To: References: <51B74EC1.4080508@pearwood.info> <51B75841.3060309@mrabarnett.plus.com> <51B75A0B.6090001@pearwood.info> Message-ID: <51B76987.9080106@egenix.com> On 11.06.2013 20:12, Alexander Belopolsky wrote: > On Tue, Jun 11, 2013 at 1:10 PM, Steven D'Aprano wrote: > >> >> I use it often enough to miss having \e, but not often enough to remember >> what the hex or octal code for ESC is. > > > +1 > > ANSI colors at the shell prompt are getting popular again these days. How would you add a new escape character in a backwards compatible way ? Adding new escape characters is not easy, since Python defaults to passing them through as-is, e.g. '\e' == '\\e'. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 11 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 20 days to go 2013-07-16: Python Meeting Duesseldorf ... 35 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From joshua.landau.ws at gmail.com Tue Jun 11 20:29:10 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Tue, 11 Jun 2013 19:29:10 +0100 Subject: [Python-ideas] Add \e escape code In-Reply-To: References: <51B74EC1.4080508@pearwood.info> <51B75841.3060309@mrabarnett.plus.com> <51B75A0B.6090001@pearwood.info> Message-ID: On 11 June 2013 19:12, Alexander Belopolsky wrote: > ANSI colors at the shell prompt are getting popular again these days. I just use: ### CODE ### """ Adapted from pygments.console Format colored console output. :copyright: Copyright 2006-2009 by the Pygments team, see AUTHORS. :license: BSD, see LICENSE for details. """ def load(): global codes codes = { "reset": "\N{ESCAPE}[39;49;00m", "bold": "\N{ESCAPE}[01m", "faint": "\N{ESCAPE}[02m", "standout": "\N{ESCAPE}[03m", "underline": "\N{ESCAPE}[04m", "blink": "\N{ESCAPE}[05m", "overline": "\N{ESCAPE}[06m" } dark_colors = "black", "darkred", "darkgreen", "brown", "darkblue", "purple", "teal", "lightgray" light_colors = "darkgray", "red", "green", "yellow", "blue", "fuchsia", "turquoise", "white" for i, (dark, light) in enumerate(zip(dark_colors, light_colors), 30): codes[dark] = "\N{ESCAPE}[{}m" .format(i) codes[light] = "\N{ESCAPE}[{};01m".format(i) codes["darkteal"] = codes["turquoise"] codes["darkyellow"] = codes["brown"] codes["fuscia"] = codes["fuchsia"] codes["white"] = codes["bold"] def unload(): global codes codes = {code:"" for code in codes} load() ### END CODE ### And then write stuff like: "{yellow}This is yellow!{reset} {bold}And{reset} this is not!".format(**codes) From rosuav at gmail.com Tue Jun 11 20:36:50 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 12 Jun 2013 04:36:50 +1000 Subject: [Python-ideas] Add \e escape code In-Reply-To: <51B76987.9080106@egenix.com> References: <51B74EC1.4080508@pearwood.info> <51B75841.3060309@mrabarnett.plus.com> <51B75A0B.6090001@pearwood.info> <51B76987.9080106@egenix.com> Message-ID: On Wed, Jun 12, 2013 at 4:16 AM, M.-A. Lemburg wrote: > How would you add a new escape character in a backwards compatible > way ? > > Adding new escape characters is not easy, since Python defaults > to passing them through as-is, e.g. '\e' == '\\e'. My understanding of [1] is that the feature of keeping them unchanged is meant to be a debugging aid, not something you depend on. Use of unescaped backslashes in non-raw string literals is already dangerous to editing (if someone changes your Windows path name to c:\testing, your code is broken). If a new version breaks someone's non-raw literal "c:\everything", it was already broken. [1] http://docs.python.org/3/reference/lexical_analysis.html#literals ChrisA From python at mrabarnett.plus.com Tue Jun 11 20:49:48 2013 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 11 Jun 2013 19:49:48 +0100 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> <51B48BA3.70502@pearwood.info> <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> <51B74C2A.3080908@mrabarnett.plus.com> Message-ID: <51B7714C.7090506@mrabarnett.plus.com> On 11/06/2013 17:36, Andrew Barnert wrote: > > On Jun 11, 2013, at 9:11, MRAB > wrote: > >> On 09/06/2013 17:13, Stephen J. Turnbull wrote: >>> Steven D'Aprano writes: >>> >>> > > But Python goes much farther. float('?.?') also returns 2.1 (not to >>> > > mention that int('??') returns 21). >>> > >>> > Yes. And why is this a problem? There is no ambiguity. It might >>> > look untidy to be mixing Arab and Thai numerals in the same number, >>> > but it is still well-defined. >>> >>> To whom? Unicode didacts, maybe, but I doubt there are any real users >>> who would consider that well-defined. So the same arguments you made >>> for not permitting non-ASCII numerals in Python source code apply >>> here, although they are somewhat weaker when applied to numeric data >>> expressed as text. >>> >>> In any case, there's not really that much use for this generality of >>> numerals. On the one hand, I think these days anyone who uses >>> information technology is fluent in ASCII numeration. On the other, >>> if you want to allow people to write in other scripts, you probably >>> are dealing with "naive" users who should be allowed to use grouping >>> characters and the usual conventions for their locale, and int () and >>> float() just aren't good enough anyway. >>> >> I was thinking that we could also add a function for numeric >> translation/transliteration somewhere: >> >>>>> # Translate toBengali >>>>> translate_number('0123456789', 'bengali') >> '??????????' >>>>> # Translate to Oriya >>>>> translate_number('0123456789', 'oriya') >> '??????????' >> >>> # Defaults to translating to ASCII range >>>>> translate_number('??????????') >> '0123456789' >> >> Non-numeric strings and mixed scripts would raise an exception. > > I like this, but I'm not sure how to completely specify it, or how to > describe it. > > What does translate_number('-1.2e+3', 'oriya') return? Or '2j'? > > What about translate_number('? %8C>??? ') or '2?3 > ? '? Or does it only > handle Arabic-style place-value numerals? > > If this is meant to solve the problem with "naive users", does it handle > grouping characters as well, even though they can't be used with int or > float? Or locale decimal points? Or parens for negatives? > I suppose that in that case it might be better to treat it as a form of encoding/decoding, though with locale-specific forms instead of bytes. You'd decode on input and encode on output. From joshua.landau.ws at gmail.com Tue Jun 11 20:59:07 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Tue, 11 Jun 2013 19:59:07 +0100 Subject: [Python-ideas] Add \e escape code In-Reply-To: References: <51B74EC1.4080508@pearwood.info> <51B75841.3060309@mrabarnett.plus.com> <51B75A0B.6090001@pearwood.info> <51B76987.9080106@egenix.com> Message-ID: On 11 June 2013 19:36, Chris Angelico wrote: > On Wed, Jun 12, 2013 at 4:16 AM, M.-A. Lemburg wrote: >> How would you add a new escape character in a backwards compatible >> way ? >> >> Adding new escape characters is not easy, since Python defaults >> to passing them through as-is, e.g. '\e' == '\\e'. > > My understanding of [1] is that the feature of keeping them unchanged > is meant to be a debugging aid, not something you depend on. Use of > unescaped backslashes in non-raw string literals is already dangerous > to editing (if someone changes your Windows path name to c:\testing, > your code is broken). If a new version breaks someone's non-raw > literal "c:\everything", it was already broken. > > [1] http://docs.python.org/3/reference/lexical_analysis.html#literals Reading that I see no specific mention that an auto-escaped backslash is incorrect -- just that it is useful for debugging. I have also read that they were put in place largely to make it difficult to add new escapes to prevent an unwanted proliferation of them. I see no reason for a quick "\e" where "\N{ESCAPE}" works and whenever you'll want a lot there's a weird circumstance involved better served by a quick library (as I've done above). From jimjjewett at gmail.com Tue Jun 11 21:26:26 2013 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 11 Jun 2013 15:26:26 -0400 Subject: [Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts In-Reply-To: References: <51B3606E.6050305@mrabarnett.plus.com> <87r4gc8oro.fsf@uwakimon.sk.tsukuba.ac.jp> <87ehcb8dge.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Jun 9, 2013 at 3:59 PM, Yuval Greenfield wrote: > On Sun, Jun 9, 2013 at 7:18 PM, Stephen J. Turnbull > wrote: >> >> Yuval Greenfield writes: >> > Personally I favor the first because more often than not files >> > aren't encoded in the platform's chosen encoding, so it's better to >> > be explicit and consistent. I'm guessing that the exceptions fit into two categories: (1) They came from some other system, likely as a saved web page. or (2) They were written by program X, which ignored the system default in favor of a "better" explicit choice. Which might actually be better, so long as you don't try to use them outside of that program. > Hebrew compatibility has been the nuisance and these are the encodings I had > to fight: > utf-8, ucs-2, utf-16, ucs-4, ISO-8859-8, ISO-8859-8-I, Windows-1255. > It's plagued websites, browsers, email clients, Unfortunately, even an explicitly declared language and character set is likely to be false. Best results were obtained by (sometimes) ignoring or overriding the explicit definitions, but the precise details on how to do this changed over time. The last time I had looked it up in the HTML5 draft, there were explicit "browser-specific heuristics" steps. Today, the majority of encoding determination has been split off into its own standard ( http://encoding.spec.whatwg.org/ -- last updated in February 2013) which warns that: "In violation of section 1.4 of Unicode Technical Standard #22 this is a much simpler and more restrictive matching algorithm, as that is found to be necessary to be compatible with deployed content." The main html standard does still define how to parse a meta charset element (because that is internal to a document) at http://www.w3.org/html/wg/drafts/html/master/infrastructure.html#extracting-character-encodings-from-meta-elements but explicitly warns that this is slightly different from even the HTTP standard. ISO-8859-8 and ISO-8859-8-I even get a special mention. Which is a long-winded way of saying "Sometimes the encoding will be wrong." If you could enforce utf-8, you would be fine -- but if you could do that, then it would already have been the system default. > So I appreciate an app being consistent and promoting utf-8 more than being > compliant with the operating system, which the apps I've used don't comply > with. So go ahead and explicitly use utf-8 when writing a file, and then use it again when reading. And the explicit use will advertise the name "utf-8" as a good thing. > Another related annoyance I struggled with recently was that git gives you > the platform's newline scheme, which means I can't have a git repository in > a dropbox shared between Windows and Ubuntu without meddling with this stuff > (the solution is a repo config file). Would you rather have spurious changes as the newline convention went back and forth, depending on who edited it last? -jJ From mal at egenix.com Tue Jun 11 22:21:48 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 11 Jun 2013 22:21:48 +0200 Subject: [Python-ideas] Add \e escape code In-Reply-To: References: <51B74EC1.4080508@pearwood.info> <51B75841.3060309@mrabarnett.plus.com> <51B75A0B.6090001@pearwood.info> <51B76987.9080106@egenix.com> Message-ID: <51B786DC.7080202@egenix.com> On 11.06.2013 20:36, Chris Angelico wrote: > On Wed, Jun 12, 2013 at 4:16 AM, M.-A. Lemburg wrote: >> How would you add a new escape character in a backwards compatible >> way ? >> >> Adding new escape characters is not easy, since Python defaults >> to passing them through as-is, e.g. '\e' == '\\e'. > > My understanding of [1] is that the feature of keeping them unchanged > is meant to be a debugging aid, not something you depend on. Use of > unescaped backslashes in non-raw string literals is already dangerous > to editing (if someone changes your Windows path name to c:\testing, > your code is broken). If a new version breaks someone's non-raw > literal "c:\everything", it was already broken. > > [1] http://docs.python.org/3/reference/lexical_analysis.html#literals I'm not saying that it's useful to rely on Python's behavior, only that any such change has the potential to break perfectly working code. The last time we changed the escape code was for the introduction of Unicode string literals. That was 13 years ago. And we carefully checked what other languages were using for Unicode at the time. Given that \e only saves you two key strokes (\033 and \x1b are the usual ways to write ESC in ASCII strings), I think the ratio between usefulness and potential breakage is not in favor of an addition. BTW: pylint detects such unsupported escape codes: http://docs.pylint.org/features.html?highlight=w1401#string-constant-checker -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 11 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 20 days to go 2013-07-16: Python Meeting Duesseldorf ... 35 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From alexander.belopolsky at gmail.com Tue Jun 11 23:28:20 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 11 Jun 2013 17:28:20 -0400 Subject: [Python-ideas] Add \e escape code In-Reply-To: References: <51B74EC1.4080508@pearwood.info> <51B75841.3060309@mrabarnett.plus.com> <51B75A0B.6090001@pearwood.info> Message-ID: When I sent my +1, I was under impression that \e was standardized by POSIX. (I only checked man printf.) It turns out, < http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap05.html#tagtcjh_2>, that \e is not on the list. While I wish I could copy my bash ANSI coloring codes directly into python, I can live with the available alternatives. Consider me +0. On Tue, Jun 11, 2013 at 2:12 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Tue, Jun 11, 2013 at 1:10 PM, Steven D'Aprano wrote: > >> >> I use it often enough to miss having \e, but not often enough to remember >> what the hex or octal code for ESC is. > > > +1 > > ANSI colors at the shell prompt are getting popular again these days. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Jun 12 00:41:17 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 11 Jun 2013 15:41:17 -0700 (PDT) Subject: [Python-ideas] Add \e escape code In-Reply-To: References: <51B74EC1.4080508@pearwood.info> <51B75841.3060309@mrabarnett.plus.com> <51B75A0B.6090001@pearwood.info> Message-ID: <1370990477.98119.YahooMailNeo@web184701.mail.ne1.yahoo.com> From: Alexander Belopolsky Sent: Tuesday, June 11, 2013 2:28 PM >While I wish I could copy my bash ANSI coloring codes directly into python, I can live with the available alternatives. ?Consider me +0. When copying and pasting a big mess of preformatted strings, you generally have to tack a .replace('\\e', '\033') onto the end (or, if it's a list of separate strings, map that over the list). Mildly annoying, but it doesn't come up often enough to be worth changing things. And for any case _other_ than copying and pasting a big mess of pre-formatted strings into Python code, you're much better off with a library or a format dict (like the one Joshua Landau just posted). So, I'm -0. From abarnert at yahoo.com Wed Jun 12 01:13:34 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 11 Jun 2013 16:13:34 -0700 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <51B7714C.7090506@mrabarnett.plus.com> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> <51B48BA3.70502@pearwood.info> <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> <51B74C2A.3080908@mrabarnett.plus.com> <51B7714C.7090506@mrabarnett.plus.com> Message-ID: <875F02B9-3E63-412E-96E9-349C8A04FF95@yahoo.com> On Jun 11, 2013, at 11:49, MRAB wrote: > On 11/06/2013 17:36, Andrew Barnert wrote: >> >> On Jun 11, 2013, at 9:11, MRAB > > wrote: >> >>> On 09/06/2013 17:13, Stephen J. Turnbull wrote: >>>> Steven D'Aprano writes: >>>> >>>> > > But Python goes much farther. float('?.?') also returns 2.1 (not to >>>> > > mention that int('??') returns 21). >>>> > >>>> > Yes. And why is this a problem? There is no ambiguity. It might >>>> > look untidy to be mixing Arab and Thai numerals in the same number, >>>> > but it is still well-defined. >>>> >>>> To whom? Unicode didacts, maybe, but I doubt there are any real users >>>> who would consider that well-defined. So the same arguments you made >>>> for not permitting non-ASCII numerals in Python source code apply >>>> here, although they are somewhat weaker when applied to numeric data >>>> expressed as text. >>>> >>>> In any case, there's not really that much use for this generality of >>>> numerals. On the one hand, I think these days anyone who uses >>>> information technology is fluent in ASCII numeration. On the other, >>>> if you want to allow people to write in other scripts, you probably >>>> are dealing with "naive" users who should be allowed to use grouping >>>> characters and the usual conventions for their locale, and int () and >>>> float() just aren't good enough anyway. >>> I was thinking that we could also add a function for numeric >>> translation/transliteration somewhere: >>> >>>>>> # Translate toBengali >>>>>> translate_number('0123456789', 'bengali') >>> '??????????' >>>>>> # Translate to Oriya >>>>>> translate_number('0123456789', 'oriya') >>> '??????????' >>> >>> # Defaults to translating to ASCII range >>>>>> translate_number('??????????') >>> '0123456789' >>> >>> Non-numeric strings and mixed scripts would raise an exception. >> >> I like this, but I'm not sure how to completely specify it, or how to >> describe it. >> >> What does translate_number('-1.2e+3', 'oriya') return? Or '2j'? >> >> What about translate_number('? > %8C>??? ') or '2?3 >> ? '? Or does it only >> handle Arabic-style place-value numerals? >> >> If this is meant to solve the problem with "naive users", does it handle >> grouping characters as well, even though they can't be used with int or >> float? Or locale decimal points? Or parens for negatives? > I suppose that in that case it might be better to treat it as a form of > encoding/decoding, though with locale-specific forms instead of bytes. > You'd decode on input and encode on output. I like that. When you use those terms it sounds a lot better than "translate", and it also definitely helps encourage the familiar pattern of using simple/canonical data internally and en/decoding at the edges. But I don't think it solves most of the problems. How do you encode 1.2E3 or 2j to Oriya? If ????, 2?3?, and 20030 all decode to 20030 from Japanese, which one do you encode to in the other direction? If you want to handle parens for negation, does that mean we have domain-specific locales like en-US-Accounting? In some cases, the answer is probably just "don't do that". But it's not obvious which ones. Coming up with a complete spec seems like a lot of work (and getting everyone to agree to it even more). If there's not an appropriate standard to adopt, I don't know that it's reasonable to do this. From stephen at xemacs.org Wed Jun 12 06:04:27 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 12 Jun 2013 13:04:27 +0900 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <875F02B9-3E63-412E-96E9-349C8A04FF95@yahoo.com> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> <51B48BA3.70502@pearwood.info> <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> <51B74C2A.3080908@mrabarnett.plus.com> <51B7714C.7090506@mrabarnett.plus.com> <875F02B9-3E63-412E-96E9-349C8A04FF95@yahoo.com> Message-ID: <87vc5k6kl0.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > If ????, 2?3?, and 20030 all decode to 20030 from Japanese Only the third does in a non-locale-specific way. The characters for "man" and "juu" have numeric values, but not decimal ones. > In some cases, the answer is probably just "don't do that". I think (for builtins) that's the answer for all non-ASCII cases.<0.5 wink/> From abarnert at yahoo.com Wed Jun 12 07:30:55 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 11 Jun 2013 22:30:55 -0700 (PDT) Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <87vc5k6kl0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> <51B48BA3.70502@pearwood.info> <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> <51B74C2A.3080908@mrabarnett.plus.com> <51B7714C.7090506@mrabarnett.plus.com> <875F02B9-3E63-412E-96E9-349C8A04FF95@yahoo.com> <87vc5k6kl0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1371015055.70791.YahooMailNeo@web184705.mail.ne1.yahoo.com> From: Stephen J. Turnbull Sent: Tuesday, June 11, 2013 9:04 PM > Andrew Barnert writes: You're replying out of context. I realize the thread is long and twisty, so let me summarize so you (and others) don't have to delve through all of it: MRAB suggested that maybe int and friends shouldn't do transliteration at all; it would be better to have a new "translate_number" function ("somewhere", not necessarily in builtins,but presumably in the stdlib). After a few issues were raised, he suggested that may this should be moved to?"a form of encoding/decoding, though locale-specific forms. You'd?decode on input and encode on output." In my email that you replied to, I was agreeing with that idea, but pointing out some examples that seem underspecified, and may be hard to specify. >> If ????, 2?3?, and 20030 all decode to 20030 from Japanese > > Only the third does in a non-locale-specific way.? The characters for > "man" and "juu" have numeric values, but not decimal ones. I brought up Japanese as one of my original examples specifically because it has common?numeric forms that aren't decimal. That's part of the reason?MRAB suggested that this is locale-specific functionality, not builtin functionality, which I agreed with. The point of reusing this example is that there are three _different_ such forms in one locale. Locale-decoding all of them to '20030' is easy, but locale-encoding '20030' is then a problem. Is there a relevant standard which says which of the three forms is canonical? If not, how do we decide what the encode function does??The same is true for the other examples I raised (how do you encode scientific format to Oriya, etc.). >> In some cases, the answer is probably just "don't do that". >? > I think (for builtins) that's the answer for all non-ASCII cases.<0.5? > wink/> In case it's not obvious: The point of adding locale-specific encode/decode functions is that builtins like int and float can then just deal with the traditional ASCII cases. My question is whether those locale-specific functions are well-specified. If they are, I'm +1 on adding them and keeping the builtins minimal. If they aren't, I'm -1 on trying to invent a brand-new specification for numeric representations as part of the Python stdlib. From storchaka at gmail.com Wed Jun 12 08:58:17 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 12 Jun 2013 09:58:17 +0300 Subject: [Python-ideas] Add \e escape code In-Reply-To: <51B786DC.7080202@egenix.com> References: <51B74EC1.4080508@pearwood.info> <51B75841.3060309@mrabarnett.plus.com> <51B75A0B.6090001@pearwood.info> <51B76987.9080106@egenix.com> <51B786DC.7080202@egenix.com> Message-ID: 11.06.13 23:21, M.-A. Lemburg ???????(??): > I'm not saying that it's useful to rely on Python's behavior, > only that any such change has the potential to break perfectly > working code. > > The last time we changed the escape code was for the introduction > of Unicode string literals. That was 13 years ago. And we carefully > checked what other languages were using for Unicode at the time. Note that in the same time \u means "Titlecase next character" and \U means "Uppercase till \E" in Perl. "Conforming with other languages" argument is not very suitable in this particular case. I'm -0.5. On one hand, I'm not worrying about backward compatibility, -- behavior of \e is not promised once for all, and there are no good reasons to use it (unlikely to backslashing non-letters). On other hands, it adds complexity (look at mess of dozens special escapes in Perl), the range of applications for this feature is enough narrow, and there are good alternatives (\x1b for shortness and \N{ESCAPE} for mnemonicity). From mal at egenix.com Wed Jun 12 09:24:32 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 12 Jun 2013 09:24:32 +0200 Subject: [Python-ideas] Add \e escape code In-Reply-To: References: <51B74EC1.4080508@pearwood.info> <51B75841.3060309@mrabarnett.plus.com> <51B75A0B.6090001@pearwood.info> <51B76987.9080106@egenix.com> <51B786DC.7080202@egenix.com> Message-ID: <51B82230.8070504@egenix.com> On 12.06.2013 08:58, Serhiy Storchaka wrote: > 11.06.13 23:21, M.-A. Lemburg ???????(??): >> I'm not saying that it's useful to rely on Python's behavior, >> only that any such change has the potential to break perfectly >> working code. >> >> The last time we changed the escape code was for the introduction >> of Unicode string literals. That was 13 years ago. And we carefully >> checked what other languages were using for Unicode at the time. > > Note that in the same time \u means "Titlecase next character" and \U means "Uppercase till \E" in > Perl. "Conforming with other languages" argument is not very suitable in this particular case. Perl was not available as reference, since they had started thinking about these things at the same time we did. Since Java was the first major language to implement Unicode at the time, we followed their approach. Turned out to be a good choice, since C++ added the same notation in C++11. http://en.cppreference.com/w/cpp/language/escape > I'm -0.5. On one hand, I'm not worrying about backward compatibility, -- behavior of \e is not > promised once for all, and there are no good reasons to use it (unlikely to backslashing > non-letters). On other hands, it adds complexity (look at mess of dozens special escapes in Perl), > the range of applications for this feature is enough narrow, and there are good alternatives (\x1b > for shortness and \N{ESCAPE} for mnemonicity). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 19 days to go 2013-07-16: Python Meeting Duesseldorf ... 34 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From greg.ewing at canterbury.ac.nz Wed Jun 12 03:47:04 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 12 Jun 2013 13:47:04 +1200 Subject: [Python-ideas] Add "namereplace" error handler In-Reply-To: References: <51B71968.3030700@egenix.com> <51B71FAB.1040605@pearwood.info> <6603741B-21EA-4DA7-8E8F-0548FBE8C3C2@masklinn.net> Message-ID: <51B7D318.8000605@canterbury.ac.nz> Serhiy Storchaka wrote: > 11.06.13 17:40, Masklinn ???????(??): > >> I'm not aware of other >> systems which would allow *naming* all of these codepoints outside of >> unicode. "namereplace" feels short, sweet and clear. > > HTML, (La)TeX? Blatantly hypergeneralising, namereplace("unicode") namereplace("html") namereplace("latex") etc. With an accompanying scheme for registering naming schemes. -- Greg From stephen at xemacs.org Wed Jun 12 10:53:36 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 12 Jun 2013 17:53:36 +0900 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <1371015055.70791.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> <51B48BA3.70502@pearwood.info> <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> <51B74C2A.3080908@mrabarnett.plus.com> <51B7714C.7090506@mrabarnett.plus.com> <875F02B9-3E63-412E-96E9-349C8A04FF95@yahoo.com> <87vc5k6kl0.fsf@uwakimon.sk.tsukuba.ac.jp> <1371015055.70791.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: <87li6f7lrj.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > MRAB suggested that maybe int and friends shouldn't do > transliteration at all; it would be better to have a new > "translate_number" function ("somewhere", not necessarily in > builtins,but presumably in the stdlib). I'm pretty sure I suggested that first, and I know I've suggested it on two or more different occasions. I don't care about credit, but it's pretty clear that nobody is paying much attention to what anybody else is writing. :-( > In my email that you replied to, I was agreeing with that idea, My apologies. I read the whole thing twice, but failed to grasp that. > >> If ????, 2?3?, and 20030 all decode to 20030 from Japanese > > > > Only the third does in a non-locale-specific way.? The characters for > > "man" and "juu" have numeric values, but not decimal ones. > > I brought up Japanese as one of my original examples specifically > because it has common?numeric forms that aren't decimal. That wasn't clear to me. In any case, my point is that such forms are clearly irrelevant to the arguments MAL and Steven d'Aprano (among others whose names I admit I've forgotten) present for "promiscuous" builtins. Those arguments specifically rely on the lack of ambiguity of the *decimal* values for digits, and deliberately ignore the issue of roundtripping. > The point of reusing this example is that there are three > _different_ such forms in one locale. Locale-decoding all of them > to '20030' is easy, but locale-encoding '20030' is then a > problem. Hardly. '20030' will always be understood, so the builtin str() is useful and often sufficient. Picking one of the other forms is application- (and often user-) dependent, just as representing the integer 20030 as an ASCII string is ambiguous. (Look how many format characters we devote to that one task! Heck, POSIX time is an integer, so we could probably make a case for including most of strftime(3)!) > My question is whether those locale-specific functions are > well-specified. They don't exist yet, so that's not an answerable question. My suggestion is that, as with any translation, we ask the native speakers for help. As with any case of ambiguity, we refuse to guess -- instead we provide multiple styles as suggested by the native- speaking consultants or requested by users (resources permitting, of course). Yeah, I know, it's terribly mendokusai[1], but we really don't have an alternative except to tell the users to do it themselves, because that's exactly what they will do if they want a particular style we don't provide. It just seems to me that it would be useful to provide a registry of styles that other people have already written, maybe on PyPI, maybe in the stdlib. If commonly used, it could become quite flexible and robust in a fairly short period of time. Footnotes: [1] Literally, "smells troublesome" in Japanese. From abarnert at yahoo.com Wed Jun 12 12:15:18 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 12 Jun 2013 03:15:18 -0700 (PDT) Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <87li6f7lrj.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> <51B48BA3.70502@pearwood.info> <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> <51B74C2A.3080908@mrabarnett.plus.com> <51B7714C.7090506@mrabarnett.plus.com> <875F02B9-3E63-412E-96E9-349C8A04FF95@yahoo.com> <87vc5k6kl0.fsf@uwakimon.sk.tsukuba.ac.jp> <1371015055.70791.YahooMailNeo@web184705.mail.ne1.yahoo.com> <87li6f7lrj.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1371032118.8452.YahooMailNeo@web184702.mail.ne1.yahoo.com> From: Stephen J. Turnbull Sent: Wednesday, June 12, 2013 1:53 AM > Andrew Barnert writes: > >> MRAB suggested that maybe int and friends shouldn't do >> transliteration at all; it would be better to have a new >> "translate_number" function ("somewhere", not > necessarily in >> builtins,but presumably in the stdlib). > > I'm pretty sure I suggested that first, and I know I've suggested it > on two or more different occasions.? I don't care about credit, but > it's pretty clear that nobody is paying much attention to what anybody > else is writing. :-( I don't know who suggested it first. But apparently you, MRAB, and I (among others) all agree that the best way to add locale-specific numeric functionality is outside of int and float, and that it should be explicitly locale-specific. (Slightly ironically, I'm actually +0.5 on the _original_ topic of this thread, handling Unicode minus signs, because that's something that has some precedent and pre-existing work, and it's really not locale-specific because it's being done with English/default-locale text. But it's become a thread about adding functionality to int and float to handle every numeric representation used anywhere in the world, and?I'm definitely -1 toward that.) >> My question is whether those locale-specific functions are >> well-specified. > > They don't exist yet, so that's not an answerable question. OK, let's say "feasibly well-specifiable". The reason I ask is simple:?If we could offer MRAB's locale-based functions for numeric forms, it would make it a lot easier to say that int and float don't need, and shouldn't get, any of those features. However, I'm not sure it actually is feasible to design those functions, much less write them.?It's a much larger problem than it initially appears. The work of gathering information from native users and domain specialists and putting together a design will be far more than the benefit. I could be wrong about that?maybe a lot of people are clamoring to be able to use Oriya numerals and I just never talk to the right people?but I don't _think_ I am. If a reasonable spec already exists as part of Unicode or another standard, or if someone builds a PyPI library that gains traction, great. If not, I think it's worth backing off the idea and making the tougher argument that it's worth leaving int and float alone even though the locale-based functions aren't even envisioned yet, and won't be in the near future. > My suggestion is that, as with any translation, we ask the native > speakers for help.? As with any case of ambiguity, we refuse to guess > -- instead we provide multiple styles as suggested by the native- > speaking consultants or requested by users (resources permitting, of > course). Sure, implementing the "is_reasonable(spec)" check that I alluded to above would be largely about?asking native speakers and domain specialists and so on, and so would designing a spec from scratch. > Yeah, I know, it's terribly mendokusai[1], but we really don't have an > alternative except to tell the users to do it themselves, because > that's exactly what they will do if they want a particular style we > don't provide.? It just seems to me that it would be useful to provide > a registry of styles that other people have already written, maybe on > PyPI, maybe in the stdlib.? If commonly used, it could become quite > flexible and robust in a fairly short period of time. > > Footnotes: > [1]? Literally, "smells troublesome" in Japanese. Sounds good.?If there are already a bunch of pre-existing solutions to parts of this problem, it would be worth gathering them up, factoring them out, and designing a?consistent?framework for a style registry. So? are there such solutions? From techtonik at gmail.com Wed Jun 12 21:32:14 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 12 Jun 2013 22:32:14 +0300 Subject: [Python-ideas] Suprocess functionality partitioning Message-ID: Something needs to be done about subprocess API, which got overly complicated. The idea is to have: shutil.run() - run command through shell, unsafe sys.run() - run command directly in operating system, ?safe Both should be API compatible (unblocking stdin/stdout read etc.). Currently, subprocess calls are unreadable in any Python code - many conditions makes its behaviour hard to predict and memorize. By partitioning shell and out-of-shell execution, the documentation will be easier to grasp and reference to. Maybe it will be even possible to add some 2D table of various subprocess states that affect behavior. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Wed Jun 12 21:44:55 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Wed, 12 Jun 2013 20:44:55 +0100 Subject: [Python-ideas] Suprocess functionality partitioning In-Reply-To: References: Message-ID: On 12 June 2013 20:32, anatoly techtonik wrote: > Something needs to be done about subprocess API, which got overly > complicated. > > The idea is to have: > shutil.run() - run command through shell, unsafe > sys.run() - run command directly in operating system, ?safe > > Both should be API compatible (unblocking stdin/stdout read etc.). > Currently, subprocess calls are unreadable in any Python code - many > conditions makes its behaviour hard to predict and memorize. By > partitioning shell and out-of-shell execution, the documentation will be > easier to grasp and reference to. Maybe it will be even possible to add > some 2D table of various subprocess states that affect behavior. Ranger (the best file manager ;P - http://ranger.nongnu.org/) has shown that the current method is rather useful - a simple flag to run in shell vs not is much easier to handle. I also, in my experience, appreciate the quick toggling we have for utilising the two modes. I also cannot agree that it would be easier to memorize the two functions than the one with the flag. So I'd be inclined to disagree with the proposal. From pyideas at rebertia.com Wed Jun 12 22:38:12 2013 From: pyideas at rebertia.com (Chris Rebert) Date: Wed, 12 Jun 2013 13:38:12 -0700 Subject: [Python-ideas] Suprocess functionality partitioning In-Reply-To: References: Message-ID: On Wed, Jun 12, 2013 at 12:44 PM, Joshua Landau wrote: > On 12 June 2013 20:32, anatoly techtonik wrote: >> Something needs to be done about subprocess API, which got overly >> complicated. >> >> The idea is to have: >> shutil.run() - run command through shell, unsafe >> sys.run() - run command directly in operating system, ?safe >> >> Both should be API compatible (unblocking stdin/stdout read etc.). >> Currently, subprocess calls are unreadable in any Python code - many >> conditions makes its behaviour hard to predict and memorize. By >> partitioning shell and out-of-shell execution, the documentation will be >> easier to grasp and reference to. Maybe it will be even possible to add >> some 2D table of various subprocess states that affect behavior. > > Ranger (the best file manager ;P - http://ranger.nongnu.org/) has > shown that the current method is rather useful - a simple flag to run > in shell vs not is much easier to handle. I also, in my experience, > appreciate the quick toggling we have for utilising the two modes. I > also cannot agree that it would be easier to memorize the two > functions than the one with the flag. How do you deal with the differences? It takes a lot more than just simply toggling the boolean to switch between the two approaches. I fully support separating the approaches. The cross-product of `shell` and `args` possibilities include nonsensical and deceptive "happens to work in some cases" combinations that cause much user confusion which could be avoided by separating them. shell=True, isinstance(args, str) ==> works normally (can be unsafe, but the user has explicitly requested it) shell=False, isinstance(args, str) ==> Works only if no arguments are being passed to the subprocess. Users, not reading the docs and assuming that Python designed things intuitively, very commonly get confused by this. "/usr/bin/foo" works; "/usr/bin/foo -a bar" doesn't work, and, at least on *nix, the error message in the latter case is approximately "OSError: [Errno 2] No such file or directory", which doesn't make the cause quite obvious enough and leaves the user confused, because /usr/bin/foo really does exist. shell=False, isinstance(args, list) ==> works normally (so long as the user gets the tokenization right, which isn't that hard, but plenty of people seem to mess it up, but that is all a separate issue from the present one) shell=True, isinstance(args, list) ==> not really useful; no known use case Essentially an implementation artifact that's arguably a bug: http://bugs.python.org/issue6760#msg98732 Again, confuses the user. And Popen()'s `executable` argument just complicates this analysis even further. Cheers, Chris From steve at pearwood.info Thu Jun 13 01:24:12 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 13 Jun 2013 09:24:12 +1000 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <1371032118.8452.YahooMailNeo@web184702.mail.ne1.yahoo.com> References: <87mwr0856x.fsf@uwakimon.sk.tsukuba.ac.jp> <51B46931.1080209@pearwood.info> <87hah78niq.fsf@uwakimon.sk.tsukuba.ac.jp> <51B48BA3.70502@pearwood.info> <87fvwr8dpb.fsf@uwakimon.sk.tsukuba.ac.jp> <51B74C2A.3080908@mrabarnett.plus.com> <51B7714C.7090506@mrabarnett.plus.com> <875F02B9-3E63-412E-96E9-349C8A04FF95@yahoo.com> <87vc5k6kl0.fsf@uwakimon.sk.tsukuba.ac.jp> <1371015055.70791.YahooMailNeo@web184705.mail.ne1.yahoo.com> <87li6f7lrj.fsf@uwakimon.sk.tsukuba.ac.jp> <1371032118.8452.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: <51B9031C.8000705@pearwood.info> On 12/06/13 20:15, Andrew Barnert wrote: > From: Stephen J. Turnbull > > Sent: Wednesday, June 12, 2013 1:53 AM > > >> Andrew Barnert writes: >> >>> MRAB suggested that maybe int and friends shouldn't do >>> transliteration at all; it would be better to have a new >>> "translate_number" function ("somewhere", not >> necessarily in >>> builtins,but presumably in the stdlib). >> >> I'm pretty sure I suggested that first, and I know I've suggested it >> on two or more different occasions. I don't care about credit, but >> it's pretty clear that nobody is paying much attention to what anybody >> else is writing. :-( > > I don't know who suggested it first. But apparently you, MRAB, and I (among others) all agree that the best way to add locale-specific numeric functionality is outside of int and float, and that it should be explicitly locale-specific. If you want locale-specific functions, we already have them in the locale module. locale.atof and friends can be used for locale-specific conversions. The advantage of int(), float() and Decimal() is that they are *independent* of the locale. I don't need to make an application-wide global change, potentially effecting my entire application, to tell that when the user enters '??' she wants forty-two. A more realistic example, though, would be non-interactive data processing. If I process a file that contains fullwidth characters like ??, there's no locale to deal with. They're not even from a different language. But they are not ASCII digits, and an ASCII-only int() would fail, *for no good reason*. By the way, in our comfortable almost-ASCII-only world, it's easy to think that other numeral systems don't matter. They do: http://stackoverflow.com/questions/16631753/convert-unicode-string-made-up-of-culture-specific-digits-to-integer-value http://stackoverflow.com/questions/6141255/parse-a-non-ascii-unicode-number-string-as-integer-in-net The answer to these two questions in Python is "call int()". Do you think that there are native users of other numeral systems that prefer jumping through hoops, as C# and .NET make you do, instead of Python's beautifully elegant solution? Python is rapidly becoming the best language for Unicode I have seen, even better than more recent languages like Go that don't have the historical baggage of Python. > (Slightly ironically, I'm actually +0.5 on the _original_ topic of this thread, handling Unicode minus signs, because that's something that has some precedent and pre-existing work, and it's really not locale-specific because it's being done with English/default-locale text. But it's become a thread about adding functionality to int and float to handle every numeric representation used anywhere in the world, and I'm definitely -1 toward that.) It certainly has not become anything even close to that exaggeration. int, float and Decimal have a very simple rule: if a character is a decimal digit, then it is a valid digit. That's not "every numeric representation used anywhere in the world". -- Steven From ncoghlan at gmail.com Thu Jun 13 01:29:36 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 13 Jun 2013 09:29:36 +1000 Subject: [Python-ideas] Suprocess functionality partitioning In-Reply-To: References: Message-ID: This is why there are a myriad of alternative shell access APIs on PyPI. Secure and convenient subprocess invocation is a hard problem, it will only be improved by people doing the hard work of creating their own libraries and iterating on them. It may not even be solvable cleanly without an invocation syntax that only accepts string literals rather than arbitrary objects. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pyideas at rebertia.com Thu Jun 13 01:42:46 2013 From: pyideas at rebertia.com (Chris Rebert) Date: Wed, 12 Jun 2013 16:42:46 -0700 Subject: [Python-ideas] Suprocess functionality partitioning In-Reply-To: References: Message-ID: On Wed, Jun 12, 2013 at 12:32 PM, anatoly techtonik wrote: > Something needs to be done about subprocess API, which got overly > complicated. > > The idea is to have: > shutil.run() - run command through shell, unsafe There's already a bug to add something like that: http://bugs.python.org/issue13238 > sys.run() - run command directly in operating system, ?safe I don't think `sys` is the right place for it (leave it in `subprocess` and refactor Popen()'s API, IMO). Anyone know if there's an existing bug for streamlining/splitting subprocess's API? Cheers, Chris From abarnert at yahoo.com Thu Jun 13 03:56:25 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 12 Jun 2013 18:56:25 -0700 Subject: [Python-ideas] Suprocess functionality partitioning In-Reply-To: References: Message-ID: On Jun 12, 2013, at 12:32, anatoly techtonik wrote: > Something needs to be done about subprocess API, which got overly complicated. > > The idea is to have: > shutil.run() - run command through shell, unsafe > sys.run() - run command directly in operating system, ?safe What does "run" mean? Is it equivalent to call, check_call, check_output, or something else? I find myself needing both check_call (to pass output straight through) and check_output (to process it) very often. I'd be happy with simplifying those. But not with simplifying one and losing the other. Also, how often do you really need shell=True? I think providing a simpler way to do that will be more of an attractive nuisance than a help. (Many of the subprocess questions on Stack Overflow come down to people using shell=True, not knowing why they used it, and not knowing how to deal with it.) > Both should be API compatible (unblocking stdin/stdout read etc.). You're suggesting that these actually provide a Popen-like object that the caller then has to communicate() with or equivalent? Because that's already way more complicated than what I need for simple cases, and won't be that much simpler than just using Popen is today. Really, the only thing you can eliminate that way is the shell flag. > Currently, subprocess calls are unreadable in any Python code - many > conditions makes its behaviour hard to predict and memorize. By > partitioning shell and out-of-shell execution, the documentation will be > easier to grasp and reference to. Maybe it will be even possible to add > some 2D table of various subprocess states that affect behavior. > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From abarnert at yahoo.com Thu Jun 13 03:59:19 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 12 Jun 2013 18:59:19 -0700 Subject: [Python-ideas] Suprocess functionality partitioning In-Reply-To: References: Message-ID: On Jun 12, 2013, at 13:38, Chris Rebert wrote: > shell=False, isinstance(args, str) ==> > Works only if no arguments are being passed to the subprocess. That's only true on POSIX. On Windows, this not only works, it's the most straightforward way to do it. > shell=False, isinstance(args, list) ==> works normally Except that on Windows it has to build a string out of your args, attempting to craft the right string that the child will then parse back into the specified args. Which _usually_ works, but when you start doing complicated things with quoting it doesn't. From steve at pearwood.info Thu Jun 13 05:15:31 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 13 Jun 2013 13:15:31 +1000 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <874nd58l8p.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> <51B674EE.3020103@pearwood.info> <874nd58l8p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51B93953.6020602@pearwood.info> On 11/06/13 11:55, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > It would be a pretty awful font that made ? look like . > > Or aging eyes. > > > But even if it did, what is the concern here? If somebody enters a > > mixed script number, presumably they have some reason for it. > > Unicode Technical Report #36 explains the concerns. Mostly that the > reason may be nefarious. I specifically draw your attention to > section 2.7: Here's the URL: http://www.unicode.org/reports/tr36/ > 2.7 Numeric Spoofs > > Turning away from the focus on domain names for a moment, there is > another area where visual spoofs can be used. Many scripts have sets > of decimal digits that are different in shape from the typical > European digits. For example, Bengali has {? ? ? ? ? ? ? ? ? ?}, while > Oriya has {? ? ? ? ? ? ? ? ? ?}. Individual digits may have the same > shapes as digits from other scripts, even digits of different > values. For example, the Bengali string "??" is visually confusable > with the European digits "89", but actually has the numeric value 42! > * If software interprets the numeric value of a string of digits without > * detecting that the digits are from different or inappropriate scripts, > * such spoofs can be used. > > Emphasis (*) added. Noting that the number 42 is the answer to Life, > the Universe, and Everything (including this thread), I conclude we're > done! There is a vast gulf between "they look similar, and we're drawing this to your attention" and "here's an actual exploit". I'd be more impressed if they demonstrated a concrete exploit. Spoofing digits in a URL is a concrete exploit -- if you expect a URL like http://foo??.com, then someone might be able to fool you into clicking http://foo89.com instead. That's a real risk, but not unique to Unicode. paypa1.com vs paypal.com anyone? But coming up with a relevant exploit involving int() is harder. Earlier, Alexander Belopolsky wrote about potential vandalism of Wikipedia when screen-scraping data. Presumably he had something in mind like this: # Actual data "Average number of eggs eaten in a month = 89" # Vandalised data: "Average number of eggs eaten in a month = ??" And lo and behold, the vandal has succeeded in hiding the fact of his vandalism, provided the reader happens to be relatively unobservant and has font support for Bengali digits. And then the unsuspecting Python programmer scrapes the data, calls int(), and gets the value 42 instead of 89. The vandal's dastardly plan succeeds. I suggest that this is rather more likely: # Vandalised data: "Average number of eggs eaten in a month = 42" There are practical exploits where the bad guy can exploit the visual similarity of certain digits to other digits, but they doesn't have anything to do with int(). The Unicode consortium has done the right thing by mentioning this, but we can get a rough idea of the practical risk involved: there are about ten pages of discussion of various URL spoofing attacks, and six lines on numeric spoofs. -- Steven From ronaldoussoren at mac.com Thu Jun 13 08:13:50 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 13 Jun 2013 08:13:50 +0200 Subject: [Python-ideas] Suprocess functionality partitioning In-Reply-To: References: Message-ID: <2083BD50-EFD6-483D-8363-FC2737A9F0BD@mac.com> On 12 Jun, 2013, at 21:32, anatoly techtonik wrote: > Something needs to be done about subprocess API, which got overly complicated. > > The idea is to have: > shutil.run() - run command through shell, unsafe > sys.run() - run command directly in operating system, ?safe > > Both should be API compatible (unblocking stdin/stdout read etc.). > Currently, subprocess calls are unreadable in any Python code - many > conditions makes its behaviour hard to predict and memorize. By > partitioning shell and out-of-shell execution, the documentation will be > easier to grasp and reference to. Maybe it will be even possible to add > some 2D table of various subprocess states that affect behavior. I'm far from convinced that splitting the functionality in this way is an improvement over the subprocess API, there is a signifant risk that you'll end up with two APIs that are awfully similar and you'd still have to know about those. Most of the complexity of the subprocess API is not due to the "shell" keyword argument, but is mostly caused by just having an awfull lot of feature that are all needed for a generic solution. The best way forward with your proposal is IMHO to start a project on PyPI to make it possible to iterate fast with community feedback. Possible stdlib inclusing can then be discussed when that project is stable and has a signifant userbase. With the current packaging ecosystem, and upcoming improvements, it might not even be necessary to add it to the stdlib at that point. Ronald From pyideas at rebertia.com Thu Jun 13 08:23:53 2013 From: pyideas at rebertia.com (Chris Rebert) Date: Wed, 12 Jun 2013 23:23:53 -0700 Subject: [Python-ideas] Suprocess functionality partitioning In-Reply-To: References: Message-ID: On Jun 12, 2013 6:59 PM, "Andrew Barnert" wrote: > On Jun 12, 2013, at 13:38, Chris Rebert wrote: > > > shell=False, isinstance(args, str) ==> > > Works only if no arguments are being passed to the subprocess. > > That's only true on POSIX. On Windows, this not only works, it's the most straightforward way to do it. > > > shell=False, isinstance(args, list) ==> works normally > > Except that on Windows it has to build a string out of your args, attempting to craft the right string that the child will then parse back into the specified args. Which _usually_ works, but when you start doing complicated things with quoting it doesn't. Ah, Windows. Always gotta be different. I suppose the subprocess module may have bitten off more than it can chew in trying to have a single completely cross-platform constructor. Cheers, Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Jun 13 10:26:33 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 13 Jun 2013 01:26:33 -0700 Subject: [Python-ideas] Suprocess functionality partitioning In-Reply-To: <2083BD50-EFD6-483D-8363-FC2737A9F0BD@mac.com> References: <2083BD50-EFD6-483D-8363-FC2737A9F0BD@mac.com> Message-ID: <721395E1-DD98-4B76-AC69-1E0077E1E0FD@yahoo.com> On Jun 12, 2013, at 23:13, Ronald Oussoren wrote: > The best way forward with your proposal is IMHO to start a project on PyPI > to make it possible to iterate fast with community feedback. Possible stdlib > inclusing can then be discussed when that project is stable and has a > signifant userbase. IMHO, even before that, it's worth surveying the other PyPI libraries in the same space. http://bugs.python.org/issue13238 links to many of them, but there are definitely more than those. Some of them are pretty similar to this proposal; others are radically different. It's also worth looking at the interfaces in node.js, ruby, etc. Most languages' stdlibs don't have the fancy exec wrappers that python has, and only have semi-fancy os.system replacements. But that might mean they've put more thought into those os.system replacements, or taken advantage of the freedom that comes from not having to design an interface that makes sense for both shell=True and shell=False. A proposal that took the best from all of those, and had a rationale that included a survey and explained what was wrong with all of the existing alternatives, instead of just how subprocess is too complicated would be a lot more compelling, at least to me. In fact, I wonder if having links to those libraries in the subprocess docs (together with the upcoming packaging improvements) would remove much of the recurring demand for adding new subprocess wrappers to the stdlib. I don't know if there's a policy on that, but it does seem to be pretty rare (and mainly used for links to implementations either as code examples or for backporting). So maybe it's not considered appropriate? -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.maier at biologie.uni-freiburg.de Thu Jun 13 10:50:21 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Thu, 13 Jun 2013 08:50:21 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods Message-ID: Dear all, currently - and referring to Python 3 - the write methods of the different io module objects work on bytes and str objects only. The built-in functions print() and bytes(), on the other hand, use an arbitrary object's __str__ and __bytes__ methods to compute the str and bytes they should work with. Wouldn't it be more consistent and pythonic if the io write methods behaved the same way? Best, Wolfgang From shibturn at gmail.com Thu Jun 13 12:53:30 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Thu, 13 Jun 2013 11:53:30 +0100 Subject: [Python-ideas] Suprocess functionality partitioning In-Reply-To: References: Message-ID: On 13/06/2013 2:59am, Andrew Barnert wrote: >> shell=False, isinstance(args, list) ==> works normally > > Except that on Windows it has to build a string out of your args, > attempting to craft the right string that the child will then parse > back into the specified args. Which _usually_ works, but when you > start doing complicated things with quoting it doesn't. I was under the impression that this worked correctly (for executables compiled with Visual C which just use argv [1]). But os.spawnv() is a total mess where you *do* need to worry about quoting. [1] Nowadays other C compilers use the same rules. But I think cmd may use different rules. -- Richard From ncoghlan at gmail.com Thu Jun 13 13:18:49 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 13 Jun 2013 21:18:49 +1000 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: <51B93953.6020602@pearwood.info> References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> <51B674EE.3020103@pearwood.info> <874nd58l8p.fsf@uwakimon.sk.tsukuba.ac.jp> <51B93953.6020602@pearwood.info> Message-ID: On 13 June 2013 13:15, Steven D'Aprano wrote: > There are practical exploits where the bad guy can exploit the visual > similarity of certain digits to other digits, but they doesn't have anything > to do with int(). The Unicode consortium has done the right thing by > mentioning this, but we can get a rough idea of the practical risk involved: > there are about ten pages of discussion of various URL spoofing attacks, and > six lines on numeric spoofs. Just as significantly, we push validation of untrusted data out to the boundaries of applications for a reason: the interpreter has no way of knowing whether data is trusted or untrusted. Establishing trustworthiness is a complex topic, and there are limits to the assistance the language can offer. We can remove obviously-unsafe-in-retrospect features (like the old Python 2.x input-with-implicit-eval), but glyph confusion in Unicode is well outside what we decided to worry about when designing Python 3's unicode features. It's similar to the way we don't second guess the user if they decided to set "shell=True" on a subprocess call. You can contrast that with the packaging metadata 2.0 design though, where not only are we continuing to disallow arbitrary Unicode in package names (with the vulnerability to glyph-confusion based exploits being one of the major considerations), but even allowing index servers to enforce the TR36 glyph confusability restrictions that apply to ASCII characters (specifically "lI1" and "0O"). I think this whole thread has gone pretty far afield, though. The original question was whether 'int("-1") == int("-\{MINUS SIGN}1")' should hold, and I agree with Lukasz and MAL that it should. I'm only +0 on the other characters MAL mentioned, though (FULLWIDTH PLUS SIGN, SUPERSCRIPT PLUS SIGN, SUPERSCRIPT MINUS). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jun 13 13:28:29 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 13 Jun 2013 21:28:29 +1000 Subject: [Python-ideas] Suprocess functionality partitioning In-Reply-To: References: Message-ID: On 13 June 2013 11:56, Andrew Barnert wrote: > Also, how often do you really need shell=True? I think providing a simpler way to do that will be more of an attractive nuisance than a help. (Many of the subprocess questions on Stack Overflow come down to people using shell=True, not knowing why they used it, and not knowing how to deal with it.) For the record, one of the main reasons I stopped working on shell-command is because the application that originally inspired it was much improved once I rewrote it to avoid the dependency on the shell. "Application was improved by being rewritten to no longer use the style this library encourages" was a bit of a red flag :P Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mal at egenix.com Thu Jun 13 13:30:27 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 13 Jun 2013 13:30:27 +0200 Subject: [Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions In-Reply-To: References: <51B45E64.80302@egenix.com> <827956C6-5CEF-4060-83C4-EB34F1A3DD31@langa.pl> <51B674EE.3020103@pearwood.info> <874nd58l8p.fsf@uwakimon.sk.tsukuba.ac.jp> <51B93953.6020602@pearwood.info> Message-ID: <51B9AD53.8000100@egenix.com> On 13.06.2013 13:18, Nick Coghlan wrote: > I think this whole thread has gone pretty far afield, though. The > original question was whether 'int("-1") == int("-\{MINUS SIGN}1")' > should hold, and I agree with Lukasz and MAL that it should. I'm only > +0 on the other characters MAL mentioned, though (FULLWIDTH PLUS SIGN, > SUPERSCRIPT PLUS SIGN, SUPERSCRIPT MINUS). FYI: The discussion of the original request is now continuing on this ticket: http://bugs.python.org/issue10581 I like Alexander's idea with using normalization. We'd only have to solve the potential performance issue for the much more common ASCII case. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 13 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 18 days to go 2013-07-16: Python Meeting Duesseldorf ... 33 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From wolfgang.maier at biologie.uni-freiburg.de Thu Jun 13 13:30:40 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Thu, 13 Jun 2013 11:30:40 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: Message-ID: Wolfgang Maier writes: > > Dear all, > currently - and referring to Python 3 - the write methods of the different > io module objects work on bytes and str objects only. The built-in functions > print() and bytes(), on the other hand, use an arbitrary object's __str__ > and __bytes__ methods to compute the str and bytes they should work with. > Wouldn't it be more consistent and pythonic if the io write methods behaved > the same way? > Best, > Wolfgang > Actually, I just played around with the .__bytes__ method and the bytes() built-in function a bit and ran into a very strange behavior, which I would call a bug, but maybe there is a reason for it? When you define your own class inheriting from object and provide it with a __bytes__ method, everything works fine, i.e. bytes() uses the method to get the bytestring representation. However, if you decide to inherit from str or int, then bytes() completely ignores the __bytes__ method and sticks to the superclass behavior instead, i.e. requiring an encoding for str and creating a bytestring of the length of an int. Example: class byteablestr (str): def __bytes__(self): return self.encode() s = byteablestr('abcd') bytes(s) Traceback (most recent call last): File "C:/Python33/bug.py", line 6, in bytes(s) TypeError: string argument without an encoding I find it very strange that you cannot override the superclass behavior, when you can override e.g. the __str__ method in any subclass of str. Best, Wolfgang From ncoghlan at gmail.com Thu Jun 13 13:39:05 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 13 Jun 2013 21:39:05 +1000 Subject: [Python-ideas] Suprocess functionality partitioning In-Reply-To: <721395E1-DD98-4B76-AC69-1E0077E1E0FD@yahoo.com> References: <2083BD50-EFD6-483D-8363-FC2737A9F0BD@mac.com> <721395E1-DD98-4B76-AC69-1E0077E1E0FD@yahoo.com> Message-ID: On 13 June 2013 18:26, Andrew Barnert wrote: > A proposal that took the best from all of those, and had a rationale that > included a survey and explained what was wrong with all of the existing > alternatives, instead of just how subprocess is too complicated would be a > lot more compelling, at least to me. Julia's command invocation supports is gorgeous (http://docs.julialang.org/en/latest/manual/running-external-programs/) and the primary inspiration for http://shell-command.readthedocs.org/en/latest/ (suitably adjusted to handle the lack of syntactic support). Wrapping around an *actual* shell the way shell-command does is dangerous, though - better to do something that avoids depending on an external shell program in the first place. > In fact, I wonder if having links to those libraries in the subprocess docs > (together with the upcoming packaging improvements) would remove much of the > recurring demand for adding new subprocess wrappers to the stdlib. I don't > know if there's a policy on that, but it does seem to be pretty rare (and > mainly used for links to implementations either as code examples or for > backporting). So maybe it's not considered appropriate? Generally not considered appropriate, except for modules that are actually backports of stdlib functionality. It's mostly a duty of care issue (once we link to an external resource, then it carries a certain promise that it's up to the standards of stdlib inclusion), but also a "we don't want to play favourites" issue. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jun 13 14:24:47 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 13 Jun 2013 22:24:47 +1000 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: Message-ID: On 13 June 2013 21:30, Wolfgang Maier wrote: > I find it very strange that you cannot override the superclass behavior, > when you can override e.g. the __str__ method in any subclass of str. If your type is acceptable input to operator.index(), you'll get the "initialised array of bytes" behaviour, and the custom str handling is triggered off a simple isinstance() check. It's part of the bytes/bytearray constructor definitions. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From wolfgang.maier at biologie.uni-freiburg.de Thu Jun 13 14:46:15 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Thu, 13 Jun 2013 12:46:15 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: Message-ID: Nick Coghlan writes: > > On 13 June 2013 21:30, Wolfgang Maier > wrote: > > I find it very strange that you cannot override the superclass behavior, > > when you can override e.g. the __str__ method in any subclass of str. > > If your type is acceptable input to operator.index(), you'll get the > "initialised array of bytes" behaviour, and the custom str handling is > triggered off a simple isinstance() check. It's part of the > bytes/bytearray constructor definitions. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at ... | Brisbane, Australia > ok, that's a nice technical explanation, but is that reasonable behavior? Shouldn't Python check first to see if the object defines __bytes__() before defaulting to the other options? Besides, you don't have that problem with iterables (only tested inheritance from list). A __bytes__ method seems to take priority over the default behaviour, which is to expect the iterable to yield integers that are then used to initialize an array of bytes. Wolfgang From ncoghlan at gmail.com Thu Jun 13 15:04:36 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 13 Jun 2013 23:04:36 +1000 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: Message-ID: On 13 June 2013 22:46, Wolfgang Maier wrote: > ok, that's a nice technical explanation, but is that reasonable behavior? > Shouldn't Python check first to see if the object defines __bytes__() before > defaulting to the other options? It doesn't matter whether it's reasonable at this point, it can't be changed without breaking backwards compatibility. Those things are special cased, and it's not worth the hassle of changing it. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From wolfgang.maier at biologie.uni-freiburg.de Thu Jun 13 15:09:30 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Thu, 13 Jun 2013 13:09:30 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: Message-ID: Nick Coghlan writes: > > On 13 June 2013 22:46, Wolfgang Maier > wrote: > > ok, that's a nice technical explanation, but is that reasonable behavior? > > Shouldn't Python check first to see if the object defines __bytes__() before > > defaulting to the other options? > > It doesn't matter whether it's reasonable at this point, it can't be > changed without breaking backwards compatibility. Those things are > special cased, and it's not worth the hassle of changing it. > > Cheers, > Nick. > I'm confused. Why would it break backwards compatibility, when the default behaviour is the same? As far as I can see, it would only affect code that defines a __bytes__ method for subclasses of str or int, but then relies on them never actually getting called ?? Best, Wolfgang From ncoghlan at gmail.com Thu Jun 13 15:23:38 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 13 Jun 2013 23:23:38 +1000 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: Message-ID: On 13 June 2013 23:09, Wolfgang Maier wrote: > Nick Coghlan writes: > >> >> On 13 June 2013 22:46, Wolfgang Maier >> wrote: >> > ok, that's a nice technical explanation, but is that reasonable behavior? >> > Shouldn't Python check first to see if the object defines __bytes__() before >> > defaulting to the other options? >> >> It doesn't matter whether it's reasonable at this point, it can't be >> changed without breaking backwards compatibility. Those things are >> special cased, and it's not worth the hassle of changing it. >> >> Cheers, >> Nick. >> > > I'm confused. Why would it break backwards compatibility, when the default > behaviour is the same? As far as I can see, it would only affect code that > defines a __bytes__ method for subclasses of str or int, but then relies on > them never actually getting called ?? A good point. That reduces it to arguing for slowing down the common cases to check for an incredibly niche case, which is still a hard sell (just not as hard a sell as breaking backwards compatibility). Inheriting from builtins is generally a bad idea, and this kind of quirky result is one of the main reasons why. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From wolfgang.maier at biologie.uni-freiburg.de Thu Jun 13 15:46:52 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Thu, 13 Jun 2013 13:46:52 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: Message-ID: Nick Coghlan writes: > > A good point. That reduces it to arguing for slowing down the common > cases to check for an incredibly niche case, which is still a hard > sell (just not as hard a sell as breaking backwards compatibility). > > Inheriting from builtins is generally a bad idea, and this kind of > quirky result is one of the main reasons why. > > Cheers, > Nick. ok, I get the point, and you're probably right. But since I like consistency: why are things different with built-in iterables then? From oscar.j.benjamin at gmail.com Thu Jun 13 15:57:01 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 13 Jun 2013 14:57:01 +0100 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: Message-ID: On 13 June 2013 13:24, Nick Coghlan wrote: > If your type is acceptable input to operator.index(), you'll get the > "initialised array of bytes" behaviour I only recently discovered this. What was the rationale for that change? $ py -2.7 -c 'print(repr(bytes(4)))' '4' $ py -3.3 -c 'print(repr(bytes(4)))' b'\x00\x00\x00\x00' I can't really see why anyone would want the latter behaviour (when you can already do b'\x00' * 4). Oscar From wolfgang.maier at biologie.uni-freiburg.de Thu Jun 13 16:13:16 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Thu, 13 Jun 2013 14:13:16 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: Message-ID: Oscar Benjamin writes: > > On 13 June 2013 13:24, Nick Coghlan wrote: > > If your type is acceptable input to operator.index(), you'll get the > > "initialised array of bytes" behaviour > > I only recently discovered this. What was the rationale for that change? > > $ py -2.7 -c 'print(repr(bytes(4)))' > '4' > > $ py -3.3 -c 'print(repr(bytes(4)))' > b'\x00\x00\x00\x00' > > I can't really see why anyone would want the latter behaviour (when > you can already do b'\x00' * 4). > > Oscar > It's funny you mention that difference since that was how I came across my issue. I was looking for a way to get back the Python 2.7 behaviour bytes('1234') '1234' in Python3. The __bytes__ method does not offer an easy solution for this though. I could only think of str(1234).encode(), which feels ridiculous. Any better ways? Wolfgang From ncoghlan at gmail.com Thu Jun 13 16:34:43 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 14 Jun 2013 00:34:43 +1000 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: Message-ID: On 14 June 2013 00:13, Wolfgang Maier wrote: > Oscar Benjamin writes: > >> >> On 13 June 2013 13:24, Nick Coghlan wrote: >> > If your type is acceptable input to operator.index(), you'll get the >> > "initialised array of bytes" behaviour >> >> I only recently discovered this. What was the rationale for that change? >> >> $ py -2.7 -c 'print(repr(bytes(4)))' >> '4' >> >> $ py -3.3 -c 'print(repr(bytes(4)))' >> b'\x00\x00\x00\x00' >> >> I can't really see why anyone would want the latter behaviour (when >> you can already do b'\x00' * 4). >> >> Oscar >> > > It's funny you mention that difference since that was how I came across my > issue. I was looking for a way to get back the Python 2.7 behaviour > bytes('1234') > '1234' You mean other than using the bytes literal b'1234' instead of a string literal? Bytes and text are different things in Python 3, whereas the 2.x "bytes" was just an alias for "str". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From wolfgang.maier at biologie.uni-freiburg.de Thu Jun 13 16:41:34 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Thu, 13 Jun 2013 14:41:34 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: Message-ID: Nick Coghlan writes: > > > Oscar Benjamin ...> writes: > >> > >> On 13 June 2013 13:24, Nick Coghlan ...> wrote: > >> > If your type is acceptable input to operator.index(), you'll get the > >> > "initialised array of bytes" behaviour > >> > >> I only recently discovered this. What was the rationale for that change? > >> > >> $ py -2.7 -c 'print(repr(bytes(4)))' > >> '4' > >> > >> $ py -3.3 -c 'print(repr(bytes(4)))' > >> b'\x00\x00\x00\x00' > >> > >> I can't really see why anyone would want the latter behaviour (when > >> you can already do b'\x00' * 4). > >> > >> Oscar > >> > > > > It's funny you mention that difference since that was how I came across my > > issue. I was looking for a way to get back the Python 2.7 behaviour > > bytes('1234') > > '1234' > > You mean other than using the bytes literal b'1234' instead of a > string literal? Bytes and text are different things in Python 3, > whereas the 2.x "bytes" was just an alias for "str". > Well, I was illustrating the case with a literal integer, but, of course, I was thinking of cases with references: a=1234 str(a).encode() # gives b'1234' in Python3, but converting your int to str first, just to encode it again to bytes seems weird Best, Wolfgang From ezio.melotti at gmail.com Fri Jun 14 01:37:39 2013 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Fri, 14 Jun 2013 02:37:39 +0300 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: Message-ID: Hi, On Tue, Jun 11, 2013 at 5:49 PM, Serhiy Storchaka wrote: > I propose to add "htmlcharrefreplace" error handler which is similar to > "xmlcharrefreplace" error handler but use html entity names if possible. > >>>> '? x??'.encode('ascii', 'xmlcharrefreplace') > b'∀ x∈ℜ' >>>> '? x??'.encode('ascii', 'htmlcharrefreplace') > b'∀ x∈ℜ' > Do you have any use cases for this, or is it just for completeness since we already have xmlcharrefreplace? IMHO character references (named or numerical) should never be used in HTML (with the exception of " > and <). They exist mainly for three reasons: 1) provide a way to include characters that are not available in the used encoding (e.g. if you are using an obsolete encoding like windows-1252 but still want to use "fancy" characters); 2) to keep the HTML source ASCII-only; 3) to specify a character by name if it's not possible to enter it directly (e.g. you don't know the keys combinations); 1) is not a problem if you are using the UTF encodings, and if you aren't (and you have unencodable chars) you are doing it wrong; 2) might still be valid for some situations, but in 2014 I would expect software to deal decently with non-ASCII text; 3) is not a concern for this case, since we already have the character we want and we aren't entering them manually; I would therefore prefer to leave this to specific functions in the html package, rather than adding a new error handler, so I'm -0.5 on this (I would be -1 if it wasn't for the fact that if we want this to work with any encoding, an error handler is indeed the simpler solution). I also want to avoid the situation where users don't know what they are doing and start putting entities everywhere just to be "safe" (since this will offer a convenient way to do it), and they might also stick with obsolete encodings just because they can use this "workaround". Best Regards, Ezio Melotti > Possible implementation: > > import codecs > from html.entities import codepoint2name > > def htmlcharrefreplace_errors(exc): > if not isinstance(exc, UnicodeEncodeError): > raise exc > try: > replace = r'&%s;' % codepoint2name[ord(exc.object[exc.start])] > except KeyError: > return codecs.xmlcharrefreplace_errors(exc) > return replace, exc.start + 1 > > codecs.register_error('htmlcharrefreplace', htmlcharrefreplace_errors) > > Even if do not register this handler from the start, it may be worth to > provide htmlcharrefreplace_errors() in the html or html.entities module. > From steve at pearwood.info Fri Jun 14 02:08:44 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 14 Jun 2013 10:08:44 +1000 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: Message-ID: <51BA5F0C.1040909@pearwood.info> On 14/06/13 00:41, Wolfgang Maier wrote: >>> It's funny you mention that difference since that was how I came across my >>> issue. I was looking for a way to get back the Python 2.7 behaviour >>> bytes('1234') >>> '1234' >> >> You mean other than using the bytes literal b'1234' instead of a >> string literal? Bytes and text are different things in Python 3, >> whereas the 2.x "bytes" was just an alias for "str". >> > > Well, I was illustrating the case with a literal integer, but, of course, I > was thinking of cases with references: > a=1234 > str(a).encode() # gives b'1234' in Python3, but converting your int to str > first, just to encode it again to bytes seems weird On the contrary, it is the most natural way to do it. Converting objects directly to bytes is not conceptually obvious. I can think of at least TWELVE obvious ways which the int 4 might convert to bytes (displaying in all hex, rather than the more compact but less consistent forms): # Treat it as a 8-bit, 16-bit, 32-bit or 64-bit integer: b'\x04' b'\x00\x04' b'\x04\x00' b'\x00\x00\x00\x04' b'\x04\x00\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x04' b'\x04\x00\x00\x00\x00\x00\x00\x00' # Convert it to the string '4' first, then encode to bytes # as UTF-8, UTF-16, or UTF-32: b'\x34' b'\x00\x34' b'\x34\x00' b'\x34\x00\x00\x00' b'\x00\x00\x00\x34' The actual behaviour, where bytes(4) => b'\x00\x00\x00\x00', I consider to be neither obvious nor especially useful. If bytes were mutable, then bytes(4) would be a useful way to initialise a block of four bytes for later modification. But they aren't, so I don't really see the point. The obvious way to get four NUL bytes is surely b'\0'*4, so it's also redundant. That you can't even subclass int and override it, like you can override every other dunder method (__str__, __repr__, __add__, __mul__, etc.) strikes me as astonishingly weird and in violation of the Zen: Special cases aren't special enough to break the rules. I imagine that the code for the bytes builtin looks something like this in pseudo-code: if isinstance(arg, int): special case int elif isinstance(arg, str): special case str else: call __bytes__ method I don't think it would effect performance very much, if at all, if it were changed to: if type(arg) is int: special case int elif type(arg) is str: special case str else: call __bytes__ method ints and strs will have to grow a dunder method in order to support inheritance, but the implication could be as simple as: def __bytes__(self): return bytes(int(self)) def __bytes__(self, encoding): return bytes(str(self), encoding) Of course, I may have missed some logic for the current behaviour. -- Steven From abarnert at yahoo.com Fri Jun 14 02:14:59 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 13 Jun 2013 17:14:59 -0700 (PDT) Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: Message-ID: <1371168899.36922.YahooMailNeo@web184704.mail.ne1.yahoo.com> From: Wolfgang Maier Sent: Thursday, June 13, 2013 7:41 AM > Nick Coghlan writes: > >> >> > Oscar Benjamin ...> writes: >> >> >> >> On 13 June 2013 13:24, Nick Coghlan > ...> wrote: >> >> > If your type is acceptable input to operator.index(), > you'll get the >> >> > "initialised array of bytes" behaviour >> >> >> >> I only recently discovered this. What was the rationale for that > change? >> >> >> >> $ py -2.7 -c 'print(repr(bytes(4)))' >> >> '4' >> >> >> >> $ py -3.3 -c 'print(repr(bytes(4)))' >> >> b'\x00\x00\x00\x00' >> >> >> >> I can't really see why anyone would want the latter behaviour > (when >> >> you can already do b'\x00' * 4). >> >> >> >> Oscar >> >> >> > >> > It's funny you mention that difference since that was how I came > across my >> > issue. I was looking for a way to get back the Python 2.7 behaviour >> > bytes('1234') >> > '1234' >> >> You mean other than using the bytes literal b'1234' instead of a >> string literal? Bytes and text are different things in Python 3, >> whereas the 2.x "bytes" was just an alias for "str". >> > > Well, I was illustrating the case with a literal integer, but, of course, I > was thinking of cases with references: > a=1234 > str(a).encode() # gives b'1234' in Python3, but converting your int to > str > first, just to encode it again to bytes seems weird Conceptually, it makes perfect sense. b'1234' isn't a string with the canonical numeral representation of 1234, it's a sequence of bytes, which happens to be a particular (unspecified) encoding of a string?with the canonical numeral representation of 1234. The docs (http://docs.python.org/3.3/library/functions.html#bytes) explicitly say a bytes object: >?is an immutable sequence of integers in the range 0 <= x < 256. bytes is an immutable version of bytearray Practically, you often want to use bytes as "ASCII strings", and you often can get away with it.?It works for literals, some but not all methods, and of course everything that strings inherit from sequences (concatenation, slicing, etc.).? But often you can't get away with it.?It doesn't work for formatting, anything strings do differently from sequences (notably indexing), some methods, most functions that special-case on strings, type-checking (there's no basestring in 3.x), etc. Likewise, the bytes() constructor doesn't work quite like str(),?and there's no bytes equivalent of repr(). Obviously, there's a tradeoff behind all of those decisions. It wouldn't have been hard to put bytes.__mod__, bytes.format, basestring, etc. into Python 3, or to make b'a'[0] return b'a' instead of 97, or to?make bytes(x) work more like str(x), or to add a brepr or similar function, etc. But it would make bytes less useful as a sequence of 8-bit integers. And, more importantly, it would be an attractive nuisance, making a lot of common errors more common (as they were in 2.x).?As the docs (http://docs.python.org/3.3/library/stdtypes.html#bytes)?put it: >?This is done deliberately to emphasise that while many binary formats include ASCII based elements and can be usefully manipulated with some text-oriented algorithms, this is not generally the case for arbitrary binary data (blindly applying text processing algorithms to binary data formats that are not ASCII compatible will usually lead to data corruption). Anyway, why do you actually want a bytes here? Maybe there's a better design for what you're trying to do that would make this whole issue irrelevant to your code. From greg.ewing at canterbury.ac.nz Fri Jun 14 02:30:31 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Jun 2013 12:30:31 +1200 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: Message-ID: <51BA6427.1090202@canterbury.ac.nz> Wolfgang Maier wrote: > the write methods of the different > io module objects work on bytes and str objects only. The built-in functions > print() and bytes(), on the other hand, use an arbitrary object's __str__ > and __bytes__ methods to compute the str and bytes they should work with. > Wouldn't it be more consistent and pythonic if the io write methods behaved > the same way? There's a difference: print() and bytes() are just single functions, but write() is an interface implemented by many objects. Requiring write() to apply __str__ or __bytes__ would place a burden on all implementations of I/O objects. -- Greg From ncoghlan at gmail.com Fri Jun 14 07:44:14 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 14 Jun 2013 15:44:14 +1000 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: <51BA5F0C.1040909@pearwood.info> References: <51BA5F0C.1040909@pearwood.info> Message-ID: On 14 June 2013 10:08, Steven D'Aprano wrote: > The actual behaviour, where bytes(4) => b'\x00\x00\x00\x00', I consider to > be neither obvious nor especially useful. If bytes were mutable, then > bytes(4) would be a useful way to initialise a block of four bytes for later > modification. But they aren't, so I don't really see the point. The obvious > way to get four NUL bytes is surely b'\0'*4, so it's also redundant. My (vague) recollection is that the intended use case was for bytearray (i.e. exactly the "initialize a section of memory for subsequent modification" use case you mention), and the bytes constructor just supports it for consistency reasons. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From amauryfa at gmail.com Fri Jun 14 08:14:31 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 14 Jun 2013 08:14:31 +0200 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: <51BA5F0C.1040909@pearwood.info> Message-ID: 2013/6/14 Nick Coghlan > On 14 June 2013 10:08, Steven D'Aprano wrote: > > The actual behaviour, where bytes(4) => b'\x00\x00\x00\x00', I consider > to > > be neither obvious nor especially useful. If bytes were mutable, then > > bytes(4) would be a useful way to initialise a block of four bytes for > later > > modification. But they aren't, so I don't really see the point. The > obvious > > way to get four NUL bytes is surely b'\0'*4, so it's also redundant. > > My (vague) recollection is that the intended use case was for > bytearray (i.e. exactly the "initialize a section of memory for > subsequent modification" use case you mention), and the bytes > constructor just supports it for consistency reasons. In early 3.0, bytes were mutable. So I guess this bytes(4) behavior was just forgotten. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jun 14 08:52:39 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 14 Jun 2013 16:52:39 +1000 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: <51BA5F0C.1040909@pearwood.info> Message-ID: On 14 June 2013 16:14, Amaury Forgeot d'Arc wrote: > > 2013/6/14 Nick Coghlan >> >> On 14 June 2013 10:08, Steven D'Aprano wrote: >> > The actual behaviour, where bytes(4) => b'\x00\x00\x00\x00', I consider >> > to >> > be neither obvious nor especially useful. If bytes were mutable, then >> > bytes(4) would be a useful way to initialise a block of four bytes for >> > later >> > modification. But they aren't, so I don't really see the point. The >> > obvious >> > way to get four NUL bytes is surely b'\0'*4, so it's also redundant. >> >> My (vague) recollection is that the intended use case was for >> bytearray (i.e. exactly the "initialize a section of memory for >> subsequent modification" use case you mention), and the bytes >> constructor just supports it for consistency reasons. > > > In early 3.0, bytes were mutable. So I guess this bytes(4) behavior was just > forgotten. Ah, I had forgotten that part of the story. Yeah, you're probably right - it's quite likely that when it was split into bytes (immutable) and bytearray (mutable), we simply didn't give the idea of changing either constructor signature any thought. Cheers, Nick. P.S. Trawling the python-3000 list archives would probably reveal the exact reasoning, but this is a plausible version of history. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mal at egenix.com Fri Jun 14 09:44:09 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 14 Jun 2013 09:44:09 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: Message-ID: <51BAC9C9.2070100@egenix.com> On 14.06.2013 01:37, Ezio Melotti wrote: > Hi, > > On Tue, Jun 11, 2013 at 5:49 PM, Serhiy Storchaka wrote: >> I propose to add "htmlcharrefreplace" error handler which is similar to >> "xmlcharrefreplace" error handler but use html entity names if possible. >> >>>>> '? x??'.encode('ascii', 'xmlcharrefreplace') >> b'∀ x∈ℜ' >>>>> '? x??'.encode('ascii', 'htmlcharrefreplace') >> b'∀ x∈ℜ' >> > > Do you have any use cases for this, or is it just for completeness > since we already have xmlcharrefreplace? The purpose is the same, but in a different, also very common context. As for use cases, you already pointed out quite a few below and I'm adding a few more. > IMHO character references (named or numerical) should never be used in > HTML (with the exception of " > and <). > They exist mainly for three reasons: > 1) provide a way to include characters that are not available in the > used encoding (e.g. if you are using an obsolete encoding like > windows-1252 but still want to use "fancy" characters); > 2) to keep the HTML source ASCII-only; This is the main reason for using them. HTML's default encoding is Latin-1, unlike XML. > 3) to specify a character by name if it's not possible to enter it > directly (e.g. you don't know the keys combinations); They exist for the same reason you have named Unicode characters: to make it obvious which character you are using without having to rely on a specific encoding. Another reason to use them is that a user might not have the needed fonts to display the characters in question. And in some cases, you also need to use the references to escape certain characters from being interpreted using their HTML meaning, e.g. & and the ones you've given above. But that's not the use case for the error handler. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 14 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 17 days to go 2013-07-16: Python Meeting Duesseldorf ... 32 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From wolfgang.maier at biologie.uni-freiburg.de Fri Jun 14 09:54:28 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Fri, 14 Jun 2013 07:54:28 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: <51BA6427.1090202@canterbury.ac.nz> Message-ID: Greg Ewing writes: > > Wolfgang Maier wrote: > > the write methods of the different > > io module objects work on bytes and str objects only. The built-in functions > > print() and bytes(), on the other hand, use an arbitrary object's __str__ > > and __bytes__ methods to compute the str and bytes they should work with. > > Wouldn't it be more consistent and pythonic if the io write methods behaved > > the same way? > > There's a difference: print() and bytes() are just single > functions, but write() is an interface implemented by many > objects. Requiring write() to apply __str__ or __bytes__ > would place a burden on all implementations of I/O objects. > Hi Greg, aren't the I/O objects in io inheriting from each other anyway? So changes in the appropriate base classes would be reflected in subclasses? Best, Wolfgang From wolfgang.maier at biologie.uni-freiburg.de Fri Jun 14 10:35:02 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Fri, 14 Jun 2013 08:35:02 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: <1371168899.36922.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: Andrew Barnert writes: > > From: Wolfgang Maier > > > Well, I was illustrating the case with a literal integer, but, of course, I > > was thinking of cases with references: > > a=1234 > > str(a).encode() # gives b'1234' in Python3, but converting your int to > > str > > first, just to encode it again to bytes seems weird > > Conceptually, it makes perfect sense. b'1234' isn't a string with the canonical numeral representation > of 1234, it's a sequence of bytes, which happens to be a particular (unspecified) encoding of a > string?with the canonical numeral representation of 1234. > > The docs (http://docs.python.org/3.3/library/functions.html#bytes) explicitly say a bytes object: > > >?is an immutable sequence of integers in the range 0 <= x < 256. bytes is an immutable version of bytearray > > Practically, you often want to use bytes as "ASCII strings", and you often can get away with it.?It works > for literals, some but not all methods, and of course everything that strings inherit from sequences > (concatenation, slicing, etc.).? > > But often you can't get away with it.?It doesn't work for formatting, anything strings do differently > from sequences (notably indexing), some methods, most functions that special-case on strings, > type-checking (there's no basestring in 3.x), etc. > > Likewise, the bytes() constructor doesn't work quite like str(),?and there's no bytes equivalent of repr(). > > Obviously, there's a tradeoff behind all of those decisions. It wouldn't have been hard to put > bytes.__mod__, bytes.format, basestring, etc. into Python 3, or to make b'a'[0] return b'a' instead of > 97, or to?make bytes(x) work more like str(x), or to add a brepr or similar function, etc. But it would make > bytes less useful as a sequence of 8-bit integers. And, more importantly, it would be an attractive > nuisance, making a lot of common errors more common (as they were in 2.x).?As the docs > (http://docs.python.org/3.3/library/stdtypes.html#bytes)?put it: > > >?This is done deliberately to emphasise that while many binary formats include ASCII based elements and > can be usefully manipulated with some text-oriented algorithms, this is not generally the case for > arbitrary binary data (blindly applying text processing algorithms to binary data formats that are not > ASCII compatible will usually lead to data corruption). > I have to say I'm not too enthusiastic about the bytes type in Python 3. The tradeoffs you're mentioning cause bytes to be sort of a hybrid between strings and numbers trying to combine aspects of both. This makes them very different from all other types in Python and is the reason behind much confusion. Personally, I would have preferred a clear decision to make bytes a sequence of 8-bit characters *or* integers (or have two separate types bytestring and byteint). Still, the current design has been discussed among people who understand the topic much better than me, so I'm not trying to argue about it, but to arrange with the status quo. > Anyway, why do you actually want a bytes here? Maybe there's a better design for what you're trying to do that > would make this whole issue irrelevant to your code. > The actual problem here is that I'm reading bytes from a text file (it's a huge file, so I/O speed matters and working in text mode is no option). Then I'm extracting numeric values from the file that I need for calculations, so I'm converting bytes to int here. While that's fine, I then want to write the result along with other parts of the original file to a new file. Now the result is an integer, while the rest of the data is bytes already, so I have to convert my integer to bytes to .join it with the rest, then write it. Here's the (simplified) problem: an input line from my file: b'somelinedescriptor\t100\t500\tmorestuffhere\n' what I need is calculate the difference between the numbers (500-100), then write this to a new file: b'somelinedescriptor\t400\tmorestuffhere\n' Currently I solve this by splitting on '\t', converting elements 1 and 2 of the resulting list to int, then (in slightly abstracted code) b'\t'.join((element0, str(subtraction_result).encode(), element3)), then writing. So, in essence, I'm going through this int -> str -> bytes conversion scheme for a million lines in my file, which just doesn't feel right. What's missing is a direct way for int -> bytes. Any suggestions are welcome. Best, Wolfgang From solipsis at pitrou.net Fri Jun 14 10:49:31 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 14 Jun 2013 10:49:31 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler References: <51BAC9C9.2070100@egenix.com> Message-ID: <20130614104931.23a917b5@fsol> On Fri, 14 Jun 2013 09:44:09 +0200 "M.-A. Lemburg" wrote: > > > IMHO character references (named or numerical) should never be used in > > HTML (with the exception of " > and <). > > They exist mainly for three reasons: > > 1) provide a way to include characters that are not available in the > > used encoding (e.g. if you are using an obsolete encoding like > > windows-1252 but still want to use "fancy" characters); > > 2) to keep the HTML source ASCII-only; > > This is the main reason for using them. HTML's default encoding > is Latin-1, unlike XML. I'd like to know which good reasons there are to not use utf-8 for HTML pages in 2013. "Keeping the HTML source ASCII-only" is just silly IMO, and it doesn't warrant special support in Python's codec error handlers. Regards Antoine. From solipsis at pitrou.net Fri Jun 14 10:53:07 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 14 Jun 2013 10:53:07 +0200 Subject: [Python-ideas] duck typing for io write methods References: Message-ID: <20130614105307.680866a0@fsol> On Thu, 13 Jun 2013 08:50:21 +0000 (UTC) Wolfgang Maier wrote: > Dear all, > currently - and referring to Python 3 - the write methods of the different > io module objects work on bytes and str objects only. The built-in functions > print() and bytes(), on the other hand, use an arbitrary object's __str__ > and __bytes__ methods to compute the str and bytes they should work with. > Wouldn't it be more consistent and pythonic if the io write methods behaved > the same way? No, it wouldn't. print() is meant to print *any* object's string representation, write() is meant to write a str-like or bytes-like object to an output stream. It isn't the same thing and therefore there is no "consistency" to look after here. Having write() convert implicitly all arguments to bytes or str would not be "Pythonic", it would be PHP-ic. Regards Antoine. From wolfgang.maier at biologie.uni-freiburg.de Fri Jun 14 11:00:16 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Fri, 14 Jun 2013 09:00:16 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: <51BA5F0C.1040909@pearwood.info> Message-ID: Steven D'Aprano writes: > > On 14/06/13 00:41, Wolfgang Maier wrote: > > >>> It's funny you mention that difference since that was how I came across my > >>> issue. I was looking for a way to get back the Python 2.7 behaviour > >>> bytes('1234') > >>> '1234' > >> > >> You mean other than using the bytes literal b'1234' instead of a > >> string literal? Bytes and text are different things in Python 3, > >> whereas the 2.x "bytes" was just an alias for "str". > >> > > > > Well, I was illustrating the case with a literal integer, but, of course, I > > was thinking of cases with references: > > a=1234 > > str(a).encode() # gives b'1234' in Python3, but converting your int to str > > first, just to encode it again to bytes seems weird > > On the contrary, it is the most natural way to do it. Converting objects directly to bytes is not > conceptually obvious. I can think of at least TWELVE obvious ways which the int 4 might convert to bytes > (displaying in all hex, rather than the more compact but less consistent forms): > > # Treat it as a 8-bit, 16-bit, 32-bit or 64-bit integer: > b'\x04' this is what's currently happening with: bytes([4]) > b'\x00\x04' > b'\x04\x00' > b'\x00\x00\x00\x04' > b'\x04\x00\x00\x00' > b'\x00\x00\x00\x00\x00\x00\x00\x04' > b'\x04\x00\x00\x00\x00\x00\x00\x00' > these would be ways to make bytes([seq of ints]) work with numbers > 255, which is currently not possible. Maybe bytes([seq of ints]) could take an additional encoding argument that specifies how many bytes to reserve per int. > # Convert it to the string '4' first, then encode to bytes > # as UTF-8, UTF-16, or UTF-32: > b'\x34' > b'\x00\x34' > b'\x34\x00' > b'\x34\x00\x00\x00' > b'\x00\x00\x00\x34' > this is what str(int).encode() does, but is quite complicated, since it actually generates a full-blown Python string object first, then encodes this to bytes again. What should be done, I think, is that a int_to_byte() function or method converts each digit of an int to its ascii code and turns this into bytes. Of course, this would be done in C, so the only high-level object ever generated would be the final bytes object. > The actual behaviour, where bytes(4) => b'\x00\x00\x00\x00', I consider to be neither obvious nor > especially useful. If bytes were mutable, then bytes(4) would be a useful way to initialise a block of four > bytes for later modification. But they aren't, so I don't really see the point. The obvious way to get four > NUL bytes is surely b'\0'*4, so it's also redundant. > > That you can't even subclass int and override it, like you can override every other dunder method (__str__, > __repr__, __add__, __mul__, etc.) strikes me as astonishingly weird and in violation of the Zen: > > Special cases aren't special enough to break the rules. > > I imagine that the code for the bytes builtin looks something like this in pseudo-code: > > if isinstance(arg, int): > special case int > elif isinstance(arg, str): > special case str > else: > call __bytes__ method > > I don't think it would effect performance very much, if at all, if it were changed to: > > if type(arg) is int: > special case int > elif type(arg) is str: > special case str > else: > call __bytes__ method > > ints and strs will have to grow a dunder method in order to support inheritance, but the implication could be > as simple as: > > def __bytes__(self): > return bytes(int(self)) > > def __bytes__(self, encoding): > return bytes(str(self), encoding) > > Of course, I may have missed some logic for the current behaviour. > I find the current implementation very disturbing, too, and would very much favour a solution like yours. Nick argued that it would slow down native str and bytes unduely, and I'm in no position to argue against this. He's probably thought it through more deeply than we could, but, yes, the current way is against the Zen. Wolfgang From steve at pearwood.info Fri Jun 14 11:06:55 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 14 Jun 2013 19:06:55 +1000 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614104931.23a917b5@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> Message-ID: <51BADD2F.1090505@pearwood.info> On 14/06/13 18:49, Antoine Pitrou wrote: > "Keeping the HTML source ASCII-only" is just silly IMO, Surely no sillier than "keep the Python std lib source ASCII-only". > and it doesn't > warrant special support in Python's codec error handlers. We're talking about this as if it were a major change. Doesn't this count as a trivial addition? The only question in my mind is, "Are the HTML char ref rules different enough from the XML rules that Python should provide both?" -- Steven From pyideas at rebertia.com Fri Jun 14 11:10:52 2013 From: pyideas at rebertia.com (Chris Rebert) Date: Fri, 14 Jun 2013 02:10:52 -0700 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: <1371168899.36922.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: On Jun 14, 2013 1:42 AM, "Wolfgang Maier" < wolfgang.maier at biologie.uni-freiburg.de> wrote: > Andrew Barnert writes: > > From: Wolfgang Maier > > Anyway, why do you actually want a bytes here? Maybe there's a better > design for what you're trying to do that > > would make this whole issue irrelevant to your code. > > The actual problem here is that I'm reading bytes from a text file (it's a > huge file, so I/O speed matters and working in text mode is no option). Then > I'm extracting numeric values from the file that I need for calculations, so > I'm converting bytes to int here. While that's fine, I then want to write > the result along with other parts of the original file to a new file. Now > the result is an integer, while the rest of the data is bytes already, so I > have to convert my integer to bytes to .join it with the rest, then write it. > Here's the (simplified) problem: > an input line from my file: > b'somelinedescriptor\t100\t500\tmorestuffhere\n' > what I need is calculate the difference between the numbers (500-100), then > write this to a new file: > b'somelinedescriptor\t400\tmorestuffhere\n' > > Currently I solve this by splitting on '\t', converting elements 1 and 2 of > the resulting list to int, then (in slightly abstracted code) > b'\t'.join((element0, str(subtraction_result).encode(), element3)), then > writing. So, in essence, I'm going through this int -> str -> bytes > conversion scheme for a million lines in my file, which just doesn't feel > right. What's missing is a direct way for int -> bytes. Any suggestions are > welcome. http://docs.python.org/3/library/stdtypes.html#int.to_bytes If it was a snake it would have bit ya; Guido's time machine strikes again; etc... Cheers, Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.maier at biologie.uni-freiburg.de Fri Jun 14 11:16:23 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Fri, 14 Jun 2013 09:16:23 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: <51BA5F0C.1040909@pearwood.info> Message-ID: Wolfgang Maier writes: > > Steven D'Aprano ...> writes: > > > b'\x00\x04' > > b'\x04\x00' > > b'\x00\x00\x00\x04' > > b'\x04\x00\x00\x00' > > b'\x00\x00\x00\x00\x00\x00\x00\x04' > > b'\x04\x00\x00\x00\x00\x00\x00\x00' > > > > these would be ways to make bytes([seq of ints]) work with numbers > 255, > which is currently not possible. Maybe bytes([seq of ints]) could take an > additional encoding argument that specifies how many bytes to reserve per int. > Chris Rebert just pointed out the new Python 3.2 int methods .to_bytes and .from_bytes, which do exactly this. Does anybody know, why .from_bytes was implemented as an int method instead of a bytes method .to_int ?? Wolfgang From solipsis at pitrou.net Fri Jun 14 11:22:45 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 14 Jun 2013 11:22:45 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BADD2F.1090505@pearwood.info> Message-ID: <20130614112245.24c44ed6@fsol> On Fri, 14 Jun 2013 19:06:55 +1000 Steven D'Aprano wrote: > On 14/06/13 18:49, Antoine Pitrou wrote: > > "Keeping the HTML source ASCII-only" is just silly IMO, > > Surely no sillier than "keep the Python std lib source ASCII-only". Or than drawing stupid analogies. Do you understand the difference between source code and hypertext documents? > > and it doesn't > > warrant special support in Python's codec error handlers. > > We're talking about this as if it were a major change. Doesn't this count as a trivial addition? The only question in my mind is, "Are the HTML char ref rules different enough from the XML rules that Python should provide both?" It's not trivial, it's additional C code in an important part of the language (unicode and codecs). And I haven't seen you propose a patch (when was your last patch, by the way?). Regards Antoine. From masklinn at masklinn.net Fri Jun 14 11:25:21 2013 From: masklinn at masklinn.net (Masklinn) Date: Fri, 14 Jun 2013 11:25:21 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614104931.23a917b5@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> Message-ID: <0B8CA9A2-6241-4429-ABDB-84EB72BBDB41@masklinn.net> On 2013-06-14, at 10:49 , Antoine Pitrou wrote: > On Fri, 14 Jun 2013 09:44:09 +0200 > "M.-A. Lemburg" wrote: >> >>> IMHO character references (named or numerical) should never be used in >>> HTML (with the exception of " > and <). >>> They exist mainly for three reasons: >>> 1) provide a way to include characters that are not available in the >>> used encoding (e.g. if you are using an obsolete encoding like >>> windows-1252 but still want to use "fancy" characters); >>> 2) to keep the HTML source ASCII-only; >> >> This is the main reason for using them. HTML's default encoding >> is Latin-1, unlike XML. > > I'd like to know which good reasons there are to not use utf-8 for HTML > pages in 2013. > "Keeping the HTML source ASCII-only" is just silly IMO, and it doesn't > warrant special support in Python's codec error handlers. As far as I know M.A. is technically wrong, there is no such thing as a default HTML encoding (browsers have their own possibly configurable[0] defaults with "proprietary" heuristics, but no HTML spec defines any kind of default only a sequence of encoding extraction before falling back on heuristics). Most browsers tend to fall back on windows-1252 (not ASCII and not latin1, in fact they'll usually coerce explicit ascii or latin1 requests to windows-1252 internally) because that's what is often encountered (historically anyway) when no encoding is specified anywhere at all. A UTF-8 default is a stupid idea (for browsers) if it breaks more content than it makes available. [0] in Firefox's settings, Content > Fonts [Advanced] > Default Character Encoding From ncoghlan at gmail.com Fri Jun 14 11:30:43 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 14 Jun 2013 19:30:43 +1000 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: <51BA5F0C.1040909@pearwood.info> Message-ID: On 14 Jun 2013 19:23, "Wolfgang Maier" < wolfgang.maier at biologie.uni-freiburg.de> wrote: > > Wolfgang Maier writes: > > > > > Steven D'Aprano ...> writes: > > > > > b'\x00\x04' > > > b'\x04\x00' > > > b'\x00\x00\x00\x04' > > > b'\x04\x00\x00\x00' > > > b'\x00\x00\x00\x00\x00\x00\x00\x04' > > > b'\x04\x00\x00\x00\x00\x00\x00\x00' > > > > > > > these would be ways to make bytes([seq of ints]) work with numbers > 255, > > which is currently not possible. Maybe bytes([seq of ints]) could take an > > additional encoding argument that specifies how many bytes to reserve per int. > > > > Chris Rebert just pointed out the new Python 3.2 int methods .to_bytes and > .from_bytes, which do exactly this. Does anybody know, why .from_bytes was > implemented as an int method instead of a bytes method .to_int ?? Because it accepts arbitrary buffer exporters (bytearray, array.array, memoryview, mmap, ctypes, ndarray, etc), not just bytes. (You can't write buffer exporters in pure Python at this point - there's an open issue for that somewhere). Cheers, Nick. > Wolfgang > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.maier at biologie.uni-freiburg.de Fri Jun 14 11:33:44 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Fri, 14 Jun 2013 09:33:44 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: <1371168899.36922.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: Chris Rebert writes: > > > On Jun 14, 2013 1:42 AM, "Wolfgang Maier" wrote: > > Andrew Barnert ...> writes: > > > From: Wolfgang Maier ...> > > > > Anyway, why do you actually want a bytes here? Maybe there's a better > > design for what you're trying to do that > > > would make this whole issue irrelevant to your code. > > > > The actual problem here is that I'm reading bytes from a text file (it's a > > huge file, so I/O speed matters and working in text mode is no option). Then > > I'm extracting numeric values from the file that I need for calculations, so > > I'm converting bytes to int here. While that's fine, I then want to write > > the result along with other parts of the original file to a new file. Now > > the result is an integer, while the rest of the data is bytes already, so I > > have to convert my integer to bytes to .join it with the rest, then write it. > > Here's the (simplified) problem: > > an input line from my file: > > b'somelinedescriptor\t100\t500\tmorestuffhere\n' > > what I need is calculate the difference between the numbers (500-100), then > > write this to a new file: > > b'somelinedescriptor\t400\tmorestuffhere\n' > > > > Currently I solve this by splitting on '\t', converting elements 1 and 2 of > > the resulting list to int, then (in slightly abstracted code) > > b'\t'.join((element0, str(subtraction_result).encode(), element3)), then > > writing. So, in essence, I'm going through this int -> str -> bytes > > conversion scheme for a million lines in my file, which just doesn't feel > > right. What's missing is a direct way for int -> bytes. Any suggestions are > > welcome. > http://docs.python.org/3/library/stdtypes.html#int.to_bytes > If it was a snake it would have bit ya; Guido's time machine strikes again; etc... > Cheers, > Chris > Hi Chris, this isn?t exactly what I?m looking for. The .to_bytes does a 'numeric' conversion, but what I?m looking for is a 'character' conversion (I?m dealing with a text file), essentially converting every digit of the int to its ascii code, then turning these into a bytes sequence. Still, I hadn?t come across int.to_bytes and int.from_bytes yet, and they might be very useful at some point, so thanks a lot for this suggestion. Best, Wolfgang From mal at egenix.com Fri Jun 14 11:36:50 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 14 Jun 2013 11:36:50 +0200 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: <1371168899.36922.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: <51BAE432.6030308@egenix.com> On 14.06.2013 10:35, Wolfgang Maier wrote: > The actual problem here is that I'm reading bytes from a text file (it's a > huge file, so I/O speed matters and working in text mode is no option). Then > I'm extracting numeric values from the file that I need for calculations, so > I'm converting bytes to int here. While that's fine, I then want to write > the result along with other parts of the original file to a new file. Now > the result is an integer, while the rest of the data is bytes already, so I > have to convert my integer to bytes to .join it with the rest, then write it. > Here's the (simplified) problem: > an input line from my file: > b'somelinedescriptor\t100\t500\tmorestuffhere\n' > what I need is calculate the difference between the numbers (500-100), then > write this to a new file: > b'somelinedescriptor\t400\tmorestuffhere\n' > > Currently I solve this by splitting on '\t', converting elements 1 and 2 of > the resulting list to int, then (in slightly abstracted code) > b'\t'.join((element0, str(subtraction_result).encode(), element3)), then > writing. So, in essence, I'm going through this int -> str -> bytes > conversion scheme for a million lines in my file, which just doesn't feel > right. What's missing is a direct way for int -> bytes. Any suggestions are > welcome. I think you'd be better off, reading the data as text based on the encoding used in the data, using int() for parsing and the math, creating text again for the output and then writing everything back using the same encoding you used for the import, i.e. data -> text -> math -> text -> data -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 14 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 17 days to go 2013-07-16: Python Meeting Duesseldorf ... 32 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Fri Jun 14 11:38:46 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 14 Jun 2013 11:38:46 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614104931.23a917b5@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> Message-ID: <51BAE4A6.20507@egenix.com> On 14.06.2013 10:49, Antoine Pitrou wrote: > On Fri, 14 Jun 2013 09:44:09 +0200 > "M.-A. Lemburg" wrote: >> >>> IMHO character references (named or numerical) should never be used in >>> HTML (with the exception of " > and <). >>> They exist mainly for three reasons: >>> 1) provide a way to include characters that are not available in the >>> used encoding (e.g. if you are using an obsolete encoding like >>> windows-1252 but still want to use "fancy" characters); >>> 2) to keep the HTML source ASCII-only; >> >> This is the main reason for using them. HTML's default encoding >> is Latin-1, unlike XML. > > I'd like to know which good reasons there are to not use utf-8 for HTML > pages in 2013. > "Keeping the HTML source ASCII-only" is just silly IMO, and it doesn't > warrant special support in Python's codec error handlers. Ezio and I gave reasons, but you've cut them away ;-) Note that error handlers can be registered in the codec registry. You don't need to add support for them to each and every codec, so the added code is minimal. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 14 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 17 days to go 2013-07-16: Python Meeting Duesseldorf ... 32 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Fri Jun 14 11:43:41 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 14 Jun 2013 11:43:41 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> Message-ID: <20130614114341.678c57b6@fsol> On Fri, 14 Jun 2013 11:38:46 +0200 "M.-A. Lemburg" wrote: > On 14.06.2013 10:49, Antoine Pitrou wrote: > > On Fri, 14 Jun 2013 09:44:09 +0200 > > "M.-A. Lemburg" wrote: > >> > >>> IMHO character references (named or numerical) should never be used in > >>> HTML (with the exception of " > and <). > >>> They exist mainly for three reasons: > >>> 1) provide a way to include characters that are not available in the > >>> used encoding (e.g. if you are using an obsolete encoding like > >>> windows-1252 but still want to use "fancy" characters); > >>> 2) to keep the HTML source ASCII-only; > >> > >> This is the main reason for using them. HTML's default encoding > >> is Latin-1, unlike XML. > > > > I'd like to know which good reasons there are to not use utf-8 for HTML > > pages in 2013. > > "Keeping the HTML source ASCII-only" is just silly IMO, and it doesn't > > warrant special support in Python's codec error handlers. > > Ezio and I gave reasons, but you've cut them away ;-) Uh, no, you cut Ezio's own rebuttals to those reasons. Ezio's point still stands: named HTML character references have a use for *manual* entering of HTML text (though of course they are cumbersome), but that doesn't warrant a codec error handler which by construction is used for *automatic* generation of HTML text. Regards Antoine. From stefan at drees.name Fri Jun 14 11:37:28 2013 From: stefan at drees.name (Stefan Drees) Date: Fri, 14 Jun 2013 11:37:28 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614112245.24c44ed6@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BADD2F.1090505@pearwood.info> <20130614112245.24c44ed6@fsol> Message-ID: <51BAE458.8040900@drees.name> On 2013-06-14 11:22 CEST, Antoine Pitrou wrote: > On Fri, 14 Jun 2013 19:06:55 +1000 > Steven D'Aprano ... wrote: >> On 14/06/13 18:49, Antoine Pitrou wrote: >>> "Keeping the HTML source ASCII-only" is just silly IMO, >> >> Surely no sillier than "keep the Python std lib source ASCII-only". > > the difference > between source code and hypertext documents? still in 2013, if you upload documents to at least one standardizing organization and you use utf-8 as author you are fine, as long it only uses ASCII characters ;-) Any umlaut or other typographically utf-8'd slipping in, ends up as broken latin-1 rendering. It will take many more years I presume until the chain of submitted documents and servers serving the received versions is really utf-8 safe. >>> and it doesn't >>> warrant special support in Python's codec error handlers. >> > We're talking about this as if it were a major change. Doesn't > this count as a trivial addition? The only question in my mind is, "Are the > HTML char ref rules different enough from the XML rules that Python > should provide both?" > > It's not trivial, it's additional C code in an important part of the > language (unicode and codecs). > > And I haven't seen you propose a patch . could we try to refrain from some b.t.w.'s \? (using trigraph-safe question mark encoding, in case some tool has trigraphs still turned on :-?) All the ebst, Stefan. From mal at egenix.com Fri Jun 14 11:58:44 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 14 Jun 2013 11:58:44 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <0B8CA9A2-6241-4429-ABDB-84EB72BBDB41@masklinn.net> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <0B8CA9A2-6241-4429-ABDB-84EB72BBDB41@masklinn.net> Message-ID: <51BAE954.4080300@egenix.com> On 14.06.2013 11:25, Masklinn wrote: > On 2013-06-14, at 10:49 , Antoine Pitrou wrote: >> On Fri, 14 Jun 2013 09:44:09 +0200 >> "M.-A. Lemburg" wrote: >>> >>>> IMHO character references (named or numerical) should never be used in >>>> HTML (with the exception of " > and <). >>>> They exist mainly for three reasons: >>>> 1) provide a way to include characters that are not available in the >>>> used encoding (e.g. if you are using an obsolete encoding like >>>> windows-1252 but still want to use "fancy" characters); >>>> 2) to keep the HTML source ASCII-only; >>> >>> This is the main reason for using them. HTML's default encoding >>> is Latin-1, unlike XML. >> >> I'd like to know which good reasons there are to not use utf-8 for HTML >> pages in 2013. >> "Keeping the HTML source ASCII-only" is just silly IMO, and it doesn't >> warrant special support in Python's codec error handlers. > > As far as I know M.A. is technically wrong, there is no such thing as > a default HTML encoding (browsers have their own possibly configurable[0] > defaults with "proprietary" heuristics, but no HTML spec defines > any kind of default only a sequence of encoding extraction before > falling back on heuristics). AFAIK, this was first defined in HTML 2.0, perhaps even earlier: http://tools.ietf.org/html/draft-ietf-html-spec-05#section-6.1 http://tools.ietf.org/html/draft-ietf-html-spec-05#section-9.5 It's still part of HTML 4.0: http://www.w3.org/TR/html401/sgml/intro.html HTTP also uses Latin-1 as default: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1 But this is getting off-topic. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 14 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 17 days to go 2013-07-16: Python Meeting Duesseldorf ... 32 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Fri Jun 14 12:07:15 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 14 Jun 2013 12:07:15 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BADD2F.1090505@pearwood.info> <20130614112245.24c44ed6@fsol> <51BAE458.8040900@drees.name> Message-ID: <20130614120715.0b27b25d@fsol> On Fri, 14 Jun 2013 11:37:28 +0200 Stefan Drees wrote: > > We're talking about this as if it were a major change. Doesn't > > this count as a trivial addition? The only question in my mind is, "Are the > > HTML char ref rules different enough from the XML rules that Python > > should provide both?" > > > > It's not trivial, it's additional C code in an important part of the > > language (unicode and codecs). > > > > And I haven't seen you propose a patch . > > could we try to refrain from some b.t.w.'s \? (using trigraph-safe > question mark encoding, in case some tool has trigraphs still turned on > :-?) We could, but in this case, this was pretty much warranted. Steven suggested that a change was "trivial", so it's only fair to wonder on which grounds he can cast such a judgement (e.g. what his authority is). python-ideas may sometimes feel like a nice soapbox, but the end goal is still to have code (or docs, PEPs, etc.) to check in. People will naturally be judged, though mostly tacitly, on their contribution track record (or absence thereof). Regards Antoine. From mal at egenix.com Fri Jun 14 12:11:01 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 14 Jun 2013 12:11:01 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614114341.678c57b6@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> Message-ID: <51BAEC35.8070401@egenix.com> On 14.06.2013 11:43, Antoine Pitrou wrote: > On Fri, 14 Jun 2013 11:38:46 +0200 > "M.-A. Lemburg" wrote: >> On 14.06.2013 10:49, Antoine Pitrou wrote: >>> On Fri, 14 Jun 2013 09:44:09 +0200 >>> "M.-A. Lemburg" wrote: >>>> >>>>> IMHO character references (named or numerical) should never be used in >>>>> HTML (with the exception of " > and <). >>>>> They exist mainly for three reasons: >>>>> 1) provide a way to include characters that are not available in the >>>>> used encoding (e.g. if you are using an obsolete encoding like >>>>> windows-1252 but still want to use "fancy" characters); >>>>> 2) to keep the HTML source ASCII-only; >>>> >>>> This is the main reason for using them. HTML's default encoding >>>> is Latin-1, unlike XML. >>> >>> I'd like to know which good reasons there are to not use utf-8 for HTML >>> pages in 2013. >>> "Keeping the HTML source ASCII-only" is just silly IMO, and it doesn't >>> warrant special support in Python's codec error handlers. >> >> Ezio and I gave reasons, but you've cut them away ;-) > > Uh, no, you cut Ezio's own rebuttals to those reasons. > Ezio's point still stands: named HTML character references have a use > for *manual* entering of HTML text (though of course they are > cumbersome), but that doesn't warrant a codec error handler which by > construction is used for *automatic* generation of HTML text. I'm not sure I follow. I've definitely had use cases for the proposed error handler in the past and have written my own set of tools to do such conversions. Now instead of everyone writing their own little helper, it's better to have a single implementation in the stdlib. I think you are forgetting that the output of such a codec is not necessarily always meant for sending over the wire to some browser. It may well be used for creating data which then has to be manipulated by other tools or humans. One of the reasons we keep the Python stdlib (mostly) ASCII is exactly that: to not create problems when editing source files in editors having different character set configurations. The same notion can be applied to HTML text. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 14 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 17 days to go 2013-07-16: Python Meeting Duesseldorf ... 32 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From stefan at drees.name Fri Jun 14 12:37:44 2013 From: stefan at drees.name (Stefan Drees) Date: Fri, 14 Jun 2013 12:37:44 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614120715.0b27b25d@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BADD2F.1090505@pearwood.info> <20130614112245.24c44ed6@fsol> <51BAE458.8040900@drees.name> <20130614120715.0b27b25d@fsol> Message-ID: <51BAF278.7040206@drees.name> On 2013-06-14 12:07 CEST, Antoine Pitrou wrote: > On Fri, 14 Jun 2013 11:37:28 +0200 > Stefan Drees wrote: >>> We're talking about this as if it were a major change. Doesn't >>> this count as a trivial addition? The only question in my mind is, "Are the >>> HTML char ref rules different enough from the XML rules that Python >>> should provide both?" >>> >>> It's not trivial, it's additional C code in an important part of the >>> language (unicode and codecs). >>> >>> And I haven't seen you propose a patch . >> >> could we try to refrain from some b.t.w.'s \? (using trigraph-safe >> question mark encoding, in case some tool has trigraphs still turned on >> :-?) > > We could, but in this case, this was pretty much warranted. Steven > suggested that a change was "trivial", so it's only fair to wonder on > which grounds he can cast such a judgement (e.g. what his authority is). me, with the sun shining outside and the summer finally arriving (again) I suggest, that for such a purpose (I won't judge on it!) and in my opinion and experience the first part "And I haven't seen you propose a patch" would have been fully sufficient, wouldn't it? Additional bad feelings possibly rooted in former experiences, behaviors and inside different areas might also be better handled in a short friendly private mail exchange, I guess. > python-ideas may sometimes feel like a nice soapbox, but the end goal > is still to have code (or docs, PEPs, etc.) to check in. People will > naturally be judged, though mostly tacitly, on their contribution > track record (or absence thereof). Well, this is not python-dev, right :-?) Now for something completely different and coming back to an anti-relevance claim, the one that challenged the use case of "even automates constructing HTML need to resort to ASCII" I think I gave a nice anecdotal counter example[1] out of the wild, where the producer has not sufficient control over the final nodes of the publication chain. References: [1]: http://mail.python.org/pipermail/python-ideas/2013-June/021399.html Now back to my soapbox - the kids are already far down the hill ... ;-) All the best, Stefan. From p.f.moore at gmail.com Fri Jun 14 12:43:08 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 14 Jun 2013 11:43:08 +0100 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <51BAEC35.8070401@egenix.com> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> Message-ID: On 14 June 2013 11:11, M.-A. Lemburg wrote: > I'm not sure I follow. I've definitely had use cases for the > proposed error handler in the past and have written my own > set of tools to do such conversions. > Just as an extra data point, I have also had need for this functionality in the past. It is sometimes possible to use xmlcharrefreplace as an alternative, but having the "named" entities in the output is often useful for debugging, if nothing else. The technicalities of HTML/HTTP encodings are not so much the issue here. Much of the output of programs that would use this functionality, while ultimately intended for consumption on the web, is often read in a text editor as part of debugging and review, if nothing else. For that purpose, readable output is very useful. And sticking to ASCII, while not essential, certainly helps in an environment like Windows where UTF-8 is *not* universal (whether it should be is really not the point here). Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at drees.name Fri Jun 14 12:57:26 2013 From: stefan at drees.name (Stefan Drees) Date: Fri, 14 Jun 2013 12:57:26 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> Message-ID: <51BAF716.2040204@drees.name> On 2013-06-14 12:43 CEST, Paul Moore wrote: > On 14 June 2013 11:11, M.-A. Lemburg ...wrote: > > I'm not sure I follow. I've definitely had use cases for the > proposed error handler in the past and have written my own > set of tools to do such conversions. > > > Just as an extra data point, I have also had need for this functionality > in the past. It is sometimes possible to use xmlcharrefreplace as an > alternative, but having the "named" entities in the output is often > useful for debugging, if nothing else. > > The technicalities of HTML/HTTP encodings are not so much the issue > here. Much of the output of programs that would use this functionality, > while ultimately intended for consumption on the web, is often read in a > text editor as part of debugging and review, if nothing else. For that > purpose, readable output is very useful. And sticking to ASCII, while > not essential, certainly helps in an environment like Windows where > UTF-8 is *not* universal (whether it should be is really not the point > here). just to add to this: I have grown a hard wired reflex when handing over program source files to admins for deployment in windows operating system driven HTML/HTTP environments to: Ensure the admin has an editor at hand to check that the utf-8 clean encoded text files she received do not suddenly become BOM-ed under the radar just because the admin changed some local file path in a config file or the like and subsequently stored it "subconsciously". The time otherwise lost in hunting mystery effects counts in days but feels like weeks ... And yes, I often have to deliver utf-8 files to "ease" the HTML/HTTP handling chain, but in debugging situations IMO it seems to be good to easily resort a pure ASCII representation without writing extra routines for it. All the best, Stefan. From alexander.belopolsky at gmail.com Fri Jun 14 13:17:00 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 14 Jun 2013 07:17:00 -0400 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <51BAEC35.8070401@egenix.com> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> Message-ID: On Fri, Jun 14, 2013 at 6:11 AM, M.-A. Lemburg wrote: > I think you are forgetting that the output of such a codec > is not necessarily always meant for sending over the wire > to some browser. It may well be used for creating data which > then has to be manipulated by other tools or humans. > +1 On top of that, even HTML that is sent over the wire to a browser may end up being read by a human. It is for a good reason that every browser has a view source option more or less readily available. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Jun 14 13:20:59 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 14 Jun 2013 13:20:59 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> Message-ID: <20130614132059.6a1c5fa5@fsol> On Fri, 14 Jun 2013 07:17:00 -0400 Alexander Belopolsky wrote: > On Fri, Jun 14, 2013 at 6:11 AM, M.-A. Lemburg wrote: > > > I think you are forgetting that the output of such a codec > > is not necessarily always meant for sending over the wire > > to some browser. It may well be used for creating data which > > then has to be manipulated by other tools or humans. > > > > +1 > > On top of that, even HTML that is sent over the wire to a browser may end > up being read by a human. It is for a good reason that every browser has a > view source option more or less readily available. If you want to *read* HTML (not write it), then you certainly want the original Unicode characters, not the garbled HTML entities meant to represent them. Regards Antoine; From stefan at drees.name Fri Jun 14 13:31:43 2013 From: stefan at drees.name (Stefan Drees) Date: Fri, 14 Jun 2013 13:31:43 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614132059.6a1c5fa5@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> Message-ID: <51BAFF1F.5060608@drees.name> On 2013-06-14.06 13:20, Antoine Pitrou wrote: > On Fri, 14 Jun 2013 07:17:00 -0400 > Alexander Belopolsky...wrote: >> On Fri, Jun 14, 2013 at 6:11 AM, M.-A. Lemburg ... wrote: >> >>> I think you are forgetting that the output of such a codec >>> is not necessarily always meant for sending over the wire >>> to some browser. It may well be used for creating data which >>> then has to be manipulated by other tools or humans. >>> >> >> +1 >> >> On top of that, even HTML that is sent over the wire to a browser may end >> up being read by a human. It is for a good reason that every browser has a >> view source option more or less readily available. > > If you want to *read* HTML (not write it), then you certainly want the > original Unicode characters, not the garbled HTML entities meant to > represent them. yes when everything just works and as a consumer, but then as the producers we are :-) in the midst of a review session a debugging attempt or when seeking a workaround, the view ascii source level of about any platform comes in quite handy ... All the best, Stefan. From alexander.belopolsky at gmail.com Fri Jun 14 13:33:53 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 14 Jun 2013 07:33:53 -0400 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614132059.6a1c5fa5@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> Message-ID: On Fri, Jun 14, 2013 at 7:20 AM, Antoine Pitrou wrote: > > On Fri, 14 Jun 2013 07:17:00 -0400 > Alexander Belopolsky > wrote: > .. > > On top of that, even HTML that is sent over the wire to a browser may end > > up being read by a human. .. > > If you want to *read* HTML (not write it), then you certainly want the > original Unicode characters, not the garbled HTML entities meant to > represent them. Not necessarily. More often than not the reason to reach for the "View Source" menu item is that the page you are looking at is garbled. In this case it is frustrating to see similarly garbled source or a stream of #NNNNs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Jun 14 13:35:24 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 14 Jun 2013 13:35:24 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> <51BAFF1F.5060608@drees.name> Message-ID: <20130614133524.0df7341a@fsol> On Fri, 14 Jun 2013 13:31:43 +0200 Stefan Drees wrote: > On 2013-06-14.06 13:20, Antoine Pitrou wrote: > > On Fri, 14 Jun 2013 07:17:00 -0400 > > Alexander Belopolsky...wrote: > >> On Fri, Jun 14, 2013 at 6:11 AM, M.-A. Lemburg ... wrote: > >> > >>> I think you are forgetting that the output of such a codec > >>> is not necessarily always meant for sending over the wire > >>> to some browser. It may well be used for creating data which > >>> then has to be manipulated by other tools or humans. > >>> > >> > >> +1 > >> > >> On top of that, even HTML that is sent over the wire to a browser may end > >> up being read by a human. It is for a good reason that every browser has a > >> view source option more or less readily available. > > > > If you want to *read* HTML (not write it), then you certainly want the > > original Unicode characters, not the garbled HTML entities meant to > > represent them. > > yes when everything just works and as a consumer, but then as the > producers we are :-) in the midst of a review session a debugging > attempt or when seeking a workaround, the view ascii source level of > about any platform comes in quite handy ... Perhaps it does, but that's not a reason to add an error handler to Python. If you want debug output, you should write your own debug routines (or, you can simply display the HTML's repr()). So I still agree with Ezio: the function may be useful as part of the stdlib, but it doesn't have to be an encoding error handler. Regards Antoine. From stefan at drees.name Fri Jun 14 13:53:11 2013 From: stefan at drees.name (Stefan Drees) Date: Fri, 14 Jun 2013 13:53:11 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614133524.0df7341a@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> <51BAFF1F.5060608@drees.name> <20130614133524.0df7341a@fsol> Message-ID: <51BB0427.7090902@drees.name> On 2013-06-14 13:35 CEST, Antoine Pitrou wrote: > On Fri, 14 Jun 2013 13:31:43 +0200 > Stefan Drees ... wrote: > >> On 2013-06-14.06 13:20, Antoine Pitrou wrote: >>> On Fri, 14 Jun 2013 07:17:00 -0400 >>> Alexander Belopolsky...wrote: >>>> On Fri, Jun 14, 2013 at 6:11 AM, M.-A. Lemburg ... wrote: >>>> >>>>> I think you are forgetting that the output of such a codec >>>>> is not necessarily always meant for sending over the wire >>>>> to some browser. It may well be used for creating data which >>>>> then has to be manipulated by other tools or humans. >>>>> >>>> >>>> +1 >>>> >>>> On top of that, even HTML that is sent over the wire to a browser may end >>>> up being read by a human. It is for a good reason that every browser has a >>>> view source option more or less readily available. >>> >>> If you want to *read* HTML (not write it), then you certainly want the >>> original Unicode characters, not the garbled HTML entities meant to >>> represent them. >> >> yes when everything just works and as a consumer, but then as the >> producers we are :-) in the midst of a review session a debugging >> attempt or when seeking a workaround, the view ascii source level of >> about any platform comes in quite handy ... > > Perhaps it does, but that's not a reason to add an error handler to > Python. If you want debug output, you should write your own debug > routines (or, you can simply display the HTML's repr()). > > So I still agree with Ezio: the function may be useful as part of the > stdlib, but it doesn't have to be an encoding error handler. +1 based on that summarizing evaluation ... surprisingly I will have to continue writing my own debug routines ;-) All the best, Stefan. From alexander.belopolsky at gmail.com Fri Jun 14 14:09:49 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 14 Jun 2013 08:09:49 -0400 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614133524.0df7341a@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> <51BAFF1F.5060608@drees.name> <20130614133524.0df7341a@fsol> Message-ID: On Fri, Jun 14, 2013 at 7:35 AM, Antoine Pitrou wrote: > > So I still agree with Ezio: the function may be useful as part of the > stdlib, but it doesn't have to be an encoding error handler. I don't understand why this functionality should be implemented as anything but an encoding error handler. It can still be implemented in the html package which would either register it itself or export a handler that applications would need to register by calling codecs.register_error(). A more user-friendly solution would be to pre-register a light-weight handler that would not import html.entities and possible most of its own implementation until the first use. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Jun 14 14:21:42 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 14 Jun 2013 14:21:42 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> <51BAFF1F.5060608@drees.name> <20130614133524.0df7341a@fsol> Message-ID: <20130614142142.4287678c@fsol> On Fri, 14 Jun 2013 08:09:49 -0400 Alexander Belopolsky wrote: > On Fri, Jun 14, 2013 at 7:35 AM, Antoine Pitrou wrote: > > > > So I still agree with Ezio: the function may be useful as part of the > > stdlib, but it doesn't have to be an encoding error handler. > > I don't understand why this functionality should be implemented as anything > but an encoding error handler. It can still be implemented in the html > package which would either register it itself or export a handler that > applications would need to register by calling codecs.register_error(). Making registration manual would indeed be a better fit for the intended use cases, IMO. I don't think such a specialized function belongs to the built-in set of error handlers. Regards Antoine. From amauryfa at gmail.com Fri Jun 14 14:39:39 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 14 Jun 2013 14:39:39 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614142142.4287678c@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> <51BAFF1F.5060608@drees.name> <20130614133524.0df7341a@fsol> <20130614142142.4287678c@fsol> Message-ID: 2013/6/14 Antoine Pitrou > Making registration manual would indeed be a better fit for the > intended use cases, IMO. I don't think such a specialized function > belongs to the built-in set of error handlers. > By the way, why is it necessary to register? Since an error handler is defined by its callback function, we could allow functions for the "errors" parameter. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From wrr at mixedbit.org Thu Jun 13 20:06:10 2013 From: wrr at mixedbit.org (Jan Wrobel) Date: Thu, 13 Jun 2013 20:06:10 +0200 Subject: [Python-ideas] Elixir inspired pipe to apply a series of functions Message-ID: Hello, I've recently stumbled upon a Joe Armstrong's (of Erlang) blog post that praises an Elixir pipe operator: http://joearms.github.io/2013/05/31/a-week-with-elixir.html The operator allows to nicely structure code that applies a series of functions to transform an input value to some output. I often end up writing code like: pkcs7_unpad( reduce(lambda result, block: result.append(block), map(decrypt_block, pairwise([iv] + secret_blocks)))) Which is dense, and needs to be read backwards (last operation is written first), but as Joe notes, the alternative is also not very compelling: decrypted_blocks = map(decrypt_block, pairwise([iv] + secret_blocks)) combined_blocks = reduce(lambda result, block: result.append(block)) return pkcs7_unpad(combined_blocks) The pipe operator nicely separates subsequent operations and allows to read them in a natural order without the need for temporary variables. Something like: [iv] + secret_blocks |> pairwise |> map, decrypt_block |> \ reduce, lambda result, block: result.append(block) |> \ pkcs7_unpad I'm not sure introducing pipes like this at the Python level would be a good idea. Is there already a library level support for such constructs? If not, what would be a good way to express them? I've tried a bit an figured out a following API (https://gist.github.com/wrr/5775808): Pipe([iv] + secret_blocks)\ (pairwise)\ (map, decrypt_block)\ (reduce, lambda result, block: result.append(block))\ (pkcs7_unpad)\ () The API is more verbose than the language level operator. I initially tried to overload `>>`, but it doesn't allow for additional arguments. It is also somehow smelly, because `()` returns a different type of value if it is invoked without an argument. Any suggestions how this could be improved? Best regards, Jan From solipsis at pitrou.net Fri Jun 14 14:57:18 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 14 Jun 2013 14:57:18 +0200 Subject: [Python-ideas] Elixir inspired pipe to apply a series of functions References: Message-ID: <20130614145718.6acdb461@fsol> On Thu, 13 Jun 2013 20:06:10 +0200 Jan Wrobel wrote: > > Which is dense, and needs to be read backwards (last operation is > written first), but as Joe notes, the alternative is also not very > compelling: > > decrypted_blocks = map(decrypt_block, pairwise([iv] + secret_blocks)) > combined_blocks = reduce(lambda result, block: result.append(block)) > return pkcs7_unpad(combined_blocks) Perhaps if you stopped wanting to use map / reduce your code would be more readable? How about list comprehensions or simple loops? > The pipe operator nicely separates subsequent operations and allows to > read them in a natural order without the need for temporary variables. > Something like: > > [iv] + secret_blocks |> pairwise |> map, decrypt_block |> \ > reduce, lambda result, block: result.append(block) |> \ > pkcs7_unpad Perhaps it's natural to you, but it's unreadable to me :-o Regards Antoine. From alexander.belopolsky at gmail.com Fri Jun 14 15:20:33 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 14 Jun 2013 09:20:33 -0400 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> <51BAFF1F.5060608@drees.name> <20130614133524.0df7341a@fsol> <20130614142142.4287678c@fsol> Message-ID: On Fri, Jun 14, 2013 at 8:39 AM, Amaury Forgeot d'Arc wrote: > By the way, why is it necessary to register? > Since an error handler is defined by its callback function, > we could allow functions for the "errors" parameter. +1 In fact, it is not necessary to register the codecs either. We could allow any namespace that defines CodecInfo attributes or a getregentry function. The users would them be able to write from encodings import ascii x.encode(ascii) instead of x.encode("ascii"). The benefit is that most IDEs would provide auto-completion and as you type error checking and the resulting program will not have a hidden import masquerading as a builtin call. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Fri Jun 14 15:55:52 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 14 Jun 2013 15:55:52 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> <51BAFF1F.5060608@drees.name> <20130614133524.0df7341a@fsol> <20130614142142.4287678c@fsol> Message-ID: <51BB20E8.7060304@egenix.com> On 14.06.2013 14:39, Amaury Forgeot d'Arc wrote: > 2013/6/14 Antoine Pitrou > >> Making registration manual would indeed be a better fit for the >> intended use cases, IMO. I don't think such a specialized function >> belongs to the built-in set of error handlers. >> > > By the way, why is it necessary to register? > Since an error handler is defined by its callback function, > we could allow functions for the "errors" parameter. For the same reason we register modules in sys.modules: to be able to reference them by name, rather than by object. Also note that codecs expect to get the error parameter as string to keep the API simple and to make short-cuts easy to implement in the code (esp. in the C implementations). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 14 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 17 days to go 2013-07-16: Python Meeting Duesseldorf ... 32 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Fri Jun 14 15:57:59 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 14 Jun 2013 15:57:59 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <51BB20E8.7060304@egenix.com> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> <51BAFF1F.5060608@drees.name> <20130614133524.0df7341a@fsol> <20130614142142.4287678c@fsol> <51BB20E8.7060304@egenix.com> Message-ID: <51BB2167.3020704@egenix.com> On 14.06.2013 15:55, M.-A. Lemburg wrote: > On 14.06.2013 14:39, Amaury Forgeot d'Arc wrote: >> 2013/6/14 Antoine Pitrou >> >>> Making registration manual would indeed be a better fit for the >>> intended use cases, IMO. I don't think such a specialized function >>> belongs to the built-in set of error handlers. >>> >> >> By the way, why is it necessary to register? >> Since an error handler is defined by its callback function, >> we could allow functions for the "errors" parameter. > > For the same reason we register modules in sys.modules: > to be able to reference them by name, rather than by object. > > Also note that codecs expect to get the error parameter as string > to keep the API simple and to make short-cuts easy to implement > in the code (esp. in the C implementations). Here's the PEP: http://www.python.org/dev/peps/pep-0293/ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 14 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-01: EuroPython 2013, Florence, Italy ... 17 days to go 2013-07-16: Python Meeting Duesseldorf ... 32 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From amauryfa at gmail.com Fri Jun 14 16:31:19 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 14 Jun 2013 16:31:19 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <51BB2167.3020704@egenix.com> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> <51BAFF1F.5060608@drees.name> <20130614133524.0df7341a@fsol> <20130614142142.4287678c@fsol> <51BB20E8.7060304@egenix.com> <51BB2167.3020704@egenix.com> Message-ID: 2013/6/14 M.-A. Lemburg > >> By the way, why is it necessary to register? > >> Since an error handler is defined by its callback function, > >> we could allow functions for the "errors" parameter. > > > > For the same reason we register modules in sys.modules: > > to be able to reference them by name, rather than by object. > > > > Also note that codecs expect to get the error parameter as string > > to keep the API simple and to make short-cuts easy to implement > > in the code (esp. in the C implementations). > > Here's the PEP: http://www.python.org/dev/peps/pep-0293/ yes, I can understand the argument: "As this requires changes to lots of C prototypes, this approach was rejected." A callable "errors" would have avoided this whole discussion: Implement some htmlcharrefreplace function in htmllib.py, don't register it at all, and let users do .encode('ascii', htmllib.htmlcharrefreplace) or implement their own without any global change to the codecs registry. import.c was once rewritten to accept PyObject everywhere, maybe unicode codecs could have a double API as well? Yes, it's a lot of work. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Jun 14 17:00:17 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 14 Jun 2013 18:00:17 +0300 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: Message-ID: 14.06.13 02:37, Ezio Melotti ???????(??): > On Tue, Jun 11, 2013 at 5:49 PM, Serhiy Storchaka wrote: >> I propose to add "htmlcharrefreplace" error handler which is similar to >> "xmlcharrefreplace" error handler but use html entity names if possible. >> >>>>> '? x??'.encode('ascii', 'xmlcharrefreplace') >> b'∀ x∈ℜ' >>>>> '? x??'.encode('ascii', 'htmlcharrefreplace') >> b'∀ x∈ℜ' >> > > Do you have any use cases for this, or is it just for completeness > since we already have xmlcharrefreplace? In fact, there is no *need* in the "htmlentityreplace" error handler. "xmlcharrefreplace" is enough in most cases, it is faster and its scope is wider. "htmlentityreplace" is only desired for more human readable html. Perhaps it is not worth to register this error handler by default, but I see some people desire it in the stdlib. With regard to non utf-8 encodings of html, of course there are reasons for their use. From storchaka at gmail.com Fri Jun 14 17:09:16 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 14 Jun 2013 18:09:16 +0300 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614104931.23a917b5@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> Message-ID: 14.06.13 11:49, Antoine Pitrou ???????(??): > I'd like to know which good reasons there are to not use utf-8 for HTML > pages in 2013. Russian text requires 2 bytes per character in utf-8 (not counting spaces, punctuation and markup) and only 1 byte per character in any special encoding (cp1251/cp866/koi8-r). Same for other European non latin-based alphabets. Some old databases contains data in one of this 8-bit encoding and generating html page in the same encoding does not requires encoding/decoding at all. > "Keeping the HTML source ASCII-only" is just silly IMO, and it doesn't > warrant special support in Python's codec error handlers. "xmlcharrefreplace" is so good as "htmlentityreplace" and even better for this purpose. From steve at pearwood.info Fri Jun 14 17:20:15 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Jun 2013 01:20:15 +1000 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614112245.24c44ed6@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BADD2F.1090505@pearwood.info> <20130614112245.24c44ed6@fsol> Message-ID: <51BB34AF.6000903@pearwood.info> On 14/06/13 19:22, Antoine Pitrou wrote: > On Fri, 14 Jun 2013 19:06:55 +1000 > Steven D'Aprano wrote: >> On 14/06/13 18:49, Antoine Pitrou wrote: >>> "Keeping the HTML source ASCII-only" is just silly IMO, >> >> Surely no sillier than "keep the Python std lib source ASCII-only". > > Or than drawing stupid analogies. Do you understand the difference > between source code and hypertext documents? Of course I do. I don't believe that the differences are as important as the similarities. Both are text. Both are expected to be read by human beings, at least sometimes. Both may be edited in an editor, or otherwise passed through some tool, that does not handle non-ASCII text correctly, causing corruption. Both may contain characters which the author has no way of entering directly. The similarities are far more important than the differences. >>> and it doesn't >>> warrant special support in Python's codec error handlers. >> >> We're talking about this as if it were a major change. Doesn't this count as a trivial addition? The only question in my mind is, "Are the HTML char ref rules different enough from the XML rules that Python should provide both?" > > It's not trivial, it's additional C code in an important part of the > language (unicode and codecs). Or, it's 17 lines of Python. Something like this is a good start: import codecs from html.entities import codepoint2name def htmlcharrefreplace_errors(exc): c = exc.object[exc.start] try: entity = codepoint2name[ord(c)] except KeyError: n = ord(c) if n <= 0xFFFF: replace = "\\u%04x" else: replace = "\\U%08x" replace = replace % n else: replace = "&{};".format(entity) return replace, exc.start + 1 codecs.register_error('htmlcharrefreplace', htmlcharrefreplace_errors) Is this the point where someone now argues that it's too trivial to bother putting in the standard library? This is not new syntax. It's not a new builtin. Even if it is written in C, the code itself is not likely to be significantly more complex than the existing xmlcharrefreplace error handler, which is under 100 lines of C. (The hard part is likely to be keeping the list of entities.) There's no backwards compatibility issues to worry about. It doesn't add a new programming idiom to the standard library. There's unlikely to be much in the way of bike-shedding about either functionality or syntax. It's merely a new error handler, with well-defined semantics and an obvious name. That's what I meant by "a trivial addition". > And I haven't seen you propose a patch (when was your last patch, by > the way?). Does it matter? Do you think that *only* those who have contributed patches are capable of recognising a good, useful piece of functionality when they see it? Putting people down because they have not contributed to the std lib as often as you is not open, considerate or respectful, nor is it welcoming to newcomers. Even those who are not prolific at submitting patches can contribute good ideas, and the ability of someone to write C code does not necessarily mean that they can judge good or bad ideas. Just look at PHP. -- Steven From solipsis at pitrou.net Fri Jun 14 17:25:37 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 14 Jun 2013 17:25:37 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> Message-ID: <20130614172537.5928403f@fsol> On Fri, 14 Jun 2013 18:09:16 +0300 Serhiy Storchaka wrote: > 14.06.13 11:49, Antoine Pitrou ???????(??): > > I'd like to know which good reasons there are to not use utf-8 for HTML > > pages in 2013. > > Russian text requires 2 bytes per character in utf-8 (not counting > spaces, punctuation and markup) and only 1 byte per character in any > special encoding (cp1251/cp866/koi8-r). Same for other European non > latin-based alphabets. And even latin-based (e.g. latin-1), but if you really care about this, it's certainly more efficient to compress your HTTP response than trying to save space at the character level. > Some old databases contains data in one of this > 8-bit encoding and generating html page in the same encoding does not > requires encoding/decoding at all. If it doesn't require encoding/decoding, how are you going to specify an encoding error handler? Regards Antoine. From steve at pearwood.info Fri Jun 14 17:32:40 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Jun 2013 01:32:40 +1000 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> <51BAFF1F.5060608@drees.name> <20130614133524.0df7341a@fsol> <20130614142142.4287678c@fsol> Message-ID: <51BB3798.7000803@pearwood.info> On 14/06/13 22:39, Amaury Forgeot d'Arc wrote: > 2013/6/14 Antoine Pitrou > >> Making registration manual would indeed be a better fit for the >> intended use cases, IMO. I don't think such a specialized function >> belongs to the built-in set of error handlers. >> > > By the way, why is it necessary to register? > Since an error handler is defined by its callback function, > we could allow functions for the "errors" parameter. In another post, I wrote: "There's unlikely to be much in the way of bike-shedding about either functionality or syntax." I spoke too soon :-( -- Steven From storchaka at gmail.com Fri Jun 14 17:37:10 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 14 Jun 2013 18:37:10 +0300 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <51BB34AF.6000903@pearwood.info> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BADD2F.1090505@pearwood.info> <20130614112245.24c44ed6@fsol> <51BB34AF.6000903@pearwood.info> Message-ID: 14.06.13 18:20, Steven D'Aprano ???????(??): > On 14/06/13 19:22, Antoine Pitrou wrote: >> It's not trivial, it's additional C code in an important part of the >> language (unicode and codecs). > > Or, it's 17 lines of Python. Something like this is a good start: > > > import codecs > from html.entities import codepoint2name > > def htmlcharrefreplace_errors(exc): > c = exc.object[exc.start] > try: > entity = codepoint2name[ord(c)] > except KeyError: > n = ord(c) > if n <= 0xFFFF: > replace = "\\u%04x" > else: > replace = "\\U%08x" > replace = replace % n Actually '&#%d;' % n. See also my sample implementation in original post which reuses xmlcharrefreplace_errors. From steve at pearwood.info Fri Jun 14 17:50:51 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 15 Jun 2013 01:50:51 +1000 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BADD2F.1090505@pearwood.info> <20130614112245.24c44ed6@fsol> <51BB34AF.6000903@pearwood.info> Message-ID: <51BB3BDB.4060403@pearwood.info> On 15/06/13 01:37, Serhiy Storchaka wrote: > 14.06.13 18:20, Steven D'Aprano ???????(??): >> On 14/06/13 19:22, Antoine Pitrou wrote: >>> It's not trivial, it's additional C code in an important part of the >>> language (unicode and codecs). >> >> Or, it's 17 lines of Python. Something like this is a good start: [...] > Actually '&#%d;' % n. See also my sample implementation in original post which reuses xmlcharrefreplace_errors. So you did. I'm sorry for the noise, I missed your original implementation. -- Steven From solipsis at pitrou.net Fri Jun 14 17:54:29 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 14 Jun 2013 17:54:29 +0200 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BADD2F.1090505@pearwood.info> <20130614112245.24c44ed6@fsol> <51BB34AF.6000903@pearwood.info> Message-ID: <20130614175429.0956e968@fsol> On Sat, 15 Jun 2013 01:20:15 +1000 Steven D'Aprano wrote: > On 14/06/13 19:22, Antoine Pitrou wrote: > > On Fri, 14 Jun 2013 19:06:55 +1000 > > Steven D'Aprano wrote: > >> On 14/06/13 18:49, Antoine Pitrou wrote: > >>> "Keeping the HTML source ASCII-only" is just silly IMO, > >> > >> Surely no sillier than "keep the Python std lib source ASCII-only". > > > > Or than drawing stupid analogies. Do you understand the difference > > between source code and hypertext documents? > > Of course I do. I don't believe that the differences are as important > as the similarities. Both are text. Both are expected to be read by > human beings, at least sometimes. HTML is expected to be viewed through a browser. Reading raw HTML is the exception, not the norm. Moreover, CPython's source code is supposed to be written and commented in English, meaning there's no opportunity for non-ASCII characters. However, note that *arbitrary* Python code can happily contain non-ASCII characters (including in identifiers). > Both may be edited in an editor, or otherwise passed through some > tool, that does not handle non-ASCII text correctly, causing > corruption. Well, I'm personally ok with letting users of such incompetent tools deal with it on their own. Python needn't fix all problems in the computing world. > Is this the point where someone now argues that it's too trivial > to bother putting in the standard library? I'm not arguing against putting it in the standard library, I'm arguing against making it a built-in error handler. (and IMO it's not too trivial) > > And I haven't seen you propose a patch (when was your last patch, by > > the way?). > > Does it matter? In an open source project which is ultimately driven by code contributions, yes, it does matter quite a bit. Also, in contrast with *other* open source projects, users of Python don't have the excuse of being non-programmers to block them from contributing. > Do you think that *only* those who have contributed patches are > capable of recognising a good, useful piece of functionality when they > see it? No, but certainly they are better able to judge whichever is "trivial" or not; and how desirable it is *for them* to accept the additional maintenance burden (since you aren't the one doing any maintenance, again). Regards Antoine. From tjreedy at udel.edu Fri Jun 14 18:30:40 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 14 Jun 2013 12:30:40 -0400 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: <51BA5F0C.1040909@pearwood.info> Message-ID: On 6/14/2013 5:00 AM, Wolfgang Maier wrote: > this is what str(int).encode() does, but is quite complicated, since it > actually generates a full-blown Python string object first, then encodes > this to bytes again. In 3.3+, it is not a complicated as you seem to think since the string of ascii digit chars uses one byte per char and the 'encoding' is just a copy. On my machine, with i = 123456, the two calls take about .3 and .2 microseconds. The extra call is noise compared to time to read, split into 4 bytes, convert 2 bytes to ints, subtract, and after the conversion of the difference to bytes, join and write the line. from timeit import repeat def f(): b = b'somelinedescriptor\t100\t500\tmorestuffhere\n' b = b.split(b'\t') i = int(b[2]) - int(b[1]) b'\t'.join((b[0], str(i).encode(), b[3])) print(repeat('f()', 'from __main__ import f')) >>> [2.584412482335259, 2.614494724632941, 2.6133167166162155] + read/write time -- Terry Jan Reedy From benjamin at python.org Fri Jun 14 21:56:07 2013 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 14 Jun 2013 19:56:07 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: Message-ID: Wolfgang Maier writes: > However, if you decide to inherit from str or int, then bytes() completely > ignores the __bytes__ method and sticks to the superclass behavior instead, > i.e. requiring an encoding for str and creating a bytestring of the length > of an int. int is fixed in 3.3. From abarnert at yahoo.com Sat Jun 15 00:00:42 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 14 Jun 2013 15:00:42 -0700 (PDT) Subject: [Python-ideas] Elixir inspired pipe to apply a series of functions In-Reply-To: References: Message-ID: <1371247242.43132.YahooMailNeo@web184705.mail.ne1.yahoo.com> From: Jan Wrobel Sent: Thursday, June 13, 2013 11:06 AM > I've recently stumbled upon a Joe Armstrong's (of Erlang) blog post > that praises an Elixir pipe operator: > > http://joearms.github.io/2013/05/31/a-week-with-elixir.html > > The operator allows to nicely structure code that applies a series of > functions to transform an input value to some output. > > I often end up writing code like: > > pkcs7_unpad( > ? reduce(lambda result, block: result.append(block), > ? ? map(decrypt_block, > ? ? ? pairwise([iv] + secret_blocks)))) > > Which is dense, and needs to be read backwards (last operation is > written first), but as Joe notes, the alternative is also not very > compelling: > > ? decrypted_blocks = map(decrypt_block, pairwise([iv] + secret_blocks)) > ? combined_blocks = reduce(lambda result, block: result.append(block)) > ? return pkcs7_unpad(combined_blocks) I don't see why some people think naming intermediate results makes things less readable. But, if you do, you can always give them short throwaway names like _ or x. Also, if you're concerned with readability, throwing in unnecessary lambdas doesn't exactly help. If you know the type of result, just use the unbound method; if you need it to be generic, you probably need it more than once, so write a named appender function.?Also, it's very weird (and definitely not in the functional spirit you're going for) to call reduce on a function that mutates an argument and returns None, and I can't figure out what exactly you're trying to accomplish, but I'll ignore that. So: _ = map(decrypt_block, pairwise([iv] + secret_blocks)) _ = reduce(list.append, _) return pkcs7_unpad(_) Is that really hard to understand? If you just want everything to be an expression? well, that's silly (the "return" shows that this is clearly already a function, and the function call will already be an expression no matter how you implement the internals)?but, more importantly, you're using the wrong language. Many of Python's readability strengths derive from the expression-statement divide and the corresponding clean statement syntax; if you spend all your time fighting that, you might be happier using a language that doesn't fight back. But Python does actually have a way to write things like this in terms of expressions. Just use a comprehension or generator expression instead of calling map and friends. When you're mapping a pre-existing function over an iterator with no filtering or anything else going on, map is great; when you want to map an expression that's hard to describe as a function, use s comprehension. (And when you want to iterate mutating code, don't use either.)?That's the same rule of thumb people use in Haskell, so it would be hard to argue that it's not "functional" enough. Meanwhile, most of what you want is?just a reverse-compose operator and a partial operator, so you can write in reverse point-free style. Let's see it without operators first: ? ? def compose(f1, f2): ? ? ? ? @wraps(f1) ? ? ? ? def composed(arg): ? ? ? ? ? ? return f1(f2(arg)) ? ? def rcompose(f1, f2): ? ? ? ? return compose(f2, f1) ? ? def rapply(arg, f): ? ? ? ? return f(arg) ? ? return rapply([iv] + secret_blocks, ? ? ? ? ? ? ? ? ? ?rcompose(partial(map, decrypt_block), ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?rcompose(partial(reduce, list.append),? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ??pkcs7_unpad))) Now call rcompose, compose, partial, and rapply, say,?FunctionType.__lshift__, __rshift__, __getitem__, and __rmod__: ? ? return?([iv] + secret_blocks]) %?(map[decrypt_block] >> reduce[list.append] >> pkcs7_unpad) This looks nothing at all like Python, and it's far less readable than the three-liner version. It saves a grand total of 12/103 keystrokes.?And of course it can't be implemented without significant changes to the function and builtin-function implementations. > I'm not sure introducing pipes like this at the Python level would be > a good idea. Is there already a library level support for such > constructs? partial is in functools. compose is not, because it was considered so trivial that it wasn't worth adding ("anyone who wants this can build it faster than he can look it up").?rcompose is just as trivial. And a reverse-apply wrapper is almost as simple. > If not, what would be a good way to express them? I've > tried a bit an figured out a following API > (https://gist.github.com/wrr/5775808): > > ? ? ? Pipe([iv] + secret_blocks)\ > ? ? ? ? (pairwise)\ > ? ? ? ? (map, decrypt_block)\ > ? ? ? ? (reduce, lambda result, block: result.append(block))\ > ? ? ? ? (pkcs7_unpad)\ > ? ? ? ? () The biggest problem here is that the model isn't clear without thinking about it. If you're going to use classes, think about it in OO terms: what object in your mental model does a Pipe represent? It's sort of an applicator with partial currying. Is there a simpler model that you could use? Sure: functions. In a function language, I think people would either write this in normal point-free style: ? ? map decrypt_block . reduce append . pkcs7_unpad $ [iv] + secret_blocks ? or as an explicit chain of reverse-applies: ? ? [iv] + secret_blocks :- pkcs7_unpad :- (reduce append) :- (map decrypt_block) ? rather than the reverse point-free you're going for: ? ? import Control.Arrow ? ? [iv] + secret_blocks :- (pkcs7_unpad >>> (reduce append) >>> (map decrypt_block)) And part of the reason for that is that the normal point-free version is blatantly obviously just defining a normal function and then calling it. In fact, the language?whether Haskell or Python?can even see that at the syntactic level. Instead of this (sorry for the hybrid syntax): ? ? def decrypt(secret_blocks): ? ? ? ? return?map decrypt_block . reduce append . pkcs7_unpad $ [iv] + secret_blocks You can just do this: ? ? decrypt =?map decrypt_block . reduce append . pkcs7_unpad? Also, the way you're hiding partialization makes it unclear what's going on at first read. Normally, people don't think of (map, decrypt_block) as meaning to call map with decrypt_block.?That makes sense in Lisp (where that's what function calling already looks like) or in Haskell (where currying means partialization is always implicit), but not so much in Python, where it looks completely different from calling map with decrypt_block. Second, your code is significantly?longer than the obvious Pythonic three-liner?even after replacing your unnecessary lambda, it's twice as many lines, more extraneous symbols, and more keystrokes. And it's clearly going to be harder to debug. If something goes wrong anywhere in the chain, it's going to be hard to tell where. Compare the traceback you'd get through a chain of Pipe.__call__ methods to what you'd get in the explicitly-sequenced version, where it goes right to the single-line statement where something went wrong. It also just looks ugly?backslash continuations, what look like (but aren't) unnecessary parens, etc. > The API is more verbose than the language level operator. I initially > tried to overload `>>`, but it doesn't allow for additional arguments. If you got rid of the implicit partials, you could use it: (Pipe([iv] + secret_blocks) >> ? ? ?pairwise >> ? ? ?partial(map, decrypt_block) >> ? ? ?partial(reduce, list,append) >> ? ? ?pkcs7_unpad)() It's a lot less ugly this way. But I definitely wouldn't use it.? And if you used it, and I had to read your code, I'd have to either reason it through, or translate it in my head to Haskell (where I could reason it through more quickly and figure out what you're really up to), rather than just reading it and understanding it. From abarnert at yahoo.com Sat Jun 15 02:13:48 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 14 Jun 2013 17:13:48 -0700 (PDT) Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> <51BAFF1F.5060608@drees.name> <20130614133524.0df7341a@fsol> <20130614142142.4287678c@fsol> <51BB20E8.7060304@egenix.com> <51BB2167.3020704@egenix.com> Message-ID: <1371255228.159.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: Amaury Forgeot d'Arc Sent: Friday, June 14, 2013 7:31 AM >2013/6/14 M.-A. Lemburg > >>> By the way, why is it necessary to register? >>>> Since an error handler is defined by its callback function, >>>> we could allow functions for the "errors" parameter. >>> >>> For the same reason we register modules in sys.modules: >>> to be able to reference them by name, rather than by object. >>> >>> Also note that codecs expect to get the error parameter as string >>> to keep the API simple and to make short-cuts easy to implement >>> in the code (esp. in the C implementations). The simplicity argument is pretty clear.?Everywhere the docs/docstrings/comments explain how errors strings work, they'd also have to explain that it can be a callable instead, and that callables don't have to be passed to PyCodec_LookupError/codecs.lookup_error but can (which will return the argument as-is), and ? Less seriously, it would make the analogy between the codec registry and the error handler registry weaker (therefore a bit more to learn), and?it would make it a bit harder to distinguish in code between the pre-looked-up string-or-callable PyObject * and the post-looked-up callable PyObject * (something you don't even have to think about today). But I'm not sure it really saved any effort in implementing codecs. Conceivably, someone could take advantage of the string value of the errors, but everything I can find in a quick skim of _codecmodule.c and unicodeobject.c and everything I could find online does one of three things: (a) ignore it, (b) if (error) handler = PyCodec_LookupError(error), or (c) pass error along untouched to another function which does one of the above. So really, almost all code both in the stdlib and out would be the same, except that the ones implemented in C would be parsing an "O" arg instead of a "z". >>Here's the PEP: http://www.python.org/dev/peps/pep-0293/ The PEP doesn't actually explain the rationale for why it doesn't use a more complicated string-or-callable API like the one I described above. Which is perfectly reasonable. Nobody asked for it until more than a decade later, and I'm not sure how good an idea it is. Borrowing a time machine to add code people will ask for years later is impressive; borrowing a time machine to add explanations for why they won't be able to have it when they ask years later would just be silly. >import.c was once rewritten to accept PyObject everywhere, >maybe unicode codecs could have a double API as well? >Yes, it's a lot of work. I don't think changing PyCodec*/_codecs/codecs is that much?work. (M.-A. Lemburg can correct me if I'm wrong.) The big problem isn't the fact that the API that every codec?including third-party codecs?must implement has to change. Which means you end?up needing two different codec interfaces, two different registries (or one dual-type registry), etc.?And I think that parallel system might have to stick around until Py4k, or at least for quite a few 3.x versions. Plus, you have to think through the API.?Does Python or C-API code need to be able to distinguish old-style and new-style codecs? (If not, what happens when you pass an error by callable to what turns out to be an old-style codec? "TypeError" seems like the obvious answer, but then it's not really true that you can pass a callable as an error handler, unless you have some out-of-band knowledge about the codec you're going to be using.)?Also: while nearly any third-party codec written in Python would just magically work as a new-API codec, "nearly" isn't good enough. And there's no way to test. Which means all such existing codecs have to be treated as old-API codecs, which sucks. In other words, even though I don't think it would actually take much work, and I like the idea, I can't see any way of fleshing out the idea that wouldn't make me hate it. Except for the obvious one: wait until py4k and just break the PyCodec* and codec-implementation interfaces. From ncoghlan at gmail.com Sat Jun 15 06:20:35 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 15 Jun 2013 14:20:35 +1000 Subject: [Python-ideas] Elixir inspired pipe to apply a series of functions In-Reply-To: References: Message-ID: On 14 June 2013 04:06, Jan Wrobel wrote: > Hello, > > I've recently stumbled upon a Joe Armstrong's (of Erlang) blog post > that praises an Elixir pipe operator: > > http://joearms.github.io/2013/05/31/a-week-with-elixir.html > > The operator allows to nicely structure code that applies a series of > functions to transform an input value to some output. > > I often end up writing code like: > > pkcs7_unpad( > reduce(lambda result, block: result.append(block), > map(decrypt_block, > pairwise([iv] + secret_blocks)))) > > Which is dense, and needs to be read backwards (last operation is > written first), but as Joe notes, the alternative is also not very > compelling: > > decrypted_blocks = map(decrypt_block, pairwise([iv] + secret_blocks)) > combined_blocks = reduce(lambda result, block: result.append(block)) > return pkcs7_unpad(combined_blocks) It's a whole lot clearer if you store some state on the heap instead of insisting on using the stack: decrypted = [] for block in itertools.chain([iv], secret_blocks): decrypted.append(decrypt_block(decrypted, block) return pkcs7_unpad(combined_blocks) Inside every reduce call is a for loop trying to get out :P Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From storchaka at gmail.com Sat Jun 15 08:06:07 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 15 Jun 2013 09:06:07 +0300 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> <51BAFF1F.5060608@drees.name> <20130614133524.0df7341a@fsol> <20130614142142.4287678c@fsol> Message-ID: 14.06.13 15:39, Amaury Forgeot d'Arc ???????(??): > By the way, why is it necessary to register? > Since an error handler is defined by its callback function, > we could allow functions for the "errors" parameter. Could you please open a new topic for this discussion? Sometimes I feel regret that callables are not applicable as error handlers, but I understand that there are reasons for that. From storchaka at gmail.com Sat Jun 15 08:16:44 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 15 Jun 2013 09:16:44 +0300 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614172537.5928403f@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <20130614172537.5928403f@fsol> Message-ID: 14.06.13 18:25, Antoine Pitrou ???????(??): > On Fri, 14 Jun 2013 18:09:16 +0300 > Serhiy Storchaka > wrote: >> 14.06.13 11:49, Antoine Pitrou ???????(??): >>> I'd like to know which good reasons there are to not use utf-8 for HTML >>> pages in 2013. >> >> Russian text requires 2 bytes per character in utf-8 (not counting >> spaces, punctuation and markup) and only 1 byte per character in any >> special encoding (cp1251/cp866/koi8-r). Same for other European non >> latin-based alphabets. > > And even latin-based (e.g. latin-1), but if you really care about this, > it's certainly more efficient to compress your HTTP response than > trying to save space at the character level. In languages with latin-based alphabet usually only small part of characters are non-ascii. A utf-8 encoding adds only 5-10% to size. >> Some old databases contains data in one of this >> 8-bit encoding and generating html page in the same encoding does not >> requires encoding/decoding at all. > > If it doesn't require encoding/decoding, how are you going to specify > an encoding error handler? Main part of the page can generated without encoding, but small part can contain encoded text. From storchaka at gmail.com Sat Jun 15 08:20:19 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 15 Jun 2013 09:20:19 +0300 Subject: [Python-ideas] Add "htmlcharrefreplace" error handler In-Reply-To: <20130614142142.4287678c@fsol> References: <51BAC9C9.2070100@egenix.com> <20130614104931.23a917b5@fsol> <51BAE4A6.20507@egenix.com> <20130614114341.678c57b6@fsol> <51BAEC35.8070401@egenix.com> <20130614132059.6a1c5fa5@fsol> <51BAFF1F.5060608@drees.name> <20130614133524.0df7341a@fsol> <20130614142142.4287678c@fsol> Message-ID: 14.06.13 15:21, Antoine Pitrou ???????(??): > Making registration manual would indeed be a better fit for the > intended use cases, IMO. I don't think such a specialized function > belongs to the built-in set of error handlers. I agree with you. The dependence of interpreter core from the html.entities module doesn't look very good. From wolfgang.maier at biologie.uni-freiburg.de Sat Jun 15 16:45:43 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Sat, 15 Jun 2013 14:45:43 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: <51BA5F0C.1040909@pearwood.info> Message-ID: Terry Reedy writes: > > On 6/14/2013 5:00 AM, Wolfgang Maier wrote: > > > this is what str(int).encode() does, but is quite complicated, since it > > actually generates a full-blown Python string object first, then encodes > > this to bytes again. > > In 3.3+, it is not a complicated as you seem to think since the string > of ascii digit chars uses one byte per char and the 'encoding' is just a > copy. On my machine, with i = 123456, the two calls take about .3 and .2 > microseconds. The extra call is noise compared to time to read, split > into 4 bytes, convert 2 bytes to ints, subtract, and after the > conversion of the difference to bytes, join and write the line. > > from timeit import repeat > > def f(): > b = b'somelinedescriptor\t100\t500\tmorestuffhere\n' > b = b.split(b'\t') > i = int(b[2]) - int(b[1]) > b'\t'.join((b[0], str(i).encode(), b[3])) > > print(repeat('f()', 'from __main__ import f')) > > >>> > [2.584412482335259, 2.614494724632941, 2.6133167166162155] > + read/write time > This sounds pretty good! I have to say I haven't timed it yet (was going to do so after the weekend). As I was saying, I simply felt uncomfortable with the double-conversion. Two questions though: you're saying in 3.3+. Does that mean the behaviour has changed with 3.3 or that you checked it only for that version (I'm currently using 3.2)? Second, is that one byte optimization special for str() from int or is it happening elsewhere too (like in string literals without non-english characters)? Where can I find that documented? Oh, and thanks for this really constructive post. Best, Wolfgang From p.f.moore at gmail.com Sat Jun 15 17:52:05 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 15 Jun 2013 16:52:05 +0100 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: <51BA5F0C.1040909@pearwood.info> Message-ID: On 15 June 2013 15:45, Wolfgang Maier < wolfgang.maier at biologie.uni-freiburg.de> wrote: > Two questions though: you're saying in 3.3+. Does that mean the behaviour > has changed with 3.3 or that you checked it only for that version (I'm > currently using 3.2)? > Second, is that one byte optimization special for str() from int or is it > happening elsewhere too (like in string literals without non-english > characters)? Where can I find that documented? > Basically, it's new in Python 3.3. See the What's New document at http://docs.python.org/3/whatsnew/3.3.html#pep-393 and PEP 393 ( http://www.python.org/dev/peps/pep-0393/) What happened is that the internal representation of strings changed so that strings are held in 1, 2 or 4-byte form depending on the actual data. So all-ASCII data (such as the numbers you are interested in) are held in 1-byte form, and encoding to and from bytes can be done by just copying the bytes (assuming you're using an ascii-compatible encoding). The same code works in earlier versions, but it will be slower (how much depends on your application) because bytestrings will need to be converted to and from wide character strings. Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat Jun 15 17:59:40 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 16 Jun 2013 00:59:40 +0900 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: <51BA5F0C.1040909@pearwood.info> Message-ID: <87r4g35pqr.fsf@uwakimon.sk.tsukuba.ac.jp> Wolfgang Maier writes: > Second, is that one byte optimization special for str() from int or is it > happening elsewhere too (like in string literals without non-english > characters)? Where can I find that documented? http://www.python.org/dev/peps/pep-0393/ From wolfgang.maier at biologie.uni-freiburg.de Sat Jun 15 18:51:32 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Sat, 15 Jun 2013 16:51:32 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: <51BA5F0C.1040909@pearwood.info> Message-ID: Paul Moore writes: > > > > On 15 June 2013 15:45, Wolfgang Maier biologie.uni-freiburg.de> wrote:Two questions though: you're saying in 3.3+. Does that mean the behaviour > > > has changed with 3.3 or that you checked it only for that version (I'm > currently using 3.2)? > Second, is that one byte optimization special for str() from int or is it > happening elsewhere too (like in string literals without non-english > characters)? Where can I find that documented? > > > Basically, it's new in Python 3.3. See the What's New document at?http://docs.python.org/3/whatsnew/3.3.html#pep-393?and PEP 393 (http://www.python.org/dev/peps/pep-0393/) > > What happened is that the internal representation of strings changed so that strings are held in 1, 2 or 4-byte form depending on the actual data. So all-ASCII data (such as the numbers you are interested in) are held in 1-byte form, and encoding to and from bytes can be done by just copying the bytes (assuming you're using an ascii-compatible encoding). > > The same code works in earlier versions, but it will be slower (how much depends on your application) because bytestrings will need to be converted to and from wide character strings. > > Paul. > That sounds like a really good argument for moving to Python 3.3 ! Thanks a lot, Paul and Stephen, for this feedback. So if I understand the PEP correctly, then, theoretically, text mode file IO objects could be implemented to declare that all they'll ever need is 1 byte strings (if the encoding is ASCII-compatible)? Then converting incoming bytes from a file would also be reduced to copying and would eliminate much of the speed difference between 'r' and 'rb' modes? Is that done already, or are there problems with such an approach? Wolfgang From p.f.moore at gmail.com Sat Jun 15 19:12:59 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 15 Jun 2013 18:12:59 +0100 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: <51BA5F0C.1040909@pearwood.info> Message-ID: On 15 June 2013 17:51, Wolfgang Maier < wolfgang.maier at biologie.uni-freiburg.de> wrote: > So if I understand the PEP correctly, then, theoretically, text mode file > IO > objects could be implemented to declare that all they'll ever need is 1 > byte > strings (if the encoding is ASCII-compatible)? > There's no need to "declare" anything - if the file does not contain code points outside the 1-byte range, it just works. No declaration needed. The changes in the PEP are entirely transparent to the user, they just magically make your code faster if possible :-) Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun Jun 16 00:19:42 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 15 Jun 2013 15:19:42 -0700 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: <51BA5F0C.1040909@pearwood.info> Message-ID: On Jun 15, 2013, at 10:12, Paul Moore wrote: > On 15 June 2013 17:51, Wolfgang Maier wrote: >> So if I understand the PEP correctly, then, theoretically, text mode file IO >> objects could be implemented to declare that all they'll ever need is 1 byte >> strings (if the encoding is ASCII-compatible)? > > There's no need to "declare" anything - if the file does not contain code points outside the 1-byte range, it just works. No declaration needed. > > The changes in the PEP are entirely transparent to the user, they just magically make your code faster if possible :-) Of course if you want to _ensure_ that you're getting the optimization, just specify encoding='ascii' when opening the file. Then, instead of slowing down or wasting space, it will raise an exception, which you can deal with as you see fit. -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Sun Jun 16 03:01:30 2013 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 16 Jun 2013 03:01:30 +0200 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: <51BA5F0C.1040909@pearwood.info> Message-ID: 2013/6/15 Wolfgang Maier : > So if I understand the PEP correctly, then, theoretically, text mode file IO > objects could be implemented to declare that all they'll ever need is 1 byte > strings (if the encoding is ASCII-compatible)? Then converting incoming > bytes from a file would also be reduced to copying and would eliminate much > of the speed difference between 'r' and 'rb' modes? > Is that done already, or are there problems with such an approach? Many functions of Python core now have a "fast-path" for pure ASCII data, or sometimes latin1 data. It is possible because a Unicode string has now a flag indicating if it only contains ASCII characters or not. The optimization you suggest is not implemented because FileIO.read() returns a bytes object, and there is no way to convert a bytes object to a Unicode object without having to copy the content. It cannot be implement because bytes strings and Unicode strings are made of one unique memory block. The object header and the content are in the same block, I guess that header of bytes and Unicode strings have a different size, bytes and str are immutable. I don't think that converting bytes to str is the bottleneck when you read a long text file... (Reading data from disk is known to be *slow*.) Victor From wolfgang.maier at biologie.uni-freiburg.de Sun Jun 16 07:42:09 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Sun, 16 Jun 2013 05:42:09 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: <51BA5F0C.1040909@pearwood.info> Message-ID: Victor Stinner writes: > > I don't think that converting bytes to str is the bottleneck when you > read a long text file... (Reading data from disk is known to be > *slow*.) > >From the io module docs (Python 3.3): "Text I/O over a binary storage (such as a file) is significantly slower than binary I/O over the same storage, because it requires conversions between unicode and binary data using a character codec. This can become noticeable handling huge amounts of text data like large log files." Wolfgang From steve at pearwood.info Sun Jun 16 07:57:47 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 16 Jun 2013 15:57:47 +1000 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: <51BA5F0C.1040909@pearwood.info> Message-ID: <51BD53DB.6090602@pearwood.info> On 16/06/13 15:42, Wolfgang Maier wrote: > Victor Stinner writes: >> >> I don't think that converting bytes to str is the bottleneck when you >> read a long text file... (Reading data from disk is known to be >> *slow*.) >> > From the io module docs (Python 3.3): > "Text I/O over a binary storage (such as a file) is significantly slower > than binary I/O over the same storage, because it requires conversions > between unicode and binary data using a character codec. This can become > noticeable handling huge amounts of text data like large log files." "this can become noticeable" != "this is the bottleneck in your code". In my recent tests on my PC (Python 3.3 on a 1GB machine), I have found that when reading medium-sized pure ASCII files, the text IO objects are as little as 2-3 times slower than binary, which may be unnoticeable for a real-world application. (Who cares about the difference between 0.03 second versus 0.01 second in a script that takes a total of 0.2 second to run?) On the other hand, given a 400MB avi file, reading it as UTF-8 with errors='ignore' is up to EIGHTY times slower than reading it as a binary file. (Hardly surprising, given the vast number of UTF-8 errors that are likely to be found.) My gut feeling is that if your file is actually ASCII, and you read it line-by-line rather than all at once, there may be a small speedup from reading it as a binary file and working with bytes directly, but probably not as much as you expect. But I wouldn't be confident without actually profiling your code. As always, if you optimize based on a wild guess as to what you need to optimize, then you risk wasting your time and effort, or worse, risk actually ending up with even slower code. -- Steven From wolfgang.maier at biologie.uni-freiburg.de Sun Jun 16 09:41:25 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Sun, 16 Jun 2013 07:41:25 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: <51BA5F0C.1040909@pearwood.info> <51BD53DB.6090602@pearwood.info> Message-ID: Steven D'Aprano writes: > > On 16/06/13 15:42, Wolfgang Maier wrote: > > Victor Stinner ...> writes: > >> > >> I don't think that converting bytes to str is the bottleneck when you > >> read a long text file... (Reading data from disk is known to be > >> *slow*.) > >> > > From the io module docs (Python 3.3): > > "Text I/O over a binary storage (such as a file) is significantly slower > > than binary I/O over the same storage, because it requires conversions > > between unicode and binary data using a character codec. This can become > > noticeable handling huge amounts of text data like large log files." > > "this can become noticeable" != "this is the bottleneck in your code". > > In my recent tests on my PC (Python 3.3 on a 1GB machine), I have found that when reading medium-sized pure > ASCII files, the text IO objects are as little as 2-3 times slower than binary, which may be unnoticeable > for a real-world application. (Who cares about the difference between 0.03 second versus 0.01 second in a > script that takes a total of 0.2 second to run?) > > On the other hand, given a 400MB avi file, reading it as UTF-8 with errors='ignore' is up to EIGHTY times > slower than reading it as a binary file. (Hardly surprising, given the vast number of UTF-8 errors that are > likely to be found.) > > My gut feeling is that if your file is actually ASCII, and you read it line-by-line rather than all at once, > there may be a small speedup from reading it as a binary file and working with bytes directly, but probably > not as much as you expect. But I wouldn't be confident without actually profiling your code. As always, if > you optimize based on a wild guess as to what you need to optimize, then you risk wasting your time and > effort, or worse, risk actually ending up with even slower code. > well, yes, some real timing data from my machine. While in initial tests I had found text IO to be quite a bit slower than binary IO, it turned out that this is only true when files are small enough for OS IO caching, which happens of course if you try to time your code repeatedly. Here I found, much like Steven, a ~ 100% speed difference. However, when I now repeated this with a file larger than my system's memory (effectively wiping out the cache between repeated trials) text and binary mode are *equal* (and about 20x slower than with caching). Summary: *Victor is right* (except for the caching case, but then, as Steven says, under these conditions speed differences aren't that important anyway.). Oh, and I tested under Python 3.2 and 3.3 and their behavior is identical. Best, Wolfgang From tjreedy at udel.edu Sun Jun 16 12:11:07 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 16 Jun 2013 06:11:07 -0400 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: <51BA5F0C.1040909@pearwood.info> Message-ID: On 6/15/2013 1:12 PM, Paul Moore wrote: > On 15 June 2013 17:51, Wolfgang Maier > > > wrote: > > So if I understand the PEP correctly, then, theoretically, text mode > file IO > objects could be implemented to declare that all they'll ever need > is 1 byte > strings (if the encoding is ASCII-compatible)? > > > There's no need to "declare" anything - if the file does not contain > code points outside the 1-byte range, it just works. No declaration needed. > > The changes in the PEP are entirely transparent to the user, they just > magically make your code faster if possible :-) Some string operations are faster, some slower. Everyone will see either a big space saving or perhaps a space saving and an end to 'bugs' due to the use of surrogates. Everyone will see the same string behavior on all platforms. -- Terry Jan Reedy From wolfgang.maier at biologie.uni-freiburg.de Sun Jun 16 22:51:06 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Sun, 16 Jun 2013 20:51:06 +0000 (UTC) Subject: [Python-ideas] duck typing for io write methods References: Message-ID: Benjamin Peterson writes: > > Wolfgang Maier ...> writes: > > > However, if you decide to inherit from str or int, then bytes() completely > > ignores the __bytes__ method and sticks to the superclass behavior instead, > > i.e. requiring an encoding for str and creating a bytestring of the length > > of an int. > > int is fixed in 3.3. > Ah right, I found this now as Issue17309 at bugs.python.org, and you're saying there that it's been fixed. I just tested int with Python3.3.0 on Windows and it still ignores __bytes__. Which exact version are you referring to? class bint (int): def __bytes__(self): return b'now bytes' a=bint(4) bytes(a) -> b'\x00\x00\x00\x00' Best, Wolfgang From python at mrabarnett.plus.com Sun Jun 16 22:58:36 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 16 Jun 2013 21:58:36 +0100 Subject: [Python-ideas] duck typing for io write methods In-Reply-To: References: Message-ID: <51BE26FC.8030306@mrabarnett.plus.com> On 16/06/2013 21:51, Wolfgang Maier wrote: > Benjamin Peterson writes: > >> >> Wolfgang Maier ...> writes: >> >> > However, if you decide to inherit from str or int, then bytes() completely >> > ignores the __bytes__ method and sticks to the superclass behavior instead, >> > i.e. requiring an encoding for str and creating a bytestring of the length >> > of an int. >> >> int is fixed in 3.3. >> > > Ah right, I found this now as Issue17309 at bugs.python.org, and you're > saying there that it's been fixed. I just tested int with Python3.3.0 on > Windows and it still ignores __bytes__. Which exact version are you > referring to? > class bint (int): > def __bytes__(self): > return b'now bytes' > a=bint(4) > bytes(a) > -> b'\x00\x00\x00\x00' > Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> class bint (int): def __bytes__(self): return b'now bytes' >>> a=bint(4) >>> bytes(a) b'now bytes' From g.rodola at gmail.com Mon Jun 17 02:32:24 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Mon, 17 Jun 2013 02:32:24 +0200 Subject: [Python-ideas] unittest and warnings Message-ID: One of the features I like the most in the new unittest2 module is the possibility to skip tests and how they are included in the final result (e.g. FAILED (errors=2, failures=3, *skipped=5*)). After http://bugs.python.org/issue10093 it is not rare that different ResourceWarnings appear while running tests. Personally I consider these warnings as something which needs to be fixed so after I run tests I often scroll my console window back up to see whether there were warnings. Would it make sense for unittest module to keep track of them so that they get included in the final test result as it currently happens for skipped tests? Side note: unittest provides some "skip-related" APIs such unittest.skip*, TestResult.skipped and others but I don't think something similar would be necessary except maybe a TestResult.warnings list similar to TestResult.skipped. Regards, - Giampaolo https://code.google.com/p/pyftpdlib https://code.google.com/p/psutil https://code.google.com/p/pysendfile -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Mon Jun 17 08:22:09 2013 From: masklinn at masklinn.net (Masklinn) Date: Mon, 17 Jun 2013 08:22:09 +0200 Subject: [Python-ideas] unittest and warnings In-Reply-To: References: Message-ID: <6EF96F10-0954-48CE-81CA-A5A673411AEE@masklinn.net> On 2013-06-17, at 02:32 , Giampaolo Rodola' wrote: > One of the features I like the most in the new unittest2 module is the > possibility to skip tests and how they are included in the final result > (e.g. FAILED (errors=2, failures=3, *skipped=5*)). > After http://bugs.python.org/issue10093 it is not rare that different > ResourceWarnings appear while running tests. > Personally I consider these warnings as something which needs to be fixed > so after I run tests I often scroll my console window back up to see > whether there were warnings. > Would it make sense for unittest module to keep track of them so that they > get included in the final test result as it currently happens for skipped > tests? > > Side note: unittest provides some "skip-related" APIs such unittest.skip*, > TestResult.skipped and others but I don't think something similar would be > necessary except maybe a TestResult.warnings list similar to > TestResult.skipped. Why not just use -Werror? Alternatively, capture warnings[0] and report them as whatever you wish to your test system. [0] http://docs.python.org/2/library/warnings.html#warnings.catch_warnings From wrr at mixedbit.org Mon Jun 17 11:39:08 2013 From: wrr at mixedbit.org (Jan Wrobel) Date: Mon, 17 Jun 2013 11:39:08 +0200 Subject: [Python-ideas] Elixir inspired pipe to apply a series of functions In-Reply-To: <1371247242.43132.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <1371247242.43132.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: On Sat, Jun 15, 2013 at 12:00 AM, Andrew Barnert wrote: > From: Jan Wrobel > > Sent: Thursday, June 13, 2013 11:06 AM > > >> I've recently stumbled upon a Joe Armstrong's (of Erlang) blog post >> that praises an Elixir pipe operator: >> >> http://joearms.github.io/2013/05/31/a-week-with-elixir.html >> >> The operator allows to nicely structure code that applies a series of >> functions to transform an input value to some output. >> >> I often end up writing code like: >> >> pkcs7_unpad( >> reduce(lambda result, block: result.append(block), >> map(decrypt_block, >> pairwise([iv] + secret_blocks)))) >> >> Which is dense, and needs to be read backwards (last operation is >> written first), but as Joe notes, the alternative is also not very >> compelling: >> >> decrypted_blocks = map(decrypt_block, pairwise([iv] + secret_blocks)) >> combined_blocks = reduce(lambda result, block: result.append(block)) >> return pkcs7_unpad(combined_blocks) > > I don't see why some people think naming intermediate results makes things less readable. But, if you do, you can always give them short throwaway names like _ or x. > > Also, if you're concerned with readability, throwing in unnecessary lambdas doesn't exactly help. If you know the type of result, just use the unbound method; if you need it to be generic, you probably need it more than once, so write a named appender function. Also, it's very weird (and definitely not in the functional spirit you're going for) to call reduce on a function that mutates an argument and returns None, and I can't figure out what exactly you're trying to accomplish, but I'll ignore that. This was just an example to illustrate a pattern ('result' is not a list but an object of a custom class, for which append returns 'this' to allow chaining, but this is not important). > > So: > > _ = map(decrypt_block, pairwise([iv] + secret_blocks)) > _ = reduce(list.append, _) > return pkcs7_unpad(_) > > Is that really hard to understand? > > If you just want everything to be an expression? well, that's silly (the "return" shows that this is clearly already a function, and the function call will already be an expression no matter how you implement the internals)?but, more importantly, you're using the wrong language. Many of Python's readability strengths derive from the expression-statement divide and the corresponding clean statement syntax; if you spend all your time fighting that, you might be happier using a language that doesn't fight back. I came to Python from C/C++, so initially my Python code was very C-like. Gradually, I've learned to use more functional constructs, which sometime require forcing and fighting, but most of the time, the outcome is positive. Writing two nested loops for anyone with C background is often the most natural and fastest solution. It is often a pain to break such code into separate steps/functions and combine results with some higher-order function, but it is worth the trouble. I don't have an impression that Python fights back and promotes imperative style. > But Python does actually have a way to write things like this in terms of expressions. Just use a comprehension or generator expression instead of calling map and friends. When you're mapping a pre-existing function over an iterator with no filtering or anything else going on, map is great; when you want to map an expression that's hard to describe as a function, use s comprehension. (And when you want to iterate mutating code, don't use either.) That's the same rule of thumb people use in Haskell, so it would be hard to argue that it's not "functional" enough. > > Meanwhile, most of what you want is just a reverse-compose operator and a partial operator, so you can write in reverse point-free style. Let's see it without operators first: > > def compose(f1, f2): > @wraps(f1) > def composed(arg): > return f1(f2(arg)) > > def rcompose(f1, f2): > return compose(f2, f1) > > def rapply(arg, f): > return f(arg) > > return rapply([iv] + secret_blocks, > rcompose(partial(map, decrypt_block), > rcompose(partial(reduce, list.append), > pkcs7_unpad))) > > Now call rcompose, compose, partial, and rapply, say, FunctionType.__lshift__, __rshift__, __getitem__, and __rmod__: > > return ([iv] + secret_blocks]) % (map[decrypt_block] >> reduce[list.append] >> pkcs7_unpad) > > This looks nothing at all like Python, and it's far less readable than the three-liner version. It saves a grand total of 12/103 keystrokes. And of course it can't be implemented without significant changes to the function and builtin-function implementations. If altering FunctionType was possible, probably the best option would be to just define Function.__rrshift__ and use partial explicitly: return ([iv] + secret_blocks]) >> partial(map, decrypt_block) >> partial(reduce, list.append) >> pkcs7_unpad >> I'm not sure introducing pipes like this at the Python level would be > >> a good idea. Is there already a library level support for such >> constructs? > > partial is in functools. compose is not, because it was considered so trivial that it wasn't worth adding ("anyone who wants this can build it faster than he can look it up"). rcompose is just as trivial. And a reverse-apply wrapper is almost as simple. > >> If not, what would be a good way to express them? I've >> tried a bit an figured out a following API >> (https://gist.github.com/wrr/5775808): >> >> Pipe([iv] + secret_blocks)\ >> (pairwise)\ >> (map, decrypt_block)\ >> (reduce, lambda result, block: result.append(block))\ >> (pkcs7_unpad)\ >> () > > The biggest problem here is that the model isn't clear without thinking about it. If you're going to use classes, think about it in OO terms: what object in your mental model does a Pipe represent? It's sort of an applicator with partial currying. Is there a simpler model that you could use? Sure: functions. But pure functions based API (without language level support) requires nesting. As in your example: rapply([iv] + secret_blocks, rcompose(partial(map, decrypt_block), rcompose(partial(reduce, list.append), pkcs7_unpad))) The order is right, but the last operation is nested in all previous operations. A unix-like pipe: ... | decrypt_blocks | pkcs7_unpad has clean, flat structure. Each operation takes input from a single previous operation, so it shouldn't be nested in all previous operations. > In a function language, I think people would either write this in normal point-free style: > > map decrypt_block . reduce append . pkcs7_unpad $ [iv] + secret_blocks > > > ? or as an explicit chain of reverse-applies: > > [iv] + secret_blocks :- pkcs7_unpad :- (reduce append) :- (map decrypt_block) I was actually trying to achieve this. Pipe(foo) applies foo to an existing result of a pipe and returns `this`, it doesn't first compose all functions and then apply them all to an input value. > ? rather than the reverse point-free you're going for: > > import Control.Arrow > [iv] + secret_blocks :- (pkcs7_unpad >>> (reduce append) >>> (map decrypt_block)) > > > And part of the reason for that is that the normal point-free version is blatantly obviously just defining a normal function and then calling it. In fact, the language?whether Haskell or Python?can even see that at the syntactic level. Instead of this (sorry for the hybrid syntax): > > def decrypt(secret_blocks): > return map decrypt_block . reduce append . pkcs7_unpad $ [iv] + secret_blocks > > You can just do this: > > decrypt = map decrypt_block . reduce append . pkcs7_unpad > > Also, the way you're hiding partialization makes it unclear what's going on at first read. Normally, people don't think of (map, decrypt_block) as meaning to call map with decrypt_block. That makes sense in Lisp (where that's what function calling already looks like) or in Haskell (where currying means partialization is always implicit), but not so much in Python, where it looks completely different from calling map with decrypt_block. This is true. Elixir syntax allows to apply a function [iv] + secret_blocks |> pairwise() |> map(decrypt_block) which for Python would also be natural, but which is not doable at the library level. > Second, your code is significantly longer than the obvious Pythonic three-liner?even after replacing your unnecessary lambda, it's twice as many lines, more extraneous symbols, and more keystrokes. > > And it's clearly going to be harder to debug. If something goes wrong anywhere in the chain, it's going to be hard to tell where. Compare the traceback you'd get through a chain of Pipe.__call__ methods to what you'd get in the explicitly-sequenced version, where it goes right to the single-line statement where something went wrong. > > It also just looks ugly?backslash continuations, what look like (but aren't) unnecessary parens, etc. > >> The API is more verbose than the language level operator. I initially >> tried to overload `>>`, but it doesn't allow for additional arguments. > > If you got rid of the implicit partials, you could use it: > > (Pipe([iv] + secret_blocks) >> > pairwise >> > partial(map, decrypt_block) >> > partial(reduce, list,append) >> > pkcs7_unpad)() > > It's a lot less ugly this way. But I definitely wouldn't use it. > > And if you used it, and I had to read your code, I'd have to either reason it through, or translate it in my head to Haskell (where I could reason it through more quickly and figure out what you're really up to), rather than just reading it and understanding it. Thank you for the very informative response. I was not aware of partial(), which is very useful. Readability is a tricky concept, because familiar constructs will be always more readable than anything new. I must agree though, that for an established language like Python, introducing a new construct for such a basic think could be confusing, and overall readability outcome could be negative. I still like the pipe operator (Elixir version, not results of our attempts to reproduce it at the top of existing Python API), but it is rather something that needs to be introduced to the language early, so it is familiar to everyone. Thanks, Jan From barry at python.org Mon Jun 17 18:59:45 2013 From: barry at python.org (Barry Warsaw) Date: Mon, 17 Jun 2013 12:59:45 -0400 Subject: [Python-ideas] unittest and warnings References: <6EF96F10-0954-48CE-81CA-A5A673411AEE@masklinn.net> Message-ID: <20130617125945.2cb55a0d@anarchist> On Jun 17, 2013, at 08:22 AM, Masklinn wrote: > >On 2013-06-17, at 02:32 , Giampaolo Rodola' wrote: > >> One of the features I like the most in the new unittest2 module is the >> possibility to skip tests and how they are included in the final result >> (e.g. FAILED (errors=2, failures=3, *skipped=5*)). >> After http://bugs.python.org/issue10093 it is not rare that different >> ResourceWarnings appear while running tests. >> Personally I consider these warnings as something which needs to be fixed >> so after I run tests I often scroll my console window back up to see >> whether there were warnings. >> Would it make sense for unittest module to keep track of them so that they >> get included in the final test result as it currently happens for skipped >> tests? >> >> Side note: unittest provides some "skip-related" APIs such unittest.skip*, >> TestResult.skipped and others but I don't think something similar would be >> necessary except maybe a TestResult.warnings list similar to >> TestResult.skipped. > >Why not just use -Werror? That doesn't really work so well. Those errors don't get turned into exceptions that cause tests to fail, so they still don't show up in the test suite's summary. ResourceWarnings are particularly difficult to track down, as described in my previous message to this list. +1 for Giampaolo's original suggestion that skipped tests be described in the summary. I also have to scroll back to find the ResourceWarnings, and I find that additional pain in trying to debug these things. >Alternatively, capture warnings[0] and report them as whatever you wish >to your test system. Yeah, not so much. They can be quite unpredictable, so there's no good place to set up the captures. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From abarnert at yahoo.com Mon Jun 17 19:59:08 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 17 Jun 2013 10:59:08 -0700 Subject: [Python-ideas] Elixir inspired pipe to apply a series of functions In-Reply-To: References: <1371247242.43132.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: <573BCEBD-76FD-488B-A864-9B01608B6421@yahoo.com> On Jun 17, 2013, at 2:39, Jan Wrobel wrote: > This was just an example to illustrate a pattern ('result' is not a > list but an object of a custom class, for which append returns 'this' > to allow chaining, but this is not important). Actually, this is very important--in fact, it's the root of the problem. But I'll take the full discussion off-list, because I don't think it's relevant to improving Python anymore. From g.rodola at gmail.com Mon Jun 17 20:42:11 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Mon, 17 Jun 2013 20:42:11 +0200 Subject: [Python-ideas] unittest and warnings In-Reply-To: <20130617125945.2cb55a0d@anarchist> References: <6EF96F10-0954-48CE-81CA-A5A673411AEE@masklinn.net> <20130617125945.2cb55a0d@anarchist> Message-ID: 2013/6/17 Barry Warsaw > > That doesn't really work so well. Those errors don't get turned into > exceptions that cause tests to fail, so they still don't show up in the test > suite's summary. Exactly. Speaking of which, maybe there's some space for introducing some APIs after all. By default unittest should just show warnings in the summary but there might also be an option (settable both from cmdline and / or from code) which turns those warnings into actual exceptions / errors. In an ideal world CPython tests might even have that enabled that by default. > +1 for Giampaolo's original suggestion that skipped tests be described in the > summary. I guess here you meant "tests producing warnings", right? > ResourceWarnings are particularly difficult to track down, > as described in my previous message to this list. They are indeed. Very often the traceback message gives no clue at all. Perhaps there's also space for improvements here (make the traceback longer / more detailed). - Giampaolo https://code.google.com/p/pyftpdlib https://code.google.com/p/psutil https://code.google.com/p/pysendfile -------------- next part -------------- An HTML attachment was scrubbed... URL: From vernondcole at gmail.com Tue Jun 18 13:55:58 2013 From: vernondcole at gmail.com (Vernon D. Cole) Date: Tue, 18 Jun 2013 05:55:58 -0600 Subject: [Python-ideas] unittest and warnings Message-ID: +1 for being able to get warnings counted, and optionally included with traceback. I am working on the django-mssql project, but I don't know django itself all that well. Two warnings are produced during the run of 4900 tests. (No, I did not key that incorrectly -- the test suite runs for hours on Windows.) I want to repair the code which causes the warnings -- but it is pretty hard to identify what caused them. Being able to get a full traceback, rather than a printout in the midst of the ...... would be really nice. -- Vernon Cole -------------- next part -------------- An HTML attachment was scrubbed... URL: From vinay_sajip at yahoo.co.uk Tue Jun 18 15:20:52 2013 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Tue, 18 Jun 2013 13:20:52 +0000 (UTC) Subject: [Python-ideas] unittest and warnings References: Message-ID: Giampaolo Rodola' writes: > Would it make sense for unittest module to keep track of [warnings] so > that they get included in the final test result as it currently happens > for skipped tests? I agree this would be nice. In the meantime/earlier Python versions, it may be possible to manage with something like I've implemented here: https://gist.github.com/5805215 This demonstrates that you can pinpoint where a ResourceWarning occurred. One problem with the warnings module is that it doesn't allow you to have multiple handlers, filter lists, so the approach might be problematic when the code under test is monkey-patching the warnings module itself, or filtering warnings. Regards, Vinay Sajip From barry at python.org Tue Jun 18 18:41:33 2013 From: barry at python.org (Barry Warsaw) Date: Tue, 18 Jun 2013 12:41:33 -0400 Subject: [Python-ideas] Adding zope.testrunner test selections to unittest Message-ID: <20130618124133.4de46219@anarchist> (Maybe this should better go to the TIP mailing list?) One of the last things from zope.testrunner that I really rely on is its highly flexible support for test selection: https://pypi.python.org/pypi/zope.testrunner/4.4.0#test-selection Michael and I talked briefly about adding something like this to unittest's discover, where --pattern is a pale ghost of related functionality. Has anybody looked into this before? Is it something that we could feasibly add to Python 3.4? (and/or unittest2) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From techtonik at gmail.com Fri Jun 21 10:10:23 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 21 Jun 2013 11:10:23 +0300 Subject: [Python-ideas] argparse: --help-explain [options] ... Message-ID: Inspired by QEMU: qemu -L . -m 128 -hda ReactOS.vmdk -net nic,model=ne2k_pci -net user -serial file:CON Can anybody tell what -L option does? QEMU help output is 6+ pages, and while it is not impossible to search through the content, I guess the first try for everybody is to try to lookup options visually. Going with every option one by one is rather tedious. So, the idea is that for such applications with extensive command line API, optparse (argparse or docopt) could provide --help-explain (or just --explain) option that parses command line and explains what it found. --help-explain [options] ... [options] ... --help-explain The option is position-independent to append it to existing command line. The explanation logic could be extensible, but that's a topic about state machine of valid option combinations. I am not sure argparse allows to manage that. However, docopt may be the saviour. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Jun 21 18:15:11 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 21 Jun 2013 09:15:11 -0700 Subject: [Python-ideas] argparse: --help-explain [options] ... In-Reply-To: References: Message-ID: <55A70540-2F29-4181-936B-3E99B059643C@yahoo.com> On Jun 21, 2013, at 1:10, anatoly techtonik wrote: > So, the idea is that for such applications with extensive command line API, optparse (argparse or docopt) could provide --help-explain (or just --explain) option that parses command line and explains what it found. You mean it shows just the help strings for the options given (in the order given?) and nothing else? That sounds like it could be pretty handy. And if it only works with valid option combinations it sounds like it would be pretty easy too. From greg at krypto.org Sat Jun 22 01:39:22 2013 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 21 Jun 2013 16:39:22 -0700 Subject: [Python-ideas] Adding zope.testrunner test selections to unittest In-Reply-To: <20130618124133.4de46219@anarchist> References: <20130618124133.4de46219@anarchist> Message-ID: On Tue, Jun 18, 2013 at 9:41 AM, Barry Warsaw wrote: > (Maybe this should better go to the TIP mailing list?) > > One of the last things from zope.testrunner that I really rely on is its > highly flexible support for test selection: > > https://pypi.python.org/pypi/zope.testrunner/4.4.0#test-selection > > Michael and I talked briefly about adding something like this to unittest's > discover, where --pattern is a pale ghost of related functionality. > > Has anybody looked into this before? Is it something that we could > feasibly > add to Python 3.4? (and/or unittest2) > That seems like a good idea. Today's test selection is rather... bare bones: unittest/main.py has this to say about it: elif len(args) > 0: self.testNames = args Granted I'm at the point where I generally don't bother selecting tests and just run an entire unittest file, commenting things I don't need out of the test file while working on a specific issue if its a big one. I'd stop the stupid commenting out trick if i could select from the command line with much less typing or pasting. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Jun 22 02:00:24 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 22 Jun 2013 02:00:24 +0200 Subject: [Python-ideas] Adding zope.testrunner test selections to unittest References: <20130618124133.4de46219@anarchist> Message-ID: <20130622020024.63e2e870@fsol> On Tue, 18 Jun 2013 12:41:33 -0400 Barry Warsaw wrote: > (Maybe this should better go to the TIP mailing list?) > > One of the last things from zope.testrunner that I really rely on is its > highly flexible support for test selection: > > https://pypi.python.org/pypi/zope.testrunner/4.4.0#test-selection That doesn't sound very usable to me. Why are there multiple options for the same thing? I'd like to specify tests by name or pattern regardless of whether the matched thing is a package, module, class or method. Regards Antoine. From greg at krypto.org Sat Jun 22 02:14:05 2013 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 21 Jun 2013 17:14:05 -0700 Subject: [Python-ideas] Adding zope.testrunner test selections to unittest In-Reply-To: <20130622020024.63e2e870@fsol> References: <20130618124133.4de46219@anarchist> <20130622020024.63e2e870@fsol> Message-ID: On Fri, Jun 21, 2013 at 5:00 PM, Antoine Pitrou wrote: > On Tue, 18 Jun 2013 12:41:33 -0400 > Barry Warsaw wrote: > > (Maybe this should better go to the TIP mailing list?) > > > > One of the last things from zope.testrunner that I really rely on is its > > highly flexible support for test selection: > > > > https://pypi.python.org/pypi/zope.testrunner/4.4.0#test-selection > > That doesn't sound very usable to me. Why are there multiple options > for the same thing? I'd like to specify tests by name or pattern > regardless of whether the matched thing is a package, module, class or > method. > Agreed. I'm +1 on the idea of more powerful easier to use test selection. NOT on zope.testrunner's interface. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertc at robertcollins.net Sat Jun 22 02:17:12 2013 From: robertc at robertcollins.net (Robert Collins) Date: Sat, 22 Jun 2013 12:17:12 +1200 Subject: [Python-ideas] Adding zope.testrunner test selections to unittest In-Reply-To: <20130622020024.63e2e870@fsol> References: <20130618124133.4de46219@anarchist> <20130622020024.63e2e870@fsol> Message-ID: On 22 June 2013 12:00, Antoine Pitrou wrote: > On Tue, 18 Jun 2013 12:41:33 -0400 > Barry Warsaw wrote: >> (Maybe this should better go to the TIP mailing list?) >> >> One of the last things from zope.testrunner that I really rely on is its >> highly flexible support for test selection: >> >> https://pypi.python.org/pypi/zope.testrunner/4.4.0#test-selection > > That doesn't sound very usable to me. Why are there multiple options > for the same thing? I'd like to specify tests by name or pattern > regardless of whether the matched thing is a package, module, class or > method. Yup, I've found the following things to be very useful: - discovery and selection are separate concerns : don't conflate them - regex matching on test id is super useful (and can do just about anything) - being able to load a list of precise matches from a file is very useful for automation I'd be happy to put a patch up for unittest for the above - I've been meaning to do so anyhow. -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From barry at python.org Sat Jun 22 18:02:04 2013 From: barry at python.org (Barry Warsaw) Date: Sat, 22 Jun 2013 12:02:04 -0400 Subject: [Python-ideas] Adding zope.testrunner test selections to unittest References: <20130618124133.4de46219@anarchist> <20130622020024.63e2e870@fsol> Message-ID: <20130622120204.59728fe4@anarchist> On Jun 21, 2013, at 05:14 PM, Gregory P. Smith wrote: >Agreed. I'm +1 on the idea of more powerful easier to use test selection. > NOT on zope.testrunner's interface. I personally only ever use -t and that seems sufficient, since it matches test name, class, module, or path. The key thing is that it should be possible to use multiple -t options. (It doesn't have to be -t). -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From boxed at killingar.net Sat Jun 22 12:27:52 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Sat, 22 Jun 2013 12:27:52 +0200 Subject: [Python-ideas] Short form for keyword arguments and dicts Message-ID: Keyword arguments are great for increasing readability and making code more robust but in my opinion they are underused compared to the gains they can provide. You often end up with code like: foo(bar=bar, baz=baz, foobaz=foobaz) which is less readable than the ordered argument version when the names of the variables and the keywords match. ( Here's another guy pointing out the same thing: http://stackoverflow.com/questions/7041752/any-reason-not-to-always-use-keyword-arguments#comment8553765_7041986 ) I have a suggestion that I believe can enable more usage of keyword arguments while still retaining almost all the brevity of ordered arguments: if the variable name to the right of the equal sign equals the keyword argument ("foo=foo") make it optional to just specify the name once ("=foo"). For completeness I suggest also make the same change for dictionaries: {'foo': foo} -> {:foo}. This change would turn the following code: a = 1 b = 2 c = 3 d = {'a':a, 'b':b, 'c':c} foo(a=a, b=b, c=c) into: a = 1 b = 2 c = 3 d = {:a, :b, :c} foo(=a, =b, =c) This should be compatible with existing code bases since the new forms are syntax errors in current python. What do you think? / Anders Hovm?ller -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Jun 22 20:26:54 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 22 Jun 2013 19:26:54 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: Message-ID: <51C5EC6E.9050501@mrabarnett.plus.com> On 22/06/2013 11:27, Anders Hovm?ller wrote: > Keyword arguments are great for increasing readability and making code > more robust but in my opinion they are underused compared to the gains > they can provide. You often end up with code like: > > foo(bar=bar, baz=baz, foobaz=foobaz) > > which is less readable than the ordered argument version when the names > of the variables and the keywords match. ( Here's another guy pointing > out the same thing: > http://stackoverflow.com/questions/7041752/any-reason-not-to-always-use-keyword-arguments#comment8553765_7041986) > > I have a suggestion that I believe can enable more usage of keyword > arguments while still retaining almost all the brevity of ordered > arguments: if the variable name to the right of the equal sign equals > the keyword argument ("foo=foo") make it optional to just specify the > name once ("=foo"). For completeness I suggest also make the same change > for dictionaries: {'foo': foo} -> {:foo}. This change would turn the > following code: > > a = 1 > b = 2 > c = 3 > d = {'a':a, 'b':b, 'c':c} > foo(a=a, b=b, c=c) > > into: > > a = 1 > b = 2 > c = 3 > d = {:a, :b, :c} Shouldn't that mean: d = {a:a, b:b, c:c} > foo(=a, =b, =c) > > > This should be compatible with existing code bases since the new forms > are syntax errors in current python. > > What do you think? > I'm not convinced. From boxed at killingar.net Sat Jun 22 21:23:03 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Sat, 22 Jun 2013 21:23:03 +0200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C5EC6E.9050501@mrabarnett.plus.com> References: <51C5EC6E.9050501@mrabarnett.plus.com> Message-ID: Well no, because: foo(**dict(a=a, b=b, c=c)) foo(**{'a':a, 'b':b, 'c':c}) foo(a=a, b=b, c=c) are all the same thing. So the short forms should match in the same way: foo(**dict(=a, =b, =c)) foo(**{:a, :b, :c}) foo(=a, =b, =c) On Sat, Jun 22, 2013 at 8:26 PM, MRAB wrote: > On 22/06/2013 11:27, Anders Hovm?ller wrote: > >> Keyword arguments are great for increasing readability and making code >> more robust but in my opinion they are underused compared to the gains >> they can provide. You often end up with code like: >> >> foo(bar=bar, baz=baz, foobaz=foobaz) >> >> which is less readable than the ordered argument version when the names >> of the variables and the keywords match. ( Here's another guy pointing >> out the same thing: >> http://stackoverflow.com/**questions/7041752/any-reason-** >> not-to-always-use-keyword-**arguments#comment8553765_**7041986 >> ) >> >> I have a suggestion that I believe can enable more usage of keyword >> arguments while still retaining almost all the brevity of ordered >> arguments: if the variable name to the right of the equal sign equals >> the keyword argument ("foo=foo") make it optional to just specify the >> name once ("=foo"). For completeness I suggest also make the same change >> for dictionaries: {'foo': foo} -> {:foo}. This change would turn the >> following code: >> >> a = 1 >> b = 2 >> c = 3 >> d = {'a':a, 'b':b, 'c':c} >> foo(a=a, b=b, c=c) >> >> into: >> >> a = 1 >> b = 2 >> c = 3 >> d = {:a, :b, :c} >> > > Shouldn't that mean: > > > d = {a:a, b:b, c:c} > > foo(=a, =b, =c) >> >> >> This should be compatible with existing code bases since the new forms >> are syntax errors in current python. >> >> What do you think? >> >> I'm not convinced. > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Sat Jun 22 21:33:23 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 22 Jun 2013 20:33:23 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> Message-ID: On 22 June 2013 20:23, Anders Hovm?ller wrote: > Well no, because: > > foo(**dict(a=a, b=b, c=c)) > foo(**{'a':a, 'b':b, 'c':c}) > foo(a=a, b=b, c=c) > > are all the same thing. So the short forms should match in the same way: > > foo(**dict(=a, =b, =c)) > foo(**{:a, :b, :c}) > foo(=a, =b, =c) What about: class Foo: bar = bar to class Foo: = bar ? I'm not convinced either. I like the idea, but it's not that big a deal and I don't like your proposed implementation. There are so many more cases to cover and this doesn't fill them, nor it's original one nicely. From boxed at killingar.net Sat Jun 22 21:51:06 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Sat, 22 Jun 2013 21:51:06 +0200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> Message-ID: Hmm, I wasn't aware that doing class Foo: bar = bar was even valid python, blech. But that's a pretty contrived example. I'm not suggesting something huge and radical like totally redefining how assignment works :P I'm just suggesting a small change that I believe would have repercussions far above the weight class of the change itself. "It's the little things" and all that. > > I'm not convinced either. > I like the idea, but it's not that big a deal and I don't like your > proposed implementation. > Well I think it is a big deal. I think Objective-C code bases are much easier to maintain because they have a superior syntax for calling methods. I don't like it when other languages do something as simple as calling functions better than my otherwise favorite language :P > There are so many more cases to cover and this doesn't fill them, Like what? At least name one so we can have a discussion about it! > nor it's original one nicely. > Ok. Why? I'm not saying you're wrong, I just want to know what the reasons are so I can understand why I was mistaken so I can forget about this idea :P best regards, Anders -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Sat Jun 22 22:07:01 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 22 Jun 2013 21:07:01 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> Message-ID: On 22 June 2013 20:51, Anders Hovm?ller wrote: > Hmm, I wasn't aware that doing > > class Foo: > bar = bar > > was even valid python, blech. But that's a pretty contrived example. I'm not > suggesting something huge and radical like totally redefining how assignment > works :P I'm just suggesting a small change that I believe would have > repercussions far above the weight class of the change itself. "It's the > little things" and all that. Yes, but consistency, y'know. Why "bar = bar ? = bar" here and "bar = bar !? = bar" there? >> I'm not convinced either. >> I like the idea, but it's not that big a deal and I don't like your >> proposed implementation. > > > Well I think it is a big deal. I think Objective-C code bases are much > easier to maintain because they have a superior syntax for calling methods. > I don't like it when other languages do something as simple as calling > functions better than my otherwise favorite language :P I've skimmed a bit but I'm still unsure; why? How does Objective-C deal with this? >> There are so many more cases to cover and this doesn't fill them, > > Like what? At least name one so we can have a discussion about it! One? Well, the one above! I agree that classes seem a bit far-fetched (personally I dislike that syntax) but what about: def function(arg=arg): ... def function(arg): self.arg = arg thing = dictionary[thing] and so on, which are all of the same form, albeit with different "surroundings". We can't just double the whole syntax of Python for this! >> nor it's original one nicely. > > Ok. Why? I'm not saying you're wrong, I just want to know what the reasons > are so I can understand why I was mistaken so I can forget about this idea > :P Does foo(=bar) not bug you? Really? From fuzzyman at gmail.com Sat Jun 22 22:18:53 2013 From: fuzzyman at gmail.com (Michael Foord) Date: Sat, 22 Jun 2013 22:18:53 +0200 Subject: [Python-ideas] Adding zope.testrunner test selections to unittest In-Reply-To: <20130622120204.59728fe4@anarchist> References: <20130618124133.4de46219@anarchist> <20130622020024.63e2e870@fsol> <20130622120204.59728fe4@anarchist> Message-ID: On 22 June 2013 18:02, Barry Warsaw wrote: > On Jun 21, 2013, at 05:14 PM, Gregory P. Smith wrote: > > >Agreed. I'm +1 on the idea of more powerful easier to use test selection. > > NOT on zope.testrunner's interface. > > I personally only ever use -t and that seems sufficient, since it > matches test name, class, module, or path. The key thing is that it > should be > possible to use multiple -t options. (It doesn't have to be -t). > I'd be very happy with this being added to test discovery. It's not even very difficult, testing it is more of a pain than implementing it! So yes, patches very welcome. Michael > > -Barry > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Sat Jun 22 23:01:56 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Sat, 22 Jun 2013 23:01:56 +0200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> Message-ID: > Yes, but consistency, y'know. Why "bar = bar ? = bar" here and "bar = > bar !? = bar" there? Keyword arguments aren't the same thing as assignment. The consistency ship has sailed and we're not on it :P > I've skimmed a bit but I'm still unsure; why? How does Objective-C > deal with this? They deal with it with the nuclear option: all methods are 100% of the time ordered AND named. Unfortunately this has the side effect of a lot of method calls like "[foo setX:x y:y width:width height:height font:font]" which is almost exactly the same as the worst case in python: "foo.setStuff(x=x, y=y, width=width, height=height, font=font)". This rather brutish approach forces a rather verbose style of writing, which is annoying when you just want to bang out a prototype but extremely valuable when having to maintain large code bases down the line. One of the main points about my suggestion is that there should be almost no overhead to using keyword arguments when just passing arguments along or when variable names are nice and descriptive. This would create a lower barrier to use, which in turn leads to more solid and more readable code (I hope!). > >> There are so many more cases to cover and this doesn't fill them, > > > > Like what? At least name one so we can have a discussion about it! > > One? Well, the one above! > > I agree that classes seem a bit far-fetched (personally I dislike that > syntax) but what about: > > def function(arg=arg): > ... > > def function(arg): > self.arg = arg (Where is self defined?) > > thing = dictionary[thing] > > and so on, which are all of the same form, albeit with different > "surroundings". We can't just double the whole syntax of Python for > this! I'm not suggesting that though. It seems to me like you're taking my suggestion in absurdum. > Does foo(=bar) not bug you? > Really? > Compared to foo(bar=bar)? No. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Jun 22 23:24:26 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 22 Jun 2013 14:24:26 -0700 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> Message-ID: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> On Jun 22, 2013, at 14:01, Anders Hovm?ller wrote: >> I've skimmed a bit but I'm still unsure; why? How does Objective-C >> deal with this? > > They deal with it with the nuclear option: all methods are 100% of the time ordered AND named. Unfortunately this has the side effect of a lot of method calls like "[foo setX:x y:y width:width height:height font:font]" which is almost exactly the same as the worst case in python: "foo.setStuff(x=x, y=y, width=width, height=height, font=font)". > > This rather brutish approach forces a rather verbose style of writing, which is annoying when you just want to bang out a prototype but extremely valuable when having to maintain large code bases down the line. You initially said that ObjC deals with this problem better than Python, and now you say that it's better because it forces you to use the keyword names (actually they're part of the method name, but let's ignore that) _always_, which Python only forces you to do it when not using them positionally. I don't understand why you're making this argument in support of a proposal that would make Python even less explicit about keyword names, less like ObjC, and, by your analysis, harder to maintain and therefore worse. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jun 23 03:26:22 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 23 Jun 2013 11:26:22 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> Message-ID: On 23 June 2013 06:07, Joshua Landau wrote: > On 22 June 2013 20:51, Anders Hovm?ller wrote: >> Ok. Why? I'm not saying you're wrong, I just want to know what the reasons >> are so I can understand why I was mistaken so I can forget about this idea >> :P > > Does foo(=bar) not bug you? > Really? Indeed, any kind of "implied LHS" notation isn't going to happen in the foreseeable future for Python. That said, the repetitiveness of passing several local variables as keyword arguments *is* mildly annoying. A potential more fruitful avenue to explore may be to find a clean notation for extracting a subset of keys (along with their values) into a dictionary. With current syntax, the original example can already be written as something like this: >>> def submap(original, *names): ... return type(original)((name, original[name]) for name in names) ... >>> a, b, c = 1, 2, 3 >>> submap(locals(), "a", "b", "c") {'a': 1, 'b': 2, 'c': 3} >>> f(**submap(locals(), "a", "b", "c")) {'a': 1, 'b': 2, 'c': 3} >>> from collections import OrderedDict >>> o = OrderedDict.fromkeys((1, 2, 3, 4, 5)) >>> o OrderedDict([(1, None), (2, None), (3, None), (4, None), (5, None)]) >>> submap(o, 1, 3, 5) OrderedDict([(1, None), (3, None), (5, None)]) (There are also plenty of possible variants on that idea, including making it a class method of the container, rather than deriving the output type from the input type) This is potentially worth pursuing, as Python currently only supports "operator.itemgetter" and comprehensions/generator expressions as a mechanism for retrieving multiple items from a container in a single expression. A helper function in collections, or a new classmethod on collections.Mapping aren't outside the realm of possibility. For the question of "How do I enlist the compiler's help in ensuring a string is an identifier?", you can actually already do that with a simple helper class: >>> class Identifiers: ... def __getattr__(self, attr): ... return attr ... >>> ident = Identifiers() >>> ident.a 'a' >>> ident.b 'b' >>> ident.c 'c' Combining the two lets you write things like: >>> submap(locals(), ident.a, ident.b, ident.c) {'a': 1, 'b': 2, 'c': 3} Personally, I'd like to see more exploration of what the language *already supports* in this area, before we start talking about adding dedicated syntax. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Sun Jun 23 03:51:29 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 23 Jun 2013 11:51:29 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: Message-ID: <51C654A1.5020704@pearwood.info> On 22/06/13 20:27, Anders Hovm?ller wrote: > Keyword arguments are great for increasing readability and making code more > robust but in my opinion they are underused compared to the gains they can > provide. You often end up with code like: > > foo(bar=bar, baz=baz, foobaz=foobaz) "Often"? In my experience, it's more like "very occasionally", and even then usually only one or two arguments: foo(spam, "yummy meatlike substance", foobaz=eggs) > which is less readable than the ordered argument version when the names of > the variables and the keywords match. ( Here's another guy pointing out the > same thing: > http://stackoverflow.com/questions/7041752/any-reason-not-to-always-use-keyword-arguments#comment8553765_7041986 > ) All very well and good, but you'll notice that in the entire discussion about keywords, with many, many examples give, there was *one* comment about matching names. In practice I don't believe this happens often enough to be more than an occasional nuisance. Not even a nuisance really. There is a moment of micro-surprise "oh look, a name is repeated" but that's all. > I have a suggestion that I believe can enable more usage of keyword > arguments while still retaining almost all the brevity of ordered > arguments: if the variable name to the right of the equal sign equals the > keyword argument ("foo=foo") make it optional to just specify the name once > ("=foo"). For completeness I suggest also make the same change for > dictionaries: {'foo': foo} -> {:foo}. This change would turn the following > code: > > a = 1 > b = 2 > c = 3 > d = {'a':a, 'b':b, 'c':c} > foo(a=a, b=b, c=c) > > into: > > a = 1 > b = 2 > c = 3 > d = {:a, :b, :c} > foo(=a, =b, =c) Ewww, that's hideous, and far worse than the (non-)problem you are trying to solve. But even if you disagree about the ugliness of code written in that form, consider what a special case you are looking at: given a function that takes an argument "eggs", there is an infinite number of potential arguments you *could* give. You can give arguments in positional form or keyword form, as literals, expressions, or as a single name. Just the names alone, there is a (near enough to) infinite number of possibilities: function(eggs=a) function(eggs=b) function(eggs=x) function(eggs=counter) function(eggs=flag) function(eggs=sequence) function(eggs=value) function(eggs=spam) function(eggs=ham) function(eggs=foo) function(eggs=bar) Out of this vast number of possible names that might be given, you wish to introduce new syntax to cover just one single special case: spam(eggs=eggs) => spam(=eggs) The Zen of Python has not one but two koans that cover this idea: py> import this [...] Explicit is better than implicit. Special cases aren't special enough to break the rules. You would invent new syntax, which requires more complexity in the parser, more tests, more documentation, yet another thing for people to learn, for such an utterly tiny and marginal decrease in typing that even if we agreed it was worthwhile in that special case, it is (in my opinion) hardly worth the effort. So: - I don't believe this is a problem that needs to be solved; - Even if it is a problem, I don't believe your proposal is a good solution. -- Steven From boxed at killingar.net Sun Jun 23 10:22:26 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Sun, 23 Jun 2013 10:22:26 +0200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> Message-ID: > You initially said that ObjC deals with this problem better than Python, > and now you say that it's better because it forces you to use the keyword > names (actually they're part of the method name, but let's ignore that) > _always_, which Python only forces you to do it when not using them > positionally. > > I don't understand why you're making this argument in support of a > proposal that would make Python even less explicit about keyword names, > less like ObjC, and, by your analysis, harder to maintain and therefore > worse. > I think you and I are talking about different things when talking about "this problem". For me the problem is to avoid stuff like "foo(1, 'foo', None, 9, 'baz')", not avoid repeating names. I just believe that python has syntax that promotes positional arguments even when it makes the code worse. My suggestion might on the surface look like just a way to type less, but that misses the point. It's about shifting the balance towards keyword arguments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sun Jun 23 10:39:28 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 23 Jun 2013 20:39:28 +1200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> Message-ID: <51C6B440.4000508@canterbury.ac.nz> Joshua Landau wrote: > What about: > > class Foo: > bar = bar That would be going too far, I think. I can't remember *ever* needing to write code like that in a class. On the other hand, passing function arguments received from a caller on to another function under the same names is very common. Also, it's a somewhat dubious thing to write anyway, since it relies on name lookups in a class scope working dynamically. While they currently do in CPython, I wouldn't like to rely on that always remaining the case. I'm not sure about the dictionary case. It's not strictly necessary, since if you have it for keyword arguments, you can do dict(=a, =b, =c). So I'm +1 on allowing this for function arguments, -0 for dicts, and -1 on anything else. -- Greg From steve at pearwood.info Sun Jun 23 11:21:06 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 23 Jun 2013 19:21:06 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> Message-ID: <51C6BE02.8020008@pearwood.info> On 23/06/13 18:22, Anders Hovm?ller wrote: > I think you and I are talking about different things when talking about > "this problem". For me the problem is to avoid stuff like "foo(1, 'foo', > None, 9, 'baz')", not avoid repeating names. Your suggestion doesn't do a thing to avoid code like the above, since all the arguments are literals. -- Steven From abarnert at yahoo.com Sun Jun 23 11:41:13 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 23 Jun 2013 02:41:13 -0700 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> Message-ID: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> On Jun 23, 2013, at 1:22, Anders Hovm?ller wrote: > >> You initially said that ObjC deals with this problem better than Python, and now you say that it's better because it forces you to use the keyword names (actually they're part of the method name, but let's ignore that) _always_, which Python only forces you to do it when not using them positionally. >> >> I don't understand why you're making this argument in support of a proposal that would make Python even less explicit about keyword names, less like ObjC, and, by your analysis, harder to maintain and therefore worse. > > I think you and I are talking about different things when talking about "this problem". For me the problem is to avoid stuff like "foo(1, 'foo', None, 9, 'baz')", not avoid repeating names. But your suggestion wouldn't affect that at all, as not a single one of the arguments is a variable, much less a variable with the same name as a keyword parameter. And I don't think it's a coincidence that you came up with a bad example--I think good examples are very rare. > I just believe that python has syntax that promotes positional arguments even when it makes the code worse. My suggestion might on the surface look like just a way to type less, but that misses the point. It's about shifting the balance towards keyword arguments. I don't think it does. It shifts the balance toward creating unnecessary local variables instead of explicit keyword names. Let's look at your example again, five different ways: foo(1, 'foo', None, 9, 'baz') foo(bar=1, baz='foo', qux=None, spam=9, eggs='baz') bar, baz, qux, spam, eggs = 1, 'foo', None, 9, 'baz' foo(bar, baz, qux, spam, eggs) bar, baz, qux, spam, eggs = 1, 'foo', None, 9, 'baz' foo(bar=bar, baz=baz, qux=qux, spam=spam, eggs=eggs) bar, baz, qux, spam, eggs = 1, 'foo', None, 9, 'baz' foo(=bar, =baz, =qux, =spam, =eggs) I'll agree that the 5th is better than the 4th. But the 2nd and 3rd are also much better than the 4th, and the 5th. In particular, if your arguments are already in variables with the same name as the parameters, adding the keyword names doesn't add anything. That may not be so obvious with this silly example, so let's take a real example: In an expression like "Barrier(4, f, 5)", it's completely unclear what the arguments mean without reading the help. Even with "Barrier(len(threads), callback, 5)" it's not very clear. But with "Barrier(parties, action, timeout)" there's no confusion at all. Your suggestion would do nothing to encourage the use of keywords in the first two cases, where they're essential, but only in the last case, where they don't add any information. On top of that, if the most natural names for your variables do not match the keywords, your change would encourage renaming them just to access the syntactic sugar. The only common case where I see this being useful is in the construction of dict (and other mappings), and I think the alternative ideas described by (IIRC) Nick Coghlan are much more interesting for that use case. But just because I can't imagine it doesn't mean it's not real. If you can show some real-life code, or even realistic fake code, where there are variables that match the parameter names, and the author either used keywords leading to overly verbose code, or should have used them but didn't leading to confusing code, please offer up the examples. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun Jun 23 11:46:08 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 23 Jun 2013 02:46:08 -0700 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> Message-ID: <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> In fact, going back to the SO comment you linked in your original answer, it argues against your idea for the same reason: > Indeed, keyword arguments are really useful when passing literals to a function. However, if the arguments are variables with clear enough names, it becomes very noisy. Consider: create_user(first_name=first_name, last_name=last_name, contact_email=contact_email, ...). He's clearly saying that you should use keywords when they add information, and not use them when they add nothing but noise. Your suggestion would make them add _less_ noise in one particular case where they add nothing but noise, but that's not a problem that needs to be solved, because Python already has a solution: don't use keywords in that case. Adding the extra = before each parameter just makes things less readable without giving any new information, so why should we add syntax to encourage it? Sent from a random iPhone On Jun 23, 2013, at 2:41, Andrew Barnert wrote: > On Jun 23, 2013, at 1:22, Anders Hovm?ller wrote: > >> >>> You initially said that ObjC deals with this problem better than Python, and now you say that it's better because it forces you to use the keyword names (actually they're part of the method name, but let's ignore that) _always_, which Python only forces you to do it when not using them positionally. >>> >>> I don't understand why you're making this argument in support of a proposal that would make Python even less explicit about keyword names, less like ObjC, and, by your analysis, harder to maintain and therefore worse. >> >> I think you and I are talking about different things when talking about "this problem". For me the problem is to avoid stuff like "foo(1, 'foo', None, 9, 'baz')", not avoid repeating names. > > But your suggestion wouldn't affect that at all, as not a single one of the arguments is a variable, much less a variable with the same name as a keyword parameter. > > And I don't think it's a coincidence that you came up with a bad example--I think good examples are very rare. > >> I just believe that python has syntax that promotes positional arguments even when it makes the code worse. My suggestion might on the surface look like just a way to type less, but that misses the point. It's about shifting the balance towards keyword arguments. > > I don't think it does. It shifts the balance toward creating unnecessary local variables instead of explicit keyword names. Let's look at your example again, five different ways: > > foo(1, 'foo', None, 9, 'baz') > > foo(bar=1, baz='foo', qux=None, spam=9, eggs='baz') > > bar, baz, qux, spam, eggs = 1, 'foo', None, 9, 'baz' > foo(bar, baz, qux, spam, eggs) > > bar, baz, qux, spam, eggs = 1, 'foo', None, 9, 'baz' > foo(bar=bar, baz=baz, qux=qux, spam=spam, eggs=eggs) > > bar, baz, qux, spam, eggs = 1, 'foo', None, 9, 'baz' > foo(=bar, =baz, =qux, =spam, =eggs) > > I'll agree that the 5th is better than the 4th. But the 2nd and 3rd are also much better than the 4th, and the 5th. > > In particular, if your arguments are already in variables with the same name as the parameters, adding the keyword names doesn't add anything. > > That may not be so obvious with this silly example, so let's take a real example: > > In an expression like "Barrier(4, f, 5)", it's completely unclear what the arguments mean without reading the help. Even with "Barrier(len(threads), callback, 5)" it's not very clear. But with "Barrier(parties, action, timeout)" there's no confusion at all. > > Your suggestion would do nothing to encourage the use of keywords in the first two cases, where they're essential, but only in the last case, where they don't add any information. > > On top of that, if the most natural names for your variables do not match the keywords, your change would encourage renaming them just to access the syntactic sugar. > > The only common case where I see this being useful is in the construction of dict (and other mappings), and I think the alternative ideas described by (IIRC) Nick Coghlan are much more interesting for that use case. > > But just because I can't imagine it doesn't mean it's not real. If you can show some real-life code, or even realistic fake code, where there are variables that match the parameter names, and the author either used keywords leading to overly verbose code, or should have used them but didn't leading to confusing code, please offer up the examples. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Sun Jun 23 13:17:07 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Sun, 23 Jun 2013 13:17:07 +0200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> Message-ID: > He's clearly saying that you should use keywords when they add > information, and not use them when they add nothing but noise. Your > suggestion would make them add _less_ noise in one particular case where > they add nothing but noise, but that's not a problem that needs to be > solved, because Python already has a solution: don't use keywords in that > case. Adding the extra = before each parameter just makes things less > readable without giving any new information, so why should we add syntax to > encourage it? > Because keyword arguments are less brittle. Consider the case you replied to: create_user(first_name=first_name, last_name=last_name, contact_email=contact_email) if you change it to: create_user(first_name, last_name, contact_email) it is more readable but it doesn't actually mean the same thing. The first piece of code will break in a nice way when the keyword argument list is changed in the definition of create_user. The second will probably fail, but later and in some not so nice way like suddenly you have emails in your database where you should've had addresses. Objective-C is better in this case because it strongly enforces something like keyword argument always. What I'm saying is that it'd be nice to be able to write code that uses keyword arguments 100% of the time for all function calls without making the readability worse. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jun 23 14:22:16 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 23 Jun 2013 22:22:16 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> Message-ID: <51C6E878.1020003@pearwood.info> On 23/06/13 21:17, Anders Hovm?ller wrote: > Objective-C is better in this case because it strongly enforces something > like keyword argument always. What I'm saying is that it'd be nice to be > able to write code that uses keyword arguments 100% of the time for all > function calls without making the readability worse. Nobody is stopping you from using keyword arguments 100% of the time (except for built-in functions that don't accept keyword arguments, but most of them take only one or two arguments). Go right ahead. I love keyword arguments! But this discussion isn't about the pros and cons of keyword arguments. This discussion is about adding magic syntax for implicitly specifying the keyword parameter name when it happens to match an argument which is an expression consisting of a single name. With your suggestion, you can abbreviate this special case: create_user(first_name=first_name, last_name=last_name, contact_email=contact_email) with this: create_user(=first_name, =last_name, =contact_email) (which I consider too ugly for words), but it does absolutely nothing for: create_user(first_name=record[3], last_name=record[2], contact_email=record.email) create_user(first_name=personal_name, last_name=family_name, contact_email=email_address) create_user(first_name="Steven", last_name="D'Aprano", contact_email="steve at example.com") create_user(first_name=first_name.title(), last_name=last_name.title(), contact_email=validate_and_clean(contact_email)) The special case "parameter name matches exactly argument expression" is far too special, and the benefit far too minor, to deserve special syntax. Oh, one last thing... your suggestion is also brittle. If you refactor the variable name, or change the function parameter name, code using this shortcut will break. Parameter names are part of the function API and shouldn't change, but variable names are not, and should be free to change. With your suggestion, they can't. -- Steven From ncoghlan at gmail.com Sun Jun 23 15:05:49 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 23 Jun 2013 23:05:49 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> Message-ID: On 23 Jun 2013 18:23, "Anders Hovm?ller" wrote: > > >> You initially said that ObjC deals with this problem better than Python, and now you say that it's better because it forces you to use the keyword names (actually they're part of the method name, but let's ignore that) _always_, which Python only forces you to do it when not using them positionally. >> >> I don't understand why you're making this argument in support of a proposal that would make Python even less explicit about keyword names, less like ObjC, and, by your analysis, harder to maintain and therefore worse. > > > I think you and I are talking about different things when talking about "this problem". For me the problem is to avoid stuff like "foo(1, 'foo', None, 9, 'baz')", not avoid repeating names. I just believe that python has syntax that promotes positional arguments even when it makes the code worse. My suggestion might on the surface look like just a way to type less, but that misses the point. It's about shifting the balance towards keyword arguments. Then use Python 3 and declare your functions with keyword-only arguments. That means your APIs can no longer be invoked with positional arguments. You can do the same in Python 2 by accepting arbitrary kwargs and unpacking them with an inner function or retrieving them directly from the dictionary. We do this ourselves in the standard library for APIs where we expect it to significantly improve clarity at call sites (consider the "key" and "reverse" arguments to sorted and list.sort). Cheers, Nick. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Sun Jun 23 15:30:09 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Sun, 23 Jun 2013 15:30:09 +0200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C6E878.1020003@pearwood.info> References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> Message-ID: > (which I consider too ugly for words), but it does absolutely nothing for: > > > create_user(first_name=record[**3], last_name=record[2], > contact_email=record.email) > > create_user(first_name=**personal_name, last_name=family_name, > contact_email=email_address) > > create_user(first_name="**Steven", last_name="D'Aprano", contact_email=" > steve at example.**com ") > > create_user(first_name=first_**name.title(), last_name=last_name.title(), > contact_email=validate_and_**clean(contact_email)) > > The special case "parameter name matches exactly argument expression" is > far too special, and the benefit far too minor, to deserve special syntax. > I disagree. I think small things can have big impacts because the design of a system shapes the usage of the system. > Oh, one last thing... your suggestion is also brittle. If you refactor the > variable name, or change the function parameter name, code using this > shortcut will break. Let's go through that statement. Refactoring variable names: yes, if you search/replace without checking the diff or using a tool that doesn't understand the syntax that'd probably screw it up. Which of course is true whenever you use the wrong tool for the wrong job and you're sloppy about it. The code will still break by pointing out that there's no such argument to the function which is better than positional arguments, and if you're sloppy about it you'd screw up "foo(bar=bar)" when trying to rename "bar". So that argument is pretty clearly moot. If you change the function parameter name: then all calls using keyword arguments will fail with a pretty good error message. This is 100% the same between python code today and with my shortcut. So again, moot. > Parameter names are part of the function API and shouldn't change, but > variable names are not, and should be free to change. With your suggestion, > they can't. The transformation to the code when changing variables names will in some cases be bigger yes. Saying that variable names aren't free to change is hyperbole though. As for parameter names being part of an API, well yes, that's true, but code changes. Just saying that we should never change the parameter names of any function after it has been called once isn't what you meant right? -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Sun Jun 23 15:42:48 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 23 Jun 2013 14:42:48 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C6B440.4000508@canterbury.ac.nz> References: <51C5EC6E.9050501@mrabarnett.plus.com> <51C6B440.4000508@canterbury.ac.nz> Message-ID: On 23 June 2013 09:39, Greg Ewing wrote: > Joshua Landau wrote: >> >> What about: >> >> class Foo: >> bar = bar > > That would be going too far, I think. I can't remember *ever* > needing to write code like that in a class. I have. Like, once, though. > Also, it's a somewhat dubious thing to write anyway, since it > relies on name lookups in a class scope working dynamically. > While they currently do in CPython, I wouldn't like to rely on > that always remaining the case. Is this not a defined behaviour? I wouldn't expect this to change before 4.0, and that's a different ballgame. Does it break in some other implementations? From joshua.landau.ws at gmail.com Sun Jun 23 15:49:00 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 23 Jun 2013 14:49:00 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> Message-ID: On 22 June 2013 22:01, Anders Hovm?ller wrote: > >> Yes, but consistency, y'know. Why "bar = bar ? = bar" here and "bar = >> bar !? = bar" there? > > > Keyword arguments aren't the same thing as assignment. The consistency ship > has sailed and we're not on it :P Fair 'nuf, except that you wanted {:this, :ugly, :not, :a, :set, :but, :looks, :like, :one} too. Why "we need to be consistent here" but "we don't here, btw"? >> I've skimmed a bit but I'm still unsure; why? How does Objective-C >> deal with this? > > > They deal with it with the nuclear option: all methods are 100% of the time > ordered AND named. Unfortunately this has the side effect of a lot of method > calls like "[foo setX:x y:y width:width height:height font:font]" which is > almost exactly the same as the worst case in python: "foo.setStuff(x=x, y=y, > width=width, height=height, font=font)". > > This rather brutish approach forces a rather verbose style of writing, which > is annoying when you just want to bang out a prototype but extremely > valuable when having to maintain large code bases down the line. One of the > main points about my suggestion is that there should be almost no overhead > to using keyword arguments when just passing arguments along or when > variable names are nice and descriptive. This would create a lower barrier > to use, which in turn leads to more solid and more readable code (I hope!). As other people have pointed out, you've sort'a just contradicted yourself. >> >> There are so many more cases to cover and this doesn't fill them, >> > >> > Like what? At least name one so we can have a discussion about it! >> >> One? Well, the one above! >> >> I agree that classes seem a bit far-fetched (personally I dislike that >> syntax) but what about: >> >> def function(arg=arg): >> ... >> >> def function(arg): >> self.arg = arg > > (Where is self defined?) Yeah, you know what I meant. >> thing = dictionary[thing] >> >> and so on, which are all of the same form, albeit with different >> "surroundings". We can't just double the whole syntax of Python for >> this! > > I'm not suggesting that though. It seems to me like you're taking my > suggestion in absurdum. Hang on; you asked me to point out cases this *didn't* cover. I'd be very surprised if you managed to both not-propose and propose any of these. >> Does foo(=bar) not bug you? >> Really? > > Compared to foo(bar=bar)? No. We'll just have to disagree. Strongly. From techtonik at gmail.com Sun Jun 23 18:49:01 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 23 Jun 2013 19:49:01 +0300 Subject: [Python-ideas] Python execution progress counter Message-ID: In x86 assembly there is a concept of instruction pointer (IP) that indicates where a computer is in its program sequence (c) http://en.wikipedia.org/wiki/Program_counter I wonder if it is possible to implement a similar thing in Python? Not the instruction pointer, which points to the linear memory address space (and therefore is a 1D structure), but a 2D Execution Progress Counter, which counts position in program sequence for every stack level? -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Sun Jun 23 20:34:55 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 23 Jun 2013 21:34:55 +0300 Subject: [Python-ideas] argparse: --help-explain [options] ... In-Reply-To: <55A70540-2F29-4181-936B-3E99B059643C@yahoo.com> References: <55A70540-2F29-4181-936B-3E99B059643C@yahoo.com> Message-ID: On Fri, Jun 21, 2013 at 7:15 PM, Andrew Barnert wrote: > On Jun 21, 2013, at 1:10, anatoly techtonik wrote: > > > So, the idea is that for such applications with extensive command line > API, optparse (argparse or docopt) could provide --help-explain (or just > --explain) option that parses command line and explains what it found. > > You mean it shows just the help strings for the options given (in the > order given?) and nothing else? > Exactly. Although the logic can probably be extended to help users further. Like checking rules when input option combination is invalid, or should produce a warning, for example when using deprecated option. But right now I don't see how it can be implemented easily. That sounds like it could be pretty handy. And if it only works with valid > option combinations it sounds like it would be pretty easy too. I guess so. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun Jun 23 20:38:33 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 23 Jun 2013 11:38:33 -0700 Subject: [Python-ideas] Python execution progress counter In-Reply-To: References: Message-ID: On Jun 23, 2013, at 9:49, anatoly techtonik wrote: > In x86 assembly there is a concept of instruction pointer (IP) that indicates where a computer is in its program sequence (c) http://en.wikipedia.org/wiki/Program_counter > > > I wonder if it is possible to implement a similar thing in Python? Not the instruction pointer, which points to the linear memory address space (and therefore is a 1D structure), but a 2D Execution Progress Counter, which counts position in program sequence for every stack level? If I understand what you're asking here, the position for every stack level is just the return pointers on the stack. So, are you asking for sys._getframe to be changed from CPython implementation detail to part of the language? Also, what do you want this for? Are you unhappy with the debugger, or tracebacks, or with how hard it is to build alternatives to them? Looking to do something like generator suspension in a regular function? Hoping to suspend and save interpreter state to resume later, like some Smalltalk and Lisp interpreters? -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Sun Jun 23 21:42:51 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 23 Jun 2013 22:42:51 +0300 Subject: [Python-ideas] Get the class that *defines* a method Message-ID: Currently, the only way to get the name of the class that *defines* a method is to get chain of parent classes from inspect.getmro() and look into every class's dict until a given name is found. Knowing that dict contains not only methods, and knowing that it can be modified at run-time, this doesn't seem too reliable for me (unless Python itself does the same). At first I wanted to propose an enhancement to runtime skeleton of Python, which is 2D lookup tree for objects and containers that define them. But then I realized that it will may not reflect the model I need. For example, classes need to provide their parent classes, but in my model classes need to provide module name (or function name, or method name) in which they are defined. And while writing this I realized that *definition* scope may be different from *run-time* scope, and Python doesn't make it clear: >>> def lll(): ... class A(object): ... pass ... a=A() ... return a ... >>> lll() <__main__.A object at 0x948252c> >>> __main__.A Traceback (most recent call last): File "", line 1, in NameError: name '__main__' is not defined The A object is said to be in __main__ namespace, but it seems to be a run-time namespace local to function and it seems like Python loses this information. There is no information about the function that defined the class (owner, parent or .?.), and hence no info about container of the function, which makes it hard to assume the scope of variables for this class at run-time. So, the above is a generalization of a simple idea - store the "structure reference" of the class that *defines* a method inside this method. "structure reference" here is the address in the nested scopes formed by Python definitions. The specific action items for you here are: 1. is that stuff will be useful (for me it brings some much needed consistency into the chaos of run-time Python object space) 2. what is the best way to define/cache the reference to the class defining the method? 3. what is the best way to define/cache the reference to the scope defining the method? 4. what is the best way to organize storing of this static scope structure information at run-time? See method.im_class note at http://docs.python.org/2/library/inspect.html#types-and-members "Namespaces are one honking great idea -- let's do more of those!" (c) import this -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From amauryfa at gmail.com Sun Jun 23 22:23:57 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Sun, 23 Jun 2013 22:23:57 +0200 Subject: [Python-ideas] Get the class that *defines* a method In-Reply-To: References: Message-ID: Hi, 2013/6/23 anatoly techtonik > Currently, the only way to get the name of the class that *defines* a > method is to get chain of parent classes from inspect.getmro() and look > into every class's dict until a given name is found. Knowing that dict > contains not only methods, and knowing that it can be modified at run-time, > this doesn't seem too reliable for me (unless Python itself does the same). > Python 3.3 has __qualname__, which may be very useful in your case: http://docs.python.org/3/whatsnew/3.3.html#pep-3155-qualified-name-for-classes-and-functions At first I wanted to propose an enhancement to runtime skeleton of Python, > which is 2D lookup tree for objects and containers that define them. But > then I realized that it will may not reflect the model I need. For example, > classes need to provide their parent classes, but in my model classes need > to provide module name (or function name, or method name) in which they are > defined. > > And while writing this I realized that *definition* scope may be different > from *run-time* scope, and Python doesn't make it clear: > > >>> def lll(): > ... class A(object): > ... pass > ... a=A() > ... return a > ... > >>> lll() > <__main__.A object at 0x948252c> > >>> __main__.A > Traceback (most recent call last): > File "", line 1, in > NameError: name '__main__' is not defined > > The A object is said to be in __main__ namespace, but it seems to be a > run-time namespace local to function and it seems like Python loses this > information. There is no information about the function that defined the > class (owner, parent or .?.), and hence no info about container of the > function, which makes it hard to assume the scope of variables for this > class at run-time. > > > So, the above is a generalization of a simple idea - store the "structure > reference" of the class that *defines* a method inside this method. > "structure reference" here is the address in the nested scopes formed by > Python definitions. > > The specific action items for you here are: > 1. is that stuff will be useful (for me it brings some much needed > consistency into the chaos of run-time Python object space) > 2. what is the best way to define/cache the reference to the class > defining the method? > 3. what is the best way to define/cache the reference to the scope > defining the method? > 4. what is the best way to organize storing of this static scope structure > information at run-time? > > See method.im_class note at > http://docs.python.org/2/library/inspect.html#types-and-members > > "Namespaces are one honking great idea -- let's do more of those!" (c) > import this > -- > anatoly t. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun Jun 23 22:25:39 2013 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 23 Jun 2013 21:25:39 +0100 Subject: [Python-ideas] Python execution progress counter In-Reply-To: References: Message-ID: On 2013-06-23 17:49, anatoly techtonik wrote: > In x86 assembly there is a concept of instruction pointer (IP) that indicates > where a computer is in its program sequence (c) > http://en.wikipedia.org/wiki/Program_counter > > I wonder if it is possible to implement a similar thing in Python? Not the > instruction pointer, which points to the linear memory address space (and > therefore is a 1D structure), but a 2D Execution Progress Counter, which counts > position in program sequence for every stack level? The `f_lasti` attribute of frame objects records the index of the last bytecode instruction that was executed in that frame. http://docs.python.org/3/reference/datamodel.html#frame-objects This is, of course, an implementation detail of CPython. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sun Jun 23 22:54:37 2013 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 23 Jun 2013 21:54:37 +0100 Subject: [Python-ideas] Get the class that *defines* a method In-Reply-To: References: Message-ID: On 2013-06-23 20:42, anatoly techtonik wrote: > Currently, the only way to get the name of the class that *defines* a method is > to get chain of parent classes from inspect.getmro() and look into every class's > dict until a given name is found. Knowing that dict contains not only methods, > and knowing that it can be modified at run-time, this doesn't seem too reliable > for me (unless Python itself does the same). I think you can most robustly replicate what Python does by walking up the MRO and checking for the method by calling each class's __getattribute__() until it stops giving you the same method. The last class to give you the same method that type(the_instance) gives you is the class that defines the method. > At first I wanted to propose an enhancement to runtime skeleton of Python, which > is 2D lookup tree for objects and containers that define them. But then I > realized that it will may not reflect the model I need. For example, classes > need to provide their parent classes, but in my model classes need to provide > module name (or function name, or method name) in which they are defined. > > And while writing this I realized that *definition* scope may be different from > *run-time* scope, and Python doesn't make it clear: > > >>> def lll(): > ... class A(object): > ... pass > ... a=A() > ... return a > ... > >>> lll() > <__main__.A object at 0x948252c> > >>> __main__.A > Traceback (most recent call last): > File "", line 1, in > NameError: name '__main__' is not defined > > The A object is said to be in __main__ namespace, but it seems to be a run-time > namespace local to function and it seems like Python loses this information. Classes defined in functions get assigned the name of the module that it is in, in this case `__main__` which is a real module recorded under that name in sys.modules. The example you have given doesn't show what you think it shows, just that the __main__ module (which the interpreter executes its code in) doesn't have a reference to itself. In general, modules don't include themselves in their namespace. You can get the class of an instance in the usual manner: type(). It is true that you cannot access that class *by name alone*. This is not fixable, even in principle. > There is no information about the function that defined the class (owner, parent > or .?.), and hence no info about container of the function, which makes it hard > to assume the scope of variables for this class at run-time. If you had actually defined a method and actually used a variable from the local namespace (the only time where this information matters), the relevant information will be recorded in the `func_closure` attribute of the function object underneath the method. [~] |29> def foo(x): ...> class A(object): ...> def foo(self): ...> print x ...> return A() ...> [~] |30> a = foo(10) [~] |31> foo_method = type(a).foo [~] |32> foo_method [~] |33> foo_method.im foo_method.im_class foo_method.im_func foo_method.im_self [~] |33> foo_method.im_func [~] |34> foo_method.im_func.func_closure (,) [~] |35> foo_method.im_func.func_closure[0].cell_contents 10 You don't need to know about the function where the class definition is contained in to get this information. The function objects carry it around with them. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ron3200 at gmail.com Mon Jun 24 00:56:42 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sun, 23 Jun 2013 17:56:42 -0500 Subject: [Python-ideas] Function calling options [was: Short form for keyword arguments and dicts] In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> Message-ID: On 06/22/2013 08:26 PM, Nick Coghlan wrote: > This is potentially worth pursuing, as Python currently only supports > "operator.itemgetter" and comprehensions/generator expressions as a > mechanism for retrieving multiple items from a container in a single > expression. A helper function in collections, or a new classmethod on > collections.Mapping aren't outside the realm of possibility. > > For the question of "How do I enlist the compiler's help in ensuring a > string is an identifier?", you can actually already do that with a > simple helper class: > >>>> >>>class Identifiers: > ... def __getattr__(self, attr): > ... return attr > ... >>>> >>>ident = Identifiers() >>>> >>>ident.a > 'a' >>>> >>>ident.b > 'b' >>>> >>>ident.c > 'c' > > Combining the two lets you write things like: > >>>> >>>submap(locals(), ident.a, ident.b, ident.c) > {'a': 1, 'b': 2, 'c': 3} That's interesting. I noticed there isn't a way to easily split a dictionary with a list of keys. Other than using a loop. Any syntax that reduces name=name to a single name would look to me like a pass by reference notation. I don't think that would be good. (?) What I see in examples like above is the amount of additional work the CPU needs to do. I think that is more of an issue than name=name. > Personally, I'd like to see more exploration of what the language > *already supports* in this area, before we start talking about adding > dedicated syntax. It may be easier to experiment with stuff like this if we could make the following possible. (*Not taking into account some things, like closures, etc... to keep the idea clear.) Wnere. result = f(...) # normal function call is the same as: args, kwds = func_parse_signature(f, ...) name_space = func_make_name_space(f, args, kwds) result = func_call_with_args(f, args, kwds) result = func_call_code(f, name_space) These functions create a way to reuse a functions parts in new ways. But it doesn't go the full step of being able to take functions apart and reassemble them. That's harder to do and keep everything working. Some of this is currently doable, but it involved hacking a function object/type or using exec. For example, the signature could be parsed by a decorator, with added features. Then the decorator could skip the functions normal signature parsing step, and call the function with the args and kwds instead. And possibly a name space could be reused with a function, which would have the effect of it having static variables, or a class with attributes used for that purpose. (Yes, care would be needed.) These functions might also be usable as stacked decorators. (It'll need some experimenting to make this work.) @func_call_code @func_make_name_space @func_parse_signatire def foo(...): ... Would just break up calling a function into sub steps. Additional decorators could be stuck between those to check or alter the intermediate results. Something like that might be useful for some types of testing. It's probably easier to check the values in the created name space before it's used than it is to check the arguments before they are parsed by the signature. If these could be C functions and corresponded to their own byte codes, they might enable some interesting internal optimisations. Just a few initial thoughts to follow up on, Ron From ncoghlan at gmail.com Mon Jun 24 01:33:21 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 24 Jun 2013 09:33:21 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> <51C6B440.4000508@canterbury.ac.nz> Message-ID: On 23 Jun 2013 23:44, "Joshua Landau" wrote: > > On 23 June 2013 09:39, Greg Ewing wrote: > > Joshua Landau wrote: > >> > >> What about: > >> > >> class Foo: > >> bar = bar > > > > That would be going too far, I think. I can't remember *ever* > > needing to write code like that in a class. > > I have. Like, once, though. > > > Also, it's a somewhat dubious thing to write anyway, since it > > relies on name lookups in a class scope working dynamically. > > While they currently do in CPython, I wouldn't like to rely on > > that always remaining the case. > > Is this not a defined behaviour? I wouldn't expect this to change > before 4.0, and that's a different ballgame. > Does it break in some other implementations? It's defined behaviour. It's only function scopes which force assignment targets to be purely local. Cheers, Nick. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Jun 24 01:54:04 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 24 Jun 2013 11:54:04 +1200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> <51C6B440.4000508@canterbury.ac.nz> Message-ID: <51C78A9C.1050506@canterbury.ac.nz> Joshua Landau wrote: >>> class Foo: >>> bar = bar > > On 23 June 2013 09:39, Greg Ewing wrote: > >>Also, it's a somewhat dubious thing to write anyway, since it >>relies on name lookups in a class scope working dynamically. >>While they currently do in CPython, I wouldn't like to rely on >>that always remaining the case. > > Is this not a defined behaviour? According to the Language Reference, section 4.1, "Naming and binding": If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. No distinction is made there between function bodies and other kinds of block. Also in that section, A class definition is an executable statement that may use and define names. These references follow the normal rules for name resolution. So I would conclude that the above code is technically illegal. -- Greg From greg.ewing at canterbury.ac.nz Mon Jun 24 01:34:49 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 24 Jun 2013 11:34:49 +1200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C6E878.1020003@pearwood.info> References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> Message-ID: <51C78619.3010405@canterbury.ac.nz> Steven D'Aprano wrote: > The special case "parameter name matches exactly argument expression" is > far too special, and the benefit far too minor, to deserve special syntax. It occurs more often than you might think, because taking parameters that you've been passed and passing them on to another function is a common pattern. > Oh, one last thing... your suggestion is also brittle. If you refactor > the variable name, or change the function parameter name, code using > this shortcut will break. Parameter names are part of the function API > and shouldn't change, but variable names are not But it's not just any variable, it's a parameter to your function, so it's not likely to change its name either. -- Greg From abarnert at yahoo.com Mon Jun 24 09:21:29 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 24 Jun 2013 00:21:29 -0700 Subject: [Python-ideas] Get the class that *defines* a method In-Reply-To: References: Message-ID: <7179B567-3BA3-4B2C-A249-EBEF8DC4D36B@yahoo.com> If you have the result of lll() and you want to get its type... Just call type on it: a = lll() A = type(a) It doesn't matter where the class was defined; this always works. If you're trying to access the type based on the information in an instance's repr, you're doing it wrong. The fact that it doesn't work in this case is irrelevant. It also doesn't work for any class that defines a __str__ or __repr__ or inherits from another class that does so. (It may not work even for a class deliberately designed to use the default object.__repr__ when run under different interpreters, because the particular format of that repr is just an implementation detail of CPython.) Let's give a very simple example: >>> class Foo(list): pass >>> a = Foo() >>> a [] You can't get the type from the repr. (Actually, your example is doubly irrelevant. The error you got is just because __main__ doesn't have a reference to itself; it has absolutely nothing to do with classes. But if you'd written sys.modules[__main__] instead, that would have shown what I think you wanted to show.) Sent from a random iPhone On Jun 23, 2013, at 12:42, anatoly techtonik wrote: > Currently, the only way to get the name of the class that *defines* a method is to get chain of parent classes from inspect.getmro() and look into every class's dict until a given name is found. Knowing that dict contains not only methods, and knowing that it can be modified at run-time, this doesn't seem too reliable for me (unless Python itself does the same). > > > At first I wanted to propose an enhancement to runtime skeleton of Python, which is 2D lookup tree for objects and containers that define them. But then I realized that it will may not reflect the model I need. For example, classes need to provide their parent classes, but in my model classes need to provide module name (or function name, or method name) in which they are defined. > > And while writing this I realized that *definition* scope may be different from *run-time* scope, and Python doesn't make it clear: > > >>> def lll(): > ... class A(object): > ... pass > ... a=A() > ... return a > ... > >>> lll() > <__main__.A object at 0x948252c> > >>> __main__.A > Traceback (most recent call last): > File "", line 1, in > NameError: name '__main__' is not defined > > The A object is said to be in __main__ namespace, but it seems to be a run-time namespace local to function and it seems like Python loses this information. There is no information about the function that defined the class (owner, parent or .?.), and hence no info about container of the function, which makes it hard to assume the scope of variables for this class at run-time. > > > So, the above is a generalization of a simple idea - store the "structure reference" of the class that *defines* a method inside this method. "structure reference" here is the address in the nested scopes formed by Python definitions. > > The specific action items for you here are: > 1. is that stuff will be useful (for me it brings some much needed consistency into the chaos of run-time Python object space) > 2. what is the best way to define/cache the reference to the class defining the method? > 3. what is the best way to define/cache the reference to the scope defining the method? > 4. what is the best way to organize storing of this static scope structure information at run-time? > > See method.im_class note at > http://docs.python.org/2/library/inspect.html#types-and-members > > "Namespaces are one honking great idea -- let's do more of those!" (c) import this > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Jun 24 09:37:58 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 24 Jun 2013 00:37:58 -0700 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C78619.3010405@canterbury.ac.nz> References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> Message-ID: On Jun 23, 2013, at 16:34, Greg Ewing wrote: > Steven D'Aprano wrote: >> The special case "parameter name matches exactly argument expression" is far too special, and the benefit far too minor, to deserve special syntax. > > It occurs more often than you might think, because taking > parameters that you've been passed and passing them on to > another function is a common pattern Yes, I do that all the time. But I can't think of a single case where there's any benefit to using keyword arguments. When you're forwarding your parameters exactly, the keywords are pure noise. Reducing the noise a little bit isn't nearly as good as just not creating it in the first place. Let's look at a typical such case: a class that encapsulates and delegates to another class--say, str. You'll have a bunch of methods like this: def encode(self, encoding, errors): return self.wrapped_str.encode(encoding, errors) What would be gained by changing it to: def encode(self, encoding, errors): return self.wrapped_str.encode(encoding=encoding, errors=errors) Or: def encode(self, encoding, errors): return self.wrapped_str.encode(=encoding, =errors) The reason for using keyword arguments is that often the meaning of positional arguments is unclear without looking up the function. That clearly isn't the case here. The meaning of the encoding and errors arguments is exactly as obvious without the keywords as with them. So, while I'll agree that the third version may be better than the second, it's still worse than the first, and Python already allows the first. This is an attempt to solve a problem that doesn't exist, and it doesn't even succeed in the attempt. From spaghettitoastbook at gmail.com Mon Jun 24 06:23:24 2013 From: spaghettitoastbook at gmail.com (SpaghettiToastBook .) Date: Mon, 24 Jun 2013 00:23:24 -0400 Subject: [Python-ideas] Fwd: Iterable unpacking within containers In-Reply-To: References: Message-ID: I think it would be very convenient if the * syntax was extended to allow unpacking iterables inside containers. For example: >>> x = range(1, 4) >>> y = [-2, *(-1, 0), *x, 4, 5] >>> print(y) [-2, -1, 0, 1, 2, 3, 4, 5] This would be especially convenient if two lists needed to be combined, but one list's contents needed to be the the middle of the other one. I don't think any code would break because the suggested syntax currently raises a SyntaxError. From jonathan at slenders.be Mon Jun 24 13:51:06 2013 From: jonathan at slenders.be (Jonathan Slenders) Date: Mon, 24 Jun 2013 13:51:06 +0200 Subject: [Python-ideas] Fwd: Iterable unpacking within containers In-Reply-To: References: Message-ID: What about: >>> y = [-2] + list((-1, 0)) + list(x) + [4, 5] or itertools.chain? 2013/6/24 SpaghettiToastBook . > I think it would be very convenient if the * syntax was extended to > allow unpacking iterables inside containers. For example: > > >>> x = range(1, 4) > >>> y = [-2, *(-1, 0), *x, 4, 5] > >>> print(y) > [-2, -1, 0, 1, 2, 3, 4, 5] > > This would be especially convenient if two lists needed to be > combined, but one list's contents needed to be the the middle of the > other one. I don't think any code would break because the suggested > syntax currently raises a SyntaxError. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michelelacchia at gmail.com Mon Jun 24 13:57:26 2013 From: michelelacchia at gmail.com (Michele Lacchia) Date: Mon, 24 Jun 2013 13:57:26 +0200 Subject: [Python-ideas] Fwd: Iterable unpacking within containers In-Reply-To: References: Message-ID: Well, honestly the * syntax is a lot more readable and immediate to understand, given the fact the it is already used in function arguments. On Mon, Jun 24, 2013 at 1:51 PM, Jonathan Slenders wrote: > What about: > > >>> y = [-2] + list((-1, 0)) + list(x) + [4, 5] > > or itertools.chain? > > > > > 2013/6/24 SpaghettiToastBook . > > I think it would be very convenient if the * syntax was extended to >> allow unpacking iterables inside containers. For example: >> >> >>> x = range(1, 4) >> >>> y = [-2, *(-1, 0), *x, 4, 5] >> >>> print(y) >> [-2, -1, 0, 1, 2, 3, 4, 5] >> >> This would be especially convenient if two lists needed to be >> combined, but one list's contents needed to be the the middle of the >> other one. I don't think any code would break because the suggested >> syntax currently raises a SyntaxError. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- Michele Lacchia -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Mon Jun 24 13:59:24 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Mon, 24 Jun 2013 12:59:24 +0100 Subject: [Python-ideas] Fwd: Iterable unpacking within containers In-Reply-To: References: Message-ID: On 24 June 2013 05:23, SpaghettiToastBook . wrote: > I think it would be very convenient if the * syntax was extended to > allow unpacking iterables inside containers. For example: > >>>> x = range(1, 4) >>>> y = [-2, *(-1, 0), *x, 4, 5] >>>> print(y) > [-2, -1, 0, 1, 2, 3, 4, 5] > > This would be especially convenient if two lists needed to be > combined, but one list's contents needed to be the the middle of the > other one. I don't think any code would break because the suggested > syntax currently raises a SyntaxError. You might well want to read all of http://bugs.python.org/issue2292. I'm a huge fan, but it's not here yet ;). From ncoghlan at gmail.com Mon Jun 24 14:16:44 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 24 Jun 2013 22:16:44 +1000 Subject: [Python-ideas] Fwd: Iterable unpacking within containers In-Reply-To: References: Message-ID: On 24 June 2013 21:59, Joshua Landau wrote: > On 24 June 2013 05:23, SpaghettiToastBook . > wrote: >> I think it would be very convenient if the * syntax was extended to >> allow unpacking iterables inside containers. For example: >> >>>>> x = range(1, 4) >>>>> y = [-2, *(-1, 0), *x, 4, 5] >>>>> print(y) >> [-2, -1, 0, 1, 2, 3, 4, 5] >> >> This would be especially convenient if two lists needed to be >> combined, but one list's contents needed to be the the middle of the >> other one. I don't think any code would break because the suggested >> syntax currently raises a SyntaxError. > > You might well want to read all of http://bugs.python.org/issue2292. > > I'm a huge fan, but it's not here yet ;). Yeah, what's needed is for a sufficiently motivated individual to take the existing patch, update it to target 3.4, and write up a PEP that details the exact changes proposed (not necessarily in that order). The core developer reaction has generally been mildly positive, it's just something that's on our "nice to have" lists rather than our respective "I want this enough to work on it myself" lists :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From barry at python.org Mon Jun 24 17:40:43 2013 From: barry at python.org (Barry Warsaw) Date: Mon, 24 Jun 2013 11:40:43 -0400 Subject: [Python-ideas] Short form for keyword arguments and dicts References: Message-ID: <20130624114043.6955b6a7@anarchist> On Jun 22, 2013, at 12:27 PM, Anders Hovm?ller wrote: >Keyword arguments are great for increasing readability and making code more >robust but in my opinion they are underused compared to the gains they can >provide. You often end up with code like: > >foo(bar=bar, baz=baz, foobaz=foobaz) The DRY motivation for this proposal reminds me of PEP 292 ($-strings) and flufl.i18n. In some of the earlier i18n work I did, the repetition was overwhelmingly inconvenient. Generally, it doesn't bother me, but I really hated doing things like: real_name = get_real_name() email_address = get_email_address() message = _('Hi $real_name <$email_address>').safe_substitute( real_name=real_name, email_address=email_address) not to mention the high probability of typos, and the added noise making the source harder to read. flufl.i18n then, shortens this to: real_name = get_real_name() email_address = get_email_address() message = _('Hi $real_name <$email_address>') The locals and globals are collected into the substitution dictionary, to be applied after the source string is translated. Yes, the implementation uses the dreaded sys._getframe(), but it's worth it. http://tinyurl.com/kqy3hbz http://tinyurl.com/lalxjaf Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From python at mrabarnett.plus.com Mon Jun 24 17:55:36 2013 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 24 Jun 2013 16:55:36 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <20130624114043.6955b6a7@anarchist> References: <20130624114043.6955b6a7@anarchist> Message-ID: <51C86BF8.3050902@mrabarnett.plus.com> On 24/06/2013 16:40, Barry Warsaw wrote: > On Jun 22, 2013, at 12:27 PM, Anders Hovm?ller wrote: > >>Keyword arguments are great for increasing readability and making code more >>robust but in my opinion they are underused compared to the gains they can >>provide. You often end up with code like: >> >>foo(bar=bar, baz=baz, foobaz=foobaz) > > The DRY motivation for this proposal reminds me of PEP 292 ($-strings) and > flufl.i18n. In some of the earlier i18n work I did, the repetition was > overwhelmingly inconvenient. Generally, it doesn't bother me, but I really > hated doing things like: > > real_name = get_real_name() > email_address = get_email_address() > message = _('Hi $real_name <$email_address>').safe_substitute( > real_name=real_name, email_address=email_address) > > not to mention the high probability of typos, and the added noise making the > source harder to read. flufl.i18n then, shortens this to: > > real_name = get_real_name() > email_address = get_email_address() > message = _('Hi $real_name <$email_address>') > > The locals and globals are collected into the substitution dictionary, to be > applied after the source string is translated. > > Yes, the implementation uses the dreaded sys._getframe(), but it's worth it. > > http://tinyurl.com/kqy3hbz > http://tinyurl.com/lalxjaf > Do you have to use sys._getframe()? Although it's a little longer: def _(message, variables): return re.sub(r"\$(\w+)", lambda m: variables[m.group(1)], message) You can then do this: >>> real_name = "REAL NAME" >>> email_address = "EMAIL ADDRESS" >>> _(m, globals()) 'Hi REAL NAME ' >>> _(m, locals()) 'Hi REAL NAME ' >>> def test(): real_name = "LOCAL REAL NAME" email_address = "LOCAL EMAIL ADDRESS" print(_(m, locals())) From amcnabb at mcnabbs.org Mon Jun 24 18:23:56 2013 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Mon, 24 Jun 2013 11:23:56 -0500 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C78619.3010405@canterbury.ac.nz> References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> Message-ID: <20130624162356.GJ22763@mcnabbs.org> On Mon, Jun 24, 2013 at 11:34:49AM +1200, Greg Ewing wrote: > Steven D'Aprano wrote: > >The special case "parameter name matches exactly argument > >expression" is far too special, and the benefit far too minor, to > >deserve special syntax. > > It occurs more often than you might think, because taking > parameters that you've been passed and passing them on to > another function is a common pattern. Another use case where this would come in handy is with string formatting: >>> print('{spam} and {eggs}'.format(spam=spam, eggs=eggs)) I've seen people use an awful workaround for this: >>> print('{spam} and {eggs}'.format(locals())) While it looks a little magical, the proposed syntax would be an improvement (especially when there are many arguments): >>> print('{spam} and {eggs}'.format(=spam, =eggs)) I'm not sure if the proposed solution is necessarily the best, but it's not as contrived as some commenters have made it out to be. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From stephen at xemacs.org Mon Jun 24 19:50:27 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 25 Jun 2013 02:50:27 +0900 Subject: [Python-ideas] [Suspected Spam] Re: Short form for keyword arguments and dicts In-Reply-To: <20130624162356.GJ22763@mcnabbs.org> References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> Message-ID: <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew McNabb writes: > I've seen people use an awful workaround for this: > > >>> print('{spam} and {eggs}'.format(locals())) > > While it looks a little magical, the proposed syntax would be an > improvement (especially when there are many arguments): > > >>> print('{spam} and {eggs}'.format(=spam, =eggs)) You're proposing that the "awful" workaround be made magical, builtin, and available to be used in any situation whether appropriate or not? I'll take the explicit use of locals any time. From amcnabb at mcnabbs.org Mon Jun 24 19:58:17 2013 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Mon, 24 Jun 2013 12:58:17 -0500 Subject: [Python-ideas] [Suspected Spam] Re: Short form for keyword arguments and dicts In-Reply-To: <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20130624175817.GL22763@mcnabbs.org> On Tue, Jun 25, 2013 at 02:50:27AM +0900, Stephen J. Turnbull wrote: > Andrew McNabb writes: > > > I've seen people use an awful workaround for this: > > > > >>> print('{spam} and {eggs}'.format(locals())) > > > > While it looks a little magical, the proposed syntax would be an > > improvement (especially when there are many arguments): > > > > >>> print('{spam} and {eggs}'.format(=spam, =eggs)) > > You're proposing that the "awful" workaround be made magical, builtin, > and available to be used in any situation whether appropriate or not? No, I'm not. That would look like this: >>> print('{spam} and {eggs}'.format()) And it would be an extraordinarily bad idea. However, the OP proposed something else, so I'm not sure how relevant this is. I'm not even sure I like it, but many of the responses have denied the existence of the use case rather than criticizing the solution. > I'll take the explicit use of locals any time. I don't think anyone likes the idea of magically passing locals into all function calls. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From tjreedy at udel.edu Mon Jun 24 20:03:58 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 24 Jun 2013 14:03:58 -0400 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <20130624162356.GJ22763@mcnabbs.org> References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> Message-ID: On 6/24/2013 12:23 PM, Andrew McNabb wrote: > Another use case where this would come in handy is with string > formatting: > >>>> print('{spam} and {eggs}'.format(spam=spam, eggs=eggs)) > > I've seen people use an awful workaround for this: > >>>> print('{spam} and {eggs}'.format(locals())) That should be print('{spam} and {eggs}'.format(**locals())) Why do you see an intended usage as an 'awful workaround'? If it is the inefficiency of unpacking and repacking a dict, that could be fixed. One possibility is for ** to just pass the mapping when the function only reads it, as is the case with .format (but how to know?). A direct solution for .format is to add a keyword-only mapping parameter: print('{spam} and {eggs}'.format(map=locals())) or mapping= names= or reps= (replacements) or strs= (strings) or dic= or ???=. > While it looks a little magical, the proposed syntax would be an > improvement (especially when there are many arguments): > >>>> print('{spam} and {eggs}'.format(=spam, =eggs)) It looks pretty hideous to me ;-). And it still has repetition ;-). -- Terry Jan Reedy From joshua.landau.ws at gmail.com Mon Jun 24 20:13:15 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Mon, 24 Jun 2013 19:13:15 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> Message-ID: On 24 June 2013 19:03, Terry Reedy wrote: > On 6/24/2013 12:23 PM, Andrew McNabb wrote: > >> Another use case where this would come in handy is with string >> formatting: >> >>>>> print('{spam} and {eggs}'.format(spam=spam, eggs=eggs)) >> >> >> I've seen people use an awful workaround for this: >> >>>>> print('{spam} and {eggs}'.format(locals())) > > > That should be > print('{spam} and {eggs}'.format(**locals())) > > Why do you see an intended usage as an 'awful workaround'? > > If it is the inefficiency of unpacking and repacking a dict, that could be > fixed. One possibility is for ** to just pass the mapping when the function > only reads it, as is the case with .format (but how to know?). A direct > solution for .format is to add a keyword-only mapping > parameter: > > print('{spam} and {eggs}'.format(map=locals())) > > or mapping= names= or reps= (replacements) or strs= (strings) or dic= or > ???=. Oh look! It's Guido's time machine! "Look, it's {what_you_wanted}!".format_map(locals()) Note that your suggestion would disallow: "{mapping}".format(mapping="HA") >> While it looks a little magical, the proposed syntax would be an >> improvement (especially when there are many arguments): >> >>>>> print('{spam} and {eggs}'.format(=spam, =eggs)) > > > It looks pretty hideous to me ;-). > And it still has repetition ;-). From eric at trueblade.com Mon Jun 24 20:16:26 2013 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 24 Jun 2013 14:16:26 -0400 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> Message-ID: <51C88CFA.9000006@trueblade.com> On 6/24/2013 2:03 PM, Terry Reedy wrote: > On 6/24/2013 12:23 PM, Andrew McNabb wrote: > >> Another use case where this would come in handy is with string >> formatting: >> >>>>> print('{spam} and {eggs}'.format(spam=spam, eggs=eggs)) >> >> I've seen people use an awful workaround for this: >> >>>>> print('{spam} and {eggs}'.format(locals())) > > That should be > print('{spam} and {eggs}'.format(**locals())) or: print('{spam} and {eggs}'.format_map(locals())) which solves the inefficiency problem mentioned below. > Why do you see an intended usage as an 'awful workaround'? > > If it is the inefficiency of unpacking and repacking a dict, that could > be fixed. One possibility is for ** to just pass the mapping when the > function only reads it, as is the case with .format (but how to know?). > A direct solution for .format is to add a keyword-only mapping > parameter: > > print('{spam} and {eggs}'.format(map=locals())) Today that would be: print('{map.spam} and {map.eggs}'.format(map=locals())) -- Eric. From amcnabb at mcnabbs.org Mon Jun 24 20:23:14 2013 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Mon, 24 Jun 2013 13:23:14 -0500 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> Message-ID: <20130624182314.GM22763@mcnabbs.org> On Mon, Jun 24, 2013 at 02:03:58PM -0400, Terry Reedy wrote: > >>>>print('{spam} and {eggs}'.format(locals())) > > That should be > print('{spam} and {eggs}'.format(**locals())) Yes, you're right. > Why do you see an intended usage as an 'awful workaround'? Mainly because I think it's magical and ugly and reduces readability. It's subjective, but I always use the also-ugly but less magical: > print('{spam} and {eggs}'.format(spam=spam, eggs=eggs)) > >>>>print('{spam} and {eggs}'.format(=spam, =eggs)) > > It looks pretty hideous to me ;-). I agree. :) > And it still has repetition ;-). True, though it can be helpful to look at the function call and actually see what arguments are being passed in. Anyway, I do have plenty of code that suffers from function calls with f(x=x,y=y,z=z). Most of it is of the form that Greg Ewing pointed out, where one function passes arguments along to another function. I'm not sure if this is the right solution, and I agree that it's ugly, but there are use cases where it might help. I'm still probably -0 on it. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From barry at python.org Mon Jun 24 20:45:40 2013 From: barry at python.org (Barry Warsaw) Date: Mon, 24 Jun 2013 14:45:40 -0400 Subject: [Python-ideas] Short form for keyword arguments and dicts References: <20130624114043.6955b6a7@anarchist> <51C86BF8.3050902@mrabarnett.plus.com> Message-ID: <20130624144540.371a9a48@anarchist> On Jun 24, 2013, at 04:55 PM, MRAB wrote: >Do you have to use sys._getframe()? Of course, this being Python I don't *have* to use anything. ;) But after a ton of experimentation on a huge body of code, I found that keeping the _() call sites really simple (i.e. essentially just the source strings) made for the most readable API. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From stephen at xemacs.org Mon Jun 24 20:55:31 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 25 Jun 2013 03:55:31 +0900 Subject: [Python-ideas] [Suspected Spam] Re: [Suspected Spam] Re: Short form for keyword arguments and dicts In-Reply-To: <20130624175817.GL22763@mcnabbs.org> References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> Message-ID: <87ehbr1gpo.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew McNabb writes: > > You're proposing that the "awful" workaround be made magical, builtin, > > and available to be used in any situation whether appropriate or not? > > No, I'm not. That would look like this: > > >>> print('{spam} and {eggs}'.format()) Ah, OK, that's right. Just goes to show that foo(=spam, =eggs) is really too confusing to be used. ;-) > > I'll take the explicit use of locals any time. > > I don't think anyone likes the idea of magically passing locals into all > function calls. My apologies, I didn't really think anybody wants "'{foo}'.format()" to DWIM. The intended comparison was to the proposed syntax, which I think is confusing and rather ugly. From boxed at killingar.net Mon Jun 24 21:13:20 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Mon, 24 Jun 2013 21:13:20 +0200 Subject: [Python-ideas] [Suspected Spam] Re: [Suspected Spam] Re: Short form for keyword arguments and dicts In-Reply-To: <87ehbr1gpo.fsf@uwakimon.sk.tsukuba.ac.jp> References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <87ehbr1gpo.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Jun 24, 2013 at 8:55 PM, Stephen J. Turnbull wrote: > Andrew McNabb writes: > > > > You're proposing that the "awful" workaround be made magical, builtin, > > > and available to be used in any situation whether appropriate or not? > > > > No, I'm not. That would look like this: > > > > >>> print('{spam} and {eggs}'.format()) > > Ah, OK, that's right. Just goes to show that foo(=spam, =eggs) is > really too confusing to be used. ;-) > I think you've just been reading all the mails in this thread of people claiming I eat children and worship satan :P > > > > I'll take the explicit use of locals any time. > > > > I don't think anyone likes the idea of magically passing locals into all > > function calls. > > My apologies, I didn't really think anybody wants "'{foo}'.format()" > to DWIM. The intended comparison was to the proposed syntax, which I > think is confusing and rather ugly. Yet obviously people DO do stuff like: _('{foo}') which walks the stack to find the locals and then puts them in there. I think this shows there is some room for a middle ground that might disincentivize people from going to those extremes :P Again, it's not about the exact syntax I suggested, it's about that middle ground. -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Mon Jun 24 20:43:04 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Mon, 24 Jun 2013 20:43:04 +0200 Subject: [Python-ideas] [Suspected Spam] Re: Short form for keyword arguments and dicts In-Reply-To: <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I'll take the explicit use of locals any time. Meaning you prefer "foo(bar=bar)" over "foo(**locals())" right? Because that seems to be the suggested solution here and I think that's pretty bad :P My suggestion isn't about introducing more magic, just a little bit of convenience for two common use cases: passing along variables with the same name to another function and throwing a bunch of variables into a dict (which is basically the same thing since "dict(foo=foo)" == "{'foo': foo}"). -------------- next part -------------- An HTML attachment was scrubbed... URL: From samuel.littley at toastwaffle.com Mon Jun 24 21:13:12 2013 From: samuel.littley at toastwaffle.com (Samuel Littley) Date: Mon, 24 Jun 2013 20:13:12 +0100 Subject: [Python-ideas] Catching of multiple exceptions without halting execution Message-ID: <51C89A48.9020008@toastwaffle.com> I find I often have to run several tests on data input, making a list of which tests succeed, which fail, and somehow determine where to go after running all the tests. Obviously, the standard way of showing that a test as failed is to raise an exception, caught by try/except/finally, however this would only allow one test to be flagged as failing, requiring multiple runs to correct every fault that may exist. I propose an alternative to try/except, as follows: validate: // Code to run tests, raising exceptions if tests fail accept: // Code to run if all tests pass (i.e. no exceptions) reject es: // Code to handle each failed test except: // Code to handle non-test related exceptions finally: // Code to be always executed The difference to a normal try/except is a different type of exception, which, rather than halting execution, is added to the list `es`, which the reject block could then loop through to display error messages, require re-entry, etc. Standard exceptions could be raised and caught by the except block. Alternatively, the except block could not be a part of this, and instead all exceptions are caught in the reject block, which could then raise exceptions itself to be caught by a try/except/finally around the validate/accept/reject/finally The use case I thought of is validating data entry (from web forms for example), where each exception creates an error message displayed on the form, however I'm pretty sure there would be other uses for this. Samuel Littley From masklinn at masklinn.net Mon Jun 24 21:48:20 2013 From: masklinn at masklinn.net (Masklinn) Date: Mon, 24 Jun 2013 21:48:20 +0200 Subject: [Python-ideas] Catching of multiple exceptions without halting execution In-Reply-To: <51C89A48.9020008@toastwaffle.com> References: <51C89A48.9020008@toastwaffle.com> Message-ID: <33D88BBF-80BE-444A-9B62-E85CD9E7AF92@masklinn.net> On 2013-06-24, at 21:13 , Samuel Littley wrote: > I find I often have to run several tests on data input, making a list of > which tests succeed, which fail, and somehow determine where to go after > running all the tests. Obviously, the standard way of showing that a > test as failed is to raise an exception, caught by try/except/finally, > however this would only allow one test to be flagged as failing, > requiring multiple runs to correct every fault that may exist. > > I propose an alternative to try/except, as follows: > > validate: > // Code to run tests, raising exceptions if tests fail > accept: > // Code to run if all tests pass (i.e. no exceptions) > reject es: > // Code to handle each failed test > except: > // Code to handle non-test related exceptions > finally: > // Code to be always executed > > The difference to a normal try/except is a different type of exception, > which, rather than halting execution, is added to the list `es`, which > the reject block could then loop through to display error messages, > require re-entry, etc. Standard exceptions could be raised and caught by > the except block. Alternatively, the except block could not be a part of > this, and instead all exceptions are caught in the reject block, which > could then raise exceptions itself to be caught by a try/except/finally > around the validate/accept/reject/finally > > The use case I thought of is validating data entry (from web forms for > example), where each exception creates an error message displayed on the > form, however I'm pretty sure there would be other uses for this. Why not something along the lines of: errors = # validation code() if not errors: # no problem else: # problems ? An other option if the validation system has significant stack depth is to pass in an error handler callback as parameter (note: no matter how good it looks at first glance, I strongly recommend not using warnings.catch_warnings, it's fairly brittle and not thread-safe) Exceptions are a tool, not the only one. Although if you still *want* an exception-type system, I'd suggest considering the extension of exceptions into Signals[0]/Conditions[1] (as in Smalltalk or Common Lisp) instead: try: validate_stuff() except (ASignal, AnOtherSignal): # handle test failure resume # tells validation to keep running except: # usual fatal errors finally: # keep running (note: success is domain-dependent and should be handed through interactions of the various blocks: some tests failing may not be an issue, or the system may still be considered successful under a certain failure threshold, ?) I believe this is a more general concept, a rather natural (if there's such a thing) super-set of existing exceptions and an already known and studied solution to the issue. It can also be used to solve a number of other problems. [0] http://www.gnu.org/software/smalltalk/manual/html_node/Handling-exceptions.html#Handling-exceptions http://live.exept.de/doc/online/english/programming/exceptions.html [1] http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html From python at mrabarnett.plus.com Mon Jun 24 22:00:01 2013 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 24 Jun 2013 21:00:01 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> Message-ID: <51C8A541.5030602@mrabarnett.plus.com> On 24/06/2013 19:13, Joshua Landau wrote: [snip] > Oh look! It's Guido's time machine! > > "Look, it's {what_you_wanted}!".format_map(locals()) > Thanks for pointing it out, Joshua. I'd never noticed it! :-) From boxed at killingar.net Mon Jun 24 22:13:24 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Mon, 24 Jun 2013 22:13:24 +0200 Subject: [Python-ideas] Catching of multiple exceptions without halting execution In-Reply-To: <51C89A48.9020008@toastwaffle.com> References: <51C89A48.9020008@toastwaffle.com> Message-ID: Why not just put those exceptions in a list instead of raising them? So instead of raise Exception(...) errors.append(Exception(....)) On Mon, Jun 24, 2013 at 9:13 PM, Samuel Littley < samuel.littley at toastwaffle.com> wrote: > I find I often have to run several tests on data input, making a list of > which tests succeed, which fail, and somehow determine where to go after > running all the tests. Obviously, the standard way of showing that a > test as failed is to raise an exception, caught by try/except/finally, > however this would only allow one test to be flagged as failing, > requiring multiple runs to correct every fault that may exist. > > I propose an alternative to try/except, as follows: > > validate: > // Code to run tests, raising exceptions if tests fail > accept: > // Code to run if all tests pass (i.e. no exceptions) > reject es: > // Code to handle each failed test > except: > // Code to handle non-test related exceptions > finally: > // Code to be always executed > > The difference to a normal try/except is a different type of exception, > which, rather than halting execution, is added to the list `es`, which > the reject block could then loop through to display error messages, > require re-entry, etc. Standard exceptions could be raised and caught by > the except block. Alternatively, the except block could not be a part of > this, and instead all exceptions are caught in the reject block, which > could then raise exceptions itself to be caught by a try/except/finally > around the validate/accept/reject/finally > > The use case I thought of is validating data entry (from web forms for > example), where each exception creates an error message displayed on the > form, however I'm pretty sure there would be other uses for this. > > Samuel Littley > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Mon Jun 24 22:49:38 2013 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 24 Jun 2013 15:49:38 -0500 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> Message-ID: On 06/24/2013 01:03 PM, Terry Reedy wrote: >> While it looks a little magical, the proposed syntax would be an >> improvement (especially when there are many arguments): >> >>>>> print('{spam} and {eggs}'.format(=spam, =eggs)) > > It looks pretty hideous to me ;-). > And it still has repetition ;-). In larger programs, data like this would probably be in a dictionary or some other structure. There may be a large dictionary of many foods. And we would just use .format_map and not have any repetition where those are used. print('{spam} and {eggs}'.format_map(foods)) I think it would also be useful to get a sub_view of a dictionary. breakfast = ['spam', 'eggs'] print('{spam} and {eggs}'.format_map(foods.sub_view(breakfast))) The food dictionary might also have lunch items and dinner items in it, and we don't want to serve dinner for breakfast by mistake. ;-) Cheers, Ron From steve at pearwood.info Tue Jun 25 00:41:27 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 25 Jun 2013 08:41:27 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <20130624175817.GL22763@mcnabbs.org> References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> Message-ID: <20130624224127.GA10419@ando> On 25/06/13 03:58, Andrew McNabb wrote: > I'm not even sure I like it, but many of the responses have denied the > existence of the use case rather than criticizing the solution. I haven't seen anyone deny that it is possible to write code like spam(ham=ham, eggs=eggs, toast=toast) What I've seen is people deny that it happens *often enough* to deserve dedicated syntax to "fix" it. (I use scare quotes here because I don't actually think that repeating the name that way is a problem that needs fixing.) People have criticized the solution, for its lack of explicitness, for being focused on such a narrow special case, for its (subjective) ugliness, and for its fragility. Refactoring a variable name shouldn't require you to change the way you pass it to a function, but with this proposal, it does. If you refactor the name "ham" to "spam" in func(=ham, eggs=SCRAMBLED, toast=None, coffee='white') you also need to change the implicit keyword syntax back to ordinary explicit keyword syntax, or it will break. (Worse than breaking, if func happens to have a parameter "spam" as well, it will silently do the wrong thing.) That's a "feature smell", like a code smell but for features. I cannot think of any other feature, in any other language, where changing a variable's name requires you to change the syntax you can use on it. The OP seems to believe that the existence of an occasional function call with a spam=spam parameter is a major factor in discouraging the use of keyword arguments everywhere. I do not believe this is the case, but even if it were, I say, oh well. We shouldn't want keyword arguments *everywhere*, but only where they add clarity rather than mere verbosity. -- Steven From abarnert at yahoo.com Tue Jun 25 00:42:40 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 24 Jun 2013 15:42:40 -0700 Subject: [Python-ideas] Catching of multiple exceptions without halting execution In-Reply-To: References: <51C89A48.9020008@toastwaffle.com> Message-ID: <4941A751-2B5C-4414-B790-E1AC090A2890@yahoo.com> On Jun 24, 2013, at 13:13, Anders Hovm?ller wrote: > Why not just put those exceptions in a list instead of raising them? So instead of > > raise Exception(...) > > errors.append(Exception(....)) And if you're trying to call code that might raise, just wrap each call/iteration/whatever. Something like this: errors = [] for test in tests: try: do_test(test) except Exception as e: errors.append(e) You can also easily wrap the try-and-append up as a function (either a closure, or a method on an object that holds a self.errors), or even a decorator. There doesn't seem to be any need for new syntax here. > On Mon, Jun 24, 2013 at 9:13 PM, Samuel Littley wrote: >> I find I often have to run several tests on data input, making a list of >> which tests succeed, which fail, and somehow determine where to go after >> running all the tests. Obviously, the standard way of showing that a >> test as failed is to raise an exception, caught by try/except/finally, >> however this would only allow one test to be flagged as failing, >> requiring multiple runs to correct every fault that may exist. >> >> I propose an alternative to try/except, as follows: >> >> validate: >> // Code to run tests, raising exceptions if tests fail >> accept: >> // Code to run if all tests pass (i.e. no exceptions) >> reject es: >> // Code to handle each failed test >> except: >> // Code to handle non-test related exceptions >> finally: >> // Code to be always executed >> >> The difference to a normal try/except is a different type of exception, >> which, rather than halting execution, is added to the list `es`, which >> the reject block could then loop through to display error messages, >> require re-entry, etc. Standard exceptions could be raised and caught by >> the except block. Alternatively, the except block could not be a part of >> this, and instead all exceptions are caught in the reject block, which >> could then raise exceptions itself to be caught by a try/except/finally >> around the validate/accept/reject/finally >> >> The use case I thought of is validating data entry (from web forms for >> example), where each exception creates an error message displayed on the >> form, however I'm pretty sure there would be other uses for this. >> >> Samuel Littley >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jun 25 00:46:32 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 25 Jun 2013 08:46:32 +1000 Subject: [Python-ideas] Catching of multiple exceptions without halting execution In-Reply-To: <51C89A48.9020008@toastwaffle.com> References: <51C89A48.9020008@toastwaffle.com> Message-ID: Python 3.4 adds subtest support to the unittest module, allowing data driven test cases to easily capture exceptions as failures before moving on to check other inputs. The feature can also be used to capture the results of multiple assertions. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Jun 25 01:41:37 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 25 Jun 2013 09:41:37 +1000 Subject: [Python-ideas] Catching of multiple exceptions without halting execution In-Reply-To: <51C89A48.9020008@toastwaffle.com> References: <51C89A48.9020008@toastwaffle.com> Message-ID: <20130624234137.GC10635@ando> On Mon, Jun 24, 2013 at 08:13:12PM +0100, Samuel Littley wrote: > I find I often have to run several tests on data input, making a list of > which tests succeed, which fail, and somehow determine where to go after > running all the tests. Sounds like you need a test framework, rather than new syntax. Have you looked at unittest, or outside of the standard library, nose? > Obviously, the standard way of showing that a > test as failed is to raise an exception, caught by try/except/finally, > however this would only allow one test to be flagged as failing, > requiring multiple runs to correct every fault that may exist. Both doctest and unittest in the standrad library collect multiple exceptions in a single run. You could look at how they do it. > I propose an alternative to try/except, as follows: The bar to getting new syntax is very high. Even when a proposed feature is a good idea, it may not be introduced with new syntax. E.g. Enums, warnings. For example, there is *masses* of code out there that uses "validate" as a function name or other variable. Turning it into a keyword, as you suggest, would break people's code. There needs to be a really, really good reason to break people's code. > validate: > // Code to run tests, raising exceptions if tests fail > accept: > // Code to run if all tests pass (i.e. no exceptions) > reject es: > // Code to handle each failed test > except: > // Code to handle non-test related exceptions > finally: > // Code to be always executed > > The difference to a normal try/except is a different type of exception, > which, rather than halting execution, is added to the list `es`, which > the reject block could then loop through to display error messages, > require re-entry, etc. Standard exceptions could be raised and caught by > the except block. Alternatively, the except block could not be a part of > this, and instead all exceptions are caught in the reject block, which > could then raise exceptions itself to be caught by a try/except/finally > around the validate/accept/reject/finally This can be trivially performed with existing syntax: exceptions = [] try: # instead of "validate" test_code() except (ValueError, TypeError) as err: # "expected exceptions", do nothing pass except KeyboardInterrupt: # Exceptions to allow through untouched raise except (ZeroDivisionError, UnicodeEncodeError) as err: # instead of "reject" # Expected failures you don't wish to ignore exceptions.append(err) except: # Catch all for everything else do_other_error() else: # instead of "accept" do_no_errors() finally: cleanup() So that's three new keywords we don't need, and ten thousand programs that won't be broken by this change :-) Put the whole thing inside a loop, and you have the beginnings of a test framework. -- Steven From greg.ewing at canterbury.ac.nz Tue Jun 25 01:49:56 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 25 Jun 2013 11:49:56 +1200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> Message-ID: <51C8DB24.8050504@canterbury.ac.nz> Andrew Barnert wrote: > On Jun 23, 2013, at 16:34, Greg Ewing wrote: > >>It occurs more often than you might think, because taking >>parameters that you've been passed and passing them on to >>another function is a common pattern > > Yes, I do that all the time. But I can't think of a single case where > there's any benefit to using keyword arguments. That's puzzling, because the benefits are the same as with any other call that benefits from keyword arguments. Do you doubt the usefulness of keyword arguments in general? -- Greg From ncoghlan at gmail.com Tue Jun 25 02:39:15 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 25 Jun 2013 10:39:15 +1000 Subject: [Python-ideas] Catching of multiple exceptions without halting execution In-Reply-To: References: <51C89A48.9020008@toastwaffle.com> Message-ID: On 25 June 2013 08:46, Nick Coghlan wrote: > Python 3.4 adds subtest support to the unittest module, allowing data driven > test cases to easily capture exceptions as failures before moving on to > check other inputs. The feature can also be used to capture the results of > multiple assertions. Specifically, the subtest support is based on a new test case method that returns a context manager and looks like this: >>> import unittest >>> class SubTestExample(unittest.TestCase): ... def test_values_even(self): ... values = range(6) ... for i in values: ... with self.subTest(i=i): ... self.assertEqual(i % 2, 0) ... >>> unittest.main() ====================================================================== FAIL: test_values_even (__main__.SubTestExample) (i=1) ---------------------------------------------------------------------- Traceback (most recent call last): File "", line 6, in test_values_even AssertionError: 1 != 0 ====================================================================== FAIL: test_values_even (__main__.SubTestExample) (i=3) ---------------------------------------------------------------------- Traceback (most recent call last): File "", line 6, in test_values_even AssertionError: 1 != 0 ====================================================================== FAIL: test_values_even (__main__.SubTestExample) (i=5) ---------------------------------------------------------------------- Traceback (most recent call last): File "", line 6, in test_values_even AssertionError: 1 != 0 ---------------------------------------------------------------------- Ran 1 test in 0.001s FAILED (failures=3) See http://docs.python.org/dev/library/unittest#distinguishing-test-iterations-using-subtests for more details. The first 3.4 alpha is scheduled for August, with the final release anticipated in February next year (see http://www.python.org/dev/peps/pep-0429/) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg.ewing at canterbury.ac.nz Tue Jun 25 02:40:34 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 25 Jun 2013 12:40:34 +1200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <20130624224127.GA10419@ando> References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> Message-ID: <51C8E702.6000509@canterbury.ac.nz> Steven D'Aprano wrote: > If you refactor the name "ham" to "spam" in > > func(=ham, eggs=SCRAMBLED, toast=None, coffee='white') > > you also need to change the implicit keyword syntax back to ordinary > explicit keyword syntax, or it will break. I don't see that as a major problem. If you change the name of a keyword argument, you have to review all the places it's used anyway. > I cannot think of any other feature, in any other language, > where changing a variable's name requires you to change the syntax you > can use on it. That can happen already. If you're accepting a ** argument and passing it on, and the names change in such a way that the incoming and outgoing names no longer match, that whole strategy will stop working. The change required in that case is much bigger than just replacing '=foo' with 'foo=blarg'. > We shouldn't want keyword arguments *everywhere*, but only where > they add clarity rather than mere verbosity. I agree that wanting to using keyword arguments everywhere is excessive. But I do sympathise with the desire to improve DRY in this area. While the actual number of occasions I've encountered this sort of thing mightn't be very high, they stick in my mind as being particularly annoying. -- Greg From abarnert at yahoo.com Tue Jun 25 02:58:10 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 24 Jun 2013 17:58:10 -0700 (PDT) Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C8DB24.8050504@canterbury.ac.nz> References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <51C8DB24.8050504@canterbury.ac.nz> Message-ID: <1372121890.62039.YahooMailNeo@web184705.mail.ne1.yahoo.com> From: Greg Ewing Sent: Monday, June 24, 2013 4:49 PM > Andrew Barnert wrote: >> On Jun 23, 2013, at 16:34, Greg Ewing > wrote: >> >>> It occurs more often than you might think, because taking >>> parameters that you've been passed and passing them on to >>> another function is a common pattern >> >> Yes, I do that all the time. But I can't think of a single case where >> there's any benefit to using keyword arguments. > > That's puzzling, because the benefits are the same as with > any other call that benefits from keyword arguments. No they aren't. The main benefit that a call gets from keyword arguments is that, from the parameter names, you can tell what the arguments mean. When the arguments are already variables (parameters or locals) with the same name, you already have the same information, and there is no benefit from getting it twice. Compare: ? ? url = get_appdir_url(False, None, True) ? ? url = get_appdir_url(per_user=False, for_domain=None, create=True) This is a change I made earlier today in someone else's coded. The improvement is, I hope, unarguable. But here's another function call I _didn't_ change: ? ? def get_appdir_url(per_user, for_domain, create): ? ? ? ? return get_special_url('appdir', per_user, for_domain, create) Would this be at all improved by adding keywords? ? ? ? ? return get_special_url(special='appdir', per_user=per_user, for_domain=for_domain, create=create) That just adds noise, making it less readable, rather than more. Of course there's another benefit: Sometimes, using keywords lets you reorder the arguments to a way that makes more sense for your use case and/or skip defaulted parameters that you don't care about: ? ? gzip.GzipFile(fileobj=f, compressionlevel=1) Obviously, I'm all for using the keywords there as well. But the proposal doesn't affect cases like that. >?Do you?doubt the usefulness of keyword arguments in general? Not at all. I only doubt the idea of encouraging people to use keyword arguments _everywhere_. Keywords make code more readable in some. cases, less readable in others. Blindly using keywords everywhere would make code overall less readable, just as blindly avoiding keywords everywhere. Most Python programmers today seem to do a decent job finding the right balance, and the language and library do a decent job helping them. Could that be better? Sure. But encouraging keywords everywhere would not make it better. And a proposal that's specifically intended to encourage using keywords in cases where they add nothing but noise would definitely not make it better. I've already granted that dict construction is a special case where this may not be true. (When is dict construction not a special case when it comes to keyword arguments?) But, as I said before, I don't think it's the right solution for that case?and, since then, at least two people have offered other ideas for that special case. From python at mrabarnett.plus.com Tue Jun 25 03:22:21 2013 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 25 Jun 2013 02:22:21 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <1372121890.62039.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <51C8DB24.8050504@canterbury.ac.nz> <1372121890.62039.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: <51C8F0CD.1080106@mrabarnett.plus.com> On 25/06/2013 01:58, Andrew Barnert wrote: > From: Greg Ewing > > Sent: Monday, June 24, 2013 4:49 PM > > >> Andrew Barnert wrote: >>> On Jun 23, 2013, at 16:34, Greg Ewing >> wrote: >>> >>>> It occurs more often than you might think, because taking >>>> parameters that you've been passed and passing them on to >>>> another function is a common pattern >>> >>> Yes, I do that all the time. But I can't think of a single case where >>> there's any benefit to using keyword arguments. >> >> That's puzzling, because the benefits are the same as with >> any other call that benefits from keyword arguments. > > No they aren't. > > The main benefit that a call gets from keyword arguments is that, from the parameter names, you can tell what the arguments mean. > > When the arguments are already variables (parameters or locals) with the same name, you already have the same information, and there is no benefit from getting it twice. > > Compare: > > url = get_appdir_url(False, None, True) > url = get_appdir_url(per_user=False, for_domain=None, create=True) > > This is a change I made earlier today in someone else's coded. The improvement is, I hope, unarguable. > > But here's another function call I _didn't_ change: > > def get_appdir_url(per_user, for_domain, create): > return get_special_url('appdir', per_user, for_domain, create) > > Would this be at all improved by adding keywords? > > return get_special_url(special='appdir', per_user=per_user, for_domain=for_domain, create=create) > > That just adds noise, making it less readable, rather than more. > > Of course there's another benefit: Sometimes, using keywords lets you reorder the arguments to a way that makes more sense for your use case and/or skip defaulted parameters that you don't care about: > > gzip.GzipFile(fileobj=f, compressionlevel=1) > > Obviously, I'm all for using the keywords there as well. But the proposal doesn't affect cases like that. > >> Do you doubt the usefulness of keyword arguments in general? > > Not at all. I only doubt the idea of encouraging people to use keyword arguments _everywhere_. > > Keywords make code more readable in some. cases, less readable in others. Blindly using keywords everywhere would make code overall less readable, just as blindly avoiding keywords everywhere. > > Most Python programmers today seem to do a decent job finding the right balance, and the language and library do a decent job helping them. Could that be better? Sure. But encouraging keywords everywhere would not make it better. > > And a proposal that's specifically intended to encourage using keywords in cases where they add nothing but noise would definitely not make it better. > > I've already granted that dict construction is a special case where this may not be true. (When is dict construction not a special case when it comes to keyword arguments?) But, as I said before, I don't think it's the right solution for that case?and, since then, at least two people have offered other ideas for that special case. > Why not just add a single marker followed by the names, something like this: return get_special_url(special='appdir', =, per_user, for_domain, create) From abarnert at yahoo.com Tue Jun 25 04:51:04 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 24 Jun 2013 19:51:04 -0700 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C8F0CD.1080106@mrabarnett.plus.com> References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <51C8DB24.8050504@canterbury.ac.nz> <1372121890.62039.YahooMailNeo@web184705.mail.ne1.yahoo.com> <51C8F0CD.1080106@mrabarnett.plus.com> Message-ID: <281F247D-DAEF-45CC-872E-2C954D33AD0D@yahoo.com> On Jun 24, 2013, at 18:22, MRAB wrote: > On 25/06/2013 01:58, Andrew Barnert wrote: >> From: Greg Ewing >> >> Sent: Monday, June 24, 2013 4:49 PM >> >> >>> Andrew Barnert wrote: >>>> On Jun 23, 2013, at 16:34, Greg Ewing >>> wrote: >>>> >>>>> It occurs more often than you might think, because taking >>>>> parameters that you've been passed and passing them on to >>>>> another function is a common pattern >>>> >>>> Yes, I do that all the time. But I can't think of a single case where >>>> there's any benefit to using keyword arguments. >>> >>> That's puzzling, because the benefits are the same as with >>> any other call that benefits from keyword arguments. >> >> No they aren't. >> >> The main benefit that a call gets from keyword arguments is that, from the parameter names, you can tell what the arguments mean. >> >> When the arguments are already variables (parameters or locals) with the same name, you already have the same information, and there is no benefit from getting it twice. >> >> Compare: >> >> url = get_appdir_url(False, None, True) >> url = get_appdir_url(per_user=False, for_domain=None, create=True) >> >> This is a change I made earlier today in someone else's coded. The improvement is, I hope, unarguable. >> >> But here's another function call I _didn't_ change: >> >> def get_appdir_url(per_user, for_domain, create): >> return get_special_url('appdir', per_user, for_domain, create) >> >> Would this be at all improved by adding keywords? >> >> return get_special_url(special='appdir', per_user=per_user, for_domain=for_domain, create=create) >> >> That just adds noise, making it less readable, rather than more. >> >> Of course there's another benefit: Sometimes, using keywords lets you reorder the arguments to a way that makes more sense for your use case and/or skip defaulted parameters that you don't care about: >> >> gzip.GzipFile(fileobj=f, compressionlevel=1) >> >> Obviously, I'm all for using the keywords there as well. But the proposal doesn't affect cases like that. >> >>> Do you doubt the usefulness of keyword arguments in general? >> >> Not at all. I only doubt the idea of encouraging people to use keyword arguments _everywhere_. >> >> Keywords make code more readable in some. cases, less readable in others. Blindly using keywords everywhere would make code overall less readable, just as blindly avoiding keywords everywhere. >> >> Most Python programmers today seem to do a decent job finding the right balance, and the language and library do a decent job helping them. Could that be better? Sure. But encouraging keywords everywhere would not make it better. >> >> And a proposal that's specifically intended to encourage using keywords in cases where they add nothing but noise would definitely not make it better. >> >> I've already granted that dict construction is a special case where this may not be true. (When is dict construction not a special case when it comes to keyword arguments?) But, as I said before, I don't think it's the right solution for that case?and, since then, at least two people have offered other ideas for that special case. > Why not just add a single marker followed by the names, something like this: > > return get_special_url(special='appdir', =, per_user, for_domain, create) That's less ugly, but I still don't see what the benefit is. The code that already works in Python 3.3 (and even 2.x) has the same amount of information (again, it's already obvious what the parameters mean without using keywords), and it's even less ugly, and it doesn't require any new syntax. I suppose if someone later reorders the parameters of get_special_url without making any other change to the API, my code will still work. But how often does that happen? And I'm not sure a language change that encourages such API changes is a good thing. From ryan at ryanhiebert.com Tue Jun 25 05:07:56 2013 From: ryan at ryanhiebert.com (Ryan Hiebert) Date: Mon, 24 Jun 2013 20:07:56 -0700 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <281F247D-DAEF-45CC-872E-2C954D33AD0D@yahoo.com> References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <51C8DB24.8050504@canterbury.ac.nz> <1372121890.62039.YahooMailNeo@web184705.mail.ne1.yahoo.com> <51C8F0CD.1080106@mrabarnett.plus.com> <281F247D-DAEF-45CC-872E-2C954D33AD0D@yahoo.com> Message-ID: <24744A9D-79CA-42E0-82B4-4896EA56C6BA@ryanhiebert.com> On Jun 24, 2013, at 7:51 PM, Andrew Barnert wrote: > > I suppose if someone later reorders the parameters of get_special_url without making any other change to the API, my code will still work. But how often does that happen? And I'm not sure a language change that encourages such API changes is a good thing. An library implementor can already write code that allows him to have keyword-only arguments, so he can avoid polluting the compatibility he must maintain in his positional arguments. It's less about allowing him to reorder them, and more about separating the positional and keyword arguments. I've not seen a solution I'm totally sold on yet, but supporting keyword argument shorthand seems like a good idea to me. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4142 bytes Desc: not available URL: From jared.grubb at gmail.com Tue Jun 25 05:37:52 2013 From: jared.grubb at gmail.com (Jared Grubb) Date: Mon, 24 Jun 2013 20:37:52 -0700 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C8E702.6000509@canterbury.ac.nz> References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> Message-ID: <57060714-7C39-4A75-93D6-A6CA67EBB8B9@gmail.com> On Jun 24, 2013, at 17:40, Greg Ewing wrote: > I agree that wanting to using keyword arguments everywhere > is excessive. But I do sympathise with the desire to improve > DRY in this area. While the actual number of occasions I've > encountered this sort of thing mightn't be very high, they > stick in my mind as being particularly annoying. Brandon Rhodes gave a nice presentation at Pycon this year on naming where he argued that "well-factored nouns" can help make code more readable... and enabling the above pattern would make it easier to do what he suggests sometimes. (For those that missed it: http://pyvideo.org/video/1676/the-naming-of-ducks-where-dynamic-types-meet-sma, starting about 08:20 through 11:00 or so, although I think the whole talk was great). I'm not saying that Python needs syntax for this, but I agree: the few times where I've had it come up, it is annoying. Jared -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Jun 25 05:57:10 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 25 Jun 2013 13:57:10 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C8E702.6000509@canterbury.ac.nz> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> Message-ID: <20130625035710.GD10635@ando> On Tue, Jun 25, 2013 at 12:40:34PM +1200, Greg Ewing wrote: > Steven D'Aprano wrote: > >If you refactor the name "ham" to "spam" in > > > >func(=ham, eggs=SCRAMBLED, toast=None, coffee='white') > > > >you also need to change the implicit keyword syntax back to ordinary > >explicit keyword syntax, or it will break. > > I don't see that as a major problem. If you change the name > of a keyword argument, you have to review all the places it's > used anyway. I'm not talking about changing the name of the parameter, I'm talking about changing the name of the argument passed to the function. That is, the "ham" on the *right* of ham=ham, not the left. Of course, it takes far more faith in refactoring tools than I have to blindly run an automated tool over code changing names. But even if you review the code, it's very easy to miss that =ham was valid but after refactoring =spam is not. Whereas the explicit func(ham=ham) => func(ham=spam) continues to work perfectly. This proposed change leads to fragile code, where a trivial renaming will break function calls. > > I cannot think of any other feature, in any other language, > >where changing a variable's name requires you to change the syntax you > >can use on it. > > That can happen already. If you're accepting a ** argument > and passing it on, and the names change in such a way that > the incoming and outgoing names no longer match, that whole > strategy will stop working. The change required in that case > is much bigger than just replacing '=foo' with 'foo=blarg'. But you don't have to change the *syntax*. func(a, b, c, **kwargs) keeps the same syntax, whether you refactor the name "kwargs" to "kw" or "extras" or any other name, or whatever you do to the keys inside it. I cannot think of any other syntax, certainly not in Python, which relies on a name being precisely one value rather than another in order to work. Wait, no, I have just thought of one: the magic treatment of super() inside methods in Python 3: def method(self, arg): super().method(arg) # works s = super; s().method(arg) # doesn't work But that's arguably because super() actually should be a keyword rather than just a regular builtin. In any case, I don't think that super() is a precedent for this proposal. > >We shouldn't want keyword arguments *everywhere*, but only where > > they add clarity rather than mere verbosity. > > I agree that wanting to using keyword arguments everywhere > is excessive. But I do sympathise with the desire to improve > DRY in this area. While the actual number of occasions I've > encountered this sort of thing mightn't be very high, they > stick in my mind as being particularly annoying. I really wish people would stop referring to every trivially repeated token as "DRY". It is not. x = a + b + c Oh noes! Two plus signs! It's a DRY violation! Not. I exaggerate a little for effect, but it really does seem that many people think that any piece of code, no matter how trivially small, that is repeated is a DRY violation. But that's not what DRY is about. "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system." https://en.wikipedia.org/wiki/Don%27t_Repeat_Yourself which has nothing to do with writing spam=spam in a function call. It's no more of a DRY violation than "spam = -spam" would be. -- Steven From boxed at killingar.net Tue Jun 25 08:05:53 2013 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Tue, 25 Jun 2013 08:05:53 +0200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <20130624224127.GA10419@ando> References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> Message-ID: <2BA25556-5EA7-4CAC-956F-D2C9CECB2B96@killingar.net> > I cannot think of any other feature, in any other language, where changing a variable's name requires you to change the syntax you can use on it. Let me give you an example then! OCaml has exactly the feature I propose. It looks like this: foo ~bar And to be clear is the same as the python foo(bar=bar) From boxed at killingar.net Tue Jun 25 08:25:45 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Tue, 25 Jun 2013 08:25:45 +0200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <1372121890.62039.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <51C8DB24.8050504@canterbury.ac.nz> <1372121890.62039.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: > The main benefit that a call gets from keyword arguments is that, from the > parameter names, you can tell what the arguments mean. > Agreed. > When the arguments are already variables (parameters or locals) with the > same name, you already have the same information, and there is no benefit > from getting it twice. > Absolutely not. At that point you have to make a pretty big assumption that this is the case. In order to KNOW you need to go look up the function and compare two lists of names. And if it changes keyword arguments will throw an error upon invocation, positional arguments will not. > Keywords make code more readable in some. cases, less readable in others. > Blindly using keywords everywhere would make code overall less readable, > just as blindly avoiding keywords everywhere. > Currently a big part of why it makes code less readable is the repetition. It'd be cool to be able to get the advantages without making any significant dent in readability. > Most Python programmers today seem to do a decent job finding the right > balance, and the language and library do a decent job helping them. Could > that be better? Sure. But encouraging keywords everywhere would not make it > better. > Just want to point out that you have no idea of knowing that. Making religious assertions doesn't strengthen your case. You could have pointed out that the current python culture is one that generally produces some of the best and most usable code already, making changes to that dynamic risky. THAT I would have agreed to. > And a proposal that's specifically intended to encourage using keywords in > cases where they add nothing but noise would definitely not make it better. > Except of course the little detail that with my suggestion the added noise would be almost insignificant. 1 extra character per variable name. Only for really horrible code with variable names of 1 or 2 characters will that be a significant increase in "noise". -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Tue Jun 25 08:47:57 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 25 Jun 2013 18:47:57 +1200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <20130625035710.GD10635@ando> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> Message-ID: <51C93D1D.1080103@canterbury.ac.nz> Steven D'Aprano wrote: >>The change required in that case >>is much bigger than just replacing '=foo' with 'foo=blarg'. > > But you don't have to change the *syntax*. It's still a bigger change than any automated tool will be able to cope with, and if you're changing it manually, who cares if the syntax changes or not? > I cannot think of any other syntax, certainly not in Python, which relies > on a name being precisely one value rather than another in order to > work. Pardon? *Every* time you use a name, you're relying on it being the same as at least one other name somewhere else. > I exaggerate a little for effect, but it really does seem that many > people think that any piece of code, no matter how trivially small, that > is repeated is a DRY violation. But that's not what DRY is about. You seem to think I'm using a form of argument by authority: "It's DRY, therefore it's undesirable." But that's not what I'm saying. Rather, I'm saying that I feel it's undesirable, and I'm using the term DRY to *describe* what I think is undesirable about it. Maybe I'm not using the term quite the way it was originally meant, but that has no bearing on how I feel about the matter and my reasons for it. > "Every piece of knowledge must have a single, unambiguous, authoritative > representation within a system." > > https://en.wikipedia.org/wiki/Don%27t_Repeat_Yourself > > which has nothing to do with writing spam=spam in a function call. I'm not so sure about that. I've just been watching this: http://pyvideo.org/video/1676/the-naming-of-ducks-where-dynamic-types-meet-sma and going by the principles advocated there, if you change the name of the parameter in the called function, you *should* change the corresponding parameter of your function to match, for the sake of consistency. So '=spam' will continue to work fine after the change. It will also help you to keep your parameter names consistent by breaking if they're not! -- Greg From turnbull at sk.tsukuba.ac.jp Tue Jun 25 09:15:18 2013 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 25 Jun 2013 16:15:18 +0900 Subject: [Python-ideas] [Suspected Spam] Re: Short form for keyword arguments and dicts In-Reply-To: References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <877ghi1x15.fsf@uwakimon.sk.tsukuba.ac.jp> Anders Hovm?ller writes: >> I'll take the explicit use of locals any time. I meant "locals" the function. I should have written "locals()". > My suggestion isn't about introducing more magic, just a little bit > of convenience It's syntax, which is always magic, or, if you prefer, "has arbitrary semantics which must be memorized". As syntax, it's ugly (IMO) and nonintuitive (by the rationale that "=" is an infix binary relation or assignment operator, not a unary prefix operator). Not to forget redundant, given the existence of locals() and positional arguments. > for two common use cases: passing along variables with the > same name to another function If the two functions were designed with similar signatures, using positional arguments is the obvious way to indicate this, at a saving of one "=" per argument. If they weren't and you don't feel like looking up the signature of the function you're calling (or are worried that the signature might change in a future version), **locals() wins at the expense of bringing in (possibly) a bunch of junk you don't want. But it's way shorter than a sequence of even *three* =-prefixed variable names (unless they're "x", "y", and "z"). I don't see a win here big enough to justify syntax, let alone *this* syntax. > and throwing a bunch of variables into a dict (which is basically > the same thing since "dict(foo=foo)" == "{'foo': foo}").? This is a little more plausible, since you can't pun on the names of the parameters when using positional arguments, since dicts don't have positional arguments. Still, adding syntax is a high hurdle. From stephen at xemacs.org Tue Jun 25 09:51:07 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 25 Jun 2013 16:51:07 +0900 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C93D1D.1080103@canterbury.ac.nz> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> Message-ID: <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > > I cannot think of any other syntax, certainly not in Python, which relies > > on a name being precisely one value rather than another in order to > > work. > > Pardon? *Every* time you use a name, you're relying on it > being the same as at least one other name somewhere else. Ah, but here we have the case of two *different* names that are spelled the same[1], and what Steven is pointing out is that for this syntax to work, these different names that are spelled the same must stay in sync. If one's spelling changes, the other must change spelling, too. Footnotes: [1] That is, they are names in disjoint namespaces. Amusingly enough, there is a Stephen J. Turnbull who is a well-known expert on Japanese history, especially military history. Nice guy, no relation, never met in real space, so no confusion for us, no need for =Stephen syntax. But it confuses the heck out of ninja fanatics. ;-) From steve at pearwood.info Tue Jun 25 10:14:35 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 25 Jun 2013 18:14:35 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C93D1D.1080103@canterbury.ac.nz> References: <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> Message-ID: <20130625081435.GE10635@ando> On Tue, Jun 25, 2013 at 06:47:57PM +1200, Greg Ewing wrote: > Steven D'Aprano wrote: > >>The change required in that case > >>is much bigger than just replacing '=foo' with 'foo=blarg'. > > > >But you don't have to change the *syntax*. > > It's still a bigger change than any automated tool will > be able to cope with, and if you're changing it manually, > who cares if the syntax changes or not? Because having to change the syntax is *one more thing* to worry about when you change a variable name, or change a parameter. Because forgetting to change the syntax is *one more thing* to go wrong. This makes this proposed feature *more* fragile rather than less, and I don't think we should be adding syntax that encourages fragile code for marginal benefit, particularly when the proposal goes against two of the Zen and, frankly, has nothing much going for it except a little less typing in a fraction of all function calls. It doesn't make code more readable, except in the sense that there's less to read. If anything, it's *less* readable because it looks like you've forgotten the parameter name: func(a, b, spam=None, ham="green", =eggs, extra=False, another=42*x) It's less explicit. It priviledges a special case that really isn't that important. Out of the infinite number of possible combinations of parameter=expression, what's so special about the one where the expression happens to be precisely the same string as the parameter? Even if it is special, it's not special *enough* to justify special syntax. > >I cannot think of any other syntax, certainly not in Python, which relies > >on a name being precisely one value rather than another in order to > >work. > > Pardon? *Every* time you use a name, you're relying on it > being the same as at least one other name somewhere else. Either I'm not explaining myself, or you're not understanding me. Let me give you some examples. "x = 2*y", if I rename y => z the statement remains "x = 2*z", I don't have to change it to "x = x.__mul__(z)". "mylist.append(None)", if I rename mylist => alist the statement remains "alist.append(None), I don't have to change it to "alist.extend([None])". "if seq:", if I rename seq => items the statement remains "if items:", I don't have to change it to "if bool(items):". BUT the proposed syntax is different from everything else in Python: "func(=spam)", but if I rename spam => ham, I must also change the way I call the function to "func(spam=ham)". The point isn't that it's hard to make that edit. The point is that I shouldn't need to make that edit. > >I exaggerate a little for effect, but it really does seem that many > >people think that any piece of code, no matter how trivially small, that > >is repeated is a DRY violation. But that's not what DRY is about. > > You seem to think I'm using a form of argument by authority: > "It's DRY, therefore it's undesirable." But that's not what > I'm saying. Rather, I'm saying that I feel it's undesirable, > and I'm using the term DRY to *describe* what I think is > undesirable about it. Maybe I'm not using the term quite the > way it was originally meant, but that has no bearing on how > I feel about the matter and my reasons for it. This isn't a question of "original meaning", you are misusing DRY to describe something which has nothing to do with the principles behind DRY, also known as "Single Point of Truth". You might as well describe your dislike of func(spam=spam) as "a violation of the Liskov Substitution Principle", then say "well that's not what Barbara Liskov meant by the LSP, but it's what I mean". I'm sorry if it annoys you to be told that your understanding of Don't Repeat Yourself is mistaken, but it is. It is not "any repetition of code is a bad thing". > >"Every piece of knowledge must have a single, unambiguous, authoritative > >representation within a system." > > > >https://en.wikipedia.org/wiki/Don%27t_Repeat_Yourself > > > >which has nothing to do with writing spam=spam in a function call. > > I'm not so sure about that. I've just been watching this: > > http://pyvideo.org/video/1676/the-naming-of-ducks-where-dynamic-types-meet-sma > > and going by the principles advocated there, if you change the > name of the parameter in the called function, you *should* > change the corresponding parameter of your function to match, > for the sake of consistency. The called function may be called from a thousand places. Do you really think that because I decide I don't like my local variable to be called (say) "first_name", and want to call it "personal_name" instead, that I "should" force every other caller to change their local variable too, "for the sake of consistency"? I'm sure you know the proverb about foolish consistency. -- Steven From greg.ewing at canterbury.ac.nz Tue Jun 25 10:15:35 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 25 Jun 2013 20:15:35 +1200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51C951A7.9050707@canterbury.ac.nz> Stephen J. Turnbull wrote: > Ah, but here we have the case of two *different* names that are > spelled the same[1], and what Steven is pointing out is that for this > syntax to work, these different names that are spelled the same must > stay in sync. I dispute that they're different names. In the use cases I have in mind, it's no accident that the two names are spelled the same, because conceptually they represent the very same thing. Giving them different names would be confusing and probably indicate an error. If the names were only accidentally the same, I would probably want to rename one of them to avoid giving the misleading impression that they were related. > Amusingly > enough, there is a Stephen J. Turnbull who is a well-known expert on > Japanese history, especially military history. Nice guy, no relation, > never met in real space, so no confusion for us, no need for =Stephen > syntax. But it confuses the heck out of ninja fanatics. ;-) That's quite a different situation -- these two Stephens really are different, and if they ever had to coexist in the same namespace, they would need to be given different names, e.g. StephenTheHistorian and StephenTheNinjaConfuser. And you would want to avoid assigning one to the other. -- Greg From boxed at killingar.net Tue Jun 25 10:58:45 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Tue, 25 Jun 2013 10:58:45 +0200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <20130625081435.GE10635@ando> References: <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <20130625081435.GE10635@ando> Message-ID: > Because having to change the syntax is *one more thing* to worry about > when you change a variable name, or change a parameter. Because > forgetting to change the syntax is *one more thing* to go wrong. This > makes this proposed feature *more* fragile rather than less, and I don't > think we should be adding syntax that encourages fragile code for > marginal benefit, particularly when the proposal goes against two of the > Zen and, frankly, has nothing much going for it except a little less > typing in a fraction of all function calls. > I agree with everything except the part of the last sentence after the comma where you misrepresent the entire issue at hand. Again. > It doesn't make code more readable, except in the sense that there's > less to read. If anything, it's *less* readable because it looks like > you've forgotten the parameter name: > Again you're conflating my suggested syntax with the underlying idea. The syntax could be "$foo" instead of "=foo" for example. This has the advantage that it'd be the same in a dict: "{$foo}", but it has the disadvantage of not being similar to the existing syntax. Again, the idea itself is a separate thing from the syntax I spent literally 2 seconds to come up with :P > It's less explicit. You mean "more". It's absolutely more explicit than just using the position. > It priviledges a special case that really isn't that > important. Out of the infinite number of possible combinations of > parameter=expression, what's so special about the one where the > expression happens to be precisely the same string as the parameter? > It's so special because one can be derived from the other. Which is impossible in all the other cases. > Even if it is special, it's not special *enough* to justify special > syntax. In your opinion. In my opinion we could have an open ended brain storm about these types of things, and we can acknowledge that it's a matter of trade offs. Either I'm not explaining myself, or you're not understanding me. Let me > give you some examples. > He was using your hyperbole against you :P > The point isn't that it's hard to make that edit. The point is that I > shouldn't need to make that edit. If you use the alternative of **locals() suggested as an alternative you're in tons and tons more trouble than just having to make that edit. Or even worse with the _() magic that goes through the stack and digs out the locals. *shudder* I understand you don't like my suggestion, but I hope we can both agree that locals() for anything but a string formatter with a literal, or rummaging through the stack is much much worse? > I'm sorry if it annoys you to be told that your understanding of Don't > Repeat Yourself is mistaken, but it is. It is not "any repetition of > code is a bad thing". Words are defined not by dictionaries but by how they are used. Words change meaning over time. > > and going by the principles advocated there, if you change the > > name of the parameter in the called function, you *should* > > change the corresponding parameter of your function to match, > > for the sake of consistency. > > The called function may be called from a thousand places. Do you really > think that because I decide I don't like my local variable to be called > (say) "first_name", and want to call it "personal_name" instead, that I > "should" force every other caller to change their local variable too, > "for the sake of consistency"? > Again with the straw man. That wasn't what he said. He said that if 'personal_name' is *a better name for what it actually is* then it is a good idea to change it in the entire code base. It's always a good idea to make the names of stuff in your code better. This has nothing to do with this thread though! -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jun 25 11:14:01 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 25 Jun 2013 02:14:01 -0700 (PDT) Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <51C8DB24.8050504@canterbury.ac.nz> <1372121890.62039.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: <1372151641.29336.YahooMailNeo@web184705.mail.ne1.yahoo.com> From: Anders Hovm?ller Sent: Monday, June 24, 2013 11:25 PM Before getting to the specific replies, I'd like to repeat my call for some examples of real (or at least realistic) code that would actually benefit from this change. I've already agreed that the two special cases of dict construction and str.format could benefit?but those are special cases.?Nick Coghlan and others have already suggested they could be better handled with different improvements. Maybe they're wrong; maybe those special cases are important enough, and there is no better solution for them. But you're presenting this as a broader fix than that, and I don't see it.?The only other real examples I've seen are my own cases of?delegating functions that forward their parameters exactly, and I don't think those need to be improved.? Maybe there's a lot more to your idea than I and others are seeing. You're clearly a knowledgeable and sane programmer, not some crank, and I believe that you do have real code you'd like to improve. But I can't imagine it, so I'd like you to demonstrate that real code. >>When the arguments are already variables (parameters or locals) with the same name, you already have the same information, and there is no benefit from getting it twice. > >Absolutely not. At that point you have to make a pretty big assumption that this is the case. An assumption? Sure. A big one? No. When I'm reading this code: ? ? url = get_appdir_url(per_user, for_domain, create) ? I have to assume that per_user is some kind of?flag value that selects between a per-user appdir and a local-system appdir, not an integer that specifies the primary group ID. Just as I have to assume that?get_appdir_url returns a URL to an appdir, not a pandas table full of the populations of world capitals.?In other words, I have to assume that whoever wrote the code I'm reading isn't being malicious or psychotic when coming up with names. And again, this is totally different from the paradigm case for keyword arguments: ? ? url = get_appdir_url(False, None, True) Here, maybe I?can guess that one of those two boolean values might be a per-user/local-system flag, but I have no idea which one, or whether False means local-system or per-user. So, using keywords makes a _huge_ difference.? But in the previous case, I always know what per_user means.?And, in fact, adding keywords doesn't affect what I have to assume. When I see this: ? ? url = get_appdir_url(per_user=per_user, for_domain=for_domain, create=create) ? I _still_ have to assume that per_user means what I think it means. It's the same words, with the same meaning.?And adding an "=" prefix to each argument wouldn't add any meaning either; it's just a meaningless symbol that I have to pass over to read the actual meaning of the code. > In order to KNOW you need to go look up the function and compare two lists of names. No, in order to KNOW I'd need to pore over the implementation of the function, and either prove that it does what I expect, or test it sufficiently to my satisfaction. Fortunately, I don't usually need?that kind of knowledge when reading someone's code. If I want to understand what your script does, or expand its functionality,?I read it with the assumption that each expression means what it says. I may come back and look at some of them more carefully if I find a bug, or a clever hack that I don't understand, but that's not the usual case when reading code. > And if it changes keyword arguments will throw an error upon invocation, positional arguments will not. How many times has it happened that some function changed the order of its parameters, but didn't change anything else that would break your code? Maybe once or twice in your lifetime as a programmer? Is it really?common enough to be worth making your code less readable, even a little bit, to protect against it? >>Most Python programmers today seem to do a decent job finding the right balance, and the language and library do a decent job helping them. Could that be better? Sure. But encouraging keywords everywhere would not make it better. > >Just want to point out that you have no idea of knowing that. Sure I do. And so do you. If you didn't have some sense, based on experience reading and writing lots of code in Python and other languages, about how well Python programmers take advantage of the freedom to choose between positional and keyword arguments, you wouldn't have?proposed a change in the first place. And this is the key question.?You're supporting the general principle that people should use keyword arguments whenever possible?replacing flexibility with TOOWTDI dogma. That can be a good thing when the flexibility has been poorly used (e.g., Python no longer lets you choose whether to explicitly or implicitly decode byte strings), or it can be a bad thing when the flexibility leads to better code. A big part of the reason both you and I use Python is that people have argued out these kinds of questions, instead of just assuming that there's no way of knowing and choosing arbitrarily. >Except of course the little detail that with my suggestion the added noise would be almost insignificant. 1 extra character per variable name. Only for really horrible code with variable names of 1 or 2 characters will that be a significant increase in "noise".? I realize this is almost like Godwin's Law here, but? By the exact same argument, perl sigils are insignificant.?But everyone knows they're not. When I see "$remaining = shift(@ARGV);" instead of just "remaining = shift(ARGV);", it?disturbs the flow of reading code, despite only being 1 extra character per variable name.?That's what people mean when they say it's "ugly"?it's not about typographical beauty, it's about being able to look at code and quickly understand what it's doing. Consider the fact that Python doesn't require?parens around conditions, or that it does require colons in block-introducing statements. Those are even tinier?one characters for the whole statement?and yet they make a huge difference in readability. There's always a cost in adding extra noise, and just saying "It's not that bad" isn't a good argument. The question is whether the benefit outweighs the cost.?The **kw syntax is a stumbling block that every immigrant from JavaScript or C++ runs into at some point?first they have to figure out what it means, then they have to get used to it. But it's so hugely useful that the cost is obviously worth taking. Your =keyword syntax would have the same cost. Would it have a similarly large benefit? From abarnert at yahoo.com Tue Jun 25 11:43:55 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 25 Jun 2013 02:43:55 -0700 (PDT) Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C951A7.9050707@canterbury.ac.nz> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> Message-ID: <1372153435.80548.YahooMailNeo@web184701.mail.ne1.yahoo.com> From: Greg Ewing Sent: Tuesday, June 25, 2013 1:15 AM > I dispute that they're different names. In the use cases > I have in mind, it's no accident that the two names are > spelled the same, because conceptually they represent the > very same thing. I've already asked Anders this, but let me ask you as well:?What are the use cases you have in mind? As I see it, there are three cases (barring coincidences, which are obviously irrelevant) where this syntax could make a difference: 1. dict constructor 2. str.format 3. forwarding functions (like my example with get_appdir_url) #1 and #2 are definitely special cases.?So, is it #3, or is there some broader use case here? And, if it is just #3, do you have the same argument that (I think) Anders has, or a different one? Again, let's try to use a realistic example instead of toy expressions that have no meaning no matter how they're written: ? ? def split(self, sep=None, maxsplit=-1): ? ? ? ? return self.__class__(self.wrapped_str.split(sep, maxsplit)) ? ? def split(self, *args, **kwargs): ? ? ? ? return?self.__class__(self.wrapped_str.split(*args, **kwargs)) Anders' position seems to be that people _should_ be writing it as: ? ? def split(self, sep=None, maxsplit=-1): ? ? ? ? return self.__class__(self.wrapped_str.split(sep=sep, maxsplit=maxsplit)) ? and the only reason we don't all write that is the repetition. I think he's wrong; there are perfectly good reasons to prefer the more common alternatives, so getting rid of the repetition would make very little difference. So, is this your case, and your argument? Sorry to belabor this, but I believe that if I'm missing the point, it's quite possible that Stephen and most of the other people against the idea are missing it in the same way. > That's quite a different situation -- these two Stephens > really are different? And you would want to avoid assigning one to the other. But you might want to pass one in place of the other. When?Tokugawa Ieyasu said "Let your step be slow and steady, that you stumble not," was he talking about being cautious and moderate in changing the Python grammar file, or just about sieging castles? How will we know without an expert? From boxed at killingar.net Tue Jun 25 11:54:34 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Tue, 25 Jun 2013 11:54:34 +0200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <1372151641.29336.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <51C8DB24.8050504@canterbury.ac.nz> <1372121890.62039.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372151641.29336.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: Thank you for that long and thoughtful message. Unfortunately the big hairy examples where I believe something like my suggestion would be good are stuff I'm sitting and looking right now which of course is closed source so I can't share it with you :( Maybe it's just this code base that's pathological or maybe I'm just being paranoid about stuff not matching up. I'm gonna think about these possibilities for a while and give up on convincing you guys for now at least :P On Tue, Jun 25, 2013 at 11:14 AM, Andrew Barnert wrote: > From: Anders Hovm?ller > Sent: Monday, June 24, 2013 11:25 PM > > > Before getting to the specific replies, I'd like to repeat my call for > some examples of real (or at least realistic) code that would actually > benefit from this change. > > I've already agreed that the two special cases of dict construction and > str.format could benefit?but those are special cases. Nick Coghlan and > others have already suggested they could be better handled with different > improvements. Maybe they're wrong; maybe those special cases are important > enough, and there is no better solution for them. But you're presenting > this as a broader fix than that, and I don't see it. The only other real > examples I've seen are my own cases of delegating functions that forward > their parameters exactly, and I don't think those need to be improved. > > Maybe there's a lot more to your idea than I and others are seeing. You're > clearly a knowledgeable and sane programmer, not some crank, and I believe > that you do have real code you'd like to improve. But I can't imagine it, > so I'd like you to demonstrate that real code. > > >>When the arguments are already variables (parameters or locals) with the > same name, you already have the same information, and there is no benefit > from getting it twice. > > > >Absolutely not. At that point you have to make a pretty big assumption > that this is the case. > > An assumption? Sure. A big one? No. > > When I'm reading this code: > > > url = get_appdir_url(per_user, for_domain, create) > > ? I have to assume that per_user is some kind of flag value that selects > between a per-user appdir and a local-system appdir, not an integer that > specifies the primary group ID. Just as I have to assume > that get_appdir_url returns a URL to an appdir, not a pandas table full of > the populations of world capitals. In other words, I have to assume that > whoever wrote the code I'm reading isn't being malicious or psychotic when > coming up with names. > > And again, this is totally different from the paradigm case for keyword > arguments: > > > url = get_appdir_url(False, None, True) > > > > Here, maybe I can guess that one of those two boolean values might be a > per-user/local-system flag, but I have no idea which one, or whether False > means local-system or per-user. So, using keywords makes a _huge_ > difference. > > But in the previous case, I always know what per_user means. And, in fact, > adding keywords doesn't affect what I have to assume. When I see this: > > > url = get_appdir_url(per_user=per_user, for_domain=for_domain, > create=create) > > ? I _still_ have to assume that per_user means what I think it means. It's > the same words, with the same meaning. And adding an "=" prefix to each > argument wouldn't add any meaning either; it's just a meaningless symbol > that I have to pass over to read the actual meaning of the code. > > > In order to KNOW you need to go look up the function and compare two > lists of names. > > > No, in order to KNOW I'd need to pore over the implementation of the > function, and either prove that it does what I expect, or test it > sufficiently to my satisfaction. > > Fortunately, I don't usually need that kind of knowledge when reading > someone's code. If I want to understand what your script does, or expand > its functionality, I read it with the assumption that each expression means > what it says. I may come back and look at some of them more carefully if I > find a bug, or a clever hack that I don't understand, but that's not the > usual case when reading code. > > > And if it changes keyword arguments will throw an error upon invocation, > positional arguments will not. > > > How many times has it happened that some function changed the order of its > parameters, but didn't change anything else that would break your code? > Maybe once or twice in your lifetime as a programmer? Is it really common > enough to be worth making your code less readable, even a little bit, to > protect against it? > > >>Most Python programmers today seem to do a decent job finding the right > balance, and the language and library do a decent job helping them. Could > that be better? Sure. But encouraging keywords everywhere would not make it > better. > > > > > >Just want to point out that you have no idea of knowing that. > > Sure I do. And so do you. If you didn't have some sense, based on > experience reading and writing lots of code in Python and other languages, > about how well Python programmers take advantage of the freedom to choose > between positional and keyword arguments, you wouldn't have proposed a > change in the first place. > > And this is the key question. You're supporting the general principle that > people should use keyword arguments whenever possible?replacing flexibility > with TOOWTDI dogma. That can be a good thing when the flexibility has been > poorly used (e.g., Python no longer lets you choose whether to explicitly > or implicitly decode byte strings), or it can be a bad thing when the > flexibility leads to better code. A big part of the reason both you and I > use Python is that people have argued out these kinds of questions, instead > of just assuming that there's no way of knowing and choosing arbitrarily. > > >Except of course the little detail that with my suggestion the added > noise would be almost insignificant. 1 extra character per variable name. > Only for really horrible code with variable names of 1 or 2 characters will > that be a significant increase in "noise". > > > I realize this is almost like Godwin's Law here, but? > > By the exact same argument, perl sigils are insignificant. But everyone > knows they're not. When I see "$remaining = shift(@ARGV);" instead of just > "remaining = shift(ARGV);", it disturbs the flow of reading code, despite > only being 1 extra character per variable name. That's what people mean > when they say it's "ugly"?it's not about typographical beauty, it's about > being able to look at code and quickly understand what it's doing. > > Consider the fact that Python doesn't require parens around conditions, or > that it does require colons in block-introducing statements. Those are even > tinier?one characters for the whole statement?and yet they make a huge > difference in readability. > > There's always a cost in adding extra noise, and just saying "It's not > that bad" isn't a good argument. The question is whether the benefit > outweighs the cost. The **kw syntax is a stumbling block that every > immigrant from JavaScript or C++ runs into at some point?first they have to > figure out what it means, then they have to get used to it. But it's so > hugely useful that the cost is obviously worth taking. Your =keyword syntax > would have the same cost. Would it have a similarly large benefit? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Jun 25 12:04:53 2013 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 25 Jun 2013 20:04:53 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <51C8DB24.8050504@canterbury.ac.nz> <1372121890.62039.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372151641.29336.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: On Tue, Jun 25, 2013 at 7:54 PM, Anders Hovm?ller wrote: > Maybe it's just this code base that's pathological or maybe I'm just being > paranoid about stuff not matching up. I'm gonna think about these > possibilities for a while and give up on convincing you guys for now at > least :P Just a side idea, maybe what you want is not new syntax but a linter? Knock together a script that runs through your code and tells you about any oddities it finds, thus guaranteeing that your stuff does indeed match up. As an added bonus, you could plug that into your source control system so you get an alert before you can commit - not sure how you do that in Mercurial but I'm sure you can (I've only ever done it with git). Makes it really easy to catch problems. ChrisA From steve at pearwood.info Tue Jun 25 12:16:51 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 25 Jun 2013 20:16:51 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <20130625081435.GE10635@ando> Message-ID: <20130625101651.GF10635@ando> By the way Anders, you're dropping attribution of the person you are quoting. That's a little impolite and makes it hard to know who said what when. On Tue, Jun 25, 2013 at 10:58:45AM +0200, Anders Hovm?ller wrote: > > Because having to change the syntax is *one more thing* to worry about > > when you change a variable name, or change a parameter. Because > > forgetting to change the syntax is *one more thing* to go wrong. This > > makes this proposed feature *more* fragile rather than less, and I don't > > think we should be adding syntax that encourages fragile code for > > marginal benefit, particularly when the proposal goes against two of the > > Zen and, frankly, has nothing much going for it except a little less > > typing in a fraction of all function calls. > > I agree with everything except the part of the last sentence after the > comma where you misrepresent the entire issue at hand. Again. I'm not misrepresenting anything. I'm making a statement that, in my judgement, your proposal for implicit parameter names has no advantage except that you sometimes get to type less. You might not accept my judgement on this, but it is an honest one. > > It doesn't make code more readable, except in the sense that there's > > less to read. If anything, it's *less* readable because it looks like > > you've forgotten the parameter name: > > Again you're conflating my suggested syntax with the underlying idea. The > syntax could be "$foo" instead of "=foo" for example. I don't think I should be judged badly for responding to your actual proposal instead of the infinite number of things you might have proposed but didn't. [...] > > It's less explicit. > > You mean "more". It's absolutely more explicit than just using the > position. I'm not comparing it to positional arguments. I'm comparing: func(spam=spam) # parameter name is explicitly given func(=spam) # parameter name is implicit > > It priviledges a special case that really isn't that > > important. Out of the infinite number of possible combinations of > > parameter=expression, what's so special about the one where the > > expression happens to be precisely the same string as the parameter? > > It's so special because one can be derived from the other. Which is > impossible in all the other cases. That's incorrect. We could run wild with implied parameters, if we so cared: funct(=3*spam, =eggs+1, =cheese[0]) could be expanded to funct(spam=3*spam, eggs=eggs+1, cheese=cheese[0]) according to the rule, "if an an argument consists of an expression with exactly one name, then the parameter name is derived from that name". I'm not suggesting this as a good idea. I think it is a terrible idea. But it is conceivable that some language might derive parameter names from expressions that are more complex than a single name I believe that it's a bad idea even if the parameter is derived from a single name. > > The point isn't that it's hard to make that edit. The point is that I > > shouldn't need to make that edit. > > If you use the alternative of **locals() suggested as an alternative you're > in tons and tons more trouble than just having to make that edit. Why would I do that? Unless I knew the called function would silently ignore unrecognised keyword args, I wouldn't use the **locals() trick. And even if I did, I probably wouldn't use the trick anyway. -- Steven From boxed at killingar.net Tue Jun 25 12:19:46 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Tue, 25 Jun 2013 12:19:46 +0200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <51C8DB24.8050504@canterbury.ac.nz> <1372121890.62039.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372151641.29336.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: > Just a side idea, maybe what you want is not new syntax but a linter? > Knock together a script that runs through your code and tells you > about any oddities it finds, thus guaranteeing that your stuff does > indeed match up. As an added bonus, you could plug that into your > source control system so you get an alert before you can commit - not > sure how you do that in Mercurial but I'm sure you can (I've only ever > done it with git). Makes it really easy to catch problems. > I've thought about that but rejected it because then I'd have to change all these functions to be keyword arguments. In an example I just scrolled to randomly this changes a function call from 368 characters to 627! I believe there are several worse examples :( With my suggestion at least that function call would only go up to 595 without changing local variable names. If I also add the feature to my suggestion that "foo(bar=something.bar)" == "foo(=something.bar)" (which is pretty horrible!), I can get it down to 500 and still use keyword arguments. And there'a few superflous Nones that I can get rid of if I use keyword arguments, but only 5, which in this case doesn't change much. Writing these numbers gives me the feeling that it's indeed this code base that is pathological :P If you know of an automated tool to convert functions from positional arguments to keyword arguments that'd be a fun experiment to run :P -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jun 25 13:40:24 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 25 Jun 2013 21:40:24 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <51C5EC6E.9050501@mrabarnett.plus.com> <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <51C8DB24.8050504@canterbury.ac.nz> <1372121890.62039.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372151641.29336.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: On 25 June 2013 20:19, Anders Hovm?ller wrote: > >> Just a side idea, maybe what you want is not new syntax but a linter? >> Knock together a script that runs through your code and tells you >> about any oddities it finds, thus guaranteeing that your stuff does >> indeed match up. As an added bonus, you could plug that into your >> source control system so you get an alert before you can commit - not >> sure how you do that in Mercurial but I'm sure you can (I've only ever >> done it with git). Makes it really easy to catch problems. > > > I've thought about that but rejected it because then I'd have to change all > these functions to be keyword arguments. In an example I just scrolled to > randomly this changes a function call from 368 characters to 627! I believe > there are several worse examples :( > > With my suggestion at least that function call would only go up to 595 > without changing local variable names. If I also add the feature to my > suggestion that "foo(bar=something.bar)" == "foo(=something.bar)" (which is > pretty horrible!), I can get it down to 500 and still use keyword arguments. > And there'a few superflous Nones that I can get rid of if I use keyword > arguments, but only 5, which in this case doesn't change much. > > Writing these numbers gives me the feeling that it's indeed this code base > that is pathological :P Unfortunately, it sounds like that may be the case. The subprocess.Popen constructor is probably the most pathological "Swiss army function" in the standard library, and even that would struggle to hit 300 characters for a single function call (maybe if you had some long variable names to pass in, or wrote a long command line in place as a list or string literal). With function signatures like that, you may even want to build the keyword argument mapping programmatically, and then call the end result as: function_with_crazy_signature(**kwargs) It's not *that* uncommon for even subprocess.Popen to be called that way (or else for it to be wrapped in a helper class or closure that supplies some of the parameters). Building up to the final call with functools.partial is another way to potentially manage interacting with that kind of complicated API. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From zak.mc.kraken at libero.it Tue Jun 25 14:33:50 2013 From: zak.mc.kraken at libero.it (ZeD) Date: Tue, 25 Jun 2013 12:33:50 +0000 (UTC) Subject: [Python-ideas] interactive sqlite3 module Message-ID: Hi. some python modules contains a "console" usage (for example the module timeit). It would be useful (for the times when you have python installed but not a sqlite executable... think windows...) to add a minimal interactive sqlite interpreter... a (really) basic implementation is something like the source below... teoretically the idea is to to $ python -msqlite3 'my_data.db' or $ python -msqlite3 ':memory:' and have a simple sqlite interpreter do you think it whould be useful? ------------------------- #!/usr/bin/env python # ATM I'm on python2... from __future__ import print_function try: input = raw_input except: pass # ideally this file is part of sqlite from sqlite3 import * def main(): from sys import argv if len(argv) < 2: raise SystemExit("usage: %s DATABASE" % (argv[0], )) with connect(argv[1]) as conn: while True: command = input("> ") if command == '.exit': break try: c = conn.cursor() for row in c.execute(command): print(row) conn.commit() except Exception as e: print(e) if __name__ == '__main__': main() -- Vito 'ZeD' De Tullio From phd at phdru.name Tue Jun 25 14:56:33 2013 From: phd at phdru.name (Oleg Broytman) Date: Tue, 25 Jun 2013 16:56:33 +0400 Subject: [Python-ideas] interactive sqlite3 module In-Reply-To: References: Message-ID: <20130625125632.GA13868@iskra.aviel.ru> On Tue, Jun 25, 2013 at 12:33:50PM +0000, ZeD wrote: > It would be useful (for the times when you have python installed but not a > sqlite executable... think windows...) sqlite3.exe is so small and is so easily obtainable (distributed as public domain) -- why not just copy it? It doesn't require any installation or configuration, doesn't have any external dependencies. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From jimjhb at aol.com Tue Jun 25 15:35:52 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Tue, 25 Jun 2013 09:35:52 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> Message-ID: <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> Syntax: for X in ListY while conditionZ: The 'for' loop would proceed as long as conditionZ remains true. The motivation is to be able to make use of all the great aspects of the python 'for' (no indexing or explicit end condition check, etc.) and at the same time avoiding a 'break' from the 'for'. (NOTE: Many people are being taught to avoid 'break' and 'continue' at all costs, so they instead convert the clean 'for' into a less-clean 'while'. Or they just let the 'for' run out. You can argue against this teaching practice (at least for Python) but that doesn't mean it's not prevalent and prevailing.) [People who avoid the 'break' by functionalizing an inner portion of the loop are just kidding themselves and making their own code worse, IMO. Takewhile from itertools also works, but that's clumsy and wordy as well.] I'm not super familiar with CPython, but I'm pretty sure I could get this up and working without too much effort. Please note that I don't feel the answer to this is 'just use break'. Programmers are now being taught to avoid 'break' and 'continue' as if they were 'goto's. The result (now) is that people are avoiding the 'for' (with its GREAT properties) because they can't break out of it. Comments and Questions welcome. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Tue Jun 25 15:59:19 2013 From: dholth at gmail.com (Daniel Holth) Date: Tue, 25 Jun 2013 09:59:19 -0400 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> Message-ID: http://docs.python.org/2/library/itertools.html#itertools.takewhile for item in takewhile(lambda x: x < 5, range(10)): pass On Tue, Jun 25, 2013 at 9:35 AM, wrote: > Syntax: > > for X in ListY while conditionZ: > > The 'for' loop would proceed as long as conditionZ remains true. > > The motivation is to be able to make use of all the great aspects of the > python 'for' (no indexing or explicit > end condition check, etc.) and at the same time avoiding a 'break' from the > 'for'. > > (NOTE: Many people are being taught to avoid 'break' and 'continue' at all > costs, so they instead convert > the clean 'for' into a less-clean 'while'. Or they just let the 'for' run > out. You can argue against this teaching > practice (at least for Python) but that doesn't mean it's not prevalent and > prevailing.) > > [People who avoid the 'break' by functionalizing an inner portion of the > loop are just kidding themselves and making their own code worse, IMO. > Takewhile from itertools also works, but that's clumsy and wordy as well.] > > I'm not super familiar with CPython, but I'm pretty sure I could get this up > and working without too much effort. > > Please note that I don't feel the answer to this is 'just use break'. > Programmers are now being taught to avoid 'break' and 'continue' as if they > were 'goto's. The result (now) is that people are avoiding the 'for' (with > its GREAT properties) because they can't break out of it. > > Comments and Questions welcome. > > Thanks. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From amauryfa at gmail.com Tue Jun 25 16:05:14 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 25 Jun 2013 16:05:14 +0200 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> Message-ID: Hi, 2013/6/25 > Syntax: > > for X in ListY while conditionZ: > > The 'for' loop would proceed as long as conditionZ remains true. > It seems that itertools.takewhile does the same thing: http://docs.python.org/2/library/itertools.html#itertools.takewhile It will probably require a lambd > > The motivation is to be able to make use of all the great aspects of the > python 'for' (no indexing or explicit > end condition check, etc.) and at the same time avoiding a 'break' from > the 'for'. > > (NOTE: Many people are being taught to avoid 'break' and 'continue' at > all costs, so they instead convert > the clean 'for' into a less-clean 'while'. Or they just let the 'for' run > out. You can argue against this teaching > practice (at least for Python) but that doesn't mean it's not prevalent > and prevailing.) > Ouch! do you have pointers? Maybe we should upgrade these teaching practices (or teachers) instead. > [People who avoid the 'break' by functionalizing an inner portion of the > loop are just kidding themselves and making their own code worse, IMO. > Takewhile from itertools also works, but that's clumsy and wordy as well.] > > I'm not super familiar with CPython, but I'm pretty sure I could get > this up and working without too much effort. > > Please note that I don't feel the answer to this is 'just use break'. > Programmers are now being taught to avoid 'break' and 'continue' as if > they were 'goto's. The result (now) is that people are avoiding the 'for' > (with its GREAT properties) because they can't break out of it. > > Comments and Questions welcome. > > Thanks. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Tue Jun 25 16:16:29 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 25 Jun 2013 07:16:29 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> Message-ID: There was a length discussion on a related topic within the last couple of months. I?ve attached the seed email for that discussion. A long discussion ensued but it may be easier to follow if you look it up in the archives as I don?t want to accidentally span the list by attaching the whole chain... On Jun 25, 2013, at 7:05 AM, Amaury Forgeot d'Arc wrote: > Hi, > > 2013/6/25 > Syntax: > > for X in ListY while conditionZ: > > The 'for' loop would proceed as long as conditionZ remains true. > > It seems that itertools.takewhile does the same thing: > http://docs.python.org/2/library/itertools.html#itertools.takewhile > It will probably require a lambd > > > The motivation is to be able to make use of all the great aspects of the python 'for' (no indexing or explicit > end condition check, etc.) and at the same time avoiding a 'break' from the 'for'. > > (NOTE: Many people are being taught to avoid 'break' and 'continue' at all costs, so they instead convert > the clean 'for' into a less-clean 'while'. Or they just let the 'for' run out. You can argue against this teaching > practice (at least for Python) but that doesn't mean it's not prevalent and prevailing.) > > Ouch! do you have pointers? > Maybe we should upgrade these teaching practices (or teachers) instead. > > > [People who avoid the 'break' by functionalizing an inner portion of the loop are just kidding themselves and making their own code worse, IMO. Takewhile from itertools also works, but that's clumsy and wordy as well.] > > I'm not super familiar with CPython, but I'm pretty sure I could get this up and working without too much effort. > > Please note that I don't feel the answer to this is 'just use break'. Programmers are now being taught to avoid 'break' and 'continue' as if they were 'goto's. The result (now) is that people are avoiding the 'for' (with its GREAT properties) because they can't break out of it. > > Comments and Questions welcome. > > Thanks. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > > > -- > Amaury Forgeot d'Arc > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded message was scrubbed... From: "Wolfgang Maier" Subject: [Python-ideas] while conditional in list comprehension ?? Date: Mon, 28 Jan 2013 14:33:45 +0100 Size: 6097 URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjhb at aol.com Tue Jun 25 16:51:53 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Tue, 25 Jun 2013 10:51:53 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> Message-ID: <8D03FCAAA28AAF9-1864-1C1FF@webmail-m103.sysops.aol.com> -----Original Message----- From: Amaury Forgeot d'Arc To: jimjhb Cc: Python-Ideas Sent: Tue, Jun 25, 2013 10:05 am Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: Hi, 2013/6/25 Syntax: for X in ListY while conditionZ: The 'for' loop would proceed as long as conditionZ remains true. It seems that itertools.takewhile does the same thing: http://docs.python.org/2/library/itertools.html#itertools.takewhile It will probably require a lambda Yes, it does. The motivation is to be able to make use of all the great aspects of the python 'for' (no indexing or explicit end condition check, etc.) and at the same time avoiding a 'break' from the 'for'. (NOTE: Many people are being taught to avoid 'break' and 'continue' at all costs, so they instead convert the clean 'for' into a less-clean 'while'. Or they just let the 'for' run out. You can argue against this teaching practice (at least for Python) but that doesn't mean it's not prevalent and prevailing.) >Ouch! do you have pointers? >Maybe we should upgrade these teaching practices (or teachers) instead. I think I figured it out. The MISRA-C 1998 standards forbid the use of gotos, breaks and continues. (You could use breaks in switch statements, though). The MISRA-C 2004 standards now allow for at most one break in a loop. But it looks like the damage has been done..... :( I'm thinking Comp. Sci. professors used the MISRA standards as a guideline for 'good' coding practices, applying them to languages other than C as well. [People who avoid the 'break' by functionalizing an inner portion of the loop are just kidding themselves and making their own code worse, IMO. Takewhile from itertools also works, but that's clumsy and wordy as well.] I'm not super familiar with CPython, but I'm pretty sure I could get this up and working without too much effort. Please note that I don't feel the answer to this is 'just use break'. Programmers are now being taught to avoid 'break' and 'continue' as if they were 'goto's. The result (now) is that people are avoiding the 'for' (with its GREAT properties) because they can't break out of it. Comments and Questions welcome. Thanks. _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjhb at aol.com Tue Jun 25 16:46:06 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Tue, 25 Jun 2013 10:46:06 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> Message-ID: <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> You shouldn't have to invoke takewhile and a lambda just to break out of for loop. >http://docs.python.org/2/library/itertools.html#itertools.takewhile > >for item in takewhile(lambda x: x < 5, range(10)): > pass >> >> [People who avoid the 'break' by functionalizing an inner portion of the >> loop are just kidding themselves and making their own code worse, IMO. >> Takewhile from itertools also works, but that's clumsy and wordy as well.] >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jun 25 17:03:42 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 26 Jun 2013 00:03:42 +0900 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C951A7.9050707@canterbury.ac.nz> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> Message-ID: <871u7q1bch.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Stephen J. Turnbull wrote: > > Ah, but here we have the case of two *different* names that are > > spelled the same[1], and what Steven is pointing out is that for this > > syntax to work, these different names that are spelled the same must > > stay in sync. > > I dispute that they're different names. In the use cases > I have in mind, Which are? If you've already explained them, or somebody else has, at least hint, please. I can abstractly imagine the kind of thing you're talking about, but I've never experienced a case where "foo(bar=bar)" bothered me because the identity of the object named was so strong as to offend my sensibility when writing the name twice. > it's no accident that the two names are spelled the same, because > conceptually they represent the very same thing. In such a case, I would almost certainly design the API (one or the other) to enable passing this name (that really lives in a super- namespace) as a positional argument. Or (as the OP posited, though you haven't) in a case where I have a bunch of such "no accident" variable names, I'd spend some time thinking about whether I should factor out a class here so I could pass a single "no accident" object name (preferably positionally). Bottom line, I still have trouble with the idea that this is a big enough problem to deserve a syntactic solution. From jimjhb at aol.com Tue Jun 25 17:17:18 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Tue, 25 Jun 2013 11:17:18 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> Message-ID: <8D03FCE37277582-1864-1C47C@webmail-m103.sysops.aol.com> http://www.csee.umbc.edu/courses/201/spring10/standards.shtml Ugh! -Jim -----Original Message----- From: Amaury Forgeot d'Arc To: jimjhb Cc: Python-Ideas Sent: Tue, Jun 25, 2013 10:05 am Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: Hi, 2013/6/25 Syntax: for X in ListY while conditionZ: The 'for' loop would proceed as long as conditionZ remains true. It seems that itertools.takewhile does the same thing: http://docs.python.org/2/library/itertools.html#itertools.takewhile It will probably require a lambd The motivation is to be able to make use of all the great aspects of the python 'for' (no indexing or explicit end condition check, etc.) and at the same time avoiding a 'break' from the 'for'. (NOTE: Many people are being taught to avoid 'break' and 'continue' at all costs, so they instead convert the clean 'for' into a less-clean 'while'. Or they just let the 'for' run out. You can argue against this teaching practice (at least for Python) but that doesn't mean it's not prevalent and prevailing.) Ouch! do you have pointers? Maybe we should upgrade these teaching practices (or teachers) instead. [People who avoid the 'break' by functionalizing an inner portion of the loop are just kidding themselves and making their own code worse, IMO. Takewhile from itertools also works, but that's clumsy and wordy as well.] I'm not super familiar with CPython, but I'm pretty sure I could get this up and working without too much effort. Please note that I don't feel the answer to this is 'just use break'. Programmers are now being taught to avoid 'break' and 'continue' as if they were 'goto's. The result (now) is that people are avoiding the 'for' (with its GREAT properties) because they can't break out of it. Comments and Questions welcome. Thanks. _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Tue Jun 25 17:42:55 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 25 Jun 2013 17:42:55 +0200 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FCE37277582-1864-1C47C@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FCE37277582-1864-1C47C@webmail-m103.sysops.aol.com> Message-ID: <51C9BA7F.4050202@egenix.com> On 25.06.2013 17:17, jimjhb at aol.com wrote: > for X in ListY while conditionZ: > > > The 'for' loop would proceed as long as conditionZ remains true. It is not clear to me at what point in the for-loop you'd run and check conditionZ. IMO, this is much more readable and straight forward to understand: for X in ListY: if not conditionZ: break pass (and it should be to your CS teachers as well ;-)) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 25 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-06-18: Released mxODBC Django DE 1.2.0 ... http://egenix.com/go47 2013-07-01: EuroPython 2013, Florence, Italy ... 6 days to go 2013-07-16: Python Meeting Duesseldorf ... 21 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mertz at gnosis.cx Tue Jun 25 18:55:12 2013 From: mertz at gnosis.cx (David Mertz) Date: Tue, 25 Jun 2013 09:55:12 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> Message-ID: I'm not quite certain if this is what the OP is proposing, but I think that extending comprehensions to allow a 'while' clause would be intuitive and somewhat useful. It's true that itertools.takewhile() basically gets us the same thing, but actual syntax would be nice, and also more straightforward for comprehensions other than generator comprehensions. E.g. attendees = {guest:guest.plus_N for guest in waiting_list while not room_full()} This would actually produce the same result as: attendees = {guest:guest.plus_N for guest in waiting_list if not room_full()} But it would save the extra looping over a bunch of final False values of 'room_full()'. On Tue, Jun 25, 2013 at 6:35 AM, wrote: > Syntax: > > for X in ListY while conditionZ: > > The 'for' loop would proceed as long as conditionZ remains true. > > The motivation is to be able to make use of all the great aspects of the > python 'for' (no indexing or explicit > end condition check, etc.) and at the same time avoiding a 'break' from > the 'for'. > > (NOTE: Many people are being taught to avoid 'break' and 'continue' at > all costs, so they instead convert > the clean 'for' into a less-clean 'while'. Or they just let the 'for' run > out. You can argue against this teaching > practice (at least for Python) but that doesn't mean it's not prevalent > and prevailing.) > > [People who avoid the 'break' by functionalizing an inner portion of the > loop are just kidding themselves and making their own code worse, IMO. > Takewhile from itertools also works, but that's clumsy and wordy as well.] > > I'm not super familiar with CPython, but I'm pretty sure I could get > this up and working without too much effort. > > Please note that I don't feel the answer to this is 'just use break'. > Programmers are now being taught to avoid 'break' and 'continue' as if > they were 'goto's. The result (now) is that people are avoiding the 'for' > (with its GREAT properties) because they can't break out of it. > > Comments and Questions welcome. > > Thanks. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Tue Jun 25 19:08:05 2013 From: ned at nedbatchelder.com (Ned Batchelder) Date: Tue, 25 Jun 2013 13:08:05 -0400 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> Message-ID: <51C9CE75.5030300@nedbatchelder.com> You don't have to: use the break statement, that's what it's for. About people teaching students not to use it: the existence of bad teachers teaching silly ideas is not a reason to add syntax to Python. --Ned. On 6/25/2013 10:46 AM, jimjhb at aol.com wrote: > > You shouldn't have to invoke takewhile and a lambda just to break out > of for loop. > >http://docs.python.org/2/library/itertools.html#itertools.takewhile > > > >for item in takewhile(lambda x: x < 5, range(10)): > > pass > > >> > >> [People who avoid the 'break' by functionalizing an inner portion of the > >> loop are just kidding themselves and making their own code worse, IMO. > >> Takewhile from itertools also works, but that's clumsy and wordy as well.] > >> > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Tue Jun 25 19:12:22 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 25 Jun 2013 10:12:22 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51C9CE75.5030300@nedbatchelder.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> Message-ID: <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> [x for x in l if x < 10 else break]? On Jun 25, 2013, at 10:08 AM, Ned Batchelder wrote: > You don't have to: use the break statement, that's what it's for. About people teaching students not to use it: the existence of bad teachers teaching silly ideas is not a reason to add syntax to Python. > > --Ned. > > On 6/25/2013 10:46 AM, jimjhb at aol.com wrote: >> >> You shouldn't have to invoke takewhile and a lambda just to break out of for loop. >> >http://docs.python.org/2/library/itertools.html#itertools.takewhile >> > >> >for item in takewhile(lambda x: x < 5, range(10)): >> > pass >> >> >> >> >> [People who avoid the 'break' by functionalizing an inner portion of the >> >> loop are just kidding themselves and making their own code worse, IMO. >> >> Takewhile from itertools also works, but that's clumsy and wordy as well.] >> >> >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjhb at aol.com Tue Jun 25 20:03:52 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Tue, 25 Jun 2013 14:03:52 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> Message-ID: <8D03FE57C22D5A9-1864-1D5B8@webmail-m103.sysops.aol.com> That's a break.... -----Original Message----- From: Shane Green To: Ned Batchelder Cc: python-ideas Sent: Tue, Jun 25, 2013 1:13 pm Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: [x for x in l if x < 10 else break]? On Jun 25, 2013, at 10:08 AM, Ned Batchelder wrote: You don't have to: use the break statement, that's what it's for. About people teaching students not to use it: the existence of bad teachers teaching silly ideas is not a reason to add syntax to Python. --Ned. On 6/25/2013 10:46 AM, jimjhb at aol.com wrote: You shouldn't have to invoke takewhile and a lambda just to break out of for loop. >http://docs.python.org/2/library/itertools.html#itertools.takewhile > >for item in takewhile(lambda x: x < 5, range(10)): > pass >> >> [People who avoid the 'break' by functionalizing an inner portion of the >> loop are just kidding themselves and making their own code worse, IMO. >> Takewhile from itertools also works, but that's clumsy and wordy as well.] >> _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjhb at aol.com Tue Jun 25 20:11:58 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Tue, 25 Jun 2013 14:11:58 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51C9CE75.5030300@nedbatchelder.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> Message-ID: <8D03FE69E036110-1864-1D692@webmail-m103.sysops.aol.com> I tend to agree with you in theory. To recap: MISRA-C 1998 [No gotos, no breaks (except in switches), no continues] MISRA-C 2004 [No gotos, no continues, one break per loop (and also the switches)] MISRA-C 2012 [Some gotos are now allowed!] C is not python, but this translated to the teaching community and here we are. Carried forward into python: http://www.csee.umbc.edu/courses/201/spring13/standards.shtml In practice, many many students are being told to avoid break/continue. Maybe something from the Python leadership saying breaks are fine? Given their lack of support for gotos it's easy to see how others might feel breaks and continues are bad as well, even in Python. -----Original Message----- From: Ned Batchelder To: python-ideas Sent: Tue, Jun 25, 2013 1:08 pm Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: You don't have to: use the break statement, that's what it's for. About people teaching students not to use it: the existence of bad teachers teaching silly ideas is not a reason to add syntax to Python. --Ned. On 6/25/2013 10:46 AM, jimjhb at aol.com wrote: You shouldn't have to invoke takewhile and a lambda just to break out of for loop. >http://docs.python.org/2/library/itertools.html#itertools.takewhile > >for item in takewhile(lambda x: x < 5, range(10)): > pass >> >> [People who avoid the 'break' by functionalizing an inner portion of the >> loop are just kidding themselves and making their own code worse, IMO. >> Takewhile from itertools also works, but that's clumsy and wordy as well.] >> _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull at sk.tsukuba.ac.jp Tue Jun 25 20:12:24 2013 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 26 Jun 2013 03:12:24 +0900 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> Message-ID: <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> Shane Green writes: > [x for x in l if x < 10 else break]? That's currently invalid syntax: break is a statement. I think a while clause (as suggested by David Mertz) would be a more plausible extension of syntax. I do think extending generator/comprehension syntax is much more plausible than extending for loop syntax (for one thing, "just use break" is not an answer here!) From boxed at killingar.net Tue Jun 25 20:15:29 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Tue, 25 Jun 2013 20:15:29 +0200 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FE57C22D5A9-1864-1D5B8@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <8D03FE57C22D5A9-1864-1D5B8@webmail-m103.sysops.aol.com> Message-ID: So is your suggestion, but it has a different name. If your teacher is this easily fooled, get a new teacher. On Tue, Jun 25, 2013 at 8:03 PM, wrote: > That's a break.... > > > > -----Original Message----- > From: Shane Green > To: Ned Batchelder > Cc: python-ideas > Sent: Tue, Jun 25, 2013 1:13 pm > Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while > conditionZ: > > [x for x in l if x < 10 else break]? > > > > > On Jun 25, 2013, at 10:08 AM, Ned Batchelder > wrote: > > You don't have to: use the break statement, that's what it's for. About > people teaching students not to use it: the existence of bad teachers > teaching silly ideas is not a reason to add syntax to Python. > > --Ned. > > On 6/25/2013 10:46 AM, jimjhb at aol.com wrote: > > > You shouldn't have to invoke takewhile and a lambda just to break out of > for loop. > > >http://docs.python.org/2/library/itertools.html#itertools.takewhile > > > >for item in takewhile(lambda x: x < 5, range(10)): > > pass > >> > >> [People who avoid the 'break' by functionalizing an inner portion of the > >> loop are just kidding themselves and making their own code worse, IMO. > >> Takewhile from itertools also works, but that's clumsy and wordy as well.] > >> > > > > > _______________________________________________ > Python-ideas mailing listPython-ideas at python.orghttp://mail.python.org/mailman/listinfo/python-ideas > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > _______________________________________________ > Python-ideas mailing listPython-ideas at python.orghttp://mail.python.org/mailman/listinfo/python-ideas > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Tue Jun 25 20:17:19 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Tue, 25 Jun 2013 19:17:19 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> Message-ID: On 25 June 2013 18:12, Shane Green wrote: > [x for x in l if x < 10 else break]? Humorously, this works: def brk(): raise StopIteration (x if x < 10 else brk() for x in range(100)) list(x if x < 10 else brk() for x in range(100)) Quirkily, but obvious with thought, this does not: [x if x < 10 else brk() for x in range(100)] From jimjhb at aol.com Tue Jun 25 20:22:12 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Tue, 25 Jun 2013 14:22:12 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> Message-ID: <8D03FE80B9B68E6-1864-1D7A2@webmail-m103.sysops.aol.com> Yeah, that's basically it. Except it's clear the other members of this list understand subtle python grammar MUCH better than I do. Main issue seems to be that programmers "shouldn't" shy away from 'break' (I can kind of see the argument both ways) but they do. So a lot of people are making their code more confusing... I think for deserves a 'while' because it is really sparing in its implementation (no index, no explicit bounds check) and so deserves a clean way to prematurely terminate the loop if needed. Break is a little harsh (for forbidden by many), itertools.takewhile is too wordy, and anything else is worse.... I think prematurely terminating a for loop is a very common activity as well. -Jim -----Original Message----- From: David Mertz To: python-ideas Sent: Tue, Jun 25, 2013 12:55 pm Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: I'm not quite certain if this is what the OP is proposing, but I think that extending comprehensions to allow a 'while' clause would be intuitive and somewhat useful. It's true that itertools.takewhile() basically gets us the same thing, but actual syntax would be nice, and also more straightforward for comprehensions other than generator comprehensions. E.g. attendees = {guest:guest.plus_N for guest in waiting_list while not room_full()} This would actually produce the same result as: attendees = {guest:guest.plus_N for guest in waiting_list if not room_full()} But it would save the extra looping over a bunch of final False values of 'room_full()'. On Tue, Jun 25, 2013 at 6:35 AM, wrote: Syntax: for X in ListY while conditionZ: The 'for' loop would proceed as long as conditionZ remains true. The motivation is to be able to make use of all the great aspects of the python 'for' (no indexing or explicit end condition check, etc.) and at the same time avoiding a 'break' from the 'for'. (NOTE: Many people are being taught to avoid 'break' and 'continue' at all costs, so they instead convert the clean 'for' into a less-clean 'while'. Or they just let the 'for' run out. You can argue against this teaching practice (at least for Python) but that doesn't mean it's not prevalent and prevailing.) [People who avoid the 'break' by functionalizing an inner portion of the loop are just kidding themselves and making their own code worse, IMO. Takewhile from itertools also works, but that's clumsy and wordy as well.] I'm not super familiar with CPython, but I'm pretty sure I could get this up and working without too much effort. Please note that I don't feel the answer to this is 'just use break'. Programmers are now being taught to avoid 'break' and 'continue' as if they were 'goto's. The result (now) is that people are avoiding the 'for' (with its GREAT properties) because they can't break out of it. Comments and Questions welcome. Thanks. _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From boxed at killingar.net Tue Jun 25 20:22:10 2013 From: boxed at killingar.net (=?ISO-8859-1?Q?Anders_Hovm=F6ller?=) Date: Tue, 25 Jun 2013 20:22:10 +0200 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FE69E036110-1864-1D692@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <8D03FE69E036110-1864-1D692@webmail-m103.sysops.aol.com> Message-ID: > MISRA-C 1998 [No gotos, no breaks (except in switches), no continues] > MISRA-C 2004 [No gotos, no continues, one break per loop (and also the > switches)] > MISRA-C 2012 [Some gotos are now allowed!] > > C is not python, but this translated to the teaching community and here > we are. > > Carried forward into python: > http://www.csee.umbc.edu/courses/201/spring13/standards.shtml > > In practice, many many students are being told to avoid break/continue. > > Maybe something from the Python leadership saying breaks are fine? Given > their lack of > support for gotos it's easy to see how others might feel breaks and > continues are bad as well, > even in Python. > What more support do we need to show than the fact that the keywords break and continue exists in the language and goto does not? Would some code written by Guido suffice? On his blog http://neopythonic.blogspot.comthere's an example of code he has written with a break without even having to search through the archive. -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Tue Jun 25 20:44:44 2013 From: barry at python.org (Barry Warsaw) Date: Tue, 25 Jun 2013 14:44:44 -0400 Subject: [Python-ideas] [Suspected Spam] Re: [Suspected Spam] Re: Short form for keyword arguments and dicts References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <87ehbr1gpo.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20130625144444.1bc514a3@anarchist> On Jun 24, 2013, at 09:13 PM, Anders Hovm?ller wrote: >Yet obviously people DO do stuff like: > >_('{foo}') > >which walks the stack to find the locals and then puts them in there. I think >this shows there is some room for a middle ground that might disincentivize >people from going to those extremes :P > >Again, it's not about the exact syntax I suggested, it's about that middle >ground. One difference here is that the above magic is tucked inside a library. If you don't like magic (or you feel it's incomprehensible), don't use the library. A syntactic equivalent impacts the entire language. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From mertz at gnosis.cx Tue Jun 25 20:58:37 2013 From: mertz at gnosis.cx (David Mertz) Date: Tue, 25 Jun 2013 11:58:37 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> Message-ID: Oh... just to clarify my comment. I think readers will recognize the intent. Obviously I didn't mean that a (hypothetical) "while" clause in a comprehension would *always* be equivalent to an "if" clause (but faster). I just meant to illustrate what seems like a common use case of some test that *becomes* true after a number of checks. On Tue, Jun 25, 2013 at 9:55 AM, David Mertz wrote: > I'm not quite certain if this is what the OP is proposing, but I think > that extending comprehensions to allow a 'while' clause would be intuitive > and somewhat useful. It's true that itertools.takewhile() basically gets > us the same thing, but actual syntax would be nice, and also more > straightforward for comprehensions other than generator comprehensions. > E.g. > > attendees = {guest:guest.plus_N for guest in waiting_list while not > room_full()} > > This would actually produce the same result as: > > attendees = {guest:guest.plus_N for guest in waiting_list if not > room_full()} > > But it would save the extra looping over a bunch of final False values of > 'room_full()'. > > On Tue, Jun 25, 2013 at 6:35 AM, wrote: > >> Syntax: >> >> for X in ListY while conditionZ: >> >> The 'for' loop would proceed as long as conditionZ remains true. >> >> The motivation is to be able to make use of all the great aspects of >> the python 'for' (no indexing or explicit >> end condition check, etc.) and at the same time avoiding a 'break' from >> the 'for'. >> >> (NOTE: Many people are being taught to avoid 'break' and 'continue' at >> all costs, so they instead convert >> the clean 'for' into a less-clean 'while'. Or they just let the 'for' >> run out. You can argue against this teaching >> practice (at least for Python) but that doesn't mean it's not prevalent >> and prevailing.) >> >> [People who avoid the 'break' by functionalizing an inner portion of >> the loop are just kidding themselves and making their own code worse, >> IMO. Takewhile from itertools also works, but that's clumsy and wordy as >> well.] >> >> I'm not super familiar with CPython, but I'm pretty sure I could get >> this up and working without too much effort. >> >> Please note that I don't feel the answer to this is 'just use break'. >> Programmers are now being taught to avoid 'break' and 'continue' as if >> they were 'goto's. The result (now) is that people are avoiding the 'for' >> (with its GREAT properties) because they can't break out of it. >> >> Comments and Questions welcome. >> >> Thanks. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jun 25 21:07:44 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 26 Jun 2013 04:07:44 +0900 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FE80B9B68E6-1864-1D7A2@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FE80B9B68E6-1864-1D7A2@webmail-m103.sysops.aol.com> Message-ID: <87txkmypof.fsf@uwakimon.sk.tsukuba.ac.jp> jimjhb at aol.com writes: > Main issue seems to be that programmers "shouldn't" shy away from > 'break' (I can kind of see the argument both ways) but they do. I guess I just hang around in the wrong company, because I don't see it. Where can I find these programmers who shy away from these constructs? > I think for deserves a 'while' because it is really sparing in its > implementation (no index, no explicit bounds check) and so deserves > a clean way to prematurely terminate the loop if needed. But this idea lacks coherence, in that 'break' statement syntax is far more expressive than a 'while' clause in the 'for' statement can be. The 'while' condition needs to satisfy expression syntax and be a function of the loop variable. But 'break' can be used at the top of the loop, bottom, or elsewhere while the new syntax can only implement one of those. And the 'break' condition need not be a function of the loop variable. On the other hand, in list comprehensions and generator expressions, the syntactic and functional restrictions on the condition need to be satisfied anyway. By the way: > David Mertz writes: ?>> attendees = {guest:guest.plus_N for guest in waiting_list while >> not room_full()} >> >> This would actually produce the same result as: >> >> attendees = {guest:guest.plus_N for guest in waiting_list if not >> room_full()} Not necessarily. waiting_list might be an infinite iterator. Steve From jimjhb at aol.com Tue Jun 25 21:22:27 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Tue, 25 Jun 2013 15:22:27 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <87txkmypof.fsf@uwakimon.sk.tsukuba.ac.jp> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FE80B9B68E6-1864-1D7A2@webmail-m103.sysops.aol.com> <87txkmypof.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <8D03FF076718B1B-1864-1DE85@webmail-m103.sysops.aol.com> >-----Original Message----- >From: Stephen J. Turnbull >To: jimjhb >Cc: mertz ; python-ideas >Sent: Tue, Jun 25, 2013 3:07 pm >Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: >jimjhb at aol.com writes: >> Main issue seems to be that programmers "shouldn't" shy away from >> 'break' (I can kind of see the argument both ways) but they do. >I guess I just hang around in the wrong company, because I don't see >it. Where can I find these programmers who shy away from these >constructs? Here: http://www.csee.umbc.edu/courses/201/spring13/standards.shtml >> I think for deserves a 'while' because it is really sparing in its >> implementation (no index, no explicit bounds check) and so deserves >> a clean way to prematurely terminate the loop if needed. >But this idea lacks coherence, in that 'break' statement syntax is far >more expressive than a 'while' clause in the 'for' statement can be. >The 'while' condition needs to satisfy expression syntax and be a >function of the loop variable. I didn't think the while had to be a function of the loop variable. end_condition = False for X in ListY while end_condition is False: >But 'break' can be used at the top of >the loop, bottom, or elsewhere while the new syntax can only implement >one of those. And the 'break' condition need not be a function of the >loop variable. >On the other hand, in list comprehensions and generator expressions, >the syntactic and functional restrictions on the condition need to be >satisfied anyway. By the way: > David Mertz writes: >> attendees = {guest:guest.plus_N for guest in waiting_list while >> not room_full()} >> >> This would actually produce the same result as: >> >> attendees = {guest:guest.plus_N for guest in waiting_list if not >> room_full()} Not necessarily. waiting_list might be an infinite iterator. Steve -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Tue Jun 25 21:37:46 2013 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 25 Jun 2013 14:37:46 -0500 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C951A7.9050707@canterbury.ac.nz> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> Message-ID: On 06/25/2013 03:15 AM, Greg Ewing wrote: > Stephen J. Turnbull wrote: >> Ah, but here we have the case of two *different* names that are >> spelled the same[1], and what Steven is pointing out is that for this >> syntax to work, these different names that are spelled the same must >> stay in sync. > > I dispute that they're different names. In the use cases > I have in mind, it's no accident that the two names are > spelled the same, because conceptually they represent the > very same thing. Giving them different names would be > confusing and probably indicate an error. > > If the names were only accidentally the same, I would > probably want to rename one of them to avoid giving the > misleading impression that they were related. I think some of the misunderstandings.. might be weather or not we are talking about function definitions or function calls, and/or other blocks that might be reused in different locations. If a function definitions were to set the calling syntax, then yes, it would be an issue because it would force the use of a particular name at all the call sites. But I don't think that is what is being proposed. But if 'name=' was just syntactic sugar for 'name=name', then it really wouldn't make any difference. Just use it at any call sight in place of a keyword argument pair that is alike. The compiler would just generate the same code as if you did use the full 'name=name' notation. And all the same rules would apply. But I'm still -1 It looks too much like a pass by reference to me, which python doesn't currently do. And I don't like the '=' with nothing on the right. The 'spam=spam' pair that is being discussed is binding an object bound to the spam on the right to a new name in a new location. I think the shorter syntax will make that harder to see and understand for new users. What the notation is really doing is.. future_local name = name I can't think of a good alternative syntax for that, that which is as clear as 'f(spam=spam)'. But I do find these discussions interesting because they can stimulate new ideas I otherwise wouldn't think of. Cheers, Ron From mertz at gnosis.cx Tue Jun 25 21:41:36 2013 From: mertz at gnosis.cx (David Mertz) Date: Tue, 25 Jun 2013 12:41:36 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <87txkmypof.fsf@uwakimon.sk.tsukuba.ac.jp> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FE80B9B68E6-1864-1D7A2@webmail-m103.sysops.aol.com> <87txkmypof.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: > David Mertz writes: > > >> attendees = {guest:guest.plus_N for guest in waiting_list while > >> not room_full()} > >> > >> This would actually produce the same result as: > >> > >> attendees = {guest:guest.plus_N for guest in waiting_list if not > >> room_full()} > > Not necessarily. waiting_list might be an infinite iterator . > Sure. Of course. I posted a followup which observed that 'room_full()' doesn't in principle have to become True and stay that way either. I was just trying to make an intuitive example where the two forms would turn out the same, but obviously there are a number of cases where they might be different.... and hence a motivation for what I think would be nice as an added syntax in comprehensions. Actually, your example is an even stronger case for why we might want a comprehension while clause. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Tue Jun 25 21:45:47 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 25 Jun 2013 12:45:47 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FE80B9B68E6-1864-1D7A2@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FE80B9B68E6-1864-1D7A2@webmail-m103.sysops.aol.com> Message-ID: <008F1081-2BDD-430C-84B2-EB5BCB4E9109@umbrellacode.com> Even though they declare ?break? is bad, they don?t actually mean break is bad. It?s when it?s used to control the flow?as in causing flow to skip sections of code, etc.?of an application that it?s primarily considered bad. Hence goto?s banning is more widely accepted than break?s. Multiple return statements are similarly frowned upon. On Jun 25, 2013, at 11:22 AM, jimjhb at aol.com wrote: > Yeah, that's basically it. Except it's clear the other members of this list understand subtle python grammar MUCH better than I do. > > Main issue seems to be that programmers "shouldn't" shy away from 'break' (I can kind of see the argument both ways) but they do. So a lot of people are making their code more confusing... > > I think for deserves a 'while' because it is really sparing in its implementation (no index, no explicit bounds check) and so deserves a clean way to prematurely terminate the loop if needed. Break is a little harsh (for forbidden by many), itertools.takewhile is too wordy, and anything else is worse.... > > I think prematurely terminating a for loop is a very common activity as well. > > -Jim > > > -----Original Message----- > From: David Mertz > To: python-ideas > Sent: Tue, Jun 25, 2013 12:55 pm > Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: > > I'm not quite certain if this is what the OP is proposing, but I think that extending comprehensions to allow a 'while' clause would be intuitive and somewhat useful. It's true that itertools.takewhile() basically gets us the same thing, but actual syntax would be nice, and also more straightforward for comprehensions other than generator comprehensions. E.g. > > attendees = {guest:guest.plus_N for guest in waiting_list while not room_full()} > > This would actually produce the same result as: > > attendees = {guest:guest.plus_N for guest in waiting_list if not room_full()} > > But it would save the extra looping over a bunch of final False values of 'room_full()'. > > On Tue, Jun 25, 2013 at 6:35 AM, wrote: > Syntax: > > for X in ListY while conditionZ: > > The 'for' loop would proceed as long as conditionZ remains true. > > The motivation is to be able to make use of all the great aspects of the python 'for' (no indexing or explicit > end condition check, etc.) and at the same time avoiding a 'break' from the 'for'. > > (NOTE: Many people are being taught to avoid 'break' and 'continue' at all costs, so they instead convert > the clean 'for' into a less-clean 'while'. Or they just let the 'for' run out. You can argue against this teaching > practice (at least for Python) but that doesn't mean it's not prevalent and prevailing.) > > [People who avoid the 'break' by functionalizing an inner portion of the loop are just kidding themselves and making their own code worse, IMO. Takewhile from itertools also works, but that's clumsy and wordy as well.] > > I'm not super familiar with CPython, but I'm pretty sure I could get this up and working without too much effort. > > Please note that I don't feel the answer to this is 'just use break'. Programmers are now being taught to avoid 'break' and 'continue' as if they were 'goto's. The result (now) is that people are avoiding the 'for' (with its GREAT properties) because they can't break out of it. > > Comments and Questions welcome. > > Thanks. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Tue Jun 25 21:50:57 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 25 Jun 2013 12:50:57 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FF076718B1B-1864-1DE85@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FE80B9B68E6-1864-1D7A2@webmail-m103.sysops.aol.com> <87txkmypof.fsf@uwakimon.sk.tsukuba.ac.jp> <8D03FF076718B1B-1864-1DE85@webmail-m103.sysops.aol.com> Message-ID: <0D43450B-841E-4243-A9BC-FC563E55FA50@umbrellacode.com> Create ?stop? which is basically an alias for ?break? but always used in the way that?s not considered bad practice. [x for x in l if x < 100 else stop] On Jun 25, 2013, at 12:22 PM, jimjhb at aol.com wrote: > > > > >-----Original Message----- > >From: Stephen J. Turnbull > >To: jimjhb > >Cc: mertz ; python-ideas > >Sent: Tue, Jun 25, 2013 3:07 pm > >Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: > > >jimjhb at aol.com writes: > > >> Main issue seems to be that programmers "shouldn't" shy away from > >> 'break' (I can kind of see the argument both ways) but they do. > > >I guess I just hang around in the wrong company, because I don't see > >it. Where can I find these programmers who shy away from these > >constructs? > Here: > http://www.csee.umbc.edu/courses/201/spring13/standards.shtml > > >> I think for deserves a 'while' because it is really sparing in its > >> implementation (no index, no explicit bounds check) and so deserves > >> a clean way to prematurely terminate the loop if needed. > > >But this idea lacks coherence, in that 'break' statement syntax is far > >more expressive than a 'while' clause in the 'for' statement can be. > >The 'while' condition needs to satisfy expression syntax and be a > >function of the loop variable. > I didn't think the while had to be a function of the loop variable. > end_condition = False > for X in ListY while end_condition is False: > >But 'break' can be used at the top of > >the loop, bottom, or elsewhere while the new syntax can only implement > >one of those. And the 'break' condition need not be a function of the > >loop variable. > > >On the other hand, in list comprehensions and generator expressions, > >the syntactic and functional restrictions on the condition need to be > >satisfied anyway. > > By the way: > > > David Mertz writes: > > >> attendees = {guest:guest.plus_N for guest in waiting_list while > >> not room_full()} > >> > >> This would actually produce the same result as: > >> > >> attendees = {guest:guest.plus_N for guest in waiting_list if not > >> room_full()} > > Not necessarily. waiting_list might be an infinite iterator. > > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.maier at biologie.uni-freiburg.de Tue Jun 25 22:51:06 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Tue, 25 Jun 2013 20:51:06 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Is_this_PEP-able=3F_for_X_in_ListY_while?= =?utf-8?q?=09conditionZ=3A?= References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> Message-ID: Hi, I suggested the very same 'while in a comprehension/generator expression' back in January: http://mail.python.org/pipermail/python-ideas/2013-January/018969.html There were many very useful responses suggesting alternative syntax (that I'm using now). The proposal itself was dismissed with the logic that comprehensions can currently be translated directly into explicit for loops, e.g.: [x for x in items if x!=0] equals: result=[] for x in items: if x!=0: result.append(x) This equivalence is considered *very* important and the while statement would break it: [x for x in items while x>0] does *not* translate into: for x in item: while x>0: result.append(x) So, as long as you can't come up with syntax that translates properly, there's no chance of getting it accepted. Best, Wolfgang From jimjhb at aol.com Tue Jun 25 22:58:07 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Tue, 25 Jun 2013 16:58:07 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> Message-ID: <8D03FFDD3D8B494-1864-1E9AB@webmail-m103.sysops.aol.com> I saw your thread. Thank you for summarizing the result! It looks like the best alternative is itertools.takewhile, but I don't like it.... :( -Jim -----Original Message----- From: Wolfgang Maier To: python-ideas Sent: Tue, Jun 25, 2013 4:51 pm Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: Hi, I suggested the very same 'while in a comprehension/generator expression' back in January: http://mail.python.org/pipermail/python-ideas/2013-January/018969.html There were many very useful responses suggesting alternative syntax (that I'm using now). The proposal itself was dismissed with the logic that comprehensions can currently be translated directly into explicit for loops, e.g.: [x for x in items if x!=0] equals: result=[] for x in items: if x!=0: result.append(x) This equivalence is considered *very* important and the while statement would break it: [x for x in items while x>0] does *not* translate into: for x in item: while x>0: result.append(x) So, as long as you can't come up with syntax that translates properly, there's no chance of getting it accepted. Best, Wolfgang _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Tue Jun 25 22:59:51 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 25 Jun 2013 14:59:51 -0600 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> Message-ID: On Tue, Jun 25, 2013 at 10:55 AM, David Mertz wrote: > I'm not quite certain if this is what the OP is proposing, but I think that > extending comprehensions to allow a 'while' clause would be intuitive and > somewhat useful. It's true that itertools.takewhile() basically gets us the > same thing, but actual syntax would be nice, and also more straightforward > for comprehensions other than generator comprehensions. While I agree somewhat, the biggest problem is that comprehensions are already complicated syntax when using the full capabilities. The simple form is pretty straightforward to learn and read. The if clause can be confusing at first if not read in context. It can make the expression longer than you can take in with a glance. Splitting the comprehension across multiple lines helps with this, so it's manageable. However, throw "nested" comprehensions into your expression and you're going to lose just about anyone trying to read your code. I think comprehensions are already at the limit of a reasonable level of complexity. Adding in another supported clause will only make them harder to learn and harder to read. -eric From jimjhb at aol.com Tue Jun 25 23:08:43 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Tue, 25 Jun 2013 17:08:43 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> Message-ID: <8D03FFF4F0D9F22-1864-1EB0E@webmail-m103.sysops.aol.com> That would bring back my previous notion of fwhile. fwhile X in ListY and conditionZ: It was pointed out that the 'and' is ambiguous (though perhaps not fatally problematic) and adding a new keyword is really bad as well. I guess: for X in ListY and conditionZ: would just have the ambiguity. [Or maybe not, as long as the condition is a bool, it's not iterable and thus not confused with list.] -----Original Message----- From: Wolfgang Maier To: python-ideas Sent: Tue, Jun 25, 2013 4:51 pm Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: Hi, I suggested the very same 'while in a comprehension/generator expression' back in January: http://mail.python.org/pipermail/python-ideas/2013-January/018969.html There were many very useful responses suggesting alternative syntax (that I'm using now). The proposal itself was dismissed with the logic that comprehensions can currently be translated directly into explicit for loops, e.g.: [x for x in items if x!=0] equals: result=[] for x in items: if x!=0: result.append(x) This equivalence is considered *very* important and the while statement would break it: [x for x in items while x>0] does *not* translate into: for x in item: while x>0: result.append(x) So, as long as you can't come up with syntax that translates properly, there's no chance of getting it accepted. Best, Wolfgang _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Tue Jun 25 23:15:47 2013 From: ned at nedbatchelder.com (Ned Batchelder) Date: Tue, 25 Jun 2013 17:15:47 -0400 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FFDD3D8B494-1864-1E9AB@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <8D03FFDD3D8B494-1864-1E9AB@webmail-m103.sysops.aol.com> Message-ID: <51CA0883.1050708@nedbatchelder.com> On 6/25/2013 4:58 PM, jimjhb at aol.com wrote: > I saw your thread. Thank you for summarizing the result! It looks > like the best alternative is itertools.takewhile, but I don't like > it.... :( > Ugh! The best alternative is "break." The people advocating not using break are doing so for "readability" reasons. I would be very surprised if they thought takewhile was more readable than a break statement. Slavishly following mindless rules is not the way to write good programs, and it isn't the way to teach people to write good programs. Resist! :) --Ned. > -Jim > > -----Original Message----- > From: Wolfgang Maier > To: python-ideas > Sent: Tue, Jun 25, 2013 4:51 pm > Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while > conditionZ: > > Hi, > I suggested the very same 'while in a comprehension/generator expression' > back in January: > http://mail.python.org/pipermail/python-ideas/2013-January/018969.html > > There were many very useful responses suggesting alternative syntax (that > I'm using now). > The proposal itself was dismissed with the logic that comprehensions can > currently be translated directly into explicit for loops, e.g.: > > [x for x in items if x!=0] > > equals: > > result=[] > for x in items: > if x!=0: > result.append(x) > > This equivalence is considered *very* important and the while statement > would break it: > > [x for x in items while x>0] > > does *not* translate into: > > for x in item: > while x>0: > result.append(x) > > So, as long as you can't come up with syntax that translates properly, > there's no chance of getting it accepted. > > Best, > Wolfgang > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjhb at aol.com Tue Jun 25 23:25:15 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Tue, 25 Jun 2013 17:25:15 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51CA0883.1050708@nedbatchelder.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <8D03FFDD3D8B494-1864-1E9AB@webmail-m103.sysops.aol.com> <51CA0883.1050708@nedbatchelder.com> Message-ID: <8D040019E00B224-1864-1ED36@webmail-m103.sysops.aol.com> Ned, I'm sorry. I totally agree. What I meant was that takewhile is the best alternative if you don't use break. MISRA-C 2004 allows one break in a loop, so even MISRA is on board. But their 1998 edict polluted a lot of minds.... -Jim -----Original Message----- From: Ned Batchelder To: python-ideas Sent: Tue, Jun 25, 2013 5:17 pm Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: On 6/25/2013 4:58 PM, jimjhb at aol.com wrote: I saw your thread. Thank you for summarizing the result! It looks like the best alternative is itertools.takewhile, but I don't like it.... :( Ugh! The best alternative is "break." The people advocating not using break are doing so for "readability" reasons. I would be very surprised if they thought takewhile was more readable than a break statement. Slavishly following mindless rules is not the way to write good programs, and it isn't the way to teach people to write good programs. Resist! :) --Ned. -Jim -----Original Message----- From: Wolfgang Maier To: python-ideas Sent: Tue, Jun 25, 2013 4:51 pm Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: Hi, I suggested the very same 'while in a comprehension/generator expression' back in January: http://mail.python.org/pipermail/python-ideas/2013-January/018969.html There were many very useful responses suggesting alternative syntax (that I'm using now). The proposal itself was dismissed with the logic that comprehensions can currently be translated directly into explicit for loops, e.g.: [x for x in items if x!=0] equals: result=[] for x in items: if x!=0: result.append(x) This equivalence is considered *very* important and the while statement would break it: [x for x in items while x>0] does *not* translate into: for x in item: while x>0: result.append(x) So, as long as you can't come up with syntax that translates properly, there's no chance of getting it accepted. Best, Wolfgang _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Jun 26 00:09:42 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 26 Jun 2013 10:09:42 +1200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <1372153435.80548.YahooMailNeo@web184701.mail.ne1.yahoo.com> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <1372153435.80548.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: <51CA1526.9000002@canterbury.ac.nz> Andrew Barnert wrote: > I've already asked Anders this, but let me ask you as well: > What are the use cases you have in mind? > > As I see it, there are three cases ... > 1. dict constructor > 2. str.format > 3. forwarding functions (like my example with get_appdir_url) Mine are all 3. Here's a simplified version of something that crops up in various places in my GUI libraries. I have a widget, let's say a Button, and I want to calculate a default size for it during construction. To do that I need to know its font, which is one of the things that can potentially be passed as an argument to the constructor. So I want to "peek" at the font argument as it goes by: class Button(Widget): def __init__(self, text = "Hello", font = system_font, **kwds): default_size = calc_default_size(text, font) Widget.__init__(self, size = default_size, font = font, **kwds) (I should also point out that my Widget constructor accepts a very large set of potential keyword arguments (essentially any of its assignable properties) so passing them positionally is not an option.) Now, it's not so bad when there are only one or two passed-on arguments, but if you have a handful it starts to feel a bit wanky. I think I've figured out why this seemingly-small amount of redundancy bothers me so much. It's not just that I feel I'm saying the same thing twice, but that I feel I'm not saying *anything at all*. If I were to write this as an ordinary assignment: font = font then it would be completely redundant. Now I know full well that it's different with a keyword argument, because the two sides live in different namespaces. But some part of my brain still tries to tell me that I'm doing something silly -- to the extent that sometimes I'm tempted to deliberately give the two sides *different* names just so that it doesn't look so useless. > And, if it is just #3, do you have the same argument that (I think) Anders has, > or a different one? A different one. I'm not on any kind of crusade to push the use of keyword arguments for everything. -- Greg From abarnert at yahoo.com Wed Jun 26 00:51:04 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 25 Jun 2013 15:51:04 -0700 (PDT) Subject: [Python-ideas] interactive sqlite3 module In-Reply-To: References: Message-ID: <1372200664.82712.YahooMailNeo@web184701.mail.ne1.yahoo.com> From: ZeD Sent: Tuesday, June 25, 2013 5:33 AM > It would be useful (for the times when you have python installed but not a > sqlite executable... think windows...) to add a minimal interactive sqlite > interpreter... a (really) basic implementation is something like the source > below...? I think it would be more useful if you added this: ? ? ?if len(sys.argv) > 2: ? ? ? ? ?for command in sys.argv[2:]: ? ? ? ? ? ? ?# execute the command and display the rows ? ? ? ? ?sys.exit(0) Personally, I've often found myself wanting to run a simple query against my Python program's database, but I was on Windows. If I could do this: ? ? python -m sqlite3 foo.db?'SELECT * FROM Employees WHERE joined >= "2013"' ? I definitely might do it. But for anything more complicated than a trivial single select, I'd probably still just go download a sqlite3 binary. From ethan at stoneleaf.us Wed Jun 26 00:22:26 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 25 Jun 2013 15:22:26 -0700 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51C8F0CD.1080106@mrabarnett.plus.com> References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <51C8DB24.8050504@canterbury.ac.nz> <1372121890.62039.YahooMailNeo@web184705.mail.ne1.yahoo.com> <51C8F0CD.1080106@mrabarnett.plus.com> Message-ID: <51CA1822.1080505@stoneleaf.us> On 06/24/2013 06:22 PM, MRAB wrote: > > Why not just add a single marker followed by the names, something like this: > > return get_special_url(special='appdir', =, per_user, for_domain, create) Interesting idea. Similar to using the '*' to only allow keyword arguments. Rather than having '*' in the `def` and '=' at the call-site, why not use the '*' in both places? Then it would mean 'only keyword-arguments allowed' and 'these names are the keyword arguments'. Because the variable names are the same as the keyword names we're not losing anything. -- ~Ethan~ From abarnert at yahoo.com Wed Jun 26 01:05:54 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 25 Jun 2013 16:05:54 -0700 (PDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FE69E036110-1864-1D692@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <8D03FE69E036110-1864-1D692@webmail-m103.sysops.aol.com> Message-ID: <1372201554.79307.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: "jimjhb at aol.com" Sent: Tuesday, June 25, 2013 11:11 AM >Maybe something from the Python leadership saying breaks are fine? ?Given their lack of >support for gotos it's easy to see how others might feel breaks and continues are bad as well, >even in Python. $ grep break cpython/Lib/*py |wc -l 551 $ grep continue cpython/Lib/*py |wc -l 280 A handful of those are actually things like "setcbreak", but I'd guess at least 90% are breaking out of loops. Then there are the examples all over the official tutorial, module reference docs, FAQs, etc. that use break and continue. And Python's loop else clauses would clearly not exist if break weren't idiomatic. If all of that isn't enough to convince people, I doubt a statement from Guido undersigned by a dozen other key leaders would make a difference. And really, the larger problem is that people are trying to apply standards for idioms from C, C++, Java, C#, etc. to Python, not that this particular one is wrong.?This is the same mentality that convinces people to use special error return values and check-before-open instead of exceptions, and all kinds of other things that are just plain wrong. If they haven't figured out yet that C and Python are not the same language, I don't know what anyone can say or do, short of not hiring them, protesting working alongside them, and exposing them to public ridicule. From abarnert at yahoo.com Wed Jun 26 01:33:27 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 25 Jun 2013 16:33:27 -0700 (PDT) Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51CA1526.9000002@canterbury.ac.nz> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <1372153435.80548.YahooMailNeo@web184701.mail.ne1.yahoo.com> <51CA1526.9000002@canterbury.ac.nz> Message-ID: <1372203207.15960.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: Greg Ewing Sent: Tuesday, June 25, 2013 3:09 PM > Andrew Barnert wrote: > >> What are the use cases you have in mind? > Here's a simplified version of something > that crops up in various places in my GUI libraries. I have > a widget, let's say a Button, and I want to calculate a > default size for it during construction. To do that I need > to know its font, which is one of the things that can > potentially be passed as an argument to the constructor. > So I want to "peek" at the font argument as it goes by: > > ? ? class Button(Widget): > > ? ? ? ? def __init__(self, text = "Hello", font = system_font, > **kwds): > ? ? ? ? ? ? default_size = calc_default_size(text, font) > ? ? ? ? ? ? Widget.__init__(self, size = default_size, font = font, **kwds) OK, I recognize this kind of thing. And I also recognize your uneasy feeling with it.? Notice that in this case, you don't actually need to pull out font and pass it along: ? ? ? ? def __init__(self, text = "Hello", **kwds): ? ? ? ? ? ? default_size = calc_default_size(text,?kwds.get('font', system_font)) ? ? ? ? ? ? Widget.__init__(self, size = default_size, **kwds) That also removes the need to copy the default value from the base class, and I think it gets across the idea of "peeking at the kwargs" more clearly than pulling one out into a positional-or-keyword arg. However, I'm not sure it's actually more readable this way. And it's certainly more verbose, not less, and all those get calls would add up if I were peeking at a lot of values. Also, that doesn't help if you want to do something with the value: ? ? ? ? def __init__(self, text = "Hello", font = system_font,?**kwds): ? ? ? ? ? ? if os.sep in font: font = load_font_file(font) ? ? ? ? ? ? default_size = calc_default_size(text, font) ? ? ? ? ? ? Widget.__init__(self, size = default_size, font = font, **kwds) > Now, it's not so bad when there are only one or two passed-on > arguments, but if you have a handful it starts to feel a bit > wanky. I think the issue isn't how many passed-on arguments you have, but how many you're peeking at. If you've got 20 arguments to pass along, but only need to peek at 1 or 2, your code is fine; it's when you've got 6 arguments to pass along and need to peek at all 6 that it gets nasty. > I think I've figured out why this seemingly-small amount of > redundancy bothers me so much. It's not just that I feel I'm > saying the same thing twice, but that I feel I'm not saying > *anything at all*. If I were to write this as an ordinary > assignment: > > ? font = font > > then it would be completely redundant. Now I know full well > that it's different with a keyword argument, because the two > sides live in different namespaces. But some part of my brain > still tries to tell me that I'm doing something silly Actually, I think it's a different problem for me. My brain initially wonders why I'm trying to use the default-value hack: ? ? filters = [lambda x, modulus=modulus: x %?modulus?== 0 for?modulus?in range(2, 6)] ? on a function call instead of a function definition. Anyway, now that I see your example, I'm definitely more sympathetic to the problem. But I still don't like the proposed syntax, even if the problem were nearly compelling enough to add more syntax?which I don't think it is. From ethan at stoneleaf.us Wed Jun 26 01:32:13 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 25 Jun 2013 16:32:13 -0700 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <1372153435.80548.YahooMailNeo@web184701.mail.ne1.yahoo.com> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <1372153435.80548.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: <51CA287D.6010906@stoneleaf.us> On 06/25/2013 02:43 AM, Andrew Barnert wrote: > From: Greg Ewing: > >> I dispute that they're different names. In the use cases >> I have in mind, it's no accident that the two names are >> spelled the same, because conceptually they represent the >> very same thing. > > I've already asked Anders this, but let me ask you as well: What are the use cases you have in mind? > > As I see it, there are three cases (barring coincidences, which are obviously irrelevant) where this syntax could make a difference: > > 1. dict constructor > 2. str.format > 3. forwarding functions (like my example with get_appdir_url) For me it's mostly #3, and the cases where I use it passing positionally is not an option because the arguments are keyword only. If something like this comes to pass, I would argue for the syntax that MRAB suggested: some_func(pos1, pos2, =, keyword, keyword, keyword) or possibly: some_func(pos1, pos2, *, keyword, keyword, keyword) to mirror the function definition. -- ~Ethan~ From steve at pearwood.info Wed Jun 26 03:40:40 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 26 Jun 2013 11:40:40 +1000 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <008F1081-2BDD-430C-84B2-EB5BCB4E9109@umbrellacode.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FE80B9B68E6-1864-1D7A2@webmail-m103.sysops.aol.com> <008F1081-2BDD-430C-84B2-EB5BCB4E9109@umbrellacode.com> Message-ID: <51CA4698.4070502@pearwood.info> On 26/06/13 05:45, Shane Green wrote: > Even though they declare ?break? is bad, they don?t actually mean break is bad. I'm pretty sure they mean exactly what they say. Why else would they say it? The coding standard in question is as clear as day: [quote] Using break and continue is not allowed in any of your code for this class. Using these statements damages the readability of your code. Readability is a quality necessary for easy code maintenance. [end quote] http://www.csee.umbc.edu/courses/201/spring10/standards.shtml This is a foolish standard, since using break and continue actually improves readability over the alternatives such as: skip = False for x in seq: if skip: pass elif cond(x): skip = True else: do_this() do_that() (or worse!) instead of: for x in seq: if cond(x): break do_this() do_that() But Python's syntax shouldn't be driven by educators with dumb ideas. > It?s when it?s used to control the flow?as in causing flow to skip sections of code, etc.?of an application that it?s primarily considered bad. I don't understand what this even means. When is break anything other than a flow control statement? And what difference does it make whether it is used in an "application", "script", "library", or something else? -- Steven From steve at pearwood.info Wed Jun 26 04:18:50 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 26 Jun 2013 12:18:50 +1000 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51CA4F8A.303@pearwood.info> On 26/06/13 04:12, Stephen J. Turnbull wrote: > Shane Green writes: > > > [x for x in l if x < 10 else break]? > > That's currently invalid syntax: break is a statement. I think a > while clause (as suggested by David Mertz) would be a more plausible > extension of syntax. > > I do think extending generator/comprehension syntax is much more > plausible than extending for loop syntax (for one thing, "just use > break" is not an answer here!) Comprehensions in Clojure have this feature. http://clojuredocs.org/clojure_core/clojure.core/for ;; :when continues through the collection even if some have the ;; condition evaluate to false, like filter user=> (for [x (range 3 33 2) :when (prime? x)] x) (3 5 7 11 13 17 19 23 29 31) ;; :while stops at the first collection element that evaluates to ;; false, like take-while user=> (for [x (range 3 33 2) :while (prime? x)] x) (3 5 7) (expr for x in seq while cond) is not expandable into a for loop in quite the same way as (expr for x in seq if cond) is: result = [] for x in seq: if cond: result.append(expr) vs result = [] for x in seq: if cond: result.append(expr) else: break but I think it is a natural extension to the generator syntax that fills a real need (or is at least frequently requested), is understandable, and has precedent in at least one other language. But Nick Coghlan has ruled it's not going to happen, although I don't understand what he had against it. -- Steven From ron3200 at gmail.com Wed Jun 26 04:43:54 2013 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 25 Jun 2013 21:43:54 -0500 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <8D03FFF4F0D9F22-1864-1EB0E@webmail-m103.sysops.aol.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <8D03FFF4F0D9F22-1864-1EB0E@webmail-m103.sysops.aol.com> Message-ID: On 06/25/2013 04:08 PM, jimjhb at aol.com wrote: > That would bring back my previous notion of fwhile. > > fwhile X in ListY and conditionZ: > > It was pointed out that the 'and' is ambiguous (though perhaps not fatally > problematic) > and adding a new keyword is really bad as well. > > I guess: > > for X in ListY and conditionZ: would just have the ambiguity. > > [Or maybe not, as long as the condition is a bool, it's not iterable and > thus not confused with list.] What about allowing break and continue in if/else expressions? In a loop it might look like... break if n>=42 else continue In a comprehension it would be... [ (x if x<10 else break) for x in range(20) ] There has been some interest in adding 'break if condition'. But I think this would be better. Cheers, Ron From ncoghlan at gmail.com Wed Jun 26 06:45:54 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 26 Jun 2013 14:45:54 +1000 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51CA4F8A.303@pearwood.info> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> Message-ID: On 26 Jun 2013 12:20, "Steven D'Aprano" wrote: > > On 26/06/13 04:12, Stephen J. Turnbull wrote: >> >> Shane Green writes: >> >> > [x for x in l if x < 10 else break]? >> >> That's currently invalid syntax: break is a statement. I think a >> while clause (as suggested by David Mertz) would be a more plausible >> extension of syntax. >> >> I do think extending generator/comprehension syntax is much more >> plausible than extending for loop syntax (for one thing, "just use >> break" is not an answer here!) > > > > Comprehensions in Clojure have this feature. > > http://clojuredocs.org/clojure_core/clojure.core/for > > ;; :when continues through the collection even if some have the > ;; condition evaluate to false, like filter > user=> (for [x (range 3 33 2) :when (prime? x)] > x) > (3 5 7 11 13 17 19 23 29 31) > > ;; :while stops at the first collection element that evaluates to > ;; false, like take-while > user=> (for [x (range 3 33 2) :while (prime? x)] > x) > (3 5 7) > > > (expr for x in seq while cond) is not expandable into a for loop in quite the same way as (expr for x in seq if cond) is: > > > result = [] > for x in seq: > if cond: > result.append(expr) > > > vs > > > result = [] > for x in seq: > if cond: > result.append(expr) > else: > break > > > but I think it is a natural extension to the generator syntax that fills a real need (or is at least frequently requested), is understandable, and has precedent in at least one other language. But Nick Coghlan has ruled it's not going to happen, although I don't understand what he had against it. Comprehensions are currently just syntactic sugar for particular kinds of explicit loop, with a relatively straightforward mechanical translation from the expression form to the statement form. That's an essential property that helps keep Python's expression level and suite level flow control constructs from diverging into two independent languages that happen to share some keywords. I would vastly prefer implementing PEP 403 to allow the iterable in a comprehension to be a full generator function over any of the proposals to add yet *more* statement like functionality to expressions. Cheers, Nick. > > > > -- > Steven > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Wed Jun 26 07:44:11 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 25 Jun 2013 22:44:11 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51CA4698.4070502@pearwood.info> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FE80B9B68E6-1864-1D7A2@webmail-m103.sysops.aol.com> <008F1081-2BDD-430C-84B2-EB5BCB4E9109@umbrellacode.com> <51CA4698.4070502@pearwood.info> Message-ID: On Jun 25, 2013, at 6:40 PM, Steven D'Aprano wrote: > On 26/06/13 05:45, Shane Green wrote: >> Even though they declare ?break? is bad, they don?t actually mean break is bad. > > I'm pretty sure they mean exactly what they say. Why else would they say it? The coding standard in question is as clear as day: > They say ?break? because its typical usage violates, or simply makes it very easy to violate, rules they want to abide by. It was taboo before Python really even came about. I guess I?m a lot closer to questioning authority than I am to believing educators are psychic :-) > > >> It?s when it?s used to control the flow?as in causing flow to skip sections of code, etc.?of an application that it?s primarily considered bad. > > I don't understand what this even means. Yeah, that was a shite explanation; sorry about that. Luckily, I realized what I meant while writing the last point: don?t loops have their own scope in C? Well, I think much of the backlash may come from (poorly) applied break statements altering the with block scope, it can alter variable declaration, initialization, etc. > I don't understand what this even means. When is break anything other than a flow control statement? It?s usage in [x for x in l if x else break] doesn?t actually determine which code gets run, *only* how many times or, more importantly, the number of items in the output. But these are honestly wild ass guesses, to be honest. There is one thing I can say for certain is that anyone who says ?break? is always bad, is wrong. Most professors are perfectly happy accepting code that leverages features the generally tell kids not to use, as long as it?s not use rampantly or incorrectly and the student as a solid grasp of it and can clearly articulate the reason for using it. On Jun 25, 2013, at 6:40 PM, Steven D'Aprano wrote: > [quote] > Using break and continue is not allowed in any of your code for this class. Using these statements damages the readability of your code. Readability is a quality necessary for easy code maintenance. > [end quote] > > http://www.csee.umbc.edu/courses/201/spring10/standards.shtml > > > This is a foolish standard, since using break and continue actually improves readability over the alternatives such as: > > skip = False > for x in seq: > if skip: > pass > elif cond(x): > skip = True > else: > do_this() > do_that() > > > (or worse!) instead of: > > for x in seq: > if cond(x): > break > do_this() > do_that() > > > > But Python's syntax shouldn't be driven by educators with dumb ideas. > > >> It?s when it?s used to control the flow?as in causing flow to skip sections of code, etc.?of an application that it?s primarily considered bad. > > I don't understand what this even means. When is break anything other than a flow control statement? And what difference does it make whether it is used in an "application", "script", "library", or something else? > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Jun 26 08:06:04 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 26 Jun 2013 15:06:04 +0900 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51CA4698.4070502@pearwood.info> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FE80B9B68E6-1864-1D7A2@webmail-m103.sysops.aol.com> <008F1081-2BDD-430C-84B2-EB5BCB4E9109@umbrellacode.com> <51CA4698.4070502@pearwood.info> Message-ID: <87mwqdz9rn.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > I'm pretty sure they mean exactly what they say. Why else would > they say it? The coding standard in question is as clear as day: > > [quote] > Using break and continue is not allowed in any of your code for > this class. > [end quote] Note that here they wrote "*this* class", while the rest of the rules have no such clarification. It's quite possible that the exercises for CSMC 201 are such that "break" or "continue" would rarely be good style, and a more general rule will be introduced in more advanced classes. I suspect that the faculty and TAs would revolt if they frequently had to plow through the kind of example you presented. From shane at umbrellacode.com Wed Jun 26 08:09:32 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 25 Jun 2013 23:09:32 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <87mwqdz9rn.fsf@uwakimon.sk.tsukuba.ac.jp> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FE80B9B68E6-1864-1D7A2@webmail-m103.sysops.aol.com> <008F1081-2BDD-430C-84B2-EB5BCB4E9109@umbrellacode.com> <51CA4698.4070502@pearwood.info> <87mwqdz9rn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <608734EA-BB2C-4AA5-8621-94215B185CC2@umbrellacode.com> Um, yeah, I missed that and was just talking a bunch of BS... Sorry, and thanks Steven for managing to point that out gently :-) On Jun 25, 2013, at 11:06 PM, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > >> I'm pretty sure they mean exactly what they say. Why else would >> they say it? The coding standard in question is as clear as day: >> >> [quote] >> Using break and continue is not allowed in any of your code for >> this class. >> [end quote] > > Note that here they wrote "*this* class", while the rest of the rules > have no such clarification. It's quite possible that the exercises > for CSMC 201 are such that "break" or "continue" would rarely be good > style, and a more general rule will be introduced in more advanced > classes. I suspect that the faculty and TAs would revolt if they > frequently had to plow through the kind of example you presented. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Jun 26 09:48:54 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 26 Jun 2013 19:48:54 +1200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> Message-ID: <51CA9CE6.1080402@canterbury.ac.nz> Ron Adam wrote: > And I don't like the '=' with nothing on the right. Expanding on the suggestion someone made of having a single marker of some kind in the argument list, I came up with this: def __init__(self, text, font = system_font, style = 'plain'): default_size = calc_button_size(text, font, style) Widget.__init__(self, size = default_size, pass font, style) -- Greg From ethan at stoneleaf.us Wed Jun 26 10:04:00 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 26 Jun 2013 01:04:00 -0700 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51CA9CE6.1080402@canterbury.ac.nz> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> Message-ID: <51CAA070.1040306@stoneleaf.us> On 06/26/2013 12:48 AM, Greg Ewing wrote: > Ron Adam wrote: >> And I don't like the '=' with nothing on the right. > > Expanding on the suggestion someone made of having a > single marker of some kind in the argument list, I > came up with this: > > def __init__(self, text, font = system_font, style = 'plain'): > default_size = calc_button_size(text, font, style) > Widget.__init__(self, size = default_size, pass font, style) I don't care for it. A word doesn't stand out like a character does, plus this usage of pass is completely different from its normal usage. We're already used to interpreting '*' as a coin with two sides, let's stick with it: def apply_map(map, target, *, frobble): # '*' means frobble is keyword only ... and later: frobble = some_funny_stuff_here() . . . apply_map(map=kansas, target=toto, *, frobble) # '*' means frobble maps to keyword frobble -- ~Ethan~ From lukasz at langa.pl Wed Jun 26 10:47:39 2013 From: lukasz at langa.pl (=?utf-8?Q?=C5=81ukasz_Langa?=) Date: Wed, 26 Jun 2013 10:47:39 +0200 Subject: [Python-ideas] PEP 315: do-while Message-ID: The PEP is deferred because it seems that while True: if condition: break is good enough. I agree. We should reject the PEP and summarise the status. Alternatively, the only way I think we can improve on the syntax above is something like this: do: if condition: break or without a new keyword: while: if condition: break The empty-predicate variant would let Python check whether there's actually a break in the body of the loop. -- Best regards, ?ukasz Langa WWW: http://lukasz.langa.pl/ Twitter: @llanga IRC: ambv on #python-dev From greg.ewing at canterbury.ac.nz Wed Jun 26 10:51:45 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 26 Jun 2013 20:51:45 +1200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <1372203207.15960.YahooMailNeo@web184703.mail.ne1.yahoo.com> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <1372153435.80548.YahooMailNeo@web184701.mail.ne1.yahoo.com> <51CA1526.9000002@canterbury.ac.nz> <1372203207.15960.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: <51CAABA1.1060702@canterbury.ac.nz> Andrew Barnert wrote: > Notice that in this case, you don't actually need to pull out font and pass > it along: > > def __init__(self, text = "Hello", **kwds): default_size = > calc_default_size(text, kwds.get('font', system_font)) > Widget.__init__(self, size = default_size, **kwds) That's what I ended up doing in many cases. But using get() with a constant name has something of a nasty whiff about it as well. > I think the issue isn't how many passed-on arguments you have, but how many > you're peeking at. Yes, that's what I meant. -- Greg From greg.ewing at canterbury.ac.nz Wed Jun 26 11:24:48 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 26 Jun 2013 21:24:48 +1200 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: References: Message-ID: <51CAB360.4080202@canterbury.ac.nz> ?ukasz Langa wrote: Alternatively, the only way I think we can improve on > the syntax above is something like this: > > do: > > if condition: > break One of the problems with a 'while' clause as suggested is that the comprehension version of it would complicate the equivalence rules between comprehensions and statements. Currently, a comprehension such as [... for a in x if c] corresponds to for a in x: if c: ... But if we take a comprehension with a 'while' clause, [... for a in x while c] and try to expand it the same way, we get for a in x: while c: ... which is not the same thing at all! To make it work, we need to expand it as for a in x: if not c: break But this only works if there is only one for-loop. If we have two nested loops, [... for a in x for b in y while c] the following expansion doesn't work, for a in x: for b in y: if not c: break because the break only exits the innermost loop. The proposed 'while' clause has a similar problem. In for a in x: for b in y while c: ... the 'while' would presumably break only the innermost loop. If we try to write it like this instead, for a in x while c: for b in y: ... we have a problem if the condition c involves both a and b. I'm not sure what the lesson is from all this. I think it's that any form of early-exit construct, including 'break' and the proposed 'while' clause, is a hack of limited usefulness without some way of scoping the amount of stuff to be broken out of. Comprehensions provide a natural way of specifying that scope, whereas the looping statements don't. So, I would probably support adding a 'while' clause to comprehensions, but not to the for-loop statement. -- Greg From robert.kern at gmail.com Wed Jun 26 12:57:30 2013 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 26 Jun 2013 11:57:30 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <1372201554.79307.YahooMailNeo@web184703.mail.ne1.yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <8D03FE69E036110-1864-1D692@webmail-m103.sysops.aol.com> <1372201554.79307.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: On 2013-06-26 00:05, Andrew Barnert wrote: > From: "jimjhb at aol.com" > Sent: Tuesday, June 25, 2013 11:11 AM > > >> Maybe something from the Python leadership saying breaks are fine? Given their lack of >> support for gotos it's easy to see how others might feel breaks and continues are bad as well, >> even in Python. > > $ grep break cpython/Lib/*py |wc -l > 551 > > $ grep continue cpython/Lib/*py |wc -l > 280 These numbers should be a bit more accurate by recursing into packages (and tests, which one may not want, but I'd count them) and only counting actual uses of the keywords in code: [cpython/Lib]$ grinpython.py --without-filename -p '\bbreak\b' | wc -l 740 [cpython/Lib]$ grinpython.py --without-filename -p '\bcontinue\b' | wc -l 607 -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From lukasz at langa.pl Wed Jun 26 13:44:19 2013 From: lukasz at langa.pl (=?utf-8?Q?=C5=81ukasz_Langa?=) Date: Wed, 26 Jun 2013 13:44:19 +0200 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <51CAB360.4080202@canterbury.ac.nz> References: <51CAB360.4080202@canterbury.ac.nz> Message-ID: <022E48BD-24BD-4BF4-A0D4-5F5A14BAD5F5@langa.pl> On 26 cze 2013, at 11:24, Greg Ewing wrote: > ?ukasz Langa wrote: > Alternatively, the only way I think we can improve on >> the syntax above is something like this: >> do: >> >> if condition: >> break > > But if we take a comprehension with a 'while' clause, > > [... for a in x while c] Correct me if I'm wrong but it seems you mixed up two independent threads on the list. I'm discussing an equivalent of the C `do {} while();` loop. This doesn't touch comprehensions at all. -- Best regards, ?ukasz Langa WWW: http://lukasz.langa.pl/ Twitter: @llanga IRC: ambv on #python-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Wed Jun 26 13:53:13 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 26 Jun 2013 14:53:13 +0300 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: References: Message-ID: 26.06.13 11:47, ?ukasz Langa ???????(??): > The empty-predicate variant would let Python check whether there's > actually a break in the body of the loop. 1. There are infinity loops. 2. There are exceptions. From joshua.landau.ws at gmail.com Wed Jun 26 16:46:26 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Wed, 26 Jun 2013 15:46:26 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51CAA070.1040306@stoneleaf.us> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> Message-ID: On 26 June 2013 09:04, Ethan Furman wrote: > On 06/26/2013 12:48 AM, Greg Ewing wrote: >> >> Ron Adam wrote: >>> >>> And I don't like the '=' with nothing on the right. >> >> >> Expanding on the suggestion someone made of having a >> single marker of some kind in the argument list, I >> came up with this: >> >> def __init__(self, text, font = system_font, style = 'plain'): >> default_size = calc_button_size(text, font, style) >> Widget.__init__(self, size = default_size, pass font, style) > > > I don't care for it. > > A word doesn't stand out like a character does, plus this usage of pass is > completely different from its normal usage. > > We're already used to interpreting '*' as a coin with two sides, let's stick > with it: > > def apply_map(map, target, *, frobble): # '*' means frobble is keyword > only > ... > > and later: > > frobble = some_funny_stuff_here() > . > . > . > apply_map(map=kansas, target=toto, *, frobble) # '*' means frobble maps > to keyword frobble Whilst Greg Ewing has made me also much more sympathetic to this view, I feel that: 1) This is nearly unreadable - it does not say what it does in the slightest 2) It's added syntax - that's a high barrier. I'm not convinced it's worth it yet. 3) It still feels like hackery; I might prefer something explicitly hackery like this: apply_map(map=kansas, target=toto, **locals("frobble")) where locals is: def locals(*args): if args: return {arg:original_locals()[arg] for arg in args} else: return original_locals() For Greg's he'd use: def __init__(self, text="Hello", font=system_font, **kwds): default_size = calc_default_size(text, font) Widget.__init__(self, size=default_size, **locals("font"), **kwds) or even def __init__(self, text = "Hello", font = system_font, **kwds): default_size = calc_default_size(text, font) Widget.__init__(self, **locals("size", "font"), **kwds) under the asumption that http://bugs.python.org/issue2292 does get implemented first. For reference, he is using (respaced for consistency): def __init__(self, text="Hello", font=system_font, **kwds): default_size = calc_default_size(text, font) Widget.__init__(self, size=default_size, font=font, **kwds) Note that this is only a way to suggest that *there might be another way*; maybe something involving objects. 3 cont.) The reason I think it feels like hackery is simply that I don't feel like Python is ever "reference by name"; objects don't know what they are called (some know what they *were* named; but they have no guarantee it's true) it feels *very wrong* to give "foobar" to function and have that function somehow extract the name! I know it's not doing that, but it's ever-so-close. However; maybe: class AttrDict(dict): def __init__(self, mapping, **defaults): super().__init__(defaults, **mapping) self.__dict__ = self def __init__(self, **kwds): kwds = AttrDict(kwds, text="Hello", font=system_font) kwds.size = calc_default_size(kwds.text, kwds,font) Widget.__init__(self, **kwds) and a decorator could even make this: @dynamic_kwds(text="Hello", font=system_font) def __init__(self, kwds): kwds.size = calc_default_size(kwds.text, kwds,font) Widget.__init__(self, **kwds) which would have the same run-time effect as the original (the second keeps re-evaluating the signature). Again; I mention these only to show people not to have a restricted idea on what the solution looks like - don't just try and fix the symptoms of the problem. From ron3200 at gmail.com Wed Jun 26 16:59:54 2013 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 26 Jun 2013 09:59:54 -0500 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51CA9CE6.1080402@canterbury.ac.nz> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> Message-ID: On 06/26/2013 02:48 AM, Greg Ewing wrote: > Ron Adam wrote: >> And I don't like the '=' with nothing on the right. > > Expanding on the suggestion someone made of having a > single marker of some kind in the argument list, I > came up with this: > > def __init__(self, text, font = system_font, style = 'plain'): > default_size = calc_button_size(text, font, style) > Widget.__init__(self, size = default_size, pass font, style) Using the '*' in the call sites as suggested earlier because of it's consistency with function definitions is probably a better choice. In the above example, would the '*' (or pass) be in the same place as the it is in the function definition? Widget.__init__(self, *, size=default_size, font, style) Or: Widget.__init__(self, size=default_size, *, font, style) I think it should, because then there would be a clearer separation between what are positional and keyword arguments throughout a program. _Ron From guido at python.org Wed Jun 26 17:03:28 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 26 Jun 2013 08:03:28 -0700 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: References: Message-ID: Please reject the PEP. More variations along these lines won't make the language more elegant or easier to learn. They'd just save a few hasty folks some typing while making others who have to read/maintain their code wonder what it means. --Guido On Wed, Jun 26, 2013 at 1:47 AM, ?ukasz Langa wrote: > The PEP is deferred because it seems that > > while True: > > if condition: > break > > is good enough. I agree. We should reject the PEP and summarise > the status. Alternatively, the only way I think we can improve on > the syntax above is something like this: > > do: > > if condition: > break > > or without a new keyword: > > while: > > if condition: > break > > The empty-predicate variant would let Python check whether there's > actually a break in the body of the loop. > > -- > Best regards, > ?ukasz Langa > > WWW: http://lukasz.langa.pl/ > Twitter: @llanga > IRC: ambv on #python-dev > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Wed Jun 26 17:13:10 2013 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 26 Jun 2013 10:13:10 -0500 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> Message-ID: On 06/26/2013 09:46 AM, Joshua Landau wrote: > However; maybe: > > class AttrDict(dict): > def __init__(self, mapping, **defaults): > super().__init__(defaults, **mapping) > self.__dict__ = self > > def __init__(self, **kwds): > kwds = AttrDict(kwds, text="Hello", font=system_font) > > kwds.size = calc_default_size(kwds.text, kwds,font) > > Widget.__init__(self, **kwds) > > and a decorator could even make this: > > @dynamic_kwds(text="Hello", font=system_font) > def __init__(self, kwds): > kwds.size = calc_default_size(kwds.text, kwds,font) > Widget.__init__(self, **kwds) > > which would have the same run-time effect as the original (the second > keeps re-evaluating the signature). > > > Again; I mention these only to show people not to have a restricted > idea on what the solution looks like - don't just try and fix the > symptoms of the problem. It's always good advise to keep an open mind. I think the participants of this thread know they can use functions and decorators as helpers, but those always have some cost in processing time and memory usage. It's much harder to find a clean solution that doesn't use those. Cheers, Ron From lukasz at langa.pl Wed Jun 26 17:39:44 2013 From: lukasz at langa.pl (=?utf-8?Q?=C5=81ukasz_Langa?=) Date: Wed, 26 Jun 2013 17:39:44 +0200 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: References: Message-ID: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> On 26 cze 2013, at 17:03, Guido van Rossum wrote: > Please reject the PEP. More variations along these lines won't make the language more elegant or easier to learn. They'd just save a few hasty folks some typing while making others who have to read/maintain their code wonder what it means. Done. PEP 315 has been rejected. http://hg.python.org/peps/rev/21deefe50c51 -- Best regards, ?ukasz Langa WWW: http://lukasz.langa.pl/ Twitter: @llanga IRC: ambv on #python-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jun 26 17:24:05 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 26 Jun 2013 08:24:05 -0700 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> Message-ID: <51CB0795.2000508@stoneleaf.us> On 06/26/2013 07:46 AM, Joshua Landau wrote: > On 26 June 2013 09:04, Ethan Furman wrote: >> On 06/26/2013 12:48 AM, Greg Ewing wrote: >>> >>> Ron Adam wrote: >>>> >>>> And I don't like the '=' with nothing on the right. >>> >>> >>> Expanding on the suggestion someone made of having a >>> single marker of some kind in the argument list, I >>> came up with this: >>> >>> def __init__(self, text, font = system_font, style = 'plain'): >>> default_size = calc_button_size(text, font, style) >>> Widget.__init__(self, size = default_size, pass font, style) >> >> >> I don't care for it. >> >> A word doesn't stand out like a character does, plus this usage of pass is >> completely different from its normal usage. >> >> We're already used to interpreting '*' as a coin with two sides, let's stick >> with it: >> >> def apply_map(map, target, *, frobble): # '*' means frobble is keyword >> only >> ... >> >> and later: >> >> frobble = some_funny_stuff_here() >> . >> . >> . >> apply_map(map=kansas, target=toto, *, frobble) # '*' means frobble maps >> to keyword frobble > > Whilst Greg Ewing has made me also much more sympathetic to this view, > I feel that: > > 1) This is nearly unreadable - it does not say what it does in the slightest And the '*' and '**' in function defintions do? > 2) It's added syntax - that's a high barrier. I'm not convinced it's > worth it yet. It is a high barrier; but this does add a bit of symmetry to the new '*'-meaning-keyword-only symbol. > 3) It still feels like hackery; I might prefer something explicitly > hackery like this: You'll get used to it. ;) -- ~Ethan~ From boxed at killingar.net Wed Jun 26 17:51:19 2013 From: boxed at killingar.net (=?utf-8?Q?Anders_Hovm=C3=B6ller?=) Date: Wed, 26 Jun 2013 17:51:19 +0200 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> Message-ID: <36F09C49-8E3C-4535-AD4C-1FC60B5A2816@killingar.net> The *locals(...) thing looks pretty nice and would clearly take away a lot of the pain I was talking about! Thanks > On 26 Jun 2013, at 16:46, Joshua Landau wrote: > >> On 26 June 2013 09:04, Ethan Furman wrote: >>> On 06/26/2013 12:48 AM, Greg Ewing wrote: >>> >>> Ron Adam wrote: >>>> >>>> And I don't like the '=' with nothing on the right. >>> >>> >>> Expanding on the suggestion someone made of having a >>> single marker of some kind in the argument list, I >>> came up with this: >>> >>> def __init__(self, text, font = system_font, style = 'plain'): >>> default_size = calc_button_size(text, font, style) >>> Widget.__init__(self, size = default_size, pass font, style) >> >> >> I don't care for it. >> >> A word doesn't stand out like a character does, plus this usage of pass is >> completely different from its normal usage. >> >> We're already used to interpreting '*' as a coin with two sides, let's stick >> with it: >> >> def apply_map(map, target, *, frobble): # '*' means frobble is keyword >> only >> ... >> >> and later: >> >> frobble = some_funny_stuff_here() >> . >> . >> . >> apply_map(map=kansas, target=toto, *, frobble) # '*' means frobble maps >> to keyword frobble > > Whilst Greg Ewing has made me also much more sympathetic to this view, > I feel that: > > 1) This is nearly unreadable - it does not say what it does in the slightest > > 2) It's added syntax - that's a high barrier. I'm not convinced it's > worth it yet. > > 3) It still feels like hackery; I might prefer something explicitly > hackery like this: > > apply_map(map=kansas, target=toto, **locals("frobble")) > > where locals is: > > def locals(*args): > if args: > return {arg:original_locals()[arg] for arg in args} > else: > return original_locals() > > For Greg's he'd use: > > def __init__(self, text="Hello", font=system_font, **kwds): > default_size = calc_default_size(text, font) > Widget.__init__(self, size=default_size, **locals("font"), **kwds) > > or even > > def __init__(self, text = "Hello", font = system_font, **kwds): > default_size = calc_default_size(text, font) > Widget.__init__(self, **locals("size", "font"), **kwds) > > under the asumption that http://bugs.python.org/issue2292 does get > implemented first. > > For reference, he is using (respaced for consistency): > > def __init__(self, text="Hello", font=system_font, **kwds): > default_size = calc_default_size(text, font) > Widget.__init__(self, size=default_size, font=font, **kwds) > > Note that this is only a way to suggest that *there might be another > way*; maybe something involving objects. > > 3 cont.) The reason I think it feels like hackery is simply that I > don't feel like Python is ever "reference by name"; objects don't know > what they are called (some know what they *were* named; but they have > no guarantee it's true) it feels *very wrong* to give "foobar" to > function and have that function somehow extract the name! I know it's > not doing that, but it's ever-so-close. > > However; maybe: > > class AttrDict(dict): > def __init__(self, mapping, **defaults): > super().__init__(defaults, **mapping) > self.__dict__ = self > > def __init__(self, **kwds): > kwds = AttrDict(kwds, text="Hello", font=system_font) > > kwds.size = calc_default_size(kwds.text, kwds,font) > > Widget.__init__(self, **kwds) > > and a decorator could even make this: > > @dynamic_kwds(text="Hello", font=system_font) > def __init__(self, kwds): > kwds.size = calc_default_size(kwds.text, kwds,font) > Widget.__init__(self, **kwds) > > which would have the same run-time effect as the original (the second > keeps re-evaluating the signature). > > > Again; I mention these only to show people not to have a restricted > idea on what the solution looks like - don't just try and fix the > symptoms of the problem. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From jsbueno at python.org.br Wed Jun 26 17:43:53 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Wed, 26 Jun 2013 12:43:53 -0300 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <20130624224127.GA10419@ando> References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> Message-ID: On 24 June 2013 19:41, Steven D'Aprano wrote: > On 25/06/13 03:58, Andrew McNabb wrote: >> >> I'm not even sure I like it, but many of the responses have denied the >> existence of the use case rather than criticizing the solution. > > > I haven't seen anyone deny that it is possible to write code like > > spam(ham=ham, eggs=eggs, toast=toast) > > What I've seen is people deny that it happens *often enough* to deserve > dedicated syntax to "fix" it. (I use scare quotes here because I don't > actually think that repeating the name that way is a problem that needs > fixing.) > Sorry for being silent for agreeing with the original proposal - but indeed - I think it does happen often enough to ask for some change - doubly so with stgin/template formatting calls (It is not fun to have to maintain 3rd party code doing format(**locals() ) , and just writting every used constant twice - and _then_ having to justify to others why that, awfull as it looks, is way better than using the **locals() call. (and yes, it _does_ happen) , I think that there could possibly be something better than the proposed syntax -and we culd get to it - but as it is, it would eb good enough for me. js -><- From ethan at stoneleaf.us Wed Jun 26 17:48:48 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 26 Jun 2013 08:48:48 -0700 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> Message-ID: <51CB0D60.20408@stoneleaf.us> On 06/26/2013 07:59 AM, Ron Adam wrote: > > > On 06/26/2013 02:48 AM, Greg Ewing wrote: >> Ron Adam wrote: >>> And I don't like the '=' with nothing on the right. >> >> Expanding on the suggestion someone made of having a >> single marker of some kind in the argument list, I >> came up with this: >> >> def __init__(self, text, font = system_font, style = 'plain'): >> default_size = calc_button_size(text, font, style) >> Widget.__init__(self, size = default_size, pass font, style) > > > Using the '*' in the call sites as suggested earlier because of it's consistency with function definitions is probably a > better choice. > > In the above example, would the '*' (or pass) be in the same place as the it is in the function definition? > > Widget.__init__(self, *, size=default_size, font, style) > > Or: > > Widget.__init__(self, size=default_size, *, font, style) > > > I think it should, because then there would be a clearer separation between what are positional and keyword arguments > throughout a program. Not at all. Just like we can use keyword arguments even when the '*'-keyword-only symbol has not been used, we should be able to use the '*'-variable-name-is-keyword whenever we have keyword arguments available -- which is most of the time. -- ~Ethan~ From tjreedy at udel.edu Wed Jun 26 20:16:39 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 26 Jun 2013 14:16:39 -0400 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> Message-ID: On 6/26/2013 12:45 AM, Nick Coghlan wrote: > Comprehensions are currently just syntactic sugar for particular kinds > of explicit loop, with a relatively straightforward mechanical > translation from the expression form to the statement form. That's an > essential property that helps keep Python's expression level and suite > level flow control constructs from diverging into two independent > languages that happen to share some keywords. I think there are two points about comprehensions that people miss when they use it as justification to proposed that their personal favorite composition be built in to the language. 1. Comprehensions compose at least three statements, and not just two. 2. Comprehensions embody and implement a basic concept and tool of thought -- defining a class or collection by rule, which is complementary to defining such by roster. The former is probably more common in everyday life and definitely so in law. A statement form of itertools.takewhile is not in the same ballpark. -- Displays are also 'just syntactic sugar for particular kinds of ... loop' -- but with the loop written in the imterpreter implementation language. nums = [1,2,3] is equivalent to nums = [] nums.append(1) nums.append(2) nums.append(3) and in CPython implemented as the internal equivalent of nums = [i for i in <1,2,3>] where <1,2,3> is a sequence of stack values. Tuples are preallocated to their final size and filled in by index. > I would vastly prefer implementing PEP 403 to allow the iterable in a > comprehension to be a full generator function I do not understand what you mean here. PEP 403 is about a new decorator clause, whereas the iterable in comprehension can already be the return from a generator function. -- Terry Jan Reedy From jimjhb at aol.com Wed Jun 26 22:25:28 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Wed, 26 Jun 2013 16:25:28 -0400 (EDT) Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> Message-ID: <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> Just to clarify, PEP 315 had to do with a do-while concept (to make common use of pre-while code that often exists) and not the (dubious) issue of: for X in listY while conditionZ: or for X in listY and conditionZ: (fwhile) Right? There seemed to be some confusion, but maybe they are related, but I'm not sure how.... -Jim -----Original Message----- From: ?ukasz Langa To: Guido van Rossum Cc: Python-Ideas Sent: Wed, Jun 26, 2013 11:40 am Subject: Re: [Python-ideas] PEP 315: do-while On 26 cze 2013, at 17:03, Guido van Rossum wrote: Please reject the PEP. More variations along these lines won't make the language more elegant or easier to learn. They'd just save a few hasty folks some typing while making others who have to read/maintain their code wonder what it means. Done. PEP 315 has been rejected. http://hg.python.org/peps/rev/21deefe50c51 -- Best regards, ?ukasz Langa WWW: http://lukasz.langa.pl/ Twitter: @llanga IRC: ambv on #python-dev _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Jun 26 22:41:09 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 26 Jun 2013 13:41:09 -0700 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> Message-ID: FWIW I'm against adding anything along these lines (the motivation is the same as what I wrote before). On Wed, Jun 26, 2013 at 1:25 PM, wrote: > Just to clarify, PEP 315 had to do with a do-while concept (to make common > use of pre-while code that > often exists) and not the (dubious) issue of: > > for X in listY while conditionZ: > > or > > for X in listY and conditionZ: > > (fwhile) > > Right? > > There seemed to be some confusion, but maybe they are related, but I'm not > sure how.... > > -Jim > > > -----Original Message----- > From: ?ukasz Langa > To: Guido van Rossum > Cc: Python-Ideas > Sent: Wed, Jun 26, 2013 11:40 am > Subject: Re: [Python-ideas] PEP 315: do-while > > On 26 cze 2013, at 17:03, Guido van Rossum wrote: > > Please reject the PEP. More variations along these lines won't make the > language more elegant or easier to learn. They'd just save a few hasty folks > some typing while making others who have to read/maintain their code wonder > what it means. > > > Done. PEP 315 has been rejected. > > http://hg.python.org/peps/rev/21deefe50c51 > > -- > Best regards, > ?ukasz Langa > > WWW: http://lukasz.langa.pl/ > Twitter: @llanga > IRC: ambv on #python-dev > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From jimjhb at aol.com Wed Jun 26 22:48:49 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Wed, 26 Jun 2013 16:48:49 -0400 (EDT) Subject: [Python-ideas] PEP 315: do-while In-Reply-To: References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> Message-ID: <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> Sounds good to me. Just let people know it's OK to use breaks!!! MISRA-C 1998 (I know it's C, but people extrapolated that to other languages) banned continues and breaks. MISRA-C 2004 allowed for one break in a loop. But by that point the damage had been done. Now there are lots of folks out there that have been trained to NOT use break and continue (ever), even in Python. So the notion was an attempt to deal with that perhaps unreasonable, but still actual, reality. -Jim -----Original Message----- From: Guido van Rossum To: jimjhb Cc: lukasz ; python-ideas Sent: Wed, Jun 26, 2013 4:41 pm Subject: Re: [Python-ideas] PEP 315: do-while FWIW I'm against adding anything along these lines (the motivation is the same as what I wrote before). On Wed, Jun 26, 2013 at 1:25 PM, wrote: > Just to clarify, PEP 315 had to do with a do-while concept (to make common > use of pre-while code that > often exists) and not the (dubious) issue of: > > for X in listY while conditionZ: > > or > > for X in listY and conditionZ: > > (fwhile) > > Right? > > There seemed to be some confusion, but maybe they are related, but I'm not > sure how.... > > -Jim > > > -----Original Message----- > From: ?ukasz Langa > To: Guido van Rossum > Cc: Python-Ideas > Sent: Wed, Jun 26, 2013 11:40 am > Subject: Re: [Python-ideas] PEP 315: do-while > > On 26 cze 2013, at 17:03, Guido van Rossum wrote: > > Please reject the PEP. More variations along these lines won't make the > language more elegant or easier to learn. They'd just save a few hasty folks > some typing while making others who have to read/maintain their code wonder > what it means. > > > Done. PEP 315 has been rejected. > > http://hg.python.org/peps/rev/21deefe50c51 > > -- > Best regards, > ?ukasz Langa > > WWW: http://lukasz.langa.pl/ > Twitter: @llanga > IRC: ambv on #python-dev > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Jun 26 22:54:41 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 26 Jun 2013 13:54:41 -0700 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> Message-ID: What misery. :-) --Guido van Rossum (sent from Android phone) On Jun 26, 2013 1:48 PM, wrote: > Sounds good to me. Just let people know it's OK to use breaks!!! > > MISRA-C 1998 (I know it's C, but people extrapolated that to other > languages) banned continues and breaks. > MISRA-C 2004 allowed for one break in a loop. > > But by that point the damage had been done. Now there are lots of folks > out there that have been trained to NOT use break and continue (ever), even > in Python. > > So the notion was an attempt to deal with that perhaps unreasonable, but > still actual, reality. > > -Jim > > -----Original Message----- > From: Guido van Rossum > To: jimjhb > Cc: lukasz ; python-ideas > Sent: Wed, Jun 26, 2013 4:41 pm > Subject: Re: [Python-ideas] PEP 315: do-while > > FWIW I'm against adding anything along these lines (the motivation is > the same as what I wrote before). > > On Wed, Jun 26, 2013 at 1:25 PM, wrote: > > Just to clarify, PEP 315 had to do with a do-while concept (to make common > > use of pre-while code that > > often exists) and not the (dubious) issue of: > > > > for X in listY while conditionZ: > > > > or > > > > for X in listY and conditionZ: > > > > (fwhile) > > > > Right? > > > > There seemed to be some confusion, but maybe they are related, but I'm not > > sure how.... > > > > -Jim > > > > > > -----Original Message----- > > From: ?ukasz Langa > > To: Guido van Rossum > > Cc: Python-Ideas > > Sent: Wed, Jun 26, 2013 11:40 am > > Subject: Re: [Python-ideas] PEP 315: do-while > > > > On 26 cze 2013, at 17:03, Guido van Rossum wrote: > > > > Please reject the PEP. More variations along these lines won't make the > > language more elegant or easier to learn. They'd just save a few hasty folks > > some typing while making others who have to read/maintain their code wonder > > what it means. > > > > > > Done. PEP 315 has been rejected. > > > > http://hg.python.org/peps/rev/21deefe50c51 > > > > -- > > Best regards, > > ?ukasz Langa > > > > WWW: http://lukasz.langa.pl/ > > Twitter: @llanga > > IRC: ambv on #python-dev > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > > > -- > --Guido van Rossum (python.org/~guido) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Thu Jun 27 00:39:23 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 26 Jun 2013 18:39:23 -0400 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> Message-ID: I can do this with generators now: >>> def stop(): ... raise StopIteration ... >>> list(i for i in range(10) if i < 3 or stop()) [0, 1, 2] Can't the same be allowed in compehensions? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Thu Jun 27 00:56:34 2013 From: mertz at gnosis.cx (David Mertz) Date: Wed, 26 Jun 2013 15:56:34 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> Message-ID: On Wed, Jun 26, 2013 at 3:39 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > I can do this with generators now: > > >>> def stop(): > ... raise StopIteration > ... > >>> list(i for i in range(10) if i < 3 or stop()) > [0, 1, 2] > > Can't the same be allowed in compehensions? > This will only work in generator comprehensions. The reasons are obvious if you think about it. But it isn't too ugly as a technique, IMO. I still like the 'while' clause in generators (notwithstanding the fact it doesn't translate straightforwardly to an "unrolled block"), but this is sort of nice (and maybe I'll start using it): >>> dict((i,i*2) for i in range(10) if i < 3 or stop()) {0: 0, 1: 2, 2: 4} Even though this happens: >>> {i:i*2 for i in range(10) if i<3 or stop()} Traceback (most recent call last): File "", line 1, in File "", line 1, in File "", line 1, in stop StopIteration -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Jun 27 01:17:37 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 26 Jun 2013 16:17:37 -0700 (PDT) Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> Message-ID: <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> From: "jimjhb at aol.com" Sent: Wednesday, June 26, 2013 1:48 PM >Sounds good to me. ?Just let people know it's OK to use breaks!!! > > >MISRA-C 1998 (I know it's C, but people extrapolated that to other languages) banned continues and breaks. >MISRA-C 2004 allowed for one break in a loop. In case it's not obvious how ridiculous it is to extrapolate MISRA-C to other languages? I went through the 127 rules in MISRA-C 1998. About 52 of them could be extrapolated to Python. Of those, 20 make some sense; the other 32 would be either ridiculous or disastrous to apply in Python. (In fact, much the same is true even for more closely-related languages like C++ or JavaScript.) If you're curious about the details, see?http://stupidpythonideas.blogspot.com/2013/06/misra-c-and-python.html. From joshua.landau.ws at gmail.com Thu Jun 27 01:11:45 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Thu, 27 Jun 2013 00:11:45 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <04D48BC2-E2E2-4C6E-BAC0-CD6A3186845A@yahoo.com> <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> Message-ID: On 26 June 2013 16:43, Joao S. O. Bueno wrote: > On 24 June 2013 19:41, Steven D'Aprano wrote: >> On 25/06/13 03:58, Andrew McNabb wrote: >>> >>> I'm not even sure I like it, but many of the responses have denied the >>> existence of the use case rather than criticizing the solution. >> >> >> I haven't seen anyone deny that it is possible to write code like >> >> spam(ham=ham, eggs=eggs, toast=toast) >> >> What I've seen is people deny that it happens *often enough* to deserve >> dedicated syntax to "fix" it. (I use scare quotes here because I don't >> actually think that repeating the name that way is a problem that needs >> fixing.) >> > > Sorry for being silent for agreeing with the original proposal - but indeed - > I think it does happen often enough to ask for some change - doubly so > with stgin/template formatting calls (It is not fun to have to > maintain 3rd party code > doing format(**locals() ) , and just writting every used constant > twice - and _then_ having to justify to others why that, awfull as it > looks, is way better than using the **locals() call. (and yes, it > _does_ happen) Is it really that awful? I mean, assuming you're sane and use .format_map. But, as was said before, most of the time it doesn't make much sense to do .format_map(locals()) as the variable you want should be part of some more-naturally accessed namespace. > , I think that there could possibly be something better than the > proposed syntax -and we culd get to it - but as it is, it would eb > good enough for me. From ncoghlan at gmail.com Thu Jun 27 01:20:43 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 27 Jun 2013 09:20:43 +1000 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> Message-ID: On 27 Jun 2013 08:57, "David Mertz" wrote: > > On Wed, Jun 26, 2013 at 3:39 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: >> >> I can do this with generators now: >> >> >>> def stop(): >> ... raise StopIteration >> ... >> >>> list(i for i in range(10) if i < 3 or stop()) >> [0, 1, 2] >> >> Can't the same be allowed in compehensions? > > > This will only work in generator comprehensions. The reasons are obvious if you think about it. But it isn't too ugly as a technique, IMO. I still like the 'while' clause in generators (notwithstanding the fact it doesn't translate straightforwardly to an "unrolled block"), but this is sort of nice (and maybe I'll start using it): > > >>> dict((i,i*2) for i in range(10) if i < 3 or stop()) > {0: 0, 1: 2, 2: 4} I'm personally not opposed to allowing "else break" after the if clause as a way of terminating iteration early in comprehensions and generator expressions, as it's consistent with the existing translation to an explicit loop: {i:i*2 for i in range(10) if i<3 else break} I'd even be OK with the use of an "else" clause to define alternative entries to use when the condition is false: {i:i*2 for i in range(10) if i<3 else i:i} A reasonable constraint on the complexity may be to disallow mixing this extended conditional form with the nested loop form. I suspect Guido still won't be a fan either way, though :) Cheers, Nick. > > Even though this happens: > > >>> {i:i*2 for i in range(10) if i<3 or stop()} > Traceback (most recent call last): > File "", line 1, in > File "", line 1, in > File "", line 1, in stop > StopIteration > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Thu Jun 27 01:15:26 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Thu, 27 Jun 2013 00:15:26 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51CB0795.2000508@stoneleaf.us> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> Message-ID: On 26 June 2013 16:24, Ethan Furman wrote: > On 06/26/2013 07:46 AM, Joshua Landau wrote: >> >> On 26 June 2013 09:04, Ethan Furman wrote: >>> >>> A word doesn't stand out like a character does, plus this usage of pass >>> is >>> completely different from its normal usage. >>> >>> We're already used to interpreting '*' as a coin with two sides, let's >>> stick >>> with it: >>> >>> def apply_map(map, target, *, frobble): # '*' means frobble is >>> keyword >>> only >>> ... >>> >>> and later: >>> >>> frobble = some_funny_stuff_here() >>> . >>> . >>> . >>> apply_map(map=kansas, target=toto, *, frobble) # '*' means frobble >>> maps >>> to keyword frobble >> >> >> Whilst Greg Ewing has made me also much more sympathetic to this view, >> I feel that: >> >> 1) This is nearly unreadable - it does not say what it does in the >> slightest > > > And the '*' and '**' in function defintions do? Yes. The "*" symbol means "unpack" across a very large part of python, and the lone "*" was a simple extension to what it already did. There was no leap; I could've guessed what it did. It does not mean "magically make an object know what its name is and then unpack both of those -- implicitly over all of the following args!". >> 2) It's added syntax - that's a high barrier. I'm not convinced it's >> worth it yet. > > It is a high barrier; but this does add a bit of symmetry to the new > '*'-meaning-keyword-only symbol. I don't think it does - there's no symmetry as they have completely different functions. >> 3) It still feels like hackery; I might prefer something explicitly >> hackery like this: > > > You'll get used to it. ;) I bet you I won't :P. From joshua.landau.ws at gmail.com Thu Jun 27 01:25:26 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Thu, 27 Jun 2013 00:25:26 +0100 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: On 27 June 2013 00:17, Andrew Barnert wrote: > > In case it's not obvious how ridiculous it is to extrapolate MISRA-C to other languages? > > I went through the 127 rules in MISRA-C 1998. About 52 of them could be extrapolated to Python. Of those, 20 make some sense; the other 32 would be either ridiculous or disastrous to apply in Python. (In fact, much the same is true even for more closely-related languages like C++ or JavaScript.) > > If you're curious about the details, see http://stupidpythonideas.blogspot.com/2013/06/misra-c-and-python.html. "67. rcm: Don't rebind the iterator in a for loop." That sounds quite sane to me... From oscar.j.benjamin at gmail.com Thu Jun 27 01:41:46 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 27 Jun 2013 00:41:46 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> Message-ID: On 27 June 2013 00:20, Nick Coghlan wrote: > > On 27 Jun 2013 08:57, "David Mertz" wrote: >> >> On Wed, Jun 26, 2013 at 3:39 PM, Alexander Belopolsky >> wrote: >>> >>> I can do this with generators now: >>> >>> >>> def stop(): >>> ... raise StopIteration >>> ... >>> >>> list(i for i in range(10) if i < 3 or stop()) >>> [0, 1, 2] >>> >>> Can't the same be allowed in compehensions? For reference this was discussed 6 months ago in the thread starting here: http://mail.python.org/pipermail/python-ideas/2013-January/018969.html >> This will only work in generator comprehensions. The reasons are obvious >> if you think about it. But it isn't too ugly as a technique, IMO. I still >> like the 'while' clause in generators (notwithstanding the fact it doesn't >> translate straightforwardly to an "unrolled block"), but this is sort of >> nice (and maybe I'll start using it): >> >> >>> dict((i,i*2) for i in range(10) if i < 3 or stop()) >> {0: 0, 1: 2, 2: 4} > > I'm personally not opposed to allowing "else break" after the if clause as a > way of terminating iteration early in comprehensions and generator > expressions, as it's consistent with the existing translation to an explicit > loop: > > {i:i*2 for i in range(10) if i<3 else break} If I remember there was an objection to using "else break" in preference for "else return" as it was deemed (by some) that it wasn't clear which loop the break applied to in the case where there were nested loops. The "else return" version still makes sense when unrolled if the intention is to terminate all the loops. > I'd even be OK with the use of an "else" clause to define alternative > entries to use when the condition is false: > > {i:i*2 for i in range(10) if i<3 else i:i} You can use ternary if/else in the expression rather than the if clause: {i:i*2 if i < 3 else i for i in range(10)} I think that's clearer in this case. This always works for list/set comprehensions and for generators. For dict comprehensions it works but it's less good if you need to use it for both key and value, i.e. {i*2: i*3 for i in range(10) if i < 3 else i:i} would become {i*2 if i < 3 else i: i*3 if i < 3 else i for i in range(10)} > A reasonable constraint on the complexity may be to disallow mixing this > extended conditional form with the nested loop form. I would welcome an addition along these lines with that constraint. I rarely use nested loops in comprehensions and often find myself writing simple while loops in situations that would be covered by a comprehension that used "else break". Oscar From ethan at stoneleaf.us Thu Jun 27 01:40:52 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 26 Jun 2013 16:40:52 -0700 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: <51CB7C04.8080708@stoneleaf.us> On 06/26/2013 04:25 PM, Joshua Landau wrote: > On 27 June 2013 00:17, Andrew Barnert wrote: >> >> In case it's not obvious how ridiculous it is to extrapolate MISRA-C to other languages? >> >> I went through the 127 rules in MISRA-C 1998. About 52 of them could be extrapolated to Python. Of those, 20 make some sense; the other 32 would be either ridiculous or disastrous to apply in Python. (In fact, much the same is true even for more closely-related languages like C++ or JavaScript.) >> >> If you're curious about the details, see http://stupidpythonideas.blogspot.com/2013/06/misra-c-and-python.html. > > "67. rcm: Don't rebind the iterator in a for loop." > > That sounds quite sane to me... def NameCase(*names): '''names should already be stripped of whitespace''' if not any(names): return names final = [] for name in names: pieces = name.lower().split() result = [] for i, piece in enumerate(pieces): if '-' in piece: piece = ' '.join(piece.replace('-',' ').split()) piece = '-'.join(NameCase(piece).split()) elif alpha_num(piece) in ('i', 'ii', 'iii', 'iv', 'v', 'vi', 'vii', 'viii', 'ix', 'x'): piece = piece.upper() elif piece in ('and', 'de', 'del', 'der', 'el', 'la', 'van', 'of'): pass elif piece[:2] == 'mc': piece = 'Mc' + piece[2:].title() else: possible = mixed_case_names.get(piece, None) if possible is not None: piece = possible else: piece = piece.title() if piece[-2:].startswith("'"): piece = piece[:-1] + piece[-1].lower() result.append(piece) if result[0] == result[0].lower(): result[0] = result[0].title() if result[-1] == result[-1].lower(): result[-1] = result[-1].title() final.append(' '.join(result)) return final -- ~Ethan~ From ethan at stoneleaf.us Thu Jun 27 03:16:59 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 26 Jun 2013 18:16:59 -0700 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> Message-ID: <51CB928B.2030706@stoneleaf.us> On 06/26/2013 04:15 PM, Joshua Landau wrote: > On 26 June 2013 16:24, Ethan Furman wrote: >> On 06/26/2013 07:46 AM, Joshua Landau wrote: >>> >>> On 26 June 2013 09:04, Ethan Furman wrote: >>>> >>>> A word doesn't stand out like a character does, plus this usage of pass >>>> is >>>> completely different from its normal usage. >>>> >>>> We're already used to interpreting '*' as a coin with two sides, let's >>>> stick >>>> with it: >>>> >>>> def apply_map(map, target, *, frobble): # '*' means frobble is >>>> keyword >>>> only >>>> ... >>>> >>>> and later: >>>> >>>> frobble = some_funny_stuff_here() >>>> . >>>> . >>>> . >>>> apply_map(map=kansas, target=toto, *, frobble) # '*' means frobble >>>> maps >>>> to keyword frobble >>> >>> >>> Whilst Greg Ewing has made me also much more sympathetic to this view, >>> I feel that: >>> >>> 1) This is nearly unreadable - it does not say what it does in the >>> slightest >> >> >> And the '*' and '**' in function defintions do? > > Yes. The "*" symbol means "unpack" across a very large part of python, > and the lone "*" was a simple extension to what it already did. There > was no leap; I could've guessed what it did. It does not mean > "magically make an object know what its name is and then unpack both > of those -- implicitly over all of the following args!". Until recently the '*' meant 'pack' if it was in a function header, and 'unpack' if it was in a function call, and wasn't usable anywhere else. Now it also means 'unpack' in assignments, as well as 'keywords only after this spot' in function headers. Likewise with '**' (except for the assignments part). I don't know about you, but the first time I saw * and ** I had no idea what they did and had to learn it. >>> 2) It's added syntax - that's a high barrier. I'm not convinced it's >>> worth it yet. >> >> It is a high barrier; but this does add a bit of symmetry to the new >> '*'-meaning-keyword-only symbol. > > I don't think it does - there's no symmetry as they have completely > different functions. Like pack and unpack are completely different? ;) I see it as: function header: '*' means only keywords accepted after this point function call: '*' okay, here's my keywords ;) >>> 3) It still feels like hackery; I might prefer something explicitly >>> hackery like this: >> >> You'll get used to it. ;) > > I bet you I won't :P. heheh -- ~Ethan~ From abarnert at yahoo.com Thu Jun 27 04:41:19 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 26 Jun 2013 19:41:19 -0700 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: On Jun 26, 2013, at 16:25, Joshua Landau wrote: > On 27 June 2013 00:17, Andrew Barnert wrote: >> >> In case it's not obvious how ridiculous it is to extrapolate MISRA-C to other languages? >> >> I went through the 127 rules in MISRA-C 1998. About 52 of them could be extrapolated to Python. Of those, 20 make some sense; the other 32 would be either ridiculous or disastrous to apply in Python. (In fact, much the same is true even for more closely-related languages like C++ or JavaScript.) >> >> If you're curious about the details, see http://stupidpythonideas.blogspot.com/2013/06/misra-c-and-python.html. > > "67. rcm: Don't rebind the iterator in a for loop." > > That sounds quite sane to me... Why? In C, the loop iterator is a mutable variable; assigning to it changes the loop flow. for (i=0; i!=10; ++i) { if (i % 2) { i = 0; } printf("%d\n", i); } That looks like it should alternate even numbers with 0's, up to 10. But it actually produces an infinite loop of 0's. Very confusing. In Python, the loop iterator is just a name for a value; assigning to it does not change that value or affect the flow in any way; it just gives you a convenient name for a different value. for i in range(10): if i % 2: i = 0 print i That looks like it should alternate even numbers with 0's, up to 10. And that's exactly what it does. Obviously this is a silly toy example, but there's plenty of real code that does this kind of thing. From ron3200 at gmail.com Thu Jun 27 04:44:40 2013 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 26 Jun 2013 21:44:40 -0500 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> Message-ID: On 06/26/2013 06:20 PM, Nick Coghlan wrote: > I'm personally not opposed to allowing "else break" after the if clause as > a way of terminating iteration early in comprehensions and generator > expressions, as it's consistent with the existing translation to an > explicit loop: The 'else' in these cases is tricky... And may make the whole expression look too much like a if-else expression. [See last example] > {i:i*2 for i in range(10) if i<3 else break} > > I'd even be OK with the use of an "else" clause to define alternative > entries to use when the condition is false: > > {i:i*2 for i in range(10) if i<3 else i:i} This already works. >>> {i:(i*2 if i<3 else i) for i in range(10)} {0: 0, 1: 2, 2: 4, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} And this: >>> [x if x<5 else 0 for x in range(10)] [0, 1, 2, 3, 4, 0, 0, 0, 0, 0] These take care of all one to one mapping of values. The 'if' after the iterator handles the selection of values. Given that, it makes sense to allow flow control statements after the last if, but not expressions effecting the values. (Not ... else i:i) Here are some examples of how I thinking about this... This includes values less than limit. [x for x in range(n) if x=limit break] If we included else break, it would need to be spelled... [x for x in range(n) if xlimit else break] Resembles an if-else expression too much... Parentheses for clarity. result = [(x for x in range(n)) if (x A reasonable constraint on the complexity may be to disallow mixing this > extended conditional form with the nested loop form. > > I suspect Guido still won't be a fan either way, though :) > > Cheers, > Nick. > From ron3200 at gmail.com Thu Jun 27 05:01:20 2013 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 26 Jun 2013 22:01:20 -0500 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51CB928B.2030706@stoneleaf.us> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> <51CB928B.2030706@stoneleaf.us> Message-ID: On 06/26/2013 08:16 PM, Ethan Furman wrote: > > I don't know about you, but the first time I saw * and ** I had no idea > what they did and had to learn it. One of my fist thoughts when I first learned Python was that it would have made things clearer if they used two different symbols for pack and unpack. def foo(*args, **kwds): return bar(^args, ^^kwds) There isn't any issue for the computer having those the same, but as a human, the visual difference would have helped make it easier to learn. But it's very hard to change something like this. And even harder to change how people think once they are used to something. Cheers, Ron From abarnert at yahoo.com Thu Jun 27 07:13:28 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 26 Jun 2013 22:13:28 -0700 (PDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> Message-ID: <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> From: Ron Adam Sent: Wednesday, June 26, 2013 7:44 PM > The 'if' after the iterator handles the selection of values.? Given > that, it makes sense to allow flow control statements after the last if, but not > expressions effecting the values. First, why restrict break to the last if clause? Saying it can't go in a for clause or in the controlled expression I understand, but? currently, there's nothing special about the last clause (or any other clause, except for the first, in genexps only), and it seems odd to change that. But meanwhile, your description made me realize exactly what feels weird about the whole idea. A comprehension doesn't have flow control statements. It has _clauses_. Obviously they're related things; there's a mapping that's rigorously definable and intuitively clear. But they don't have a colon and a suite. (They don't even take the same set of expressions, but that's not important here.) Intuitively, that's because the whole point of a comprehension is that the rest of the comprehension, especially including the core expression, is the suite. Adding a suite breaks that.? Forget that break means the flow control is no longer always downward-local. It also completely turns the meaning of the controlling if statement around. Instead of "if this: the rest of the expression" it's "if this: the suite, else: the rest of the expression". On top of that, the clauses map to nested statements. You can't nest break statements. All of this becomes more obvious if you try to describe things rigorously. Let's rewrite http://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries to explain the semantics of break. In order to make the issues more obvious,?I'm going to ignore continue, use your "only at the end" restriction, and not add a rule saying that the break has to be controlled by an if. (If you want to write [x for subiter in iter for x in subiter break], fine, you get back []?this actually won't be interpretable for a genexp with only one clause, but let that be an error because it translates to an error.)?I don't think changing any of those would solve any of the problems, they'd just make things more complicated and harder to see. First, add the new syntax. There are two obvious ways to do it: ? ? comp_if ::= ?"if" comp_if_body expression_nocond ["break" | comp_iter] Or: ? ? comp_iter ::= comp_for | comp_if | comp_break ? ? comp_break ::= "break" If you keep the existing semantics, this: ? ? x = [value for value in iter if pred(value) break] ? maps to: ? ? x = [] ? ? for value in iter: ? ? ? ? if pred(value): ? ? ? ? ? ? break ? ? ? ? ? ? ? ? x.append(value) Or maybe the append is a sibling of the break rather than a child? Either way, it makes no sense. It has to map to something like this: ? ? x = [] ? ? for value in iter: ? ? ? ? if pred(value): ? ? ? ? ? ? break ? ? ? ? x.append(value) Which means the clauses are no longer nested, so the simple explanation no longer works. Instead, you need something more convoluted, like: The comprehension consists of a single expression followed by at least one for clause, zero or more for or if clauses, and zero or one break clause.?In this case, the elements of the new container are those that would be produced by considering each of the for or if clauses a block,?nesting from left to right,?and the break clause, if present, a simple statement,?and evaluating the expression to produce an element each time the innermost block that does not directly contain a break is entered and a block directly containing a break, if any, is not entered. I think you can make it a little less convoluted by using the concept of exiting a block (which is well-defined, thanks to with statements), but I think that just makes it even less intuitive: The comprehension consists of a single expression followed by at least one for clause, zero or more for or if clauses, and zero or one break clause.?In this case, the elements of the new container are those that would be produced by considering each of the for or if clauses a block,?nesting from left to right,?and the break clause, if present, a simple statement,?and evaluating the expression to produce an element each time the innermost block is exited without executing a break. There's no way to explain this simply, because it's no longer simple. As a side note: > ? ? [x for x in range(n) if x ? ? [x for x in range(n) if n<10 else range(10)] I don't think that's valid (although I'm not sure without testing).?And?I'm not sure if it would be as hard for the parser to deal with as it is for a human. But really, if a human can't parse it, it's meaningless. From stephen at xemacs.org Thu Jun 27 07:55:49 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 27 Jun 2013 14:55:49 +0900 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: <87zjucxfkq.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: >?Compare: > > > ? ? [x for x in range(n) if n<10 else range(10)] > > I don't think that's valid (although I'm not sure without testing). It's not. Just add parentheses to get a ternary expression: >>> n = 11 >>> [x for x in range(n) if n<10 else range(10)] File "", line 1 [x for x in range(n) if n<10 else range(10)] ^ SyntaxError: invalid syntax >>> [x for x in (range(n) if n<10 else range(10))] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> Python 3.2.5 on Mac OS X From abarnert at yahoo.com Thu Jun 27 09:28:24 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 27 Jun 2013 00:28:24 -0700 (PDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> Let me try to gather together all of the possibilities that have been discussed in this and the two previous threads, plus a couple of obvious ones nobody's mentioned. Unless I'm missing a good idea, or someone can explain why one of these isn't as bad as it seems, I don't like any of them.?Some of them are ugly right off the bat. The rest are?deceptively appealing until you think them through, but then they're even worse than the obviously bad ones. I'll try to put them in order from least bad to most horrid. (I'm +0.25 on #1, but not until Python 4; -0 on #2; -1 on #3, and it's all downhill from there.) 1. Redefine comprehensions on top of generator expressions instead of defining them in terms of nested blocks. ? ? def stop(): raise StopIteration ? ? x = [value if pred(value) else stop() for value in iterable] This would make the implementation of Python simpler.? It also makes the language conceptually simpler. The subtle differences between [x for x in foo] and list(x for x in foo) are gone. And it's actually a pretty small change to the official semantics.?Just?replace the last two paragraphs of 6.2.4 with?"The comprehension consists of a single expression followed by at least one for clause and zero or more for or if clauses. In this case, the expression and clauses are interpreted as if they were a generator expression, and the?elements of the new container are those yielded by that generator expression consumed to completion." (It also makes it easier to fix the messy definition of dict comprehensions, if anyone cares.) Unlike #5, 6, 7, 8, and 10, but like #2, 3, 4, and 9, this only?allows you to break out of one for clause, not any. But that's exactly the same as break only being able to break out of one for loop. Nobody complains that Python doesn't have "break 2" or "break :label", right? The real downside is that this is a very radical change. People may sometimes rely on the differences between listcomps and genexps, maybe even without realizing they're doing so. Code that's worked unchanged from 2.0 to 3.3 might break. Also, the obvious implementation would make listcomps?about 25-50% slower, and trying to tweak the existing optimized comprehension code to work as-if wrapping a generator?(and writing tests for all of the edge cases) sounds like a lot of work. And at best it would still be at least a _little_ slower (handling StopIteration, handling errors in the outer for clause differently from other clauses, etc.?aren't free). Still, I could get behind this for Python 4.0. 2. Just require comprehensions to handle StopIteration. The main cost and benefit are the same as #1. However, it makes the language and implementation more complex, rather than simpler. Also, the effects of this less radical change (you can no longer pass StopIteration through a comprehension) seem like they might be harder to explain to people than the more radical one. And, worse, they less radical effects would probably cause more subtle bugs. Which implies a few versions where passing StopIteration through a comprehension triggers a warning. 3. Turn break from a statement into an expression. ? ? x = [value if pred(value) else break for value in iterable] The break expression, when evaluated, would have the exact same effect as executing the statement today (including being an error if it's not nested syntactically under a loop or inside a comprehension). This would be trivial to implement, document, and teach. But I don't think anyone wants an expression that inherently has no value. Python doesn't have any such expressions today. (Yes, you can write a function call that never returns normally?like sys.exit()?but that still has a return, it just never gets there.) Also, today, Python gets a lot of mileage out of separating statements and expressions. You can see the flow at a glance, and there's exactly one side-effect per line. If we're going to lose that, I'd rather get something a lot cooler in exchange. 4. Add a break expression that's only allowed inside a comprehension expression. Basically the same as #3 with less wide-reaching effects? but harder to describe. (How would you write the grammar? Duplicate every expression node: comp_conditional_expression, comp_lambda_form, ?? Allow break as an expression syntactically, but make it an error semantically except directly in an expression_stmt or comprehension?) And of course it makes the language less consistent. I'd rather have #3, and just say "don't use break as an expression except in comprehensions" in PEP 8 and in linters. 5. Add a new until statement, and a corresponding until clause to comprehensions. ? ? x = [value for value in iterable until pred(value)] It's intuitively obvious what it means.?And it's easy to define: ? ??comp_iter ::= comp_for | comp_if | comp_until ? ? comp_until ::= "until" expression [comp_iter] Then just s/for or if/for, if, or until/g in 6.2.4. Of course you have to define an until statement: ? ? until expression: suite ? equivalent to: ? ? if expression: break ? ? else: suit ? with syntax: ? ? until_stmt ::= "until" expression ":" suite ? and semantics: > until may only?occur syntactically nested in a for or while loop, but not nested in a function or class definition within that loop. ?If the expression is found to be true, it terminates the nearest enclosing loop; otherwise, the suite is executed. ? and nobody will ever use that statement. Which is a hell of an argument against adding it to the language. Obviously you could use a different new keyword, with the same sense or the opposite. But I don't think anything is any better. A couple of people suggested when, but that's even worse?besides sounding like it means "if" rather than "while" here, it actually _does_ mean "if" in most languages that have it?Clojure, CoffeeScript, Racket, etc. 6. Add the until clause without the statement. Basically the same as #5, but interpreted as-if an until statement existed, which it won't. To me, this is much worse than #5. It adds the same complexity to the language, and makes it inconsistent to boot. To interpret a comprehension, you'll have to map it to a nested block that contains statements that don't exist?you have no experience with, and no way to gain experience with them, and you can't even run the resulting code. 7. Add a "magic" while clause that's basically until with the opposite sense. ? ? x = [value for value in iterable while pred(value)] This reads pretty nicely (at least in trivial comprehensions), it parallels takewhile and friends, and it matches a bunch of other languages (most of the languages where "when" means "if", "while" means this). But it has a completely different meaning from while statements, and in fact a barely-related one. In particular, it's obviously not this: ? ? x = [] ? ? for value in iterable: ? ? ? ? while pred(value): ? ? ? ? ? ? x.append(value) What it actually means is: ? ? x = [] ? ? for value in iterable: ? ? ? ? if pred(value): ? ? ? ? ? ? x.append(value) ? ? ? ?else: ? ? ? ? ? ?break Imagine trying to teach that to a novice. Or trying to write the formal definition in 6.2.4. This makes the language much more inconsistent than #6.? 8. Allow else break in comp_if clauses. ? ? x = [value for value in iterable if pred(value) else break] This one is pretty easy to define rigorously, since it maps to?exactly what the while attempt maps to with a slight change to the existing rules. But to me, it makes the code a confusing mess. I'm immediately reading "iterable if pred(value) else break", and that's wrong. Also, while it's pretty easy to convert this to a nested block in your head, it's not as easy to just read out in line, because the rest of the comprehension and the main expression no longer just nest under the end of the clause; instead, they nest under the middle of it Also, an else clause that can't be used for anything but break is very weird. But it doesn't make sense to put anything else there. 9. Add a special comp_break clause. ? ? x = [value for value in iterable if pred(value) break] The syntax is intuitively simple (break clauses nest just like if and for clauses, and the fact that you can't nest anything underneath it is obvious), and also trivial to define (see my last email). But this reverses the sense of the controlling if. In fact, each time I read it, the meaning seems to flip back and forth until I finally get it. And there's no way to make the semantics reasonable if they're defined in terms of nested blocks (as they are today) without lots of special-case language to deal with break. (See my last email.) 10. Allow break in comp_if clauses. ? ? x = [value for value in iterable if pred(value) break] The syntax is almost simple as #9, and it gets you more flexibility (because there's nothing stopping you from putting a break under any if statement, not just the last). However, intuitively this is exactly the same as #9. An if statement with a break means something completely different, and nearly opposite, to one without a break. And trying to define the meaning is even more complicated. ----- Original Message ----- > From: Andrew Barnert > To: "ron3200 at gmail.com" ; "python-ideas at python.org" > Cc: > Sent: Wednesday, June 26, 2013 10:13 PM > Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: > > From: Ron Adam > > Sent: Wednesday, June 26, 2013 7:44 PM > > >> The 'if' after the iterator handles the selection of values.? Given > >> that, it makes sense to allow flow control statements after the last if, > but not >> expressions effecting the values. > > First, why restrict break to the last if clause? Saying it can't go in a for > clause or in the controlled expression I understand, but? currently, there's > nothing special about the last clause (or any other clause, except for the > first, in genexps only), and it seems odd to change that. > > > But meanwhile, your description made me realize exactly what feels weird about > the whole idea. > > A comprehension doesn't have flow control statements. It has _clauses_. > Obviously they're related things; there's a mapping that's > rigorously definable and intuitively clear. But they don't have a colon and > a suite. (They don't even take the same set of expressions, but that's > not important here.) > > Intuitively, that's because the whole point of a comprehension is that the > rest of the comprehension, especially including the core expression, is the > suite. Adding a suite breaks that.? > > Forget that break means the flow control is no longer always downward-local. It > also completely turns the meaning of the controlling if statement around. > Instead of "if this: the rest of the expression" it's "if > this: the suite, else: the rest of the expression". > > On top of that, the clauses map to nested statements. You can't nest break > statements. > > All of this becomes more obvious if you try to describe things rigorously. > Let's rewrite > http://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries > to explain the semantics of break. > > In order to make the issues more obvious,?I'm going to ignore continue, use > your "only at the end" restriction, and not add a rule saying that the > break has to be controlled by an if. (If you want to write [x for subiter in > iter for x in subiter break], fine, you get back []?this actually won't be > interpretable for a genexp with only one clause, but let that be an error > because it translates to an error.)?I don't think changing any of those > would solve any of the problems, they'd just make things more complicated > and harder to see. > > First, add the new syntax. There are two obvious ways to do it: > > ? ? comp_if ::= ?"if" comp_if_body expression_nocond > ["break" | comp_iter] > > Or: > > ? ? comp_iter ::= comp_for | comp_if | comp_break > ? ? comp_break ::= "break" > > If you keep the existing semantics, this: > > > ? ? x = [value for value in iter if pred(value) break] > > ? maps to: > > ? ? x = [] > ? ? for value in iter: > ? ? ? ? if pred(value): > ? ? ? ? ? ? break > ? ? ? ? ? ? ? ? x.append(value) > > Or maybe the append is a sibling of the break rather than a child? Either way, > it makes no sense. It has to map to something like this: > > ? ? x = [] > ? ? for value in iter: > ? ? ? ? if pred(value): > ? ? ? ? ? ? break > ? ? ? ? x.append(value) > > Which means the clauses are no longer nested, so the simple explanation no > longer works. Instead, you need something more convoluted, like: > > The comprehension consists of a single expression followed by at least one for > clause, zero or more for or if clauses, and zero or one break clause.?In this > case, the elements of the new container are those that would be produced by > considering each of the for or if clauses a block,?nesting from left to > right,?and the break clause, if present, a simple statement,?and evaluating the > expression to produce an element each time the innermost block that does not > directly contain a break is entered and a block directly containing a break, if > any, is not entered. > > I think you can make it a little less convoluted by using the concept of exiting > a block (which is well-defined, thanks to with statements), but I think that > just makes it even less intuitive: > > The comprehension consists of a single expression followed by at least one for > clause, zero or more for or if clauses, and zero or one break clause.?In this > case, the elements of the new container are those that would be produced by > considering each of the for or if clauses a block,?nesting from left to > right,?and the break clause, if present, a simple statement,?and evaluating the > expression to produce an element each time the innermost block is exited without > executing a break. > > There's no way to explain this simply, because it's no longer simple. > > > As a side note: > > >> ? ? [x for x in range(n) if x > > This is already hard to distinguish from a ternary expression, even without the > other stuff you bring in later.?Compare: > >> ? ? [x for x in range(n) if n<10 else range(10)] > > I don't think that's valid (although I'm not sure without > testing).?And?I'm not sure if it would be as hard for the parser to deal > with as it is for a human. But really, if a human can't parse it, it's > meaningless. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From ncoghlan at gmail.com Thu Jun 27 10:43:07 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 27 Jun 2013 18:43:07 +1000 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: On 27 June 2013 17:28, Andrew Barnert wrote: > 8. Allow else break in comp_if clauses. > > x = [value for value in iterable if pred(value) else break] > > This one is pretty easy to define rigorously, since it maps to exactly what the while attempt maps to with a slight change to the existing rules. > > But to me, it makes the code a confusing mess. I'm immediately reading "iterable if pred(value) else break", and that's wrong. Ouch, I hadn't noticed that parallel. Yeah, it's a bad idea :) Someone was asking earlier why I thought PEP 403 was at all relevant to these discussions. It's because it makes it easy to define a one shot generator function to precisely control the iterable in a comprehension. If we assume a world where itertools.takewhile doesn't exist, you could write an equivalent inline: @in x = [transform(value) for value in my_takewhile(pred, iterable)] def my_takewhile(pred, itr): for x in itr: if not pred(x): break yield x All the important inputs are still visible in the header line, and the definition of "takewhile" is right there, rather than sending the reader off to another part of the code. It's an intermediate step between "This callable needs to be a function or generator to get access to full statement syntax, but we only use it in this one place" and "This is a useful piece of independent functionality, let's make it available as a function or method". Would such syntax be a net win for a language? Highly arguable, which is why the PEP is still deferred. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From oscar.j.benjamin at gmail.com Thu Jun 27 12:16:58 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 27 Jun 2013 11:16:58 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: On 27 June 2013 08:28, Andrew Barnert wrote: > Let me try to gather together all of the possibilities that have been discussed > in this and the two previous threads, plus a couple of obvious ones nobody's > mentioned. You've missed out having "else return" in comprehensions. I like this less than a while clause but it was preferred by some as it unrolls perfectly in the case that the intention is to break out of all loops e.g.: [f(x, y) for x in xs for y in ys if g(x, y) else return] becomes _tmplist = [] def _tmpfunc(): for x in xs: for y in ys: if g(x, y): _tmplist.append(f(x, y)) else: return _tmpfunc() > 1. Redefine comprehensions on top of generator expressions instead of defining them in terms of nested blocks. > > def stop(): raise StopIteration > > x = [value if pred(value) else stop() for value in iterable] I prefer x = [value for value in iterable if pred(value) or stop()] so that the flow control is all on the right hand side of the "in". > > This would make the implementation of Python simpler. > > It also makes the language conceptually simpler. The subtle differences between [x for x in foo] and list(x for x in foo) are gone. > > And it's actually a pretty small change to the official semantics. Just replace the last two paragraphs of 6.2.4 with "The comprehension consists of a single expression followed by at least one for clause and zero or more for or if clauses. In this case, the expression and clauses are interpreted as if they were a generator expression, and the elements of the new container are those yielded by that generator expression consumed to completion." (It also makes it easier to fix the messy definition of dict comprehensions, if anyone cares.) > > Unlike #5, 6, 7, 8, and 10, but like #2, 3, 4, and 9, this only allows you to break out of one for clause, not any. But that's exactly the same as break only being able to break out of one for loop. Nobody complains that Python doesn't have "break 2" or "break :label", right? I'm not sure why you expect that it would only break out of one for clause; I expect it to break out of all of them. That's how it works with generator expressions: Python 2.7.3 (default, Sep 26 2012, 21:51:14) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> list(x + y for x in 'abc' for y in '123') ['a1', 'a2', 'a3', 'b1', 'b2', 'b3', 'c1', 'c2', 'c3'] >>> def stop(): raise StopIteration ... >>> list(x + y for x in 'abc' if x == 'a' or stop() for y in '123') ['a1', 'a2', 'a3'] >>> list(x + y for x in 'abc' for y in '123' if y == '1' or stop()) ['a1'] It also works that way if you spell it the way that you did: >>> list(x + y if y == '1' else stop() for x in 'abc' for y in '123') ['a1'] > 2. Just require comprehensions to handle StopIteration. > > The main cost and benefit are the same as #1. > > However, it makes the language and implementation more complex, rather than simpler. > > Also, the effects of this less radical change (you can no longer pass StopIteration through a comprehension) seem like they might be harder to explain to people than the more radical one. I think that the current behaviour is harder to explain. > And, worse, they less radical effects would probably cause more subtle bugs. Which implies a few versions where passing StopIteration through a comprehension triggers a warning. I would be happy if it triggered a warning anyway. I can't imagine a reasonable situation where that isn't a bug. > 7. Add a "magic" while clause that's basically until with the opposite sense. > > x = [value for value in iterable while pred(value)] > > This reads pretty nicely (at least in trivial comprehensions), it parallels takewhile and friends, and it matches a bunch of other languages (most of the languages where "when" means "if", "while" means this). > > But it has a completely different meaning from while statements, and in fact a barely-related one. > > In particular, it's obviously not this: > > x = [] > > for value in iterable: > while pred(value): > x.append(value) > > What it actually means is: > > x = [] > > for value in iterable: > if pred(value): > x.append(value) > else: > break > > Imagine trying to teach that to a novice. I can definitely imagine teaching it to a novice. I have taught Python to groups of students who are entirely new to programming and also to groups with prior experience of other languages. I would not teach list comprehensions by unrolling them unless it was a more advanced Python programming course. To explain the list comprehension with while clauses I imagine having the following conversation and interactive session: ''' Okay so a list comprehension is a way of making a new list out of an existing list. Let's say we have a list called numbers_list like >>> numbers_list = [1,2,3,4,5,4,3,2,1] >>> numbers_list [1, 2, 3, 4, 5, 4, 3, 2, 1] Now we want to create a new list called squares_list containing the square of each of the numbers in numbers_list. We can do this very easily with a list comprehension and it looks like >>> squares_list = [n ** 2 for n in numbers_list] >>> squares_list [1, 4, 9, 16, 25, 16, 9, 4, 1] The list comprehension loops through all the numbers in numbers_list and, calling the current number n, computes n squared (n ** 2). As it does this is puts all the n squared numbers into a new list in the same order. We can also add an "if clause" to choose which elements from numbers_list we will use to make the new list. To make a list that is the square of all the numbers from numbers_list that are less than 4 we can do >>> [n ** 2 for n in numbers_list if n < 4] [1, 4, 9, 9, 4, 1] Now the comprehension includes n ** 2 in the new list only if n < 4; otherwise n is ignored and the comprehension moves on to the next number from numbers_list. Also if we want the list comprehension to stop looping over the numbers in numbers_list after, for example, seeing a particular number we can use a "while clause" instead of an "if clause". If we want the comprehension to read numbers from numbers_list only while all of the numbers seen are less than 4 then we could do >>> [n ** 2 for n in numbers_list while n < 4] [1, 4, 9] In this case what happens is that as soon as the comprehension finds the number 4 from numbers_list the while condition isn't true any more so it stops reading numbers from numbers_list. This means it doesn't find the other numbers that are also less than 4 at the end of numbers_list (unlike the if clause). ''' > 8. Allow else break in comp_if clauses. > > x = [value for value in iterable if pred(value) else break] > > This one is pretty easy to define rigorously, since it maps to exactly what the while attempt maps to with a slight change to the existing rules. > > But to me, it makes the code a confusing mess. I'm immediately reading "iterable if pred(value) else break", and that's wrong. You wouldn't have that confusion with "else return". Oscar From abarnert at yahoo.com Thu Jun 27 13:28:36 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 27 Jun 2013 04:28:36 -0700 (PDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: <1372332516.17577.YahooMailNeo@web184702.mail.ne1.yahoo.com> From: Oscar Benjamin Sent: Thursday, June 27, 2013 3:16 AM > On 27 June 2013 08:28, Andrew Barnert wrote: >> Let me try to gather together all of the possibilities that have been >> discussed >> in this and the two previous threads, plus a couple of obvious ones >> nobody's >> mentioned. > > You've missed out having "else return" in comprehensions. For simplicity, I was just dealing with the "break" solutions, not "continue" and "return" (and "raise" or anything else anyone might want). I should have said that, but I didn't; sorry. So, this goes under case 8. But it's worth noting that return adds additional mental load on top of break. Comprehensions aren't functions, either conceptually or practically, but it will force you to think of them as functions and carry that cognitive dissonance around as you read them. Comprehensions _are_ loops, so break doesn't have that problem. > I like this > less than a while clause but it was preferred by some as it unrolls > perfectly in the case that the intention is to break out of all loops Some of the ideas can only break out of the last loop, some can only break out of all of them, some can break out of any one loop.?I don't think it will make a difference that often in comprehensions. A comprehension with two for clauses and an if is already pushing the limits of readability; throwing in a break or return as well just seems like asking for trouble. But you're right, that is the difference between else break and else return. >> 1. Redefine comprehensions on top of generator expressions instead of? > defining them in terms of nested blocks. >> >> ? ? def stop(): raise StopIteration >> >> ? ? x = [value if pred(value) else stop() for value in iterable] > > I prefer > > x = [value for value in iterable if pred(value) or stop()] >? > so that the flow control is all on the right hand side of the "in". I suppose this solution allows either. Personally, I think using or for non-trivial flow control is more obscure than helpful. Would you write this? ? ? for value in iterable: ? ? ? ? if pred(value) or stop(): ? ? ? ? ? ? yield value Of course not; you'd write: ? ? for value in iterable: ? ? ? ? if pred(value): ? ? ? ? ? ? yield value ? ? ? ? else: ? ? ? ? ? ? stop() >> Unlike #5, 6, 7, 8, and 10, but like #2, 3, 4, and 9, this only allows you? >> to break out of one for clause, not any. But that's exactly the same as >> break only being able to break out of one for loop. Nobody complains that Python >> doesn't have "break 2" or "break :label", right? > > I'm not sure why you expect that it would only break out of one for > clause; I expect it to break out of all of them. That's how it works > with generator expressions: My fault again, I wasn't clear here. What I meant is that #1, 2, 3, 4, and 9 don't give you the option of where or how far to break; #5, 6, 7, 8, and 10 do.?The important bit was that I don't think it's that important. >> 2. Just require comprehensions to handle StopIteration. >> >> The main cost and benefit are the same as #1. >> >> However, it makes the language and implementation more complex, rather than > simpler. >> >> Also, the effects of this less radical change (you can no longer pass >> StopIteration through a comprehension) seem like they might be harder to explain >> to people than the more radical one. >? > I think that the current behaviour is harder to explain. What's hard to explain about the current behavior? StopIteration passes through comprehensions the same way it does through for loops. >> 7. Add a "magic" while clause that's basically until with the > opposite sense. >> >> ? ? x = [value for value in iterable while pred(value)] >> >> This reads pretty nicely (at least in trivial comprehensions), it parallels >> takewhile and friends, and it matches a bunch of other languages (most of the >> languages where "when" means "if", "while" means >> this). >> >> But it has a completely different meaning from while statements, and in >> fact a barely-related one. ? >> Imagine trying to teach that to a novice. > > I can definitely imagine teaching it to a novice. I have taught Python > to groups of students who are entirely new to programming and also to > groups with prior experience of other languages. I would not teach > list comprehensions by unrolling them unless it was a more advanced > Python programming course. This makes perfect sense today. Going from novice to intermediate understanding of list comprehensions, and many other areas of Python, is almost trivial, and that's part of what makes teaching Python so much easier than teaching, say, C++. I've seen hundreds of?people show up on StackOverflow and similar places completely baffled by a complex list comprehension in some code they've run into. As soon as you show them how to unroll it, they immediately get it. If that were no longer true, how would you get people over that step? Also, in my experience, devs coming over from other languages, at least the good ones, want to understand the abstractions, not just use them. They'll want to know?why a for clause is just like a for statement and an if clause is just like an if statement but a while clause is nothing like a while statement. Partly this is because many of them are still spending 80% of their time writing in JavaScript or C# or whatever and only 15% in Python, and anything unique about Python that they can't understand, they're going to forget.?Comprehensions are flow control expressions that map to flow control statements in a way that's not just simple, but obvious?once you get it, you can't forget it. But if we break the abstraction, that will no longer we true. >> 8. Allow else break in comp_if clauses. >> >> ? ? x = [value for value in iterable if pred(value) else break] >> >> This one is pretty easy to define rigorously, since it maps to exactly what > the while attempt maps to with a slight change to the existing rules. >> >> But to me, it makes the code a confusing mess. I'm immediately reading > "iterable if pred(value) else break", and that's wrong. > > You wouldn't have that confusion with "else return". Why not? They're both single-keyword flow-control statements. Why would anyone's brain read else return any differently from else break? From abarnert at yahoo.com Thu Jun 27 13:49:53 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 27 Jun 2013 04:49:53 -0700 (PDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> From: Nick Coghlan Sent: Thursday, June 27, 2013 1:43 AM > Someone was asking earlier why I thought PEP 403 was at all relevant > to these discussions. It's because it makes it easy to define a one > shot generator function to precisely control the iterable in a > comprehension. If we assume a world where itertools.takewhile doesn't > exist, you could write an equivalent inline: > > ? ? @in x = [transform(value) for value in my_takewhile(pred, iterable)] > ? ? def my_takewhile(pred, itr): > ? ? ? ? for x in itr: > ? ? ? ? ? ? if not pred(x): > ? ? ? ? ? ? ? ? break > ? ? ? ? ? ? yield x I'm still not sure how it's relevant here. Obviously?takewhile already exists in the stdlib, and if I wanted stop I'd probably want it frequently enough to define it and reuse it.?So, for the current discussion, are you just suggesting using it to break up?an overly-verbose or -complex takewhile solution, as an answer to "Yeah, I know about takewhile, but it's too verbose or complex so I don't want to use it"? If so, I don't think it answers that.?If someone wants this: ? ? x = [value for value in iterable if value > 0 while value < 10] ? and rejects this: ? ? x = [value for value in takewhile(lambda x: x < 10, iterable) if value > 0] ? I don't think they'd be happier with this: ? ? @in x =?[value for value in takewhile(under10, iterable) if value > 0] ? ? def under10(value): ? ? ? ? return?value?< 10 And as someone who _wouldn't_ reject the takewhile, if I want to break it up, I'd?be happier going in the opposite direction: ? ? iterable_to_10 = takewhile(lambda?value:?value?< 10, iterable) ? ? x = [value?for?value?in iterable_to_10 if value > 0] That puts it as a sequence of simple transformations to an iterable, rather than a bunch of simple functions plugged into different places in one big expression. (Although admittedly my stupid toy example, with its total of 1 transformations or functions, isn't the best demonstration of that?) Where your idea would really be useful is, exactly, as you say, "??to get?access to full statement syntax, but we only use it in this one place". I don't think the kinds of things people are envisioning putting in while clauses need statement syntax. The only reason statement syntax even really came up is the idea of break and/or a more-statement-like if?as an alternative to while. Having a real function wouldn't help with break, because you're not allowed to break across function boundaries. In fact, I think what people really want is the opposite: to write an expression, not a function. That's a big part of the appeal of using comprehensions over map and filter: if you don't already have a ready-made function, no problem (and no lambdas or partials), just use a comprehension and write the expression in-place. People want a similar answer to make takewhile at least as unnecessary as map and filter. From joshua.landau.ws at gmail.com Thu Jun 27 14:10:56 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Thu, 27 Jun 2013 13:10:56 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51CB928B.2030706@stoneleaf.us> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> <51CB928B.2030706@stoneleaf.us> Message-ID: On 27 June 2013 02:16, Ethan Furman wrote: > On 06/26/2013 04:15 PM, Joshua Landau wrote: >> >> On 26 June 2013 16:24, Ethan Furman wrote: >>> >>> On 06/26/2013 07:46 AM, Joshua Landau wrote: >>>> >>>> I feel that: >>>> >>>> 1) This is nearly unreadable - it does not say what it does in the >>>> slightest >>> >>> >>> And the '*' and '**' in function defintions do? >> >> Yes. The "*" symbol means "unpack" across a very large part of python, >> and the lone "*" was a simple extension to what it already did. There >> was no leap; I could've guessed what it did. It does not mean >> "magically make an object know what its name is and then unpack both >> of those -- implicitly over all of the following args!". > > Until recently the '*' meant 'pack' if it was in a function header, and > 'unpack' if it was in a function call, and wasn't usable anywhere else. Now > it also means 'unpack' in assignments, as well as 'keywords only after this > spot' in function headers. True; but "unpack" and "pack" are synonymous operations; and as I said the "keywords-only" wasn't a new feature, really. It was just a *restricted* version of its previous usage; packing. It didn't actually add anything but a useful subset of what we already had. I say "pack" and "unpack" are synonymous in the same way I think that these are synonymous uses of "+": a + 5 = 10 # So a = 5 a = 5 + 10 # So a = 15 You do the "same thing" but on different sides of the equation, or function call. The definition you seem to want to add is completely different. > Likewise with '**' (except for the assignments part). > > I don't know about you, but the first time I saw * and ** I had no idea what > they did and had to learn it. Yeah, true. *However*, and this is the point I'm trying to make: after I knew what def func(first_arg, *rest_of_args): pass did, I was able to guess (approximately) what: first_arg, *rest_of_args = iterable did. If you cannot, you probably need more trust in the developers. I would *never* have thought that: func(*, arg): pass meant func(arg=arg): pass despite that I sort of get the *very slight* (but not really even that) symmetry (now that you've explained it). >>>> 2) It's added syntax - that's a high barrier. I'm not convinced it's >>>> worth it yet. >>> >>> >>> It is a high barrier; but this does add a bit of symmetry to the new >>> '*'-meaning-keyword-only symbol. >> >> >> I don't think it does - there's no symmetry as they have completely >> different functions. > > Like pack and unpack are completely different? ;) No, they are symmetrical; like ? and ? are symmetrical. They are different, but the *same shape*. > I see it as: > > function header: '*' means only keywords accepted after this point > > function call: '*' okay, here's my keywords ;) But you don't give keywords, they magically appear. My "gut reaction" would be that: foo(*, blah) causes an error ("only keywords allowed after *"). I know it's pointless thing to do, but it's what it looks like. I would then go and "make up" several things to make it seem non-useless like: foo(*, mapping1, mapping2) == foo(**mapping1, **mapping2) but again it makes no sense. I feel no real symmetry from this. Something like: foo(*=blah, *=foo) would make a lot more sense and actually be symetrical, but I agree it's not as clean (nor am I a fan of it; this was just an example). Please remember that: def foo(*, a) != def foo(*, a=b) despite that both are valid -- you are seeing, I think, a false symmetry. I don't think I've really made my point clear in this last section, but I tried. From ncoghlan at gmail.com Thu Jun 27 14:19:19 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 27 Jun 2013 22:19:19 +1000 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <1372332516.17577.YahooMailNeo@web184702.mail.ne1.yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372332516.17577.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: On 27 June 2013 21:28, Andrew Barnert wrote: > From: Oscar Benjamin > > Sent: Thursday, June 27, 2013 3:16 AM > > >> On 27 June 2013 08:28, Andrew Barnert wrote: >>> Let me try to gather together all of the possibilities that have been >>> discussed >>> in this and the two previous threads, plus a couple of obvious ones >>> nobody's >>> mentioned. >> >> You've missed out having "else return" in comprehensions. > > For simplicity, I was just dealing with the "break" solutions, not "continue" and "return" (and "raise" or anything else anyone might want). I should have said that, but I didn't; sorry. So, this goes under case 8. > > But it's worth noting that return adds additional mental load on top of break. Comprehensions aren't functions, either conceptually or practically, but it will force you to think of them as functions and carry that cognitive dissonance around as you read them. Comprehensions _are_ loops, so break doesn't have that problem. FWIW, while I actually agree with you that "else return" doesn't fit because people *think* of comprehensions and generator as loops rather than as nested functions, they *are* defined as following the scoping rules of a nested function and CPython actually implements them that way: >>> from dis import dis >>> dis("[x for x in (1, 2, 3)]") 1 0 LOAD_CONST 0 ( at 0x7f45710efd00, file "", line 1>) 3 LOAD_CONST 1 ('') 6 MAKE_FUNCTION 0 9 LOAD_CONST 5 ((1, 2, 3)) 12 GET_ITER 13 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 16 RETURN_VALUE This is one of the things we cleaned up in Python 3 in order to stop leaking the iteration variable for comprehensions into the surrounding scope - we tried a few other options, but ultimately declaring them to be implicit nested functions proved to be the simplest approach. If you're willing, I'm actually thinking this may be one of those discussions that's worth summarising in a PEP, even if it's just to immediately mark it Rejected. Similar to PEP 315 and a few other PEPs, it can help to have a document that clearly spells out the objective (which I think you summarised nicely as "trying to find a syntactic replacement for itertools.takewhile, just as comprehensions replaced many uses of map and filter"), even if no acceptable solution has been found. That phrasing of the objective also highlights a counter argument I hadn't considered before: if we don't consider takewhile a common enough use case to make it a builtin, why on *earth* are we even discussing the possibility of giving it dedicated syntax? Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From oscar.j.benjamin at gmail.com Thu Jun 27 14:19:33 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 27 Jun 2013 13:19:33 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <1372332516.17577.YahooMailNeo@web184702.mail.ne1.yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372332516.17577.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: On 27 June 2013 12:28, Andrew Barnert wrote: > From: Oscar Benjamin > > Some of the ideas can only break out of the last loop, some can only break out of all of them, some can break out of any one loop. I don't think it will make a difference that often in comprehensions. A comprehension with two for clauses and an if is already pushing the limits of readability; throwing in a break or return as well just seems like asking for trouble. So what about Nick's suggestion that a syntax could be limited to cases where there is only one for clause? >>> def stop(): raise StopIteration >>> >>> x = [value if pred(value) else stop() for value in iterable] >> >> I prefer >> >> x = [value for value in iterable if pred(value) or stop()] >> >> so that the flow control is all on the right hand side of the "in". > > I suppose this solution allows either. Personally, I think using or for non-trivial flow control is more obscure than helpful. Would you write this? > > for value in iterable: > if pred(value) or stop(): > yield value No, but I find it readable in the comprehension. >>> 2. Just require comprehensions to handle StopIteration. >>> >>> The main cost and benefit are the same as #1. >>> >>> However, it makes the language and implementation more complex, rather than >> simpler. >>> >>> Also, the effects of this less radical change (you can no longer pass >>> StopIteration through a comprehension) seem like they might be harder to explain >>> to people than the more radical one. >> >> I think that the current behaviour is harder to explain. > > What's hard to explain about the current behavior? StopIteration passes through comprehensions the same way it does through for loops. It's hard to explain the difference between list(generator expression) and a list comprehension in this respect as it requires thinking about the mechanics of StopIteration. You can see my previous attempt here: http://mail.python.org/pipermail/python-ideas/2013-January/019051.html >>> 7. Add a "magic" while clause that's basically until with the >> opposite sense. >>> >>> x = [value for value in iterable while pred(value)] >>> >>> This reads pretty nicely (at least in trivial comprehensions), it parallels >>> takewhile and friends, and it matches a bunch of other languages (most of the >>> languages where "when" means "if", "while" means >>> this). >>> >>> But it has a completely different meaning from while statements, and in >>> fact a barely-related one. > ? >>> Imagine trying to teach that to a novice. >> >> I can definitely imagine teaching it to a novice. I have taught Python >> to groups of students who are entirely new to programming and also to >> groups with prior experience of other languages. I would not teach >> list comprehensions by unrolling them unless it was a more advanced >> Python programming course. > > This makes perfect sense today. Going from novice to intermediate understanding of list comprehensions, and many other areas of Python, is almost trivial, and that's part of what makes teaching Python so much easier than teaching, say, C++. > > I've seen hundreds of people show up on StackOverflow and similar places completely baffled by a complex list comprehension in some code they've run into. As soon as you show them how to unroll it, they immediately get it. If that were no longer true, how would you get people over that step? I think that this is really a style problem. The emphasis on unrolling comprehensions makes it seem acceptable to write complex comprehensions that can only be understood after being unrolled. To me, a comprehension is good when you can look at it and immediately know what it does. If it needs to be mentally unrolled before it can be understood then it would probably be better if it was actually unrolled in the source and then everyone can look directly at the unrolled code. >>> x = [value for value in iterable if pred(value) else break] >>> >>> This one is pretty easy to define rigorously, since it maps to exactly what >> the while attempt maps to with a slight change to the existing rules. >>> >>> But to me, it makes the code a confusing mess. I'm immediately reading >> "iterable if pred(value) else break", and that's wrong. >> >> You wouldn't have that confusion with "else return". > > Why not? They're both single-keyword flow-control statements. Why would anyone's brain read else return any differently from else break? My bad, you're right. I was thinking of the break as an expression because of the way you split it out but it's not a valid expression (just like return). Oscar From joshua.landau.ws at gmail.com Thu Jun 27 14:43:59 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Thu, 27 Jun 2013 13:43:59 +0100 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: On 27 June 2013 03:41, Andrew Barnert wrote: > On Jun 26, 2013, at 16:25, Joshua Landau wrote: > >> On 27 June 2013 00:17, Andrew Barnert wrote: >>> >>> In case it's not obvious how ridiculous it is to extrapolate MISRA-C to other languages? >>> >>> I went through the 127 rules in MISRA-C 1998. About 52 of them could be extrapolated to Python. Of those, 20 make some sense; the other 32 would be either ridiculous or disastrous to apply in Python. (In fact, much the same is true even for more closely-related languages like C++ or JavaScript.) >>> >>> If you're curious about the details, see http://stupidpythonideas.blogspot.com/2013/06/misra-c-and-python.html. >> >> "67. rcm: Don't rebind the iterator in a for loop." >> >> That sounds quite sane to me... > > Why? > > In C, the loop iterator is a mutable variable; assigning to it changes the loop flow. > > for (i=0; i!=10; ++i) { > if (i % 2) { i = 0; } > printf("%d\n", i); > } > > That looks like it should alternate even numbers with 0's, up to 10. But it actually produces an infinite loop of 0's. Very confusing. > > In Python, the loop iterator is just a name for a value; assigning to it does not change that value or affect the flow in any way; it just gives you a convenient name for a different value. > > for i in range(10): > if i % 2: i = 0 > print i > > That looks like it should alternate even numbers with 0's, up to 10. And that's exactly what it does. > > Obviously this is a silly toy example, but there's plenty of real code that does this kind of thing. My response applies to Ethan as well; it'll be pointless to post it twice. You are not rebinding the *iterator* but the, um, iteree? Rebinding the iterator, to me, would look more like: >>> looper = range(10) >>> for value in looper: ... looper = range(value-2) ... print(value, end="") ... 0123456789 Which is just a confusing thing to do, or (even worse): >>> looper = list(range(10)) >>> for value in looper: ... looper.append(value) ... print(value, end="") ... 01234567890123456789012345678901234567890123456... Which can be very bad if you do it on hashed things (when you could), and is really quite confusing. There are so very few reasons to do this that "just avoid it" is a good response, ? mon avis. From steve at pearwood.info Thu Jun 27 14:44:57 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 27 Jun 2013 22:44:57 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> <51CB928B.2030706@stoneleaf.us> Message-ID: <51CC33C9.3040602@pearwood.info> On 27/06/13 13:01, Ron Adam wrote: > > > On 06/26/2013 08:16 PM, Ethan Furman wrote: >> >> I don't know about you, but the first time I saw * and ** I had no idea >> what they did and had to learn it. > > One of my fist thoughts when I first learned Python was that it would have made things clearer if they used two different symbols for pack and unpack. > > def foo(*args, **kwds): > return bar(^args, ^^kwds) > > There isn't any issue for the computer having those the same, but as a human, the visual difference would have helped make it easier to learn. I don't think it would be easier. I think that means you just have to learn two things instead of one, and have the added source of errors when you get them mixed up: def foo(^args, ^^kwds): return bar(*args, **kwds) We don't seem to have problems with this: items[x] = items[y] rather than: items{x} = items[y] Or should it be the other way around? -- Steven From ncoghlan at gmail.com Thu Jun 27 14:55:49 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 27 Jun 2013 22:55:49 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> <51CB928B.2030706@stoneleaf.us> Message-ID: On 27 June 2013 22:10, Joshua Landau wrote: > I would *never* have thought that: > > func(*, arg): pass > > meant > > func(arg=arg): pass It doesn't mean that, since a bare keyword-only argument *must* be passed in when calling the function (as it has no default value). > despite that I sort of get the *very slight* (but not really even > that) symmetry (now that you've explained it). The "*" in the keyword-only parameter syntax still refers to positional argument packing. It just leaves the destination undefined to say "no extra positional arguments are allowed" rather than "any extra positional arguments go here". The parameters that appear after that entry are keyword only solely by a process of elimination - if all the positional arguments have been consumed (or disallowed), then any parameters that are left *have* to be passed as keywords, since positional and keyword are the only two argument passing options we offer. On the calling side, we've actually toyed with the idea of removing the argument ordering restrictions, because they imply a parallel with the parameter definitions that doesn't actually exist. We haven't though, as it's significant extra work for no compelling benefit. As far as this whole thread goes, though, we're still in the situation where the simplest mechanisms Python currently has to extract a submap from a mapping are a dict comprehension and operator.itemgetter. These proposed changes to function calling syntax then target a very niche case of that broader problem, where the mapping is specifically the current local variables and the destination for the submap is a function call with like named parameters. This is *not* the calibre of problem that prompts us to make Python harder to learn by adding new syntax. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From joshua.landau.ws at gmail.com Thu Jun 27 15:05:15 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Thu, 27 Jun 2013 14:05:15 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> <51CB928B.2030706@stoneleaf.us> Message-ID: On 27 June 2013 13:55, Nick Coghlan wrote: > On 27 June 2013 22:10, Joshua Landau wrote: >> I would *never* have thought that: >> >> func(*, arg): pass >> >> meant >> >> func(arg=arg): pass > > It doesn't mean that, since a bare keyword-only argument *must* be > passed in when calling the function (as it has no default value). > >> despite that I sort of get the *very slight* (but not really even >> that) symmetry (now that you've explained it). > > The "*" in the keyword-only parameter syntax still refers to > positional argument packing. It just leaves the destination undefined > to say "no extra positional arguments are allowed" rather than "any > extra positional arguments go here". > > The parameters that appear after that entry are keyword only solely by > a process of elimination - if all the positional arguments have been > consumed (or disallowed), then any parameters that are left *have* to > be passed as keywords, since positional and keyword are the only two > argument passing options we offer. > > On the calling side, we've actually toyed with the idea of removing > the argument ordering restrictions, because they imply a parallel with > the parameter definitions that doesn't actually exist. We haven't > though, as it's significant extra work for no compelling benefit. I... don't understand. Did you perhaps misread my post? To reiterate, in shorthand, I am arguing against (I think -- if you are right then I've no idea what's actually happening).: foo(*, bar, spam) meaning foo(bar=bar, spam=spam) partially by pointing out that it isn't symmetrical with def foo(*, bar, spam): ... which means nothing of the sort. Of course, if I'm the one confused then I'm afraid that I just have no idea what I'm on about. > As far as this whole thread goes, though, we're still in the situation > where the simplest mechanisms Python currently has to extract a submap > from a mapping are a dict comprehension and operator.itemgetter. Thank you for stating that root of the problem that I've had on the tip of my tongue. Perhaps: "dict & set" or even "dict & iter" would work, although it looks unwieldy, and I'm not sure I like it. > These > proposed changes to function calling syntax then target a very niche > case of that broader problem, where the mapping is specifically the > current local variables and the destination for the submap is a > function call with like named parameters. This is *not* the calibre of > problem that prompts us to make Python harder to learn by adding new > syntax. From jimjhb at aol.com Thu Jun 27 15:26:02 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Thu, 27 Jun 2013 09:26:02 -0400 (EDT) Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: <8D0415100D6245B-1864-288A7@webmail-m103.sysops.aol.com> (You have an extra period on your link....) Andrew, I'm not doubting what you are saying, but I am informing the python community of the reality out there. Say what you will, but Python's lack of gotos (if they are bad, then one can just avoid them, right?) is a form of social engineering on Python's part as well. I think that's a good decision, but one can see how social engineering of programmers can go awry as well. (MISRA-C 2012 now allows for limited gotos, so times change.) If I take the viewpoint of the no break/continue folks, they have a point. They can often be avoided, and throwing them in (a lot) is often a sign of sloppy code. Breaks (especially) are very easy to avoid with the C for, because the conditional is explicit and can easily be expanded. So for C, educating folks with these rules don't really have much net effect. The PROBLEM is that you can't do that with a Python for. So all this rule indoctrination is much more consequential (to the code). If can argue that adding a conditional SHOULDN'T be necessary, but that might not reflect our current practical reality. -----Original Message----- From: Andrew Barnert To: jimjhb ; guido Cc: python-ideas Sent: Wed, Jun 26, 2013 7:17 pm Subject: Re: [Python-ideas] PEP 315: do-while From: "jimjhb at aol.com" Sent: Wednesday, June 26, 2013 1:48 PM >Sounds good to me. Just let people know it's OK to use breaks!!! > > >MISRA-C 1998 (I know it's C, but people extrapolated that to other languages) banned continues and breaks. >MISRA-C 2004 allowed for one break in a loop. In case it's not obvious how ridiculous it is to extrapolate MISRA-C to other languages? I went through the 127 rules in MISRA-C 1998. About 52 of them could be extrapolated to Python. Of those, 20 make some sense; the other 32 would be either ridiculous or disastrous to apply in Python. (In fact, much the same is true even for more closely-related languages like C++ or JavaScript.) If you're curious about the details, see http://stupidpythonideas.blogspot.com/2013/06/misra-c-and-python.html. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jun 27 15:27:18 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 27 Jun 2013 23:27:18 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> <51CB928B.2030706@stoneleaf.us> Message-ID: On 27 June 2013 23:05, Joshua Landau wrote: > I... don't understand. Did you perhaps misread my post? Sorry, I should have been clearer - I was agreeing with you :) Ethan had indicated that he saw the use of a bare "*" to introduce keyword only argument parameter declarations as some magic new syntax unrelated to tuple packing. It isn't - it's still a tuple packing declaration like "*args", it's just one *without a named destination*, so we can put more stuff after it in the parameter list without allowing an arbitrary number of positional arguments. The other places where we allow tuple packing in a binding target are positional only, so there's no need for such a notation. It's certainly one of the more obscure aspects of Python's syntax, but there's still an underlying logic to it. (PEP 3102 has the details, but be aware that PEP uses the word "argument" in a few places where it should say "parameter" - PEP 362 better covers the differences, although a useful shorthand is "as part of a function call, parameter names are bound to argument values") Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From joshua.landau.ws at gmail.com Thu Jun 27 15:34:46 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Thu, 27 Jun 2013 14:34:46 +0100 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <8D0415100D6245B-1864-288A7@webmail-m103.sysops.aol.com> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> <8D0415100D6245B-1864-288A7@webmail-m103.sysops.aol.com> Message-ID: On 27 June 2013 14:26, wrote: > Andrew, > > I'm not doubting what you are saying, but I am informing the python > community of the reality out there. Say what you will, but Python's > lack of gotos (if they are bad, then one can just avoid them, right?) is a > form of social engineering on Python's part as well. I think that's > a good decision, but one can see how social engineering of programmers can > go awry as well. (MISRA-C 2012 now allows for limited gotos, so times > change.) > > If I take the viewpoint of the no break/continue folks, they have a point. > They can often be avoided, and throwing them in (a lot) is often a sign of > sloppy code. Breaks (especially) are very easy to avoid with the C for, > because the conditional is explicit and can easily be expanded. So for C, > educating folks with these rules don't really have much net effect. > > The PROBLEM is that you can't do that with a Python for. So all this rule > indoctrination is much more consequential (to the code). > > If can argue that adding a conditional SHOULDN'T be necessary, but that > might not reflect our current practical reality. As the rest of us have said, this is irrelevant. These people you know are not the norm and no-one is going to budge for them. That's it; and no matter how much you reiterate this is not going to convince even those people who like your suggestion (of which I am not one). Please, just let this point rest. From ron3200 at gmail.com Thu Jun 27 16:24:36 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 27 Jun 2013 09:24:36 -0500 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51CC33C9.3040602@pearwood.info> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> <51CB928B.2030706@stoneleaf.us> <51CC33C9.3040602@pearwood.info> Message-ID: On 06/27/2013 07:44 AM, Steven D'Aprano wrote: > We don't seem to have problems with this: > > items[x] = items[y] > > rather than: > > items{x} = items[y] > > Or should it be the other way around? Yes, that was my thoughts when I first started to see *args in different places. (>10 years ago). I'm mostly over it ;-) Cheers, Ron From jimjhb at aol.com Thu Jun 27 16:48:25 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Thu, 27 Jun 2013 10:48:25 -0400 (EDT) Subject: [Python-ideas] PEP 315: do-while In-Reply-To: References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> <8D0415100D6245B-1864-288A7@webmail-m103.sysops.aol.com> Message-ID: <8D0415C835C7F63-1864-290C2@webmail-m103.sysops.aol.com> -----Original Message----- From: Joshua Landau To: jimjhb Cc: Andrew Barnert ; python-ideas Sent: Thu, Jun 27, 2013 9:35 am Subject: Re: [Python-ideas] PEP 315: do-while On 27 June 2013 14:26, wrote: > Andrew, > > I'm not doubting what you are saying, but I am informing the python > community of the reality out there. Say what you will, but Python's > lack of gotos (if they are bad, then one can just avoid them, right?) is a > form of social engineering on Python's part as well. I think that's > a good decision, but one can see how social engineering of programmers can > go awry as well. (MISRA-C 2012 now allows for limited gotos, so times > change.) > > If I take the viewpoint of the no break/continue folks, they have a point. > They can often be avoided, and throwing them in (a lot) is often a sign of > sloppy code. Breaks (especially) are very easy to avoid with the C for, > because the conditional is explicit and can easily be expanded. So for C, > educating folks with these rules don't really have much net effect. > > The PROBLEM is that you can't do that with a Python for. So all this rule > indoctrination is much more consequential (to the code). > > If can argue that adding a conditional SHOULDN'T be necessary, but that > might not reflect our current practical reality. As the rest of us have said, this is irrelevant. These people you know are not the norm and no-one is going to budge for them. That's it; and no matter how much you reiterate this is not going to convince even those people who like your suggestion (of which I am not one). Please, just let this point rest. =================== Joshua, You make a good point. I have no idea how prevalent the problem really is, and that's obviously relevant. (I haven't taken any polls, and neither have you.) Bottom line is most other languages allow early termination of for loops without breaking out of them. Python does not. -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Thu Jun 27 18:40:06 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 27 Jun 2013 19:40:06 +0300 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: 27.06.13 02:17, Andrew Barnert ???????(??): > In case it's not obvious how ridiculous it is to extrapolate MISRA-C to other languages? > > I went through the 127 rules in MISRA-C 1998. About 52 of them could be extrapolated to Python. Of those, 20 make some sense; the other 32 would be either ridiculous or disastrous to apply in Python. (In fact, much the same is true even for more closely-related languages like C++ or JavaScript.) > > If you're curious about the details, see http://stupidpythonideas.blogspot.com/2013/06/misra-c-and-python.html. Thank you. It's amusing post and interesting blog. From ethan at stoneleaf.us Thu Jun 27 17:55:15 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 27 Jun 2013 08:55:15 -0700 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <8D0415C835C7F63-1864-290C2@webmail-m103.sysops.aol.com> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> <8D0415100D6245B-1864-288A7@webmail-m103.sysops.aol.com> <8D0415C835C7F63-1864-290C2@webmail-m103.sysops.aol.com> Message-ID: <51CC6063.1060800@stoneleaf.us> On 06/27/2013 07:48 AM, jimjhb at aol.com wrote: > > Bottom line is most other languages allow early termination of for loops > without breaking out of them. Python does not. If they are terminating early, then they most certainly are breaking out of them, regardless of whether the word 'break' is used. -- ~Ethan~ From jimjhb at aol.com Thu Jun 27 18:53:25 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Thu, 27 Jun 2013 12:53:25 -0400 (EDT) Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <51CC6063.1060800@stoneleaf.us> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> <8D0415100D6245B-1864-288A7@webmail-m103.sysops.aol.com> <8D0415C835C7F63-1864-290C2@webmail-m103.sysops.aol.com> <51CC6063.1060800@stoneleaf.us> Message-ID: <8D0416DF98D80F7-1864-29F3A@webmail-m103.sysops.aol.com> -----Original Message----- From: Ethan Furman To: python-ideas Sent: Thu, Jun 27, 2013 12:43 pm Subject: Re: [Python-ideas] PEP 315: do-while On 06/27/2013 07:48 AM, jimjhb at aol.com wrote: > > Bottom line is most other languages allow early termination of for loops > without breaking out of them. Python does not. If they are terminating early, then they most certainly are breaking out of them, regardless of whether the word 'break' is used. Yes, but the control flow (and location of the control) is different. All this "don't use breaks" stuff can be traced back to E.W Dijkstra and structured programming. Structured programming remains in a lot of mindsets today. -- ~Ethan~ _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Jun 27 19:51:42 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 27 Jun 2013 10:51:42 -0700 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <8D0416DF98D80F7-1864-29F3A@webmail-m103.sysops.aol.com> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> <8D0415100D6245B-1864-288A7@webmail-m103.sysops.aol.com> <8D0415C835C7F63-1864-290C2@webmail-m103.sysops.aol.com> <51CC6063.1060800@stoneleaf.us> <8D0416DF98D80F7-1864-29F3A@webmail-m103.sysops.aol.com> Message-ID: <217D6207-DCA4-4A75-BDF7-C51080322AD3@yahoo.com> On Jun 27, 2013, at 9:53, jimjhb at aol.com wrote: > -----Original Message----- > From: Ethan Furman > To: python-ideas > Sent: Thu, Jun 27, 2013 12:43 pm > Subject: Re: [Python-ideas] PEP 315: do-while > > On 06/27/2013 07:48 AM, jimjhb at aol.com wrote: > > > > Bottom line is most other languages allow early termination of for loops > > without breaking out of them. Python does not. > > If they are terminating early, then they most certainly are breaking out of > them, regardless of whether the word 'break' > is used. > Yes, but the control flow (and location of the control) is different. All this "don't use breaks" stuff can be traced back to E.W Dijkstra and structured programming. Structured programming remains in a lot of mindsets today. Structured programming is all about how to write maintainable programs in a low-level Algol-like language, and that's what C is. But that's not even close to what Python is. People trying to program Python as if it were C are going to write bad Python. Bending over backward to accommodate them will only make things worse. The best thing we can do is use as many generators as possible--anyone who thinks break doesn't belong in Python will suffer from head-asplode once they finally realize what yield does to control flow, and then they'll no longer be a problem. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Jun 27 19:54:50 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 27 Jun 2013 10:54:50 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372332516.17577.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: <3605BF1A-AB31-4C8E-90C3-EB6B8A858C97@yahoo.com> On Jun 27, 2013, at 5:19, Nick Coghlan wrote: > FWIW, while I actually agree with you that "else return" doesn't fit > because people *think* of comprehensions and generator as loops rather > than as nested functions, they *are* defined as following the scoping > rules of a nested function and CPython actually implements them that > way: I thought it was decided to explicitly add comprehensions as a new place that defines a scope, to avoid the confusion of calling them functions? I may be remembering wrong; I'll read over the docs later. From abarnert at yahoo.com Thu Jun 27 20:26:57 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 27 Jun 2013 11:26:57 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372332516.17577.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: <9F8420CC-31A1-40FE-9DAE-38D3CC9A3E86@yahoo.com> Sorry for the split reply; hit the wrong button. On Jun 27, 2013, at 5:19, Nick Coghlan wrote: > If you're willing, I'm actually thinking this may be one of those > discussions that's worth summarising in a PEP, even if it's just to > immediately mark it Rejected. Similar to PEP 315 and a few other PEPs, > it can help to have a document that clearly spells out the objective > (which I think you summarised nicely as "trying to find a syntactic > replacement for itertools.takewhile, just as comprehensions replaced > many uses of map and filter"), even if no acceptable solution has been > found. The ideas are pretty broad-ranging. In particular, #1 got two somewhat supportive responses that had nothing to do with the comprehension termination idea. Do side issues like that need to be discussed first/separately before referencing them in a while clause PEP? Also, we seem pretty far from a consensus on what the actual tradeoffs are for most of the options. For example, is the definition of comps in terms of loops the core idea behind the abstraction, or a minor triviality that's only useful in understanding bad code? Are the differences between comps and genexps a bug or a feature? Finally, how does this connect up to the original idea of this thread, which was to draft a PEP for while clauses in loop statements rather than in comprehensions? Obviously if that existed, it would change the options for comp syntax. Do we need to include that idea in the discussion? (I completely forgot about it while digging up all of the ideas that spun off from it...) > That phrasing of the objective also highlights a counter argument I > hadn't considered before: if we don't consider takewhile a common > enough use case to make it a builtin, why on *earth* are we even > discussing the possibility of giving it dedicated syntax? Besides early termination for comps, the only good use case I know of is in code that already makes heavy use of itertools, and making one of the functions a builtin wouldn't change anything. And if early termination for comps is a special case, it doesn't seem too unreasonable to consider other ways to handle it. But you're right that "make takewhile a builtin" should probably be considered among the alternatives. From jimjhb at aol.com Thu Jun 27 21:40:19 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Thu, 27 Jun 2013 15:40:19 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <9F8420CC-31A1-40FE-9DAE-38D3CC9A3E86@yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372332516.17577.YahooMailNeo@web184702.mail.ne1.yahoo.com> <9F8420CC-31A1-40FE-9DAE-38D3CC9A3E86@yahoo.com> Message-ID: <8D0418549948BD1-1864-2BFC6@webmail-m103.sysops.aol.com> -----Original Message----- From: Andrew Barnert To: Nick Coghlan Cc: python-ideas Sent: Thu, Jun 27, 2013 2:28 pm Subject: Re: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: Sorry for the split reply; hit the wrong button. On Jun 27, 2013, at 5:19, Nick Coghlan wrote: > If you're willing, I'm actually thinking this may be one of those > discussions that's worth summarising in a PEP, even if it's just to > immediately mark it Rejected. Similar to PEP 315 and a few other PEPs, > it can help to have a document that clearly spells out the objective > (which I think you summarised nicely as "trying to find a syntactic > replacement for itertools.takewhile, just as comprehensions replaced > many uses of map and filter"), even if no acceptable solution has been > found. The ideas are pretty broad-ranging. In particular, #1 got two somewhat supportive responses that had nothing to do with the comprehension termination idea. Do side issues like that need to be discussed first/separately before referencing them in a while clause PEP? Also, we seem pretty far from a consensus on what the actual tradeoffs are for most of the options. For example, is the definition of comps in terms of loops the core idea behind the abstraction, or a minor triviality that's only useful in understanding bad code? Are the differences between comps and genexps a bug or a feature? Finally, how does this connect up to the original idea of this thread, which was to draft a PEP for while clauses in loop statements rather than in comprehensions? Obviously if that existed, it would change the options for comp syntax. Do we need to include that idea in the discussion? (I completely forgot about it while digging up all of the ideas that spun off from it...) > That phrasing of the objective also highlights a counter argument I > hadn't considered before: if we don't consider takewhile a common > enough use case to make it a builtin, why on *earth* are we even > discussing the possibility of giving it dedicated syntax? Besides early termination for comps, the only good use case I know of is in code that already makes heavy use of itertools, and making one of the functions a builtin wouldn't change anything. And if early termination for comps is a special case, it doesn't seem too unreasonable to consider other ways to handle it. But you're right that "make takewhile a builtin" should probably be considered among the alternatives. ===================== So combining this and what Shane says, "trying to find a syntactic replacement for itertools.takewhile, just as comprehensions replaced many uses of map and filter", which is also usable in comprehensions? (Maybe that was already implied) You should probably include the Wolfgang Maier thread in Jan. 2013 on the same subject. _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Fri Jun 28 01:34:06 2013 From: shane at umbrellacode.com (Shane Green) Date: Thu, 27 Jun 2013 16:34:06 -0700 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> <8D0415100D6245B-1864-288A7@webmail-m103.sysops.aol.com> <8D0415C835C7F63-1864-290C2@webmail-m103.sysops.aol.com> Message-ID: I actually meant to say the reason the idea has gotten some legs... :-) By ?legs? I meant lively discussion... On Jun 27, 2013, at 12:21 PM, Shane Green wrote: > Opinions about break statements are really irrelevant, IMHO. The reason this idea hadn?t taken legs isn?t because of concern about abiding by educators? blanket rules. > > List comprehensions are concise, intuitive, and efficient; proper use can improve readability *and* performance significantly. > > The number of proper applications is limited by list comprehension limitations. Adding the ability to configure termination in the comprehension will increase that number. > > > On Jun 27, 2013, at 7:48 AM, jimjhb at aol.com wrote: > >> >> >> -----Original Message----- >> From: Joshua Landau >> To: jimjhb >> Cc: Andrew Barnert ; python-ideas >> Sent: Thu, Jun 27, 2013 9:35 am >> Subject: Re: [Python-ideas] PEP 315: do-while >> >> On 27 June 2013 14:26, wrote: >> > Andrew, >> > >> > I'm not doubting what you are saying, but I am informing the python >> > community of the reality out there. Say what you will, but Python's >> > lack of gotos (if they are bad, then one can just avoid them, right?) is a >> > form of social engineering on Python's part as well. I think that's >> > a good decision, but one can see how social engineering of programmers can >> > go awry as well. (MISRA-C 2012 now allows for limited gotos, so times >> > change.) >> > >> > If I take the viewpoint of the no break/continue folks, they have a point. >> > They can often be avoided, and throwing them in (a lot) is often a sign of >> > sloppy code. Breaks (especially) are very easy to avoid with the C for, >> > because the conditional is explicit and can easily be expanded. So for C, >> > educating folks with these rules don't really have much net effect. >> > >> > The PROBLEM is that you can't do that with a Python for. So all this rule >> > indoctrination is much more consequential (to the code). >> > >> > If can argue that adding a conditional SHOULDN'T be necessary, but that >> > might not reflect our current practical reality. >> >> As the rest of us have said, this is irrelevant. >> >> These people you know are not the norm and no-one is going to budge >> for them. That's it; and no matter how much you reiterate this is not >> going to convince even those people who like your suggestion (of which >> I am not one). Please, just let this point rest. >> =================== >> Joshua, >> >> >> >> You make a good point. I have no idea how prevalent the problem really is, and that's obviously relevant. (I haven't taken any polls, and neither have you.) Bottom line is most other languages allow early termination of for loops without breaking out of them. Python does not. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Thu Jun 27 21:21:15 2013 From: shane at umbrellacode.com (Shane Green) Date: Thu, 27 Jun 2013 12:21:15 -0700 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <8D0415C835C7F63-1864-290C2@webmail-m103.sysops.aol.com> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> <8D0415100D6245B-1864-288A7@webmail-m103.sysops.aol.com> <8D0415C835C7F63-1864-290C2@webmail-m103.sysops.aol.com> Message-ID: Opinions about break statements are really irrelevant, IMHO. The reason this idea hadn?t taken legs isn?t because of concern about abiding by educators? blanket rules. List comprehensions are concise, intuitive, and efficient; proper use can improve readability *and* performance significantly. The number of proper applications is limited by list comprehension limitations. Adding the ability to configure termination in the comprehension will increase that number. On Jun 27, 2013, at 7:48 AM, jimjhb at aol.com wrote: > > > -----Original Message----- > From: Joshua Landau > To: jimjhb > Cc: Andrew Barnert ; python-ideas > Sent: Thu, Jun 27, 2013 9:35 am > Subject: Re: [Python-ideas] PEP 315: do-while > > On 27 June 2013 14:26, wrote: > > Andrew, > > > > I'm not doubting what you are saying, but I am informing the python > > community of the reality out there. Say what you will, but Python's > > lack of gotos (if they are bad, then one can just avoid them, right?) is a > > form of social engineering on Python's part as well. I think that's > > a good decision, but one can see how social engineering of programmers can > > go awry as well. (MISRA-C 2012 now allows for limited gotos, so times > > change.) > > > > If I take the viewpoint of the no break/continue folks, they have a point. > > They can often be avoided, and throwing them in (a lot) is often a sign of > > sloppy code. Breaks (especially) are very easy to avoid with the C for, > > because the conditional is explicit and can easily be expanded. So for C, > > educating folks with these rules don't really have much net effect. > > > > The PROBLEM is that you can't do that with a Python for. So all this rule > > indoctrination is much more consequential (to the code). > > > > If can argue that adding a conditional SHOULDN'T be necessary, but that > > might not reflect our current practical reality. > > As the rest of us have said, this is irrelevant. > > These people you know are not the norm and no-one is going to budge > for them. That's it; and no matter how much you reiterate this is not > going to convince even those people who like your suggestion (of which > I am not one). Please, just let this point rest. > =================== > Joshua, > > > > You make a good point. I have no idea how prevalent the problem really is, and that's obviously relevant. (I haven't taken any polls, and neither have you.) Bottom line is most other languages allow early termination of for loops without breaking out of them. Python does not. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Fri Jun 28 08:59:02 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 28 Jun 2013 15:59:02 +0900 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> <8D0415100D6245B-1864-288A7@webmail-m103.sysops.aol.com> <8D0415C835C7F63-1864-290C2@webmail-m103.sysops.aol.com> Message-ID: <87obaqyb49.fsf@uwakimon.sk.tsukuba.ac.jp> Shane Green writes: > List comprehensions are concise, intuitive, and efficient; proper > use can improve readability *and* performance significantly. Truth, the whole truth, and nothing but the truth. >?The number of proper applications is limited by list comprehension > limitations. Truth, but not the whole truth. > Adding the ability to configure termination in the comprehension > will increase that number. Not necessarily. In fact, any programming construct is limited by *human* comprehension limitations. As has been pointed out by people who think about these issues deeply and push these constructs to their limits, *because* a comprehension[1] is an *expression*, nested comprehensions (nested in higher-level expressions) are possible. Adding clauses to the construct increases the complexity, and comprehensions are already pretty complex.[2] Personally, *because* iterables can be infinite, I think the lack of a stop clause in comprehensions is ugly and adds some unobvious complexity to comprehensions and displays. The programmer needs to also keep in mind the possibility that some iterables are infinite, and deal with that. The convenience value of an internal "takewhile()" is also undeniable. I'd like to see the feature added. But I definitely see this as a tradeoff, and I don't know how costly the increased complexity of comprehensions would be to folks who push the limits of comprehension usage already. Increased complexity in the basic construct might push them over the limit, and they would *decrease* usage.[3] Nick et al clearly are concerned about that end of the spectrum, and I personally would place those needs over mine in this case. (YMMV, that's just my opinion about *me*.) Footnotes: [1] Here and below "comprehension" includes generators and displays that take arbitrary iterables. [2] As an objectively verifiable measure (probably pretty imperfect, but relatively objective) of that complexity, at present a fully elaborated comprehension looks like [ # we have a comprehension function(x) # we need to know what function does for x # loop variable we need to track in iterable # we need to know what iterable generates if # optional so need to track "dangling if" condition(x) # we need to know what satisfies condition ] So a comprehension necessarily requires a minimum of 5 facts (each comment describes a fact), sometimes 6, and context (iterable = nested comprehension, etc) will usually add more. So most comprehensions are already at the median of the "7 plus/minus 2" facts that psychologists say that typical humans can operate on simultaneously. Adding another optional clause puts it close to the upper bound of 9, even before considering context and nested comprehensions. [3] True, if such a person is designing from scratch, they can simply make a rule of thumb to avoid the new takewhile feature when they're going to have complex nested comprehensions. But if they're maintaining existing code that already uses the feature, that rule of thumb might cause them to eschew the more extensive changes required to convert to a concise expression in terms of nested comprehensions, or similar. From wolfgang.maier at biologie.uni-freiburg.de Fri Jun 28 10:20:00 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Fri, 28 Jun 2013 10:20:00 +0200 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: Message-ID: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> On Jun 27, 2013, at 5:19, Nick Coghlan wrote: >> If you're willing, I'm actually thinking this may be one of those >> discussions that's worth summarising in a PEP, even if it's just to >> immediately mark it Rejected. Similar to PEP 315 and a few other PEPs, >> it can help to have a document that clearly spells out the objective >> (which I think you summarised nicely as "trying to find a syntactic >> replacement for itertools.takewhile, just as comprehensions replaced >> many uses of map and filter"), even if no acceptable solution has been >> found. > >The ideas are pretty broad-ranging. In particular, #1 got two somewhat >supportive responses that had nothing to do with the comprehension termination >idea. Do side issues like that need to be discussed first/separately before >referencing them in a while clause PEP? > >Also, we seem pretty far from a consensus on what the actual tradeoffs are for >most of the options. For example, is the definition of comps in terms of loops >the core idea behind the abstraction, or a minor triviality that's only useful >in understanding bad code? Are the differences between comps and genexps a bug >or a feature? > >Finally, how does this connect up to the original idea of this thread, which was >to draft a PEP for while clauses in loop statements rather than in >comprehensions? Obviously if that existed, it would change the options for comp >syntax. Do we need to include that idea in the discussion? (I completely forgot >about it while digging up all of the ideas that spun off from it...) I don't think while clauses in loop statements are a big gain for the language. There's break and despite all the programming-style discussions going on in this part of the thread, it has been working well for many years and most people find it intuitive to use. >> That phrasing of the objective also highlights a counter argument I >> hadn't considered before: if we don't consider takewhile a common >> enough use case to make it a builtin, why on *earth* are we even >> discussing the possibility of giving it dedicated syntax? > >Besides early termination for comps, the only good use case I know of is in code >that already makes heavy use of itertools, and making one of the functions a >builtin wouldn't change anything. > >And if early termination for comps is a special case, it doesn't seem too >unreasonable to consider other ways to handle it. > >But you're right that "make takewhile a builtin" should probably be considered >among the alternatives. But a builtin takewhile would still not come with nicer and easier to read syntax! I guess the use cases are not that rare, it's just that right now people switch to explicit loops when they need early termination because it keeps things readable. I'm encountering this situation quite regularly (reading a small header from large files is the most common example, but there are others). Let me suggest one more solution although it requires a new keyword: introduce *breakif* condition and define its translation as if condition: break . You can now write (x for x in iterable breakif x < 0) and I don't see a way how that could possibly be misread by anyone. Also it would translate unambiguously to the explicit: for x in iterable: breakif x<0 # itself translating to if x<0: break yield x It would work with genexps, comprehensions and explicit loops alike (with very little benefit for the later, though maybe it increases readability even there by making it clear from the start of the line what the purpose of the condition test is). Best, Wolfgang From jimjhb at aol.com Fri Jun 28 16:22:39 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Fri, 28 Jun 2013 10:22:39 -0400 (EDT) Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <87obaqyb49.fsf@uwakimon.sk.tsukuba.ac.jp> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> <8D0415100D6245B-1864-288A7@webmail-m103.sysops.aol.com> <8D0415C835C7F63-1864-290C2@webmail-m103.sysops.aol.com> <87obaqyb49.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <8D04222146CE2DB-1864-301AF@webmail-m103.sysops.aol.com> Stephen Turnbull writes: >Shane Green writes: >> List comprehensions are concise, intuitive, and efficient; proper >> use can improve readability *and* performance significantly. >Truth, the whole truth, and nothing but the truth. >> The number of proper applications is limited by list comprehension >> limitations. >Truth, but not the whole truth. >> Adding the ability to configure termination in the comprehension >> will increase that number. >Not necessarily. In fact, any programming construct is limited by >*human* comprehension limitations. As has been pointed out by people >who think about these issues deeply and push these constructs to their >limits, *because* a comprehension[1] is an *expression*, nested >comprehensions (nested in higher-level expressions) are possible. >Adding clauses to the construct increases the complexity, and >comprehensions are already pretty complex.[2] >Personally, *because* iterables can be infinite, I think the lack of a >stop clause in comprehensions is ugly and adds some unobvious >complexity to comprehensions and displays. The programmer needs to >also keep in mind the possibility that some iterables are infinite, >and deal with that. The convenience value of an internal >"takewhile()" is also undeniable. I'd like to see the feature added. >But I definitely see this as a tradeoff, and I don't know how costly >the increased complexity of comprehensions would be to folks who push >the limits of comprehension usage already. Increased complexity in >the basic construct might push them over the limit, and they would >*decrease* usage.[3] Nick et al clearly are concerned about that end >of the spectrum, and I personally would place those needs over mine in >this case. (YMMV, that's just my opinion about *me*.) >Footnotes: The takewhile syntax is kind of god awful. The lambda thing always gives me pause, though I now 'grok' the whole thing to be a conditional. I don't think the issue is about added complexity to the comprehension (as the added clause would be optional anyway) but the twists and turns needed internally to get it working, and the open question about whether the syntactical implications can be adequately 'contained'. -------------- next part -------------- An HTML attachment was scrubbed... URL: From apalala at gmail.com Fri Jun 28 17:32:17 2013 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Fri, 28 Jun 2013 11:02:17 -0430 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <87obaqyb49.fsf@uwakimon.sk.tsukuba.ac.jp> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> <8D0415100D6245B-1864-288A7@webmail-m103.sysops.aol.com> <8D0415C835C7F63-1864-290C2@webmail-m103.sysops.aol.com> <87obaqyb49.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Jun 28, 2013 at 2:29 AM, Stephen J. Turnbull wrote: > So most comprehensions are > already at the median of the "7 plus/minus 2" facts that psychologists > say that typical humans can operate on simultaneously. Adding another > optional clause puts it close to the upper bound of 9, even before > considering context and nested comprehensions. > That's bogus. One doesn't have to apply all of the "facts" at the same time. More likely one will apply one to three, and refactor if more seem to be needed. One can write "obfuscated Python" with Python as it is today, and that doesn't mean the language should be cut down to "training wheels" state. Cheers, -- Juancarlo *A?ez* -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Jun 28 18:44:11 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 28 Jun 2013 09:44:11 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> Message-ID: <25EC756A-B11D-485C-974C-D5FEEFAB0208@yahoo.com> On Jun 28, 2013, at 1:20, "Wolfgang Maier" wrote: > Let me suggest one more solution although it requires a new keyword: > introduce > > *breakif* condition > > and define its translation as if condition: break . This is basically the same idea as the until statement (with the opposite truth sense, but all of these are easy to invert)--except that it doesn't introduce a block. This means you can't nest anything underneath it, which means it doesn't work in comprehensions without changing what comprehensions mean. So, it combines the disadvantages of the until solution, because it introduces a new statement that no one will ever use just to provide a meaning for a comp clause that people might, and the various break solutions (other than "make break an expression), because it requires complicating the definition of what comprehensions do. That being said, it clearly reads differently from an until clause or an else break clause intuitively. I think that gives me a glimmer of a better way to organize the choices into a few (not completely independent) ideas organized in a tree, with different options (e.g., choice of keyword or syntax) for most of them. I'll try to write it up later, after letting it stew for a while, but the basics are something like: * No syntax change, just make existing things easier to use (e.g., builtin takewhile) * Breaking expressions (e.g., break as expression) * making comprehensions StopIteration-able (e.g., redefine as list(genexp) * Breaking clauses * that fit nesting rule as-is * by also adding a new statement (until) * by explicitly defining what they map to instead of just "the equivalent statement" (magic while) * that require redefining comprehensions (else break) * by also adding a new statement or modifying an existing one (ifbreak) We'd still need to list the specific versions of each category, because they definitely read differently, especially in simple comprehensions (a simple expression, one for, and the new clause)--which, as people have pointed out, are the most important ones (you can teach novices how to use and read simple comps without explaining the nesting rule, and that should still be true). > You can now write > (x for x in iterable breakif x < 0) > and I don't see a way how that could possibly be misread by anyone. > Also it would translate unambiguously to the explicit: > > for x in iterable: > breakif x<0 # itself translating to if x<0: break > yield x > > It would work with genexps, comprehensions and explicit loops alike (with > very > little benefit for the later, though maybe it increases readability even > there > by making it clear from the start of the line what the purpose of the > condition > test is). From zuo at chopin.edu.pl Sat Jun 29 00:28:18 2013 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Sat, 29 Jun 2013 00:28:18 +0200 Subject: [Python-ideas] =?utf-8?q?=22Iteration_stopping=22_syntax_=5BWas?= =?utf-8?q?=3A_Is_this_PEP-able=3F_for_X_in_ListY_while_conditionZ=3A=5D?= In-Reply-To: <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> 27.06.2013 13:49, Andrew Barnert napisa?: [...] > If someone wants this: > > ? ? x = [value for value in iterable if value > 0 while value < 10] > > ? and rejects this: > > ? ? x = [value for value in takewhile(lambda x: x < 10, iterable) if > value > 0] > > ? I don't think they'd be happier with this: > > ? ? @in x =?[value for value in takewhile(under10, iterable) if value > > 0] > ? ? def under10(value): > ? ? ? ? return?value?< 10 [...] > In fact, I think what people really want is the opposite: to write an > expression, not a function. That's a big part of the appeal of using > comprehensions over map and filter: if you don't already have a > ready-made function, no problem (and no lambdas or partials), just > use > a comprehension and write the expression in-place. People want a > similar answer to make takewhile at least as unnecessary as map and > filter. +1! Maybe it could be achieved with a new separate generator expression syntax, being orthogonal to the existing generator-expr/comprehension syntaxes? Maybe such as: ( from while ) E.g.: g = (line from myfile while line.strip()) ...which would be equivalent to: def _temp(_myfile): iterator = iter(_myfile) line = next(iterator) while line.strip(): yield line line = next(iterator) g = _temp(myfile) Unlike the existing generator-expr/comprehension syntaxes this syntax is focused on predicate-based iteration stopping (like itertools.takewhile) rather than on any processing of the iterated values. Obviously, it could be combined with the existing syntaxes, e.g.: processed = [2 * x for x in (x from seq while x < 100)] ...or probably better (a matter of taste): items = (x from seq while x < 100) processed = [2 * x for x in items] Cheers. *j From shane at umbrellacode.com Sat Jun 29 00:41:55 2013 From: shane at umbrellacode.com (Shane Green) Date: Fri, 28 Jun 2013 15:41:55 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: Resending with the mailing list copied this time... I don't think the use of "while" is appropriate. A thing about list comprehension is that everything inside the brackets works around a single iteration; it's stateless, basically. You cannot (cleanly) define a condition around the length of the output, for example. "While", on the other hand, implies repetition, but its usage is again stateless and agnostic to it's application or position within the comprehension. [x until condition(x) for x in l if x] Note that "until" suffers from the same implied duration issue that "while" did, but does benefit from not officially being a looping mechanicsm already. In this approach the condition is not evaluated for values not returned, and it is run against the result of any If/else clause that manipulates return values. On Jun 27, 2013 3:17 AM, "Oscar Benjamin" wrote: On 27 June 2013 08:28, Andrew Barnert wrote: > Let me try to gather together all of the possibilities that have been discussed > in this and the two previous threads, plus a couple of obvious ones nobody's > mentioned. You've missed out having "else return" in comprehensions. I like this less than a while clause but it was preferred by some as it unrolls perfectly in the case that the intention is to break out of all loops e.g.: [f(x, y) for x in xs for y in ys if g(x, y) else return] becomes _tmplist = [] def _tmpfunc(): for x in xs: for y in ys: if g(x, y): _tmplist.append(f(x, y)) else: return _tmpfunc() > 1. Redefine comprehensions on top of generator expressions instead of defining them in terms of nested blocks. > > def stop(): raise StopIteration > > x = [value if pred(value) else stop() for value in iterable] I prefer x = [value for value in iterable if pred(value) or stop()] so that the flow control is all on the right hand side of the "in". > > This would make the implementation of Python simpler. > > It also makes the language conceptually simpler. The subtle differences between [x for x in foo] and list(x for x in foo) are gone. > > And it's actually a pretty small change to the official semantics. Just replace the last two paragraphs of 6.2.4 with "The comprehension consists of a single expression followed by at least one for clause and zero or more for or if clauses. In this case, the expression and clauses are interpreted as if they were a generator expression, and the elements of the new container are those yielded by that generator expression consumed to completion." (It also makes it easier to fix the messy definition of dict comprehensions, if anyone cares.) > > Unlike #5, 6, 7, 8, and 10, but like #2, 3, 4, and 9, this only allows you to break out of one for clause, not any. But that's exactly the same as break only being able to break out of one for loop. Nobody complains that Python doesn't have "break 2" or "break :label", right? I'm not sure why you expect that it would only break out of one for clause; I expect it to break out of all of them. That's how it works with generator expressions: Python 2.7.3 (default, Sep 26 2012, 21:51:14) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> list(x + y for x in 'abc' for y in '123') ['a1', 'a2', 'a3', 'b1', 'b2', 'b3', 'c1', 'c2', 'c3'] >>> def stop(): raise StopIteration ... >>> list(x + y for x in 'abc' if x == 'a' or stop() for y in '123') ['a1', 'a2', 'a3'] >>> list(x + y for x in 'abc' for y in '123' if y == '1' or stop()) ['a1'] It also works that way if you spell it the way that you did: >>> list(x + y if y == '1' else stop() for x in 'abc' for y in '123') ['a1'] > 2. Just require comprehensions to handle StopIteration. > > The main cost and benefit are the same as #1. > > However, it makes the language and implementation more complex, rather than simpler. > > Also, the effects of this less radical change (you can no longer pass StopIteration through a comprehension) seem like they might be harder to explain to people than the more radical one. I think that the current behaviour is harder to explain. > And, worse, they less radical effects would probably cause more subtle bugs. Which implies a few versions where passing StopIteration through a comprehension triggers a warning. I would be happy if it triggered a warning anyway. I can't imagine a reasonable situation where that isn't a bug. > 7. Add a "magic" while clause that's basically until with the opposite sense. > > x = [value for value in iterable while pred(value)] > > This reads pretty nicely (at least in trivial comprehensions), it parallels takewhile and friends, and it matches a bunch of other languages (most of the languages where "when" means "if", "while" means this). > > But it has a completely different meaning from while statements, and in fact a barely-related one. > > In particular, it's obviously not this: > > x = [] > > for value in iterable: > while pred(value): > x.append(value) > > What it actually means is: > > x = [] > > for value in iterable: > if pred(value): > x.append(value) > else: > break > > Imagine trying to teach that to a novice. I can definitely imagine teaching it to a novice. I have taught Python to groups of students who are entirely new to programming and also to groups with prior experience of other languages. I would not teach list comprehensions by unrolling them unless it was a more advanced Python programming course. To explain the list comprehension with while clauses I imagine having the following conversation and interactive session: ''' Okay so a list comprehension is a way of making a new list out of an existing list. Let's say we have a list called numbers_list like >>> numbers_list = [1,2,3,4,5,4,3,2,1] >>> numbers_list [1, 2, 3, 4, 5, 4, 3, 2, 1] Now we want to create a new list called squares_list containing the square of each of the numbers in numbers_list. We can do this very easily with a list comprehension and it looks like >>> squares_list = [n ** 2 for n in numbers_list] >>> squares_list [1, 4, 9, 16, 25, 16, 9, 4, 1] The list comprehension loops through all the numbers in numbers_list and, calling the current number n, computes n squared (n ** 2). As it does this is puts all the n squared numbers into a new list in the same order. We can also add an "if clause" to choose which elements from numbers_list we will use to make the new list. To make a list that is the square of all the numbers from numbers_list that are less than 4 we can do >>> [n ** 2 for n in numbers_list if n < 4] [1, 4, 9, 9, 4, 1] Now the comprehension includes n ** 2 in the new list only if n < 4; otherwise n is ignored and the comprehension moves on to the next number from numbers_list. Also if we want the list comprehension to stop looping over the numbers in numbers_list after, for example, seeing a particular number we can use a "while clause" instead of an "if clause". If we want the comprehension to read numbers from numbers_list only while all of the numbers seen are less than 4 then we could do >>> [n ** 2 for n in numbers_list while n < 4] [1, 4, 9] In this case what happens is that as soon as the comprehension finds the number 4 from numbers_list the while condition isn't true any more so it stops reading numbers from numbers_list. This means it doesn't find the other numbers that are also less than 4 at the end of numbers_list (unlike the if clause). ''' > 8. Allow else break in comp_if clauses. > > x = [value for value in iterable if pred(value) else break] > > This one is pretty easy to define rigorously, since it maps to exactly what the while attempt maps to with a slight change to the existing rules. > > But to me, it makes the code a confusing mess. I'm immediately reading "iterable if pred(value) else break", and that's wrong. You wouldn't have that confusion with "else return". Oscar _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Sat Jun 29 01:38:44 2013 From: shane at umbrellacode.com (Shane Green) Date: Fri, 28 Jun 2013 16:38:44 -0700 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> Message-ID: This closely ties to the discussion with subject that starts with ?is this PEP-able? [...]? What about [x until condition for x in l ...] or [x for x in l until condition] Condition checked with output values only On Jun 28, 2013, at 3:28 PM, Jan Kaliszewski wrote: > 27.06.2013 13:49, Andrew Barnert napisa?: > [...] >> If someone wants this: >> >> x = [value for value in iterable if value > 0 while value < 10] >> >> ? and rejects this: >> >> x = [value for value in takewhile(lambda x: x < 10, iterable) if value > 0] >> >> ? I don't think they'd be happier with this: >> >> @in x = [value for value in takewhile(under10, iterable) if value > 0] >> def under10(value): >> return value < 10 > [...] >> In fact, I think what people really want is the opposite: to write an >> expression, not a function. That's a big part of the appeal of using >> comprehensions over map and filter: if you don't already have a >> ready-made function, no problem (and no lambdas or partials), just use >> a comprehension and write the expression in-place. People want a >> similar answer to make takewhile at least as unnecessary as map and >> filter. > > +1! > > Maybe it could be achieved with a new separate generator expression > syntax, being orthogonal to the existing generator-expr/comprehension > syntaxes? > > Maybe such as: > > ( from while ) > > E.g.: > > g = (line from myfile while line.strip()) > > ...which would be equivalent to: > > def _temp(_myfile): > iterator = iter(_myfile) > line = next(iterator) > while line.strip(): > yield line > line = next(iterator) > g = _temp(myfile) > > Unlike the existing generator-expr/comprehension syntaxes this > syntax is focused on predicate-based iteration stopping (like > itertools.takewhile) rather than on any processing of the > iterated values. > > Obviously, it could be combined with the existing syntaxes, e.g.: > > processed = [2 * x for x in (x from seq while x < 100)] > > ...or probably better (a matter of taste): > > items = (x from seq while x < 100) > processed = [2 * x for x in items] > > > Cheers. > *j > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sat Jun 29 01:50:51 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 28 Jun 2013 19:50:51 -0400 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> Message-ID: On Fri, Jun 28, 2013 at 7:38 PM, Shane Green wrote: > .. > [x until condition for x in l ...] or > [x for x in l until condition] > Just to throw in one more variation: [expr for item in iterable break if condition] (inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b") -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Sat Jun 29 03:50:42 2013 From: shane at umbrellacode.com (Shane Green) Date: Fri, 28 Jun 2013 18:50:42 -0700 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> Message-ID: <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> Yes, but it only works for generator expressions and not comprehensions. My opinion of that workaround is that it?s also a step backward in terms of readability. I suspect if i < 50 else stop() would probably also work, since it throws an exception. That?s better, IMHO. On Jun 28, 2013, at 6:38 PM, Andrew Carter wrote: > Digging through the archives (with a quick google search) http://mail.python.org/pipermail/python-ideas/2013-January/019051.html, if you really want an expression it seems you can just do > > def stop(): > raise StopIteration > list(i for i in range(100) if i < 50 or stop()) > > it seems to me that this would provide syntax that doesn't require lambdas. > > > > > > > On Fri, Jun 28, 2013 at 4:50 PM, Alexander Belopolsky wrote: > > > > On Fri, Jun 28, 2013 at 7:38 PM, Shane Green wrote: > .. > [x until condition for x in l ...] or > [x for x in l until condition] > > Just to throw in one more variation: > > [expr for item in iterable break if condition] > > (inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b") > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Sat Jun 29 03:54:27 2013 From: shane at umbrellacode.com (Shane Green) Date: Fri, 28 Jun 2013 18:54:27 -0700 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> Message-ID: <9F9ADA6E-46C9-4F92-91E7-EED89E9F4575@umbrellacode.com> Those things said, it is a workaround that works and was the culmination of a long discussion, so you?re absolutely right that it belongs in, and maybe finishes, this discussion. On Jun 28, 2013, at 6:50 PM, Shane Green wrote: > Yes, but it only works for generator expressions and not comprehensions. My opinion of that workaround is that it?s also a step backward in terms of readability. I suspect > > if i < 50 else stop() would probably also work, since it throws an exception. That?s better, IMHO. > > > > > > > On Jun 28, 2013, at 6:38 PM, Andrew Carter wrote: > >> Digging through the archives (with a quick google search) http://mail.python.org/pipermail/python-ideas/2013-January/019051.html, if you really want an expression it seems you can just do >> >> def stop(): >> raise StopIteration >> list(i for i in range(100) if i < 50 or stop()) >> >> it seems to me that this would provide syntax that doesn't require lambdas. >> >> >> >> >> >> >> On Fri, Jun 28, 2013 at 4:50 PM, Alexander Belopolsky wrote: >> >> >> >> On Fri, Jun 28, 2013 at 7:38 PM, Shane Green wrote: >> .. >> [x until condition for x in l ...] or >> [x for x in l until condition] >> >> Just to throw in one more variation: >> >> [expr for item in iterable break if condition] >> >> (inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b") >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Jun 29 04:50:14 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 28 Jun 2013 19:50:14 -0700 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> Message-ID: <6507AA45-09C5-4A61-B071-70DE827B4CFC@yahoo.com> On Jun 28, 2013, at 16:38, Shane Green wrote: > This closely ties to the discussion with subject that starts with ?is this PEP-able? [...]? > > What about > [x until condition for x in l ...] or > [x for x in l until condition] This one is already on the list of possibilities. Which probably implies that Nick Coghlan was right: it's time for a draft PEP so people have something to refer to instead of trying to follow multiple ridiculously long threads (and, as he implied, so the core devs have a way to close the discussion with a final "rejected" decision if they want to). I'll work on it this weekend. (I'll also probably write up #1 and #2 as a separate draft and just reference it, and try to better organize the other choices as I described in an earlier message.) Meanwhile, the advantage of "until" is that it can still map directly to a nestable statement, meaning the semantics of comps and genexps doesn't have to change at all. The disadvantage is that it seems like a statement that would almost never be useful outside of comprehensions. So, either we have to add a new flow control statement nobody will use, or add an "as-if" statement that's only accessible as a comp clause. > Condition checked with output values only > > > On Jun 28, 2013, at 3:28 PM, Jan Kaliszewski wrote: > >> 27.06.2013 13:49, Andrew Barnert napisa?: >> [...] >>> If someone wants this: >>> >>> x = [value for value in iterable if value > 0 while value < 10] >>> >>> ? and rejects this: >>> >>> x = [value for value in takewhile(lambda x: x < 10, iterable) if value > 0] >>> >>> ? I don't think they'd be happier with this: >>> >>> @in x = [value for value in takewhile(under10, iterable) if value > 0] >>> def under10(value): >>> return value < 10 >> [...] >>> In fact, I think what people really want is the opposite: to write an >>> expression, not a function. That's a big part of the appeal of using >>> comprehensions over map and filter: if you don't already have a >>> ready-made function, no problem (and no lambdas or partials), just use >>> a comprehension and write the expression in-place. People want a >>> similar answer to make takewhile at least as unnecessary as map and >>> filter. >> >> +1! >> >> Maybe it could be achieved with a new separate generator expression >> syntax, being orthogonal to the existing generator-expr/comprehension >> syntaxes? >> >> Maybe such as: >> >> ( from while ) >> >> E.g.: >> >> g = (line from myfile while line.strip()) >> >> ...which would be equivalent to: >> >> def _temp(_myfile): >> iterator = iter(_myfile) >> line = next(iterator) >> while line.strip(): >> yield line >> line = next(iterator) >> g = _temp(myfile) >> >> Unlike the existing generator-expr/comprehension syntaxes this >> syntax is focused on predicate-based iteration stopping (like >> itertools.takewhile) rather than on any processing of the >> iterated values. >> >> Obviously, it could be combined with the existing syntaxes, e.g.: >> >> processed = [2 * x for x in (x from seq while x < 100)] >> >> ...or probably better (a matter of taste): >> >> items = (x from seq while x < 100) >> processed = [2 * x for x in items] >> >> >> Cheers. >> *j >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Jun 29 04:59:35 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 28 Jun 2013 19:59:35 -0700 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> Message-ID: On Jun 28, 2013, at 16:50, Alexander Belopolsky wrote: > > > > On Fri, Jun 28, 2013 at 7:38 PM, Shane Green wrote: >> .. >> [x until condition for x in l ...] or >> [x for x in l until condition] > > Just to throw in one more variation: > > [expr for item in iterable break if condition] > > (inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b") This is pretty much the same as the single-word breakif or ifbreak ideas, but has the advantage of not adding a new keyword. I'm not sure, without looking more carefully, whether the grammar could be ambiguous to the parser. But to a person I think it could be. We've already got two ways an "if" can appear in a comp--an if clause, or a ternary expression. Adding a third seems like it might make it harder to get the meaning just by scanning. As a side note, I think it would really help if we came up with a couple of paradigm examples instead of using content free toy examples. Maybe: [line for line in f break if not line.strip() if not line.startswith("#")] In other words "all comment characters up to the first blank line". -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Jun 29 05:16:57 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 28 Jun 2013 20:16:57 -0700 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> Message-ID: <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> On Jun 28, 2013, at 18:50, Shane Green wrote: > Yes, but it only works for generator expressions and not comprehensions. This is the point if options #1 and 2: make StopIteration work in comps either (1) by redefining comprehensions in terms of genexps or (2) by fiat. After some research, it turns out that these are equivalent. Replacing any [comprehension] with list(comprehension) is guaranteed by the language (and the CPython implementation) to give you exactly the same value unless (a) something in the comp raises StopIteration, or (b) something in the comp relies on reflective properties (e.g., sys._getframe().f_code.co_flags) that aren't guaranteed anyway. So, other than being 4 characters more verbose and 40% slower, there's already an answer for comprehensions. And if either of those problems is unacceptable, a patch for #1 or #2 is actually pretty easy. I've got two different proof of concepts: one actually implements the comp as passing the genexp to list, the other just wraps everything after the BUILD_LIST and before the RETURN_VALUE in a the equivalent of try: ... except StopIteration: pass. I need to add some error handling to the C code, and for #2 write sufficient tests that verify that it really does work exactly like #1, but I should have working patches to play with in a couple days. > My opinion of that workaround is that it?s also a step backward in terms of readability. I suspect. > > if i < 50 else stop() would probably also work, since it throws an exception. That?s better, IMHO. > > > > > > > On Jun 28, 2013, at 6:38 PM, Andrew Carter wrote: > >> Digging through the archives (with a quick google search) http://mail.python.org/pipermail/python-ideas/2013-January/019051.html, if you really want an expression it seems you can just do >> >> def stop(): >> raise StopIteration >> list(i for i in range(100) if i < 50 or stop()) >> >> it seems to me that this would provide syntax that doesn't require lambdas. >> >> >> >> >> >> >> On Fri, Jun 28, 2013 at 4:50 PM, Alexander Belopolsky wrote: >>> >>> >>> >>> On Fri, Jun 28, 2013 at 7:38 PM, Shane Green wrote: >>>> .. >>>> [x until condition for x in l ...] or >>>> [x for x in l until condition] >>> >>> Just to throw in one more variation: >>> >>> [expr for item in iterable break if condition] >>> >>> (inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b") >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> http://mail.python.org/mailman/listinfo/python-ideas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jun 29 11:46:47 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 29 Jun 2013 19:46:47 +1000 Subject: [Python-ideas] PEP 315: do-while In-Reply-To: <8D04222146CE2DB-1864-301AF@webmail-m103.sysops.aol.com> References: <9932233C-898C-479A-8855-1936766F4E84@langa.pl> <8D040C26ED329D8-1864-24B42@webmail-m103.sysops.aol.com> <8D040C5B1DD4275-1864-24ED6@webmail-m103.sysops.aol.com> <1372288657.52390.YahooMailNeo@web184705.mail.ne1.yahoo.com> <8D0415100D6245B-1864-288A7@webmail-m103.sysops.aol.com> <8D0415C835C7F63-1864-290C2@webmail-m103.sysops.aol.com> <87obaqyb49.fsf@uwakimon.sk.tsukuba.ac.jp> <8D04222146CE2DB-1864-301AF@webmail-m103.sysops.aol.com> Message-ID: On 29 June 2013 00:22, wrote: > The takewhile syntax is kind of god awful. The lambda thing always gives > me pause, though I now 'grok' the whole thing to be a conditional. > I don't think the issue is about added complexity to the comprehension > (as the added clause would be optional anyway) Note that this is a misunderstanding of what "added complexity" means in a language design context. More options almost always mean more complexity. The only way adding more options can simplify things in a practical sense is when they shift complexity from user code to the interpreter implementation by extracting an existing common pattern and giving it dedicated syntax (such as comprehensions, generator expressions and "with" statements), or when they provide a corrected alternative to an existing tempting-but-wrong construct (such as "a if p else b" replacing the "p and a or b" hack and "yield from itr" replacing the coroutine incompatible "for x in itr: yield x"). In this case, the proposal is only tinkering at the edges - you can *always* handle it by creating a custom generator instead. All this proposal does is subtly adjust the point at which it becomes more attractive to write a custom generator function than it is to do the operation in line in the comprehension or generator expression. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Jun 29 12:09:54 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 29 Jun 2013 20:09:54 +1000 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> Message-ID: On 28 June 2013 18:20, Wolfgang Maier wrote: > Let me suggest one more solution although it requires a new keyword: > introduce > > *breakif* condition > > and define its translation as if condition: break . > You can now write > (x for x in iterable breakif x < 0) > and I don't see a way how that could possibly be misread by anyone. > Also it would translate unambiguously to the explicit: > > for x in iterable: > breakif x<0 # itself translating to if x<0: break > yield x > > It would work with genexps, comprehensions and explicit loops alike (with > very > little benefit for the later, though maybe it increases readability even > there > by making it clear from the start of the line what the purpose of the > condition > test is). This (or, more accurately, a slight variant that doesn't need a new keyword) actually sounds quite attractive to me. My rationale for that ties into the just rejected PEP 315, which tried to find an improved "loop and a half" syntax for Python, as well as the ongoing confusion regarding the meaning of the "else" clause on while and for loops. Currently, Python's fully general loop syntax looks like this: while True: # Iteration setup if termination_condition: break # Remained of iteration And the recommended idiom for a search loop looks like this: for x in data: if desired_value(x): break else: raise ValueError("Value not found in {:100!r}".format(data)) Rather than adding a new keyword, we could simply expand the syntax for the existing break statement to be this: break [if ] This would simplify the above two standard idioms to the following: while True: # Iteration setup break if termination_condition # Remainder of iteration for x in data: break if desired_value(x) else: raise ValueError("Value not found in {:100!r}".format(data)) A "bare" break would then be equivalent to "break if True". The "else" clause on the loop could then be *explicitly* documented as associated with the "break if " form - the else only executes if the break clause is never true. (That also becomes the justification for only allowing this for break, and not for continue or return: those have no corresponding "else" clause) Once the break statement has been redefined this way, it *then* becomes reasonable to allow the following in comprehensions: data = [x for x in iterable break if x is None] As with other proposals, I would suggest limiting this truncating form to disallow combination with the filtering and nested loop forms (at least initially). The dual use of "if" would make the filtering combination quite hard to read, and the nested loop form would be quite ambiguous as to which loop was being broken. If we start with the syntax restricted, we can relax those restrictions later if we find them too limiting, while if we start off being permissive, backwards compatibility would prevent us from adding restrictions later. I'd be very keen to see this written up as a PEP - it's the first proposal that I feel actually *simplifies* the language in any way (mostly by doing something about those perplexing-to-many else clauses on for and while loops). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Sat Jun 29 13:06:24 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 29 Jun 2013 12:06:24 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> Message-ID: On 29 June 2013 11:09, Nick Coghlan wrote: [talking about "break if ] > I'd be very keen to see this written up as a PEP - it's the first > proposal that I feel actually *simplifies* the language in any way > (mostly by doing something about those perplexing-to-many else clauses > on for and while loops). > +1. I like this syntax, and the rationale behind it. Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Sat Jun 29 14:10:38 2013 From: shane at umbrellacode.com (Shane Green) Date: Sat, 29 Jun 2013 05:10:38 -0700 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> Message-ID: <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> Thanks Andrew. My knee jerk reaction was to strongly prefer option two, which sounds like?if I understood correctly, and I?m not sure I do?it keeps both comprehensions and expressions. Rereading your points again, I must admit I didn?t see much to justify the knee jerk reaction. I do commonly use list comprehensions precisely *because* of the performance impact, and can think of a few places the 40% would be problematic. Was there a measurable performance difference with approach 2? On Jun 28, 2013, at 8:16 PM, Andrew Barnert wrote: > On Jun 28, 2013, at 18:50, Shane Green wrote: > >> Yes, but it only works for generator expressions and not comprehensions. > > This is the point if options #1 and 2: make StopIteration work in comps either (1) by redefining comprehensions in terms of genexps or (2) by fiat. > > After some research, it turns out that these are equivalent. Replacing any [comprehension] with list(comprehension) is guaranteed by the language (and the CPython implementation) to give you exactly the same value unless (a) something in the comp raises StopIteration, or (b) something in the comp relies on reflective properties (e.g., sys._getframe().f_code.co_flags) that aren't guaranteed anyway. > > So, other than being 4 characters more verbose and 40% slower, there's already an answer for comprehensions. > > And if either of those problems is unacceptable, a patch for #1 or #2 is actually pretty easy. > > I've got two different proof of concepts: one actually implements the comp as passing the genexp to list, the other just wraps everything after the BUILD_LIST and before the RETURN_VALUE in a the equivalent of try: ... except StopIteration: pass. I need to add some error handling to the C code, and for #2 write sufficient tests that verify that it really does work exactly like #1, but I should have working patches to play with in a couple days. > >> My opinion of that workaround is that it?s also a step backward in terms of readability. I suspect. >> >> if i < 50 else stop() would probably also work, since it throws an exception. That?s better, IMHO. >> >> >> >> >> >> >> On Jun 28, 2013, at 6:38 PM, Andrew Carter wrote: >> >>> Digging through the archives (with a quick google search) http://mail.python.org/pipermail/python-ideas/2013-January/019051.html, if you really want an expression it seems you can just do >>> >>> def stop(): >>> raise StopIteration >>> list(i for i in range(100) if i < 50 or stop()) >>> >>> it seems to me that this would provide syntax that doesn't require lambdas. >>> >>> >>> >>> >>> >>> >>> On Fri, Jun 28, 2013 at 4:50 PM, Alexander Belopolsky wrote: >>> >>> >>> >>> On Fri, Jun 28, 2013 at 7:38 PM, Shane Green wrote: >>> .. >>> [x until condition for x in l ...] or >>> [x for x in l until condition] >>> >>> Just to throw in one more variation: >>> >>> [expr for item in iterable break if condition] >>> >>> (inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b") >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> http://mail.python.org/mailman/listinfo/python-ideas >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Sat Jun 29 14:20:54 2013 From: shane at umbrellacode.com (Shane Green) Date: Sat, 29 Jun 2013 05:20:54 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> Message-ID: On Jun 29, 2013, at 3:09 AM, Nick Coghlan wrote: > data = [x for x in iterable break if x is None] +1 I like the syntax really like the reasoning and full story. On Jun 29, 2013, at 3:09 AM, Nick Coghlan wrote: > On 28 June 2013 18:20, Wolfgang Maier > wrote: >> Let me suggest one more solution although it requires a new keyword: >> introduce >> >> *breakif* condition >> >> and define its translation as if condition: break . >> You can now write >> (x for x in iterable breakif x < 0) >> and I don't see a way how that could possibly be misread by anyone. >> Also it would translate unambiguously to the explicit: >> >> for x in iterable: >> breakif x<0 # itself translating to if x<0: break >> yield x >> >> It would work with genexps, comprehensions and explicit loops alike (with >> very >> little benefit for the later, though maybe it increases readability even >> there >> by making it clear from the start of the line what the purpose of the >> condition >> test is). > > This (or, more accurately, a slight variant that doesn't need a new > keyword) actually sounds quite attractive to me. My rationale for that > ties into the just rejected PEP 315, which tried to find an improved > "loop and a half" syntax for Python, as well as the ongoing confusion > regarding the meaning of the "else" clause on while and for loops. > > Currently, Python's fully general loop syntax looks like this: > > while True: > # Iteration setup > if termination_condition: > break > # Remained of iteration > > And the recommended idiom for a search loop looks like this: > > for x in data: > if desired_value(x): > break > else: > raise ValueError("Value not found in {:100!r}".format(data)) > > Rather than adding a new keyword, we could simply expand the syntax > for the existing break statement to be this: > > break [if ] > > This would simplify the above two standard idioms to the following: > > while True: > # Iteration setup > break if termination_condition > # Remainder of iteration > > for x in data: > break if desired_value(x) > else: > raise ValueError("Value not found in {:100!r}".format(data)) > > A "bare" break would then be equivalent to "break if True". The "else" > clause on the loop could then be *explicitly* documented as associated > with the "break if " form - the else only executes if the break > clause is never true. (That also becomes the justification for only > allowing this for break, and not for continue or return: those have no > corresponding "else" clause) > > Once the break statement has been redefined this way, it *then* > becomes reasonable to allow the following in comprehensions: > > data = [x for x in iterable break if x is None] > > As with other proposals, I would suggest limiting this truncating form > to disallow combination with the filtering and nested loop forms (at > least initially). The dual use of "if" would make the filtering > combination quite hard to read, and the nested loop form would be > quite ambiguous as to which loop was being broken. If we start with > the syntax restricted, we can relax those restrictions later if we > find them too limiting, while if we start off being permissive, > backwards compatibility would prevent us from adding restrictions > later. > > I'd be very keen to see this written up as a PEP - it's the first > proposal that I feel actually *simplifies* the language in any way > (mostly by doing something about those perplexing-to-many else clauses > on for and while loops). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Sat Jun 29 17:46:11 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 29 Jun 2013 16:46:11 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> Message-ID: On 29 June 2013 11:09, Nick Coghlan wrote: > Rather than adding a new keyword, we could simply expand the syntax > for the existing break statement to be this: > > break [if ] ... > Once the break statement has been redefined this way, it *then* > becomes reasonable to allow the following in comprehensions: > > data = [x for x in iterable break if x is None] Almost all of your proposal looks reasonable, but I personally find this quite hard to read; it should be written along the lines of (I'm not proposing this): x for x in iterable; break if x is None if one is to continue having syntax that is pseudo-correct English - a trait I am eager to to keep. In summary, this is hard for me to read because there is no separation of the statements. Because I have not other substantial objections, I'm -0 on this. If you can find a way to "fix" that, I'll be, for all intents and purposes, neutral. From steve at pearwood.info Sat Jun 29 17:56:32 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 30 Jun 2013 01:56:32 +1000 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> Message-ID: <51CF03B0.8080508@pearwood.info> On 29/06/13 20:09, Nick Coghlan wrote: > On 28 June 2013 18:20, Wolfgang Maier > wrote: >> for x in iterable: >> breakif x<0 # itself translating to if x<0: break >> yield x [...] > This (or, more accurately, a slight variant that doesn't need a new > keyword) actually sounds quite attractive to me. My rationale for that > ties into the just rejected PEP 315, which tried to find an improved > "loop and a half" syntax for Python, as well as the ongoing confusion > regarding the meaning of the "else" clause on while and for loops. [...] > Rather than adding a new keyword, we could simply expand the syntax > for the existing break statement to be this: > > break [if ] > > This would simplify the above two standard idioms to the following: > > while True: > # Iteration setup > break if termination_condition > # Remainder of iteration It doesn't really *simplify* the standard idiom though. It just saves a line and, trivially, one indent: while True: if condition: break ... And it doesn't even save that if you write it like this: while True: if condition: break ... In fact, I would argue the opposite, it's not *simpler* it is *more complex* because it is a special case for the if keyword: break if condition # allowed continue if condition # maybe allowed? return 'spam' if condition # probably disallowed pass if condition # what's the point? raise Exception if condition # probably disallowed x += 1 if condition # almost certainly disallowed to say nothing of: break if condition else continue Given this suggestion, now people have to learn a third case for "if": - "if condition" starts a new block and must be followed by a colon; - unless it follows an expression, in which case it is the ternary operator and must be followed by "else expression"; - unless it follows a break (or continue? return? raise? ...) in which case it must not be followed by "else". > for x in data: > break if desired_value(x) > else: > raise ValueError("Value not found in {:100!r}".format(data)) > > A "bare" break would then be equivalent to "break if True". The "else" > clause on the loop could then be *explicitly* documented as associated > with the "break if " form - the else only executes if the break > clause is never true. (That also becomes the justification for only > allowing this for break, and not for continue or return: those have no > corresponding "else" clause) It is certainly not correct that the else of while...else and for...else is necessarily associated with the (implied) "if" of "break if condition". It's currently legal to write a for...else loop with no break. Pointless, but legal, so it must remain legal, and you still have an else without an if. Likewise we can do things like this: for x in data: break else: print("data must be empty") for x in data: if condition: return x else: print("condition was never true") I think that it is simply the case that for...else and while...else were misnamed. Great concept, but the keyword doesn't make sense. It isn't an *else* clause in the ordinary English sense of the word, but a *then* clause, and should have been spelled "for...then". But we're stuck with it until Python 4000. Trying to retcon the "else" as being associated with an implied "if" is simply misguided. > Once the break statement has been redefined this way, it *then* > becomes reasonable to allow the following in comprehensions: > > data = [x for x in iterable break if x is None] Which reads wrongly for something which is intended to be a single expression. Current comprehensions read like an English expression: [this for x in iterable] "Do this for each x in iterable." [this for x in iterable if condition] "Do this for each x in iterable if condition is true." are both reasonable English-like expressions. Even the proposed (and rejected) form: [this for x in iterable while condition] "Do this for each x in iterable while condition remains true." is an English-like expression. But your proposal isn't: [this for x in iterable break if x is None] "Do this for each x in iterable break if condition is true." The "break if..." clause reads like a second statement. Now I realise that Python is not English and doesn't match English grammar, but still, this proposal doesn't flow off the tongue, it trips and stumbles because it feels like two statements merely shoe-horned into one. Python is, for the most part, an astonishingly beautiful language to read (at least for English speakers) and I'm afraid that "break if condition" in a comprehension is not beautiful. -- Steven From jimjhb at aol.com Sat Jun 29 18:18:04 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Sat, 29 Jun 2013 12:18:04 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> Message-ID: <8D042FB5DD0DAA9-113C-35640@webmail-m160.sysops.aol.com> Cool. So it looks like it IS PEP-able. :) I was just thinking that, as keywords go, "break" is pretty unencumbered. It's really just a specific "goto outside of loop". No syntax tied with it, and for all practical purposes, relies on some conditional for its use. So it's not a bad candidate for augmentation. Not to be a pill, but following the same logic, would this mean: >>foo = [x for x in range(100) continue if ((x > 5) and (x<95))] >>print foo >>[0, 1, 2, 3, 4, 5, 95, 96, 97, 98, 99] ?? (feel free to just slap me regarding this....) - Jim Nick wote: >On 28 June 2013 18:20, Wolfgang Maier wrote: > Let me suggest one more solution although it requires a new keyword: > introduce > > *breakif* condition > > and define its translation as if condition: break . > You can now write > (x for x in iterable breakif x < 0) > and I don't see a way how that could possibly be misread by anyone. > Also it would translate unambiguously to the explicit: > > for x in iterable: > breakif x<0 # itself translating to if x<0: break > yield x > > It would work with genexps, comprehensions and explicit loops alike (with > very > little benefit for the later, though maybe it increases readability even > there > by making it clear from the start of the line what the purpose of the > condition > test is). This (or, more accurately, a slight variant that doesn't need a new keyword) actually sounds quite attractive to me. My rationale for that ties into the just rejected PEP 315, which tried to find an improved "loop and a half" syntax for Python, as well as the ongoing confusion regarding the meaning of the "else" clause on while and for loops. Currently, Python's fully general loop syntax looks like this: while True: # Iteration setup if termination_condition: break # Remained of iteration And the recommended idiom for a search loop looks like this: for x in data: if desired_value(x): break else: raise ValueError("Value not found in {:100!r}".format(data)) Rather than adding a new keyword, we could simply expand the syntax for the existing break statement to be this: break [if ] This would simplify the above two standard idioms to the following: while True: # Iteration setup break if termination_condition # Remainder of iteration for x in data: break if desired_value(x) else: raise ValueError("Value not found in {:100!r}".format(data)) A "bare" break would then be equivalent to "break if True". The "else" clause on the loop could then be *explicitly* documented as associated with the "break if " form - the else only executes if the break clause is never true. (That also becomes the justification for only allowing this for break, and not for continue or return: those have no corresponding "else" clause) Once the break statement has been redefined this way, it *then* becomes reasonable to allow the following in comprehensions: data = [x for x in iterable break if x is None] As with other proposals, I would suggest limiting this truncating form to disallow combination with the filtering and nested loop forms (at least initially). The dual use of "if" would make the filtering combination quite hard to read, and the nested loop form would be quite ambiguous as to which loop was being broken. If we start with the syntax restricted, we can relax those restrictions later if we find them too limiting, while if we start off being permissive, backwards compatibility would prevent us from adding restrictions later. >I'd be very keen to see this written up as a PEP - it's the first proposal that I feel actually *simplifies* the language in any way (mostly by doing something about those perplexing-to-many else clauses on for and while loops). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Sat Jun 29 18:20:30 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 29 Jun 2013 17:20:30 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <77399634-871A-4C88-AFFA-962C5464B09E@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> <51CB928B.2030706@stoneleaf.us> Message-ID: On 27 June 2013 13:55, Nick Coghlan wrote: > As far as this whole thread goes, though, we're still in the situation > where the simplest mechanisms Python currently has to extract a submap > from a mapping are a dict comprehension and operator.itemgetter. These > proposed changes to function calling syntax then target a very niche > case of that broader problem, where the mapping is specifically the > current local variables and the destination for the submap is a > function call with like named parameters. This is *not* the calibre of > problem that prompts us to make Python harder to learn by adding new > syntax. Sorry for the necromancy; but I just want to shine some inspiration I had recently on this: We *do* have a way of extracting a submap from a dict. We *already* have a way of writing foo = foo across scopes, as we want to do here. This problem *has* been solved before, and it looks like: from module import these, are, in, the, submap I haven't been able to find a really good syntax from this, but something like: 1) foo(a, b, **(key1, key2 from locals())) 2) {**(key1, key2 from locals())} 3) import key1, key2 from {"key1": 123, "key2": 345} etc. And, ? mon avis, it's a hell of a lot better than previous proposals for (1) and (2) from this thread, and (3) from other threads. Again, not a proposal per se but a revelation. From shane at umbrellacode.com Sat Jun 29 18:33:00 2013 From: shane at umbrellacode.com (Shane Green) Date: Sat, 29 Jun 2013 09:33:00 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> Message-ID: I see your point. I?m wondering if there?s a formatting convention or something that might suffice. Tried a couple but have?t come up with anything neat. >> data = [x for x in iterable >> break if x is None] >> data = [x for x in utterable break if x is None] I don?t necessarily think this makes the options any more difficult to parse than they were previously; advanced comprehensions can get a bit unwieldy already. With syntax highlighting a color coded ?break? keyword separates the generation from the termination. On Jun 29, 2013, at 8:46 AM, Joshua Landau wrote: > On 29 June 2013 11:09, Nick Coghlan wrote: >> Rather than adding a new keyword, we could simply expand the syntax >> for the existing break statement to be this: >> >> break [if ] > ... >> Once the break statement has been redefined this way, it *then* >> becomes reasonable to allow the following in comprehensions: >> >> data = [x for x in iterable break if x is None] > > Almost all of your proposal looks reasonable, but I personally find > this quite hard to read; it should be written along the lines of (I'm > not proposing this): > > x for x in iterable; break if x is None > > if one is to continue having syntax that is pseudo-correct English - a > trait I am eager to to keep. > > In summary, this is hard for me to read because there is no separation > of the statements. > > > Because I have not other substantial objections, I'm -0 on this. If > you can find a way to "fix" that, I'll be, for all intents and > purposes, neutral. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Sat Jun 29 18:46:03 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 29 Jun 2013 11:46:03 -0500 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> Message-ID: On 06/28/2013 10:16 PM, Andrew Barnert wrote: > On Jun 28, 2013, at 18:50, Shane Green > > wrote: > >> Yes, but it only works for generator expressions and not comprehensions. > > This is the point if options #1 and 2: make StopIteration work in comps > either (1) by redefining comprehensions in terms of genexps or (2) by fiat. > > After some research, it turns out that these are equivalent. Replacing any > [comprehension] with list(comprehension) is guaranteed by the language (and > the CPython implementation) to give you exactly the same value unless (a) > something in the comp raises StopIteration, or (b) something in the comp > relies on reflective properties (e.g., sys._getframe().f_code.co_flags) > that aren't guaranteed anyway. > > So, other than being 4 characters more verbose and 40% slower, there's > already an answer for comprehensions. Right.. Any solution also must not slow down the existing simpler cases. For those who haven't looked at the C code yet, there is this comment there. /* List and set comprehensions and generator expressions work by creating a nested function to perform the actual iteration. This means that the iteration variables don't leak into the current scope. The defined function is called immediately following its definition, with the result of that call being the result of the expression. The LC/SC version returns the populated container, while the GE version is flagged in symtable.c as a generator, so it returns the generator object when the function is called. This code *knows* that the loop cannot contain break, continue, or return, so it cheats and skips the SETUP_LOOP/POP_BLOCK steps used in normal loops. Possible cleanups: - iterate over the generator sequence instead of using recursion */ I don't know how much the SETUP_LOOP/POP_BLOCK costs in time. It probably only makes a big difference in nested cases. > And if either of those problems is unacceptable, a patch for #1 or #2 is > actually pretty easy. > > I've got two different proof of concepts: one actually implements the comp > as passing the genexp to list, the other just wraps everything after the > BUILD_LIST and before the RETURN_VALUE in a the equivalent of try: ... > except StopIteration: pass. I need to add some error handling to the C > code, and for #2 write sufficient tests that verify that it really does > work exactly like #1, but I should have working patches to play with in a > couple days. > >> My opinion of that workaround is that it?s also a step backward in terms >> of readability. I suspect. >> >> if i < 50 else stop() would probably also work, since it throws an >> exception. That?s better, IMHO. Once a function is added that is called on every iteration, then a regular for loop with a break (without the function call) will run quicker. I think what matters is that it's fast and is easy to explain. The first two examples here are the existing variations. The third case would be the added break case. (The exact spelling of the expression may be different.) # [x for x in seq] for x in iter: append x # LIST_APPEND byte code, not a method call # [x for x in seq if expr] for x in iter: if expr: append x # [x for x in seq if expr break] for x in iter: if expr: break append x The generator comps have YIELD_VALUE in place of LIST_APPEND, This last case is the simplest variation for an early exit. It only differs from the second case by having a BREAK_LOOP after the POP_JUMP_IF_FALSE instruction. Along with SETUP_LOOP and POP_BLOCK, before after the loops. I am curious about how many places in the library adding break to these would make a difference. If there isn't any, or only a few, then it's probably not needed. But then again, maybe it's worth a good before dismissing it. Cheers, Ron (* dis.dis seems to be adding some extra unneeded lines, a second, dead JUMP_ABSOLUTE to the top of the loop for case 2 above, and a "JUMP_FORWARD 0" in the third case. Seems odd, but these don't effect what we are talking about here.) From steve at pearwood.info Sat Jun 29 18:56:02 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 30 Jun 2013 02:56:02 +1000 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> <51CB928B.2030706@stoneleaf.us> Message-ID: <51CF11A2.2070804@pearwood.info> On 30/06/13 02:20, Joshua Landau wrote: > We*do* have a way of extracting a submap from a dict. Er, not really. Here's a dict: {} What you talk about is extracting named attributes from a module or package namespace. That modules happen to use a dict as the storage mechanism is neither here nor there. I can do this: class NoDict(object): __slots__ = ['eggs'] fake_module = NoDict() fake_module.eggs = 42 assert not hasattr(fake_module, '__dict__') sys.modules['spam'] = fake_module from spam import eggs and it all just works. But I'm no closer to extracting a subdict from a dict. > We*already* have a way of writing > > foo = foo > > across scopes, as we want to do here. We sure do. And that's by just explicitly writing foo=foo. This is not a problem that needs solving. I'm kind of astonished that so many words have been spilled on a non-problem just to save a few characters for such a special case. > This problem*has* been solved before, and it looks like: > > from module import these, are, in, the, submap As I show above, this is about named attribute access, not dicts. And as you admit below, it doesn't lead to good syntax for extracting a subdict: > I haven't been able to find a really good syntax from this, but something like: > > 1) foo(a, b, **(key1, key2 from locals())) > > 2) {**(key1, key2 from locals())} > > 3) import key1, key2 from {"key1": 123, "key2": 345} > > etc. > > And, ? mon avis, it's a hell of a lot better than previous proposals > for (1) and (2) from this thread, and (3) from other threads. > > Again, not a proposal per se but a revelation. -- Steven From python at mrabarnett.plus.com Sat Jun 29 19:19:10 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 29 Jun 2013 18:19:10 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51CF11A2.2070804@pearwood.info> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> <51CB928B.2030706@stoneleaf.us> <51CF11A2.2070804@pearwood.info> Message-ID: <51CF170E.8020409@mrabarnett.plus.com> On 29/06/2013 17:56, Steven D'Aprano wrote: > On 30/06/13 02:20, Joshua Landau wrote: > >> We*do* have a way of extracting a submap from a dict. > > Er, not really. Here's a dict: {} > > What you talk about is extracting named attributes from a module or package namespace. That modules happen to use a dict as the storage mechanism is neither here nor there. I can do this: > > class NoDict(object): > __slots__ = ['eggs'] > > fake_module = NoDict() > fake_module.eggs = 42 > assert not hasattr(fake_module, '__dict__') > > sys.modules['spam'] = fake_module > > from spam import eggs > > > and it all just works. But I'm no closer to extracting a subdict from a dict. > > >> We*already* have a way of writing >> >> foo = foo >> >> across scopes, as we want to do here. > > We sure do. And that's by just explicitly writing foo=foo. This is not a problem that needs solving. I'm kind of astonished that so many words have been spilled on a non-problem just to save a few characters for such a special case. > > >> This problem*has* been solved before, and it looks like: >> >> from module import these, are, in, the, submap > > As I show above, this is about named attribute access, not dicts. And as you admit below, it doesn't lead to good syntax for extracting a subdict: > > >> I haven't been able to find a really good syntax from this, but something like: >> >> 1) foo(a, b, **(key1, key2 from locals())) >> >> 2) {**(key1, key2 from locals())} >> >> 3) import key1, key2 from {"key1": 123, "key2": 345} >> >> etc. >> You could just add a method to dict: foo(a, b, **locals().subdict(["key1", "key2"])) >> And, ? mon avis, it's a hell of a lot better than previous proposals >> for (1) and (2) from this thread, and (3) from other threads. >> >> Again, not a proposal per se but a revelation. > > > From shane at umbrellacode.com Sat Jun 29 19:32:16 2013 From: shane at umbrellacode.com (Shane Green) Date: Sat, 29 Jun 2013 10:32:16 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51CF03B0.8080508@pearwood.info> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: <7B05EB29-A2BC-4F5F-9929-ECE3C23149D2@umbrellacode.com> Reminds me of a cartoon in the new yorker: a guy is looking over the shoulder of another who is dressed futuristically, incredulously asking the futuristic guy, ?you came back in time just so you could hit ?reply? instead of ?reply all??? Made me realize my habit of accidentally doing the opposite is far better than being the habit of hitting reply-all :-) On Jun 29, 2013, at 8:56 AM, Steven D'Aprano wrote: > It doesn't really *simplify* the standard idiom though. It just saves a line and, trivially, one indent: > > while True: > if condition: > break > ... > > And it doesn't even save that if you write it like this: > > while True: > if condition: break > ... This argument has come up before and I don?t really understand or agree with it. It seems to: - gloss over the fact this allows you to do these things in list comprehensions; - argue against the existence of list comprehensions at all, rather than this extension; - and leaves out everything else a list comprehension does. Without this feature list compreshensions aren?t used to describe while loops, period; so you shouldn?t subtract the standard list comprehension contributions from the benefits realized by this change. [line.strip() for line in lines break if not line] compared to: messages = [] while True: line = lines.pop(0) if not line: break messages.push(lines[0].strip()) Of course it would be easier using a for loop, which you?d then be tempted to replace with a compression... -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Sat Jun 29 21:48:28 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 29 Jun 2013 20:48:28 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51CF11A2.2070804@pearwood.info> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> <51CB928B.2030706@stoneleaf.us> <51CF11A2.2070804@pearwood.info> Message-ID: On 29 June 2013 17:56, Steven D'Aprano wrote: > On 30/06/13 02:20, Joshua Landau wrote: > >> We*do* have a way of extracting a submap from a dict. > > Er, not really. Here's a dict: {} > > What you talk about is extracting named attributes from a module or package > namespace. That modules happen to use a dict as the storage mechanism is > neither here nor there. I can do this: > > class NoDict(object): > __slots__ = ['eggs'] > > fake_module = NoDict() > fake_module.eggs = 42 > assert not hasattr(fake_module, '__dict__') > > sys.modules['spam'] = fake_module > > from spam import eggs > > > and it all just works. But I'm no closer to extracting a subdict from a > dict. I realise you used "slots", but it's still a dictionary. Well, OK, it's merely a "mapping" but I thought we duck-typed. So that's what you just did. You took a part of a namespace and added it to another namespace. That's what we wanted to do with: function_call(pass "arg" from local namespace to function's namespace) The fact that Python's namespaces are dictionaries isn't irrelevant. They're the same thing, and I think it's foolish to claim that's purely by implementation. Dictionaries are mappings are namespaces; they provide the *same* functionality through the same interface with slightly different semi-arbitrary restrictions and APIs. Just imagine my post but using "namespace" instead of "dictionary" and you might get the point. >> We*already* have a way of writing >> >> >> foo = foo >> >> across scopes, as we want to do here. > > > We sure do. And that's by just explicitly writing foo=foo. This is not a > problem that needs solving. I'm kind of astonished that so many words have > been spilled on a non-problem just to save a few characters for such a > special case. As has been shown, this is actually a problem that quite a lot of people care about. There are a lot of threads (by far not just this one) that have understood that moving an argument from one namespace to another is something that feels like it should be able to look nicer. Surely you'd understand the problem if we had to write: module = import("module") arg = module.arg foo = module.foo blah = module.blah ... which, as I was saying, is *exactly the same problem* -- except that we've already solved it. That was what I'd realised and what I was pointing out. The rest of the post is along these lines, so which I've covered, and about the syntax, which is beside the point. From joshua.landau.ws at gmail.com Sat Jun 29 21:51:37 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 29 Jun 2013 20:51:37 +0100 Subject: [Python-ideas] Short form for keyword arguments and dicts In-Reply-To: <51CF170E.8020409@mrabarnett.plus.com> References: <15EECE07-B3C9-4218-937F-E0BDAB3DBF34@yahoo.com> <51C6E878.1020003@pearwood.info> <51C78619.3010405@canterbury.ac.nz> <20130624162356.GJ22763@mcnabbs.org> <87hagn1jq4.fsf@uwakimon.sk.tsukuba.ac.jp> <20130624175817.GL22763@mcnabbs.org> <20130624224127.GA10419@ando> <51C8E702.6000509@canterbury.ac.nz> <20130625035710.GD10635@ando> <51C93D1D.1080103@canterbury.ac.nz> <874ncm1vdg.fsf@uwakimon.sk.tsukuba.ac.jp> <51C951A7.9050707@canterbury.ac.nz> <51CA9CE6.1080402@canterbury.ac.nz> <51CAA070.1040306@stoneleaf.us> <51CB0795.2000508@stoneleaf.us> <51CB928B.2030706@stoneleaf.us> <51CF11A2.2070804@pearwood.info> <51CF170E.8020409@mrabarnett.plus.com> Message-ID: On 29 June 2013 18:19, MRAB wrote: > On 30/06/13 02:20, Joshua Landau wrote: >> I haven't been able to find a really good syntax from this, but something >> like: >> >> 1) foo(a, b, **(key1, key2 from locals())) >> >> 2) {**(key1, key2 from locals())} >> >> 3) import key1, key2 from {"key1": 123, "key2": 345} >> >> etc. >> > You could just add a method to dict: > > foo(a, b, **locals().subdict(["key1", "key2"])) Only for (1), and (2) once http://bugs.python.org/issue2292 is in. Again, though, I don't really care about the syntax as it's not the point I was making. From ncoghlan at gmail.com Sun Jun 30 01:51:43 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 30 Jun 2013 09:51:43 +1000 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51CF03B0.8080508@pearwood.info> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: On 30 June 2013 01:56, Steven D'Aprano wrote: > In fact, I would argue the opposite, it's not *simpler* it is *more complex* > because it is a special case for the if keyword: > > break if condition # allowed > continue if condition # maybe allowed? > return 'spam' if condition # probably disallowed > pass if condition # what's the point? > raise Exception if condition # probably disallowed > x += 1 if condition # almost certainly disallowed People already have to learn "for/else" and "while/else". Adding "break if" can *only* be justified on the grounds of pairing it up with those two existing else clauses to make appropriate if/else pairs, as "for/break if/else" and "while/break if/else" should actually be easier to learn than the status quo. However, note that I said "I would love to see this idea written up as a PEP", not "I would love to see this idea implemented as part of the language": I'm undecided on that point as yet. It at least has the *potential* to make an existing aspect of the language easier to learn (unlike every other suggestion in any of these threads). Including a plan to deprecate allowing loop else clauses without a corresponding "break" in such a PEP would be a good idea, though. Expanding it to comprehensions is indeed trickier, because the "break" doesn't quite scan correctly. While Joshua said he wasn't proposing it, permitting a semi-colon in the comprehension to separate out a termination clause actually reads pretty well to me (we can avoid ambiguity in the grammar because we don't currently allow ';' anywhere between any kind of bracket): [x for x in iterable; break if x is None] [x for x in data if x; break if x is None] One nice advantage of that notation is that: 1. The statement after the ";" is exactly the statement that would appear in the expanded loop 2. It can be combined unambiguously with a filtering clause 3. It clearly disallows its use with nested loops in the comprehension Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Sun Jun 30 06:53:34 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 29 Jun 2013 21:53:34 -0700 (PDT) Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> Message-ID: <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> From: Shane Green Sent: Saturday, June 29, 2013 5:10 AM >Thanks Andrew. ?My knee jerk reaction was to strongly prefer option two, which sounds?like?if I understood correctly, and I?m not sure I do?it keeps both comprehensions and expressions. ?Rereading your points again, I must admit I didn?t see much to justify the knee jerk reaction. Sorry, it's my fault for conflating two independent choices. Let me refactor things: Nobody's talking about getting rid of comprehensions. Today, you have the choice to write [comp] instead of list(comp), and that won't change. However, today these differ in that the latter lets you exit early with StopIteration, while the former doesn't. That's what's proposed to change. Choice 1 is about the language definition. With this change, there is?no remaining difference between genexps except for returning a list vs. an iterator. That means we could simplify things by defining the two concepts together (or defining one in terms of the other). I don't see any downside to doing that. Choice 2 is about the CPython implementation. We can reimplement comprehensions as wrappers around genexps, or we can just add a try?except into comprehensions. The former would simplify the compiler and the interpreter, but at a cost of up to 40% for comprehensions. The latter would leave things no simpler than they are today, but also no slower. Once put this way, I think the choices are obvious: Simplify the language, don't simplify the implementation. >I do commonly use list comprehensions precisely *because* of the performance impact, and can think of a few places the 40% would be problematic. Usually that's a premature optimization. For anything simple enough that the iteration cost isn't swamped by your real work, the performance usually doesn't matter anyway. But "usually" isn't always, and there definitely are real-world cases where it would hurt. > Was there a measurable performance difference with approach 2? Once I realized that the right place to put the try is just outside the loop? that makes it obvious that there is no per-iteration cost, only a constant cost. If you don't raise an exception through a listcomp, that cost is basically running one more opcode and loading a few more bytes into memory. It adds?less than 1% for even a trivial comp that loops 10 times, or for a realistic but still simple comp that loops 3 times. I'll post actual numbers for local tests and for benchmarks once I get things finished (hopefully Monday). >On Jun 28, 2013, at 8:16 PM, Andrew Barnert wrote: > >On Jun 28, 2013, at 18:50, Shane Green wrote: >> >> >>Yes, but it only works for generator expressions and not comprehensions. >> >> >>This is the point if options #1 and 2: make StopIteration work in comps either (1) by redefining comprehensions in terms of genexps or (2) by fiat.? >> >> >>After some research, it turns out that these are equivalent. Replacing any [comprehension] with list(comprehension) is guaranteed by the language (and the CPython implementation) to give you exactly the same value unless (a) something in the comp raises StopIteration, or (b) something in the comp relies on reflective properties (e.g., sys._getframe().f_code.co_flags) that aren't guaranteed anyway. >> >> >>So, other than being 4 characters more verbose and 40% slower, there's already an answer for comprehensions. >> >> >>And if either of those problems is unacceptable, a patch for #1 or #2 is actually pretty easy.? >> >> >>I've got two different proof of concepts: one actually implements the comp as passing the genexp to list, the other just wraps everything after the BUILD_LIST and before the RETURN_VALUE in a the equivalent of try: ... except StopIteration: pass. I need to add some error handling to the C code, and for #2 write sufficient tests that verify that it really does work exactly like #1, but I should have working patches to play with in a couple days. >> >>My opinion of that workaround is that it?s also a step backward in terms of readability. ?I suspect.? >> >>> >>>if i < 50 else stop() would probably also work, since it throws an exception. ?That?s better, IMHO. ? >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>On Jun 28, 2013, at 6:38 PM, Andrew Carter wrote: >>> >>>Digging through the archives (with a quick google search)?http://mail.python.org/pipermail/python-ideas/2013-January/019051.html, if you really want an expression it seems you can just do >>>> >>>> >>>>def stop(): >>>>? raise StopIteration >>>>list(i for i in range(100) if i < 50 or stop()) >>>> >>>> >>>>it seems to me that this would provide syntax that doesn't require lambdas. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>On Fri, Jun 28, 2013 at 4:50 PM, Alexander Belopolsky??wrote: >>>> >>>> >>>>> >>>>> >>>>> >>>>> >>>>>On Fri, Jun 28, 2013 at 7:38 PM, Shane Green??wrote: >>>>> >>>>>.. >>>>>>[x until condition for x in l ...] or? >>>>>>[x for x in l until condition] >>>>> >>>>> >>>>>Just to throw in one more variation: >>>>> >>>>> >>>>>[expr for item in iterable break if condition] >>>>> >>>>> >>>>>(inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b") ? >>>>>_______________________________________________ >>>>>Python-ideas mailing list >>>>>Python-ideas at python.org >>>>>http://mail.python.org/mailman/listinfo/python-ideas >>> >>_______________________________________________ >>>Python-ideas mailing list >>>Python-ideas at python.org >>>http://mail.python.org/mailman/listinfo/python-ideas > > > From ncoghlan at gmail.com Sun Jun 30 07:45:53 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 30 Jun 2013 15:45:53 +1000 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: On 30 June 2013 14:53, Andrew Barnert wrote: > If you don't raise an exception through a listcomp, that cost is basically running one more opcode and loading a few more bytes into memory. It adds less than 1% for even a trivial comp that loops 10 times, or for a realistic but still simple comp that loops 3 times. Comprehensions translate to inline loops. The fact that the "stop()" hack works for generator expressions is just a quirky result of the line between "return" and "raise StopIteration" in a generator function being extraordinarily blurry - it has nothing to with breaking out of a loop. That's why the stop() hack terminates a nested genexp completely, rather than just breaking out of the innermost loop: >>> def stop(): ... raise StopIteration ... >>> list((x, y) for x in range(10) for y in range(10) if y < 3 or stop()) [(0, 0), (0, 1), (0, 2)] >>> def greturn(): ... for x in range(10): ... for y in range(10): ... if y >= 3: return ... yield x, y ... >>> list(greturn()) [(0, 0), (0, 1), (0, 2)] >>> def graise(): ... for x in range(10): ... for y in range(10): ... if y >= 3: raise StopIteration ... yield x, y ... >>> list(graise()) [(0, 0), (0, 1), (0, 2)] Note how all three produce the same output, and how both loops terminate immediately when the return/raise is encountered. You can't get the same semantics for other comprehensions without introducing the yield/return distinction, which is astonishingly slow by comparison (as you need to suspend and resume the generator frame on each iteration). The overhead of that is only worth it when the cost of having the entire result in memory at the same time is prohibitive. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Sun Jun 30 08:43:53 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 29 Jun 2013 23:43:53 -0700 (PDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: <1372574633.43291.YahooMailNeo@web184702.mail.ne1.yahoo.com> From: Nick Coghlan Sent: Saturday, June 29, 2013 4:51 PM > On 30 June 2013 01:56, Steven D'Aprano wrote: >> In fact, I would argue the opposite, it's not *simpler* it is *more > complex* >> because it is a special case for the if keyword: >> >> break if condition? # allowed >> continue if condition? # maybe allowed? >> return 'spam' if condition? # probably disallowed >> pass if condition? # what's the point? >> raise Exception if condition? # probably disallowed >> x += 1 if condition? # almost certainly disallowed > > People already have to learn "for/else" and "while/else". > Adding > "break if" can *only* be justified on the grounds of pairing it up > with those two existing else clauses to make appropriate if/else > pairs, as "for/break if/else" and "while/break if/else" > should > actually be easier to learn than the status quo. I honestly don't think it would read this way to most people; the else clause would still be confusing. Especially if it continues to means the same thing even without break if, which means as soon as you "learn" the for/break if/else rule you'll learn that it's not really true. Also, if it comes at the cost of making comprehensions harder to learn and understand, I think people use (and see) comprehensions much more often than for/else. Again, this is equivalent to an "until" clause, but worse in three ways: it doesn't scan well, it looks ambiguous (even if it isn't to the parser), and it doesn't have a nested block form to translate to. Compare: ? ? [x for x in iterable until x is None] ? ? a = [] ? ? for x in iterable: ? ? ? ? until x is None: ? ? ? ? ? ? a.append(x) ? ? [x for x in iterable break if x is None] ? ? a = [] ? ? for x in iterable: ? ? ? ? break if x is None ? ? ? ? a.append(x) > While Joshua said he wasn't proposing > it, permitting a semi-colon in the comprehension to separate out a > termination clause actually reads pretty well to me (we can avoid > ambiguity in the grammar because we don't currently allow ';' > anywhere > between any kind of bracket): > > ? [x for x in iterable; break if x is None] > ? [x for x in data if x; break if x is None] This seems to be much better. But only because it's fixed to the end of the comprehension; I don't see how it could be extended to work with nested comprehensions. And that brings up a larger point. While there are many languages with break-if/while/until-type clauses in comprehensions, I can't think of any of with such clauses in Python/Haskell-style arbitrary nested comprehensions. In most other languages, you can't nest arbitrary clauses, just loops. Then you can attach a single optional clause of each type to the end of either the whole thing (like Racket) or each loop (like Clojure).?In pseudo-Python: ? ? [x for row in data, x in row if x until x is None] ? ? [x for (row in data), (x in row if x until x is None)] The first of these is clearly weaker in that you can only attach condition clauses to the innermost loop, but usually that's not a problem?most cases where you'd need breaks in two places are probably too complicated to write as comprehensions anyway. And, when it is a problem, you can always?explicitly nest and flatten: ? ? flatten((x for x in row if x) for row in data until row is None) And it has the advantage over the second one that adding syntax to separate a comprehension-wide clause is much easier than adding syntax to separate per-loop clauses. (I can't think of a way to write the latter that doesn't require extra parentheses.) So, your suggestion turns Python list comprehensions into a hybrid of two styles: Haskell-style ifs mixed with fors, Racket-style per-comprehension break-ifs (and anything else we add in the future). While that sounds bad at first, I'm not sure it really is. I think it would be very easy to explain pedagogically, and?it's trivial in grammar terms: ? ? comprehension ::= expression comp_for [";" comp_cond] ? ? comp_cond ::= comp_breakif ? ? comp_breakif ::= "break" "if" expression_nocond All that being said, I still think "until" is better than "break if" even with this syntax. > One nice advantage of that notation is that: > > 1. The statement after the ";" is exactly the statement that would > appear in the expanded loop But it still breaks the rule that the expression is nested inside the last statement, which means it still makes explaining and documenting comprehensions more difficult. Not _as_ difficult as with a non-nested-block clause that can appear anywhere, but still more difficult than without any such clauses. > 2. It can be combined unambiguously with a filtering clause > 3. It clearly disallows its use with nested loops in the comprehension > > Cheers, > Nick. > > -- > Nick Coghlan? |? ncoghlan at gmail.com? |? Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From ron3200 at gmail.com Sun Jun 30 09:01:47 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sun, 30 Jun 2013 02:01:47 -0500 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: On 06/30/2013 12:45 AM, Nick Coghlan wrote: > You can't get the same semantics for other comprehensions without > introducing the yield/return distinction, which is astonishingly slow > by comparison (as you need to suspend and resume the generator frame > on each iteration). The overhead of that is only worth it when the > cost of having the entire result in memory at the same time is > prohibitive. It seems to me that if we were to rewrite comprehensions as generators yielding into the container. the generator would be private. It can't be taken out and reused, never needs to be suspended, and always yields to the same container. Because it is private, it can be altered to make it more efficient by removing the unneeded parts and/or substituting alternative byte code that does the same thing in a faster way. Wouldn't they then be equivalent to the current comprehensions with the generator exception handling added? Cheers, Ron From abarnert at yahoo.com Sun Jun 30 09:26:28 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 30 Jun 2013 00:26:28 -0700 (PDT) Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: <1372577188.85901.YahooMailNeo@web184701.mail.ne1.yahoo.com> From: Nick Coghlan Sent: Saturday, June 29, 2013 10:45 PM > On 30 June 2013 14:53, Andrew Barnert wrote: >> If you don't raise an exception through a listcomp, that cost is >> basically running one more opcode and loading a few more bytes into memory. It >> adds less than 1% for even a trivial comp that loops 10 times, or for a >> realistic but still simple comp that loops 3 times. > > Comprehensions translate to inline loops. The fact that the "stop()" > hack works for generator expressions is just a quirky result of the > line between "return" and "raise StopIteration" in a > generator > function being extraordinarily blurry - it has nothing to with > breaking out of a loop. I think it's a pretty obvious consequence of the fact that a genexp builds an iterator, and raising StopIteration stops any iterator. And that's why the stop() hack terminates a nested genexp completely rather than just one loop: because it's the entire genexp that's an iterator, not each individual nested clause. > You can't get the same semantics for other comprehensions without > introducing the yield/return distinction, which is astonishingly slow > by comparison (as you need to suspend and resume the generator frame > on each iteration). Sure you can. All you have to do is put a try/except around the outer loop.?Taking your example: ? ? [(x, y) for x in range(10) for y in range(10) if y < 3 or stop()] This would now map to: ? ? a = [] ? ? try: ? ? ? ? for x in range(10): ? ? ? ? ? ? for y in range(10): ? ? ? ? ? ? ? ? if y < 3 or stop(): ? ? ? ? ? ? ? ? ? ? a.append((x, y)) ? ? except StopIteration: ? ? ? ? pass And that change is enough to give it _exactly_ the same semantics as ? ? list((x, y) for x in range(10) for y in range(10) if y < 3 or stop()) ? but without the 40% performance hit from the yields. The only overhead added is a single SETUP_EXCEPT. But really, despite the origin of the idea, my reason for liking it has little to do with iteration stopping, and much more to do with making the language simpler. The key is that this change would mean that [comp] becomes semantically and behaviorally identical to list(genexp), except for being 40% faster. Put another way: I think the language would be better if, instead of documenting comprehensions and generator expressions largely independently and in parallel, we simply said that [comp] has the same effect as list(genexp) except that if you raise StopIteration with a comp it passes through. (The fact that this is true today is by no means obvious, but that's part of the problem I want to solve, not an argument against solving it.) If we removed that one difference, it would be even simpler. I don't think the difference buys us anything, and the cost of eliminating it is a relatively simple patch with minimal performance impact. (As a minor side benefit, that would also mean you could use the StopIteration hack in comps, but I still don't think we'd want to recommend doing that.) From ncoghlan at gmail.com Sun Jun 30 09:51:46 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 30 Jun 2013 17:51:46 +1000 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <1372574633.43291.YahooMailNeo@web184702.mail.ne1.yahoo.com> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <1372574633.43291.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: On 30 June 2013 16:43, Andrew Barnert wrote: > From: Nick Coghlan > > Sent: Saturday, June 29, 2013 4:51 PM > > >> On 30 June 2013 01:56, Steven D'Aprano wrote: >>> In fact, I would argue the opposite, it's not *simpler* it is *more >> complex* >>> because it is a special case for the if keyword: >>> >>> break if condition # allowed >>> continue if condition # maybe allowed? >>> return 'spam' if condition # probably disallowed >>> pass if condition # what's the point? >>> raise Exception if condition # probably disallowed >>> x += 1 if condition # almost certainly disallowed >> >> People already have to learn "for/else" and "while/else". >> Adding >> "break if" can *only* be justified on the grounds of pairing it up >> with those two existing else clauses to make appropriate if/else >> pairs, as "for/break if/else" and "while/break if/else" >> should >> actually be easier to learn than the status quo. > > > I honestly don't think it would read this way to most people; the else clause would still be confusing. Especially if it continues to means the same thing even without break if, which means as soon as you "learn" the for/break if/else rule you'll learn that it's not really true. Hence why I said any such PEP should also propose that a dangling "else" on a loop without a break statement should be deprecated and eventually become a syntax error. Without a break, a loop else clause is just pointless indentation of code that could be written in line after the loop instead. Loop else clauses *only* make sense in combination with break, so we should just make that official and enforce it in the compiler. While, we should probably do that regardless, there's no incentive to work on it without some other kind of payoff, like the ability to terminate comprehensions early :) > Also, if it comes at the cost of making comprehensions harder to learn and understand, I think people use (and see) comprehensions much more often than for/else. While it does have a strong statement/expression dichotomy, Python is still one language, not two. Any proposals that rely on adding new expression-only keywords are dead in the water. PEP 315 has now been explicitly rejected: the official syntax for terminating a loop early is the existing break statement, thus any proposal for terminating a comprehension early must also be based on break if it is to be considered a serious suggestion rather than people just idling passing the time on an internet mailing list (I actually don't mind that last happening a bit here - I think it's an important part of python-ideas serving the purpose it is designed to serve. It's just important to learn the difference between those discussions and the proposals which actually have some hope of surviving the vigorous critique PEPs face on python-dev). Any proposal to allow termination of comprehensions slots into the same design space as PEPs like 403 (the @in pseudo-decorator) and 422 (the __init_class__ hook) - they don't add fundamentally new capabilities to the language the way context managers or generator delegation did, they just propose tidying up a couple of rough edges for things that are already possible, but require structuring code in a slightly awkward way. PEP 409 is an example of an accepted PEP that fits into the same category - it makes it easier to generate clean exception tracebacks when you're deliberately suppressing an inner exception and replacing it with a different one. PEP 3129 (which added class decorators), is another good example of a "clean up" that took an existing concept and adjusted it slightly, rather than adding a fundamental new capability. A proposal to allow early termination of comprehensions has *zero* chance of acceptance as a "major language change" PEP. It simply doesn't have enough to offer in terms of additional expressiveness. PEP 403 (the @in pseudo-decorator) holds the promise of addressing at least some of the many requests over the years for multi-line lambda support, and even *that* is on dubious ground in terms of the additional complexity vs additional expressiveness trade-off. There's nothing wrong with cleanup PEPs, though - they're an important part of refactoring the language design to be a bit more self-consistent. That's why I latched on to the idea of doing something to clean up the known wart that is loop else clauses, and then *expanding* that to offer early termination of comprehensions. It may still get shot down (if Guido doesn't like it, it *will* get shot down), but the "[x for x in y; break if x is None]" variant definitely has a few points in its favour: - the ";" helps it read like English and avoids ambiguity relative to filtering clauses - the "cannot break nested comprehensions" restriction helps limit ambiguity - the statement form adds a new "if" to go with the confusing "else" on loops - it can be paired with deprecation of tolerating else-without-break on loops I think the idea of early termination of comprehensions has a *much* better chance of getting Guido's interest if it helps make the behaviour of else clauses on loops more comprehensible without needing elaborate explanations like http://python-notes.curiousefficiency.org/en/latest/python_concepts/break_else.html That still needs a volunteer to make the sales pitch in a PEP and work out how to implement it, though :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Sun Jun 30 12:17:43 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 30 Jun 2013 03:17:43 -0700 (PDT) Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: <1372587463.90079.YahooMailNeo@web184701.mail.ne1.yahoo.com> From: Ron Adam Sent: Sunday, June 30, 2013 12:01 AM > On 06/30/2013 12:45 AM, Nick Coghlan wrote: >> You can't get the same semantics for other comprehensions without >> introducing the yield/return distinction, which is astonishingly slow >> by comparison (as you need to suspend and resume the generator frame >> on each iteration). The overhead of that is only worth it when the >> cost of having the entire result in memory at the same time is >> prohibitive. > > It seems to me that if we were to rewrite comprehensions as generators yielding > into the container. the generator would be private.? It can't be taken out > and reused, never needs to be suspended, and always yields to the same > container. > > Because it is private, it can be altered to make it more efficient by removing > the unneeded parts and/or substituting alternative byte code that does the same > thing in a faster way. I tried to come up with an efficient way to do that, but failed?and learned a lot from my failure.?See?http://stupidpythonideas.blogspot.com/2013/06/can-you-optimize-listgenexp.html for the gory details, but I'll summarize here. Ultimately, it's the generator feeding a separate FOR_ITER that makes the StopIteration work. (Raising StopIteration in a generator basically feeds a value to the calling FOR_ITER, just like returning or yielding.) But that's exactly the part that makes it slow. Optimizing out the other stuff (mainly the list call) doesn't do much good; you need some way to inline the generator into the outer FOR_ITER so you can just jump back and forth instead of suspending and resuming a generator. Which?means adding new FAST_FOR_ITER, FAST_YIELD, and FAST_RETURN opcodes. Making that work in general is very hard (you need some way to let FAST_FOR_ITER know whether the inlined generator did a yield, a return, or an exception), but in?the special case where you don't care about the return value and the generator doesn't do anything useful after yielding and doesn't return anything useful (which is the case with an inlined genexpr), it's doable. So, you can directly use the genexpr compiler code for comps and scrap the existing comp-specific code? but you have to make the genexpr code more complicated and do if (genexpr) else in multiple places to make it inlinable, add in a comp-specific wrapper that's longer and more complicated than the code you scrapped, and add 3 new opcodes. Definitely not a win for simplicity. And if you trace through what it's doing, it ends up as just a tangled-up version of?the exception-handling listcomp function?exactly equivalent, but with a whole bunch of extra jumping around to slow things down. You can optimize it by straightening everything out, but then you're not sharing existing code anymore?and in fact, when you're done, you've just added a SETUP_EXCEPT to the listcomp code. It's possible that I missed something stupid that would offer a better way to do this, but I think trying to directly optimize list(genexpr) by inlining a "private generator" ends up being just a much more complicated way to add StopIteration handling to the listcomp code. So, you might as well do the easier one. From ncoghlan at gmail.com Sun Jun 30 13:32:08 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 30 Jun 2013 21:32:08 +1000 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: <1372577188.85901.YahooMailNeo@web184701.mail.ne1.yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> <1372577188.85901.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: On 30 June 2013 17:26, Andrew Barnert wrote: > If we removed that one difference, it would be even simpler. I don't think the difference buys us anything, and the cost of eliminating it is a relatively simple patch with minimal performance impact. (As a minor side benefit, that would also mean you could use the StopIteration hack in comps, but I still don't think we'd want to recommend doing that.) An interesting rationale, especially along with your reply to Ron about how much simpler that approach is than attempting to optimize list(genexp), while still yielding a semantically equivalent end result. It still raises my "behavioural change without adequate justification" hackles, but I'm only -0 now, whereas I was definitely -1 earlier. It's definitely a much smaller change than the scoping one we introduced in the Python 3 migration, which had the dual motivation of moving the iteration variable into a private scope, and better aligning the other semantics of "[x for x in y]" vs "list(x for x in y)". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Sun Jun 30 13:41:27 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 30 Jun 2013 04:41:27 -0700 (PDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <1372574633.43291.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: <1372592487.55743.YahooMailNeo@web184705.mail.ne1.yahoo.com> From: Nick Coghlan Sent: Sunday, June 30, 2013 12:51 AM > On 30 June 2013 16:43, Andrew Barnert wrote: >> From: Nick Coghlan >>> People already have to learn "for/else" and > "while/else". >>> Adding >>> "break if" can *only* be justified on the grounds of pairing? >>> it up >>> with those two existing else clauses to make appropriate if/else >>> pairs, as "for/break if/else" and "while/break >>> if/else" >>> should >>> actually be easier to learn than the status quo. >> >> I honestly don't think it would read this way to most people; the else >> clause would still be confusing. Especially if it continues to means the same >> thing even without break if, which means as soon as you "learn" the >> for/break if/else rule you'll learn that it's not really true. > > Hence why I said any such PEP should also propose that a dangling > "else" on a loop without a break statement should be deprecated and > eventually become a syntax error.? Sure, but?we already have a way to put a break into a for loop: the break statement. Unless you're also planning to deprecate that, it will still be true that the "for/break if/else" rule is often not true. For example: ? ? for i in x: ? ? ? ? if i > 0: ? ? ? ? ? ? print(i) ? ? ? ? else: ? ? ? ? ? ? break ? ? else: ? ? ? ? print('Finished') Also, even in your preferred case, this would be the only place in Python where an statement matches another statement at a different indentation level.?Is that really going to make Python easier to understand? And finally, as others have pointed out, besides being yet another syntax for if (if statements, ternary expressions, if clauses, and now break if statements and clauses), what it really reads like is perl's postfix if, which is misleading, because you can't do postfix conditionals anywhere else. Killing two birds with one stone is nice, but I don't think this kills either bird. >> Also, if it comes at the cost of making comprehensions harder to learn and? >> understand, I think people use (and see) comprehensions much more often than >> for/else. > > While it does have a strong statement/expression dichotomy, Python is > still one language, not two. Any proposals that rely on adding new > expression-only keywords are dead in the water.? I think I didn't make my point very clearly. I'm _assuming_ that Python is one language. As a consequence, if some new loop?syntax doesn't work well in expressions, it's not worth doing, even if it _does_ work well in statements. Also, the syntax has to follow the regular mapping between clauses and statements?each clause is a nested block statement?or it's not worth doing. I realize that you have a solution for the second point: the semicolon-separated "comp_break_if" clause at the end of a comprehension is clearly different, and therefore it's acceptable that it maps differently. But that means having two mapping rules instead of one, which still makes comprehensions twice as complicated. > PEP 315 has now been > explicitly rejected: the official syntax for terminating a loop early > is the existing break statement, thus any proposal for terminating a > comprehension early must also be based on break if it is to be > considered a serious suggestion And, as I said in my earlier summary, I think that means there is no feasible suggestion. The only option I like is #1 (redefining comps on top of genexprs), and I like it for completely independent reasons. (In fact, even with #1, I still probably wouldn't use stop() in listcomps.) Just as in the language we borrowed listcomps from, takewhile is the right answer.?You?can stick it around the inner iterator, or one of the sub-iterators in a nested comprehension; there are no possible syntax ambiguities; there's nothing special to learn; etc. The only real problem with takewhile is the same problem as map, filter, reduce, and everything in itertools: Python's lambda syntax sometimes makes it a bit clumsy to turn an expression into a function. And the solution is the same as it is everywhere else in Python:?define the function out of line, move the takewhile call itself out of line, find somewhere else to break the expression up somewhere else, use a partial instead of a lambda, or?just give up on trying to write an expression and write a statement. (Or push for PEP 403.) It was worth exploring whether there are any obvious options?after all, we did reduce the need for map and filter calls, maybe we could do the same for takewhile?but that doesn't mean we have to pick the least bad if all of them are bad, it just means the answer is no. And I think at this point, your earlier suggestion of drafting a PEP to get all of the bad ideas explicitly rejected is a better use of time than trying to draft a PEP to push for one of them. From ncoghlan at gmail.com Sun Jun 30 14:40:27 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 30 Jun 2013 22:40:27 +1000 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <1372592487.55743.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <1372574633.43291.YahooMailNeo@web184702.mail.ne1.yahoo.com> <1372592487.55743.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: On 30 June 2013 21:41, Andrew Barnert wrote: > It was worth exploring whether there are any obvious options?after all, we did reduce the need for map and filter calls, maybe we could do the same for takewhile?but that doesn't mean we have to pick the least bad if all of them are bad, it just means the answer is no. > > And I think at this point, your earlier suggestion of drafting a PEP to get all of the bad ideas explicitly rejected is a better use of time than trying to draft a PEP to push for one of them. Well said, and well argued. I concede the point :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ron3200 at gmail.com Sun Jun 30 17:21:52 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sun, 30 Jun 2013 10:21:52 -0500 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> <1372577188.85901.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: On 06/30/2013 06:32 AM, Nick Coghlan wrote: > On 30 June 2013 17:26, Andrew Barnert wrote: >> >If we removed that one difference, it would be even simpler. I don't think the difference buys us anything, and the cost of eliminating it is a relatively simple patch with minimal performance impact. (As a minor side benefit, that would also mean you could use the StopIteration hack in comps, but I still don't think we'd want to recommend doing that.) > An interesting rationale, especially along with your reply to Ron > about how much simpler that approach is than attempting to optimize > list(genexp), while still yielding a semantically equivalent end > result. > > It still raises my "behavioural change without adequate justification" > hackles, but I'm only -0 now, whereas I was definitely -1 earlier. > It's definitely a much smaller change than the scoping one we > introduced in the Python 3 migration, which had the dual motivation of > moving the iteration variable into a private scope, and better > aligning the other semantics of "[x for x in y]" vs "list(x for x in > y)". Yes, that was also my point. It's the same as in-lineing the generator parts into the iterator that is driving it. We don't need to do that because we already have an optimised version of that. It just needs to catch the StopIteration to be the same. I'm +1 for this. I still would like to see actual time comparisons, and have it pass pythons test suit. I don't think there would be any issues with either of those. I think that it's not uncommon for people to think this is how list comps work. And I think it is surprising for them that the StopIteration isn't caught. Cheers, Ron From guido at python.org Sun Jun 30 22:52:01 2013 From: guido at python.org (Guido van Rossum) Date: Sun, 30 Jun 2013 13:52:01 -0700 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> <1372577188.85901.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: I apologize, this thread was too long for me to follow. Is the issue the following? >>> def stopif(x): ... if x: raise StopIteration ... return True ... >>> [i for i in range(10) if stopif(i==3)] Traceback (most recent call last): File "", line 1, in File "", line 1, in File "", line 2, in stopif StopIteration >>> list(i for i in range(10) if stopif(i==3)) [0, 1, 2] I.e. the difference between list() and [] is that if raises StopIteration, list(...) returns the elements up to that point but [...] passes the exception out? That seems a bug to me inherited from the Python 2 implementation of list comprehensions and I'm fine with fixing it in 3.4. The intention of the changes to comprehensions in Python 3 was that these two forms would be completely equivalent. The difficulty has always been that CPython comprehensions were traditionally faster than generator expressions and we're reluctant to give that up. But it's still a bug. -- --Guido van Rossum (python.org/~guido)