From ziade.tarek at gmail.com Sun Jul 4 12:47:03 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sun, 4 Jul 2010 12:47:03 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org Message-ID: Hello, If you follow python-checkins, you have probably notice and got annoyed this morning my +100 checkin mails in distutils2. I was lagging a bit on getting the GSOC students work pulled in, and, with the DVCS effect, you get 2/3 weeks of work in hg.python.org in a minute. :) Once CPython itself is in mercurial, we will probably have the same problem when people are pulling contributions. If you use a "hg pull" command it will get all commits from the third party, even if some if those commits are unnecessary noise, like "I have removed this file. OOps I am putting the file back in..". And it's not so easy to edit the incoming changelog once they are commited. It's not easy either to use "hg incoming" because most of the time, the third party clone has many unrelated changes. I think we should work with queues and patches everywhere to solve this. The idea is to have contributors handling hg patches in bug.python.org, one patch per feature. They can use mq for that, and the benefit will be to have a very clean history in all repositories. A good thing about hg patches is that unlike simple diffs, the contributor name and comment appears in the final changelog. I would like to propose a policy for hg.python.org, based on mercurial queues + bugs.python.org, and I would like to contribute a small guide about it in python.org/dev. Regards Tarek -- Tarek Ziad? | http://ziade.org From solipsis at pitrou.net Sun Jul 4 12:57:00 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 4 Jul 2010 12:57:00 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org References: Message-ID: <20100704125700.3edc8fa3@pitrou.net> On Sun, 4 Jul 2010 12:47:03 +0200 Tarek Ziad? wrote: > > I would like to propose a policy for hg.python.org, based on mercurial > queues + bugs.python.org, and I would like to contribute a small guide > about it in python.org/dev. Sounds good. We can probably make mq optional, since regular diffs would work as well (except that they wouldn't contain the original committer name, but that isn't different from what we have today). Regards Antoine. From merwok at netwok.org Sun Jul 4 13:14:32 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Sun, 04 Jul 2010 13:14:32 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: <20100704125700.3edc8fa3@pitrou.net> References: <20100704125700.3edc8fa3@pitrou.net> Message-ID: <4C306D18.3050100@netwok.org> [Antoine Pitrou] > [Tarek Ziad?] >> I would like to propose a policy for hg.python.org, based on mercurial >> queues + bugs.python.org, and I would like to contribute a small guide >> about it in python.org/dev. > > Sounds good. We can probably make mq optional, since regular diffs > would work as well (except that they wouldn't contain the > original committer name, but that isn't different from what we have > today). Agreed. The policy can just require patches and thus let people choose their local workflow (many commits in a named branch/bookmark/pbranch and then diff, MQ for moar power, or just edit things to get a diff without using a fancy command (like now)). The policy should say something about authorship attribution. Mercurial-made diffs contain the user name in a special comment which is used by hg import, plain diffs can be applied with patch and then committed with ?hg commit --user "Bill "?, and if a patch is edited before commit, then use the current style (core dev as committer, original patch author in the commit message). First-class authorship acknowledgment is a nice feature of DVCSes. The policy should also allow pulling from another repo if it contains changesets that aren?t crufty. In that case, a pusher (new name for a svn committer) can just pull from Bill and push to the main repo, adding extra export-to-patch import-from-patch steps is unnecessary. Cheers From dirkjan at ochtman.nl Sun Jul 4 15:46:53 2010 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 04 Jul 2010 15:46:53 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: References: Message-ID: <4C3090CD.7020909@ochtman.nl> On 2010-07-04 12:47, Tarek Ziad? wrote: > Once CPython itself is in mercurial, we will probably have the same > problem when people are pulling contributions. If you use a "hg pull" > command it will get all commits from the third party, even if some if > those commits are unnecessary noise, like > "I have removed this file. OOps I am putting the file back in..". > > And it's not so easy to edit the incoming changelog once they are > commited. It's not easy either to use "hg incoming" because most of > the time, the third party clone has many unrelated changes. I think we > should work with queues and patches everywhere to solve this. > > The idea is to have contributors handling hg patches in > bug.python.org, one patch per feature. They can use mq for that, and > the benefit will be to have a very clean history in all repositories. > A good thing about hg patches is that unlike simple diffs, the > contributor name and comment appears in the final changelog. Hmm, I don't think I agree on what you're saying. First, a changeset is a changeset is a changeset. If you exchange them as patches or in some other way (by pulling or pushing or whatever) shouldn't really matter. This is one of the things DVCS is good at, you can move csets around different clones in many ways, and all clones are created equal. Second, a noisy history is never good. So yes, pulling some kind of messy history and pushing it to a central repo as-is is not a good idea. People should polish their changesets so that each changeset can stand on its own. So yes, somewhere between it being a messy history of actual development and it going into a central repo, it should be cleaned up. Ideally, the original author should do that, but if he's not in a position to do so, the committer should do it. Third, if the result of cleaning up is a single cset, it should probably be rebased before getting pushed to a central repo. If it's two or three csets, rebase it. On the other hand, if it's 10 csets, actually doing an explicit merge makes sense. The idea is not to clutter up the history with a merge every other cset, but if the merge is hard/non-trivial it can make sense to leave it explicit. Fourth, one-patch-per-issue is too restrictive. Small commits are useful because they're way easier to review. Concatenate several small commits leading up to a single issue fix into a single patch and it gets much harder to read. Easy reviews are important, because a lot of valuable time is spent reviewing. The simple example is a chain like refactor-refactor-fix (which is IME quite common). Ideally each stage keeps the test suite passing and is internally consistent, but moving towards a common goal (to fix the issue). So, I find your proposed policy somewhat vague and also not that attractive. Cleaning up the history is certainly a good thing, but I don't think we have to mandate a way for things to get into the repo. Mandating the use of issues as a reference for each fix or enhancement could be useful, but seems unnecessary. Cheers, Dirkjan From ziade.tarek at gmail.com Sun Jul 4 15:53:19 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sun, 4 Jul 2010 15:53:19 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: <4C308B38.2050800@ochtman.nl> References: <4C308B38.2050800@ochtman.nl> Message-ID: On Sun, Jul 4, 2010 at 3:23 PM, Dirkjan Ochtman wrote: ... > > Hmm, I don't think I agree on what you're saying. > > First, a changeset is a changeset is a changeset. If you exchange them as > patches or in some other way (by pulling or pushing or whatever) shouldn't > really matter. This is one of the things DVCS is good at, you can move csets > around different clones in many ways, and all clones are created equal. As far as I have experienced, there's a back and forth game between the contributor and the commiter, leading to clone hell, unless the contributor uses mq, then apply it in his clone right before the contributor pulls the changes in. Using patches makes changes separated from the history for the time being, until they are merged. And are easy to read, review and understand. > Second, a noisy history is never good. So yes, pulling some kind of messy > history and pushing it to a central repo as-is is not a good idea. People > should polish their changesets so that each changeset can stand on its own. When you work on a feature, how do you polish a changeset without polluting your history or doing many clones ? (true question, I've been looking for that) > So yes, somewhere between it being a messy history of actual development and > it going into a central repo, it should be cleaned up. Ideally, the original > author should do that, but if he's not in a position to do so, the committer > should do it. Do you have an easy way to perform this cleanup ? Could you propose a process here ? I am bit skeptical that contributors will do this, whereas a patch policy makes sure we don't have to deal with this, and avoid asking people to have a high mercurial-fu. I am also skeptical that contributors are willing to digg into a clone to get what they want and/or check that it's fine to pull. It seems to me that patches are the universal format to propose a change and are easy to produce and consume. Contributors can use any process they want to create it, even without using mercurial. > > Third, if the result of cleaning up is a single cset, it should probably be > rebased before getting pushed to a central repo. If it's two or three csets, > rebase it. On the other hand, if it's 10 csets, actually doing an explicit > merge makes sense. The idea is not to clutter up the history with a merge > every other cset, but if the merge is hard/non-trivial it can make sense to > leave it explicit. > > Fourth, one-patch-per-issue is too restrictive. Small commits are useful > because they're way easier to review. Concatenate several small commits > leading up to a single issue fix into a single patch and it gets much harder > to read. Easy reviews are important, because a lot of valuable time is spent > reviewing. The simple example is a chain like refactor-refactor-fix (which > is IME quite common). Ideally each stage keeps the test suite passing and is > internally consistent, but moving towards a common goal (to fix the issue). > > So, I find your proposed policy somewhat vague and also not that attractive. > Cleaning up the history is certainly a good thing, but I don't think we have > to mandate a way for things to get into the repo. Mandating the use of > issues as a reference for each fix or enhancement could be useful, but seems > unnecessary. Yes, it's vague, I don't have a clear idea yet and I am not that experienced in hg latest features, so I am probably doing some steps wrong or ignore some shortcuts. But I have the strong feeling that without patches, we are heading for extra work for all parties unless we have a strong tutorial on how to contribute with hg.python.org, and that is proven to be very simple. side note: I am replying to the gname emails but I don't know if this works with the mailman thread as well.. Tarek From solipsis at pitrou.net Sun Jul 4 17:26:56 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 4 Jul 2010 17:26:56 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org References: <4C3090CD.7020909@ochtman.nl> Message-ID: <20100704172656.1dae21c1@pitrou.net> On Sun, 04 Jul 2010 15:46:53 +0200 Dirkjan Ochtman wrote: > > Fourth, one-patch-per-issue is too restrictive. Small commits are useful > because they're way easier to review. Concatenate several small commits > leading up to a single issue fix into a single patch and it gets much > harder to read. I don't agree with that. The commits obviously won't be independent because they will be motivated by each other (or even dependent on each other), therefore you have to remember what the other commits do when reviewing one of them. What's more, when reading "hg log" months or years later, it is hard to make sense of a single commit because you don't really know what issue it was meant to contribute to fix. I know that's how Mercurial devs do things, but I don't really like it. Regards Antoine. From thomas at jollans.com Sun Jul 4 18:06:45 2010 From: thomas at jollans.com (Thomas Jollans) Date: Sun, 04 Jul 2010 18:06:45 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: References: <4C308B38.2050800@ochtman.nl> Message-ID: <4C30B195.1080804@jollans.com> On 07/04/2010 03:53 PM, Tarek Ziad? wrote: > > > On Sun, Jul 4, 2010 at 3:23 PM, Dirkjan Ochtman wrote: > ... >> >> Hmm, I don't think I agree on what you're saying. >> >> First, a changeset is a changeset is a changeset. If you exchange them as >> patches or in some other way (by pulling or pushing or whatever) shouldn't >> really matter. This is one of the things DVCS is good at, you can move csets >> around different clones in many ways, and all clones are created equal. > > As far as I have experienced, there's a back and forth game between > the contributor and the commiter, leading to clone hell, unless the > contributor uses mq, then apply it in his clone > right before the contributor pulls the changes in. > > Using patches makes changes separated from the history for the time > being, until they are merged. And are easy to read, review and > understand. > >> Second, a noisy history is never good. So yes, pulling some kind of messy >> history and pushing it to a central repo as-is is not a good idea. People >> should polish their changesets so that each changeset can stand on its own. > > When you work on a feature, how do you polish a changeset without > polluting your history or doing many clones ? (true question, I've > been looking for that) mq is a good method. If a changeset that only exists locally has to be changed, you can convert it to a patch, make some changes, and re-commit. If the changesets are relatively clean in the first place, you can rebase/transplant/strip your way around too big a mess. > >> So yes, somewhere between it being a messy history of actual development and >> it going into a central repo, it should be cleaned up. Ideally, the original >> author should do that, but if he's not in a position to do so, the committer >> should do it. > > Do you have an easy way to perform this cleanup ? Could you propose a > process here ? > > I am bit skeptical that contributors will do this, whereas a patch > policy makes sure we don't have to deal with this, and avoid asking > people to have a high mercurial-fu. I am also skeptical that > contributors are willing to digg into a clone to get what they want > and/or check that it's fine to pull. > > It seems to me that patches are the universal format to propose a > change and are easy to produce and consume. Contributors can use any > process they want to create it, even without using mercurial. There's no reason to force those of us capable of producing clean hg branches back into the world of patches. I can see why you'd want to be able to say "no, this repo is a mess. Submit something presentable, like a patch." Some "how to contribute" document might recommend using mq, but it shouldn't be a requirement - pulling comes naturally with DVCS. Python should use it. Accept patches - sure - not everyone uses mercurial. Require patches - please don't! > >> >> Third, if the result of cleaning up is a single cset, it should probably be >> rebased before getting pushed to a central repo. If it's two or three csets, >> rebase it. On the other hand, if it's 10 csets, actually doing an explicit >> merge makes sense. The idea is not to clutter up the history with a merge >> every other cset, but if the merge is hard/non-trivial it can make sense to >> leave it explicit. >> >> Fourth, one-patch-per-issue is too restrictive. Small commits are useful >> because they're way easier to review. Concatenate several small commits >> leading up to a single issue fix into a single patch and it gets much harder >> to read. Easy reviews are important, because a lot of valuable time is spent >> reviewing. The simple example is a chain like refactor-refactor-fix (which >> is IME quite common). Ideally each stage keeps the test suite passing and is >> internally consistent, but moving towards a common goal (to fix the issue). >> >> So, I find your proposed policy somewhat vague and also not that attractive. >> Cleaning up the history is certainly a good thing, but I don't think we have >> to mandate a way for things to get into the repo. Mandating the use of >> issues as a reference for each fix or enhancement could be useful, but seems >> unnecessary. > > Yes, it's vague, I don't have a clear idea yet and I am not that > experienced in hg latest features, so I am probably doing some steps > wrong or ignore some shortcuts. > > But I have the strong feeling that without patches, we are heading for > extra work for all parties > unless we have a strong tutorial on how to contribute with > hg.python.org, and that is proven to be very simple. > > side note: I am replying to the gname emails but I don't know if this > works with the mailman thread as well.. > > Tarek > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From g.brandl at gmx.net Sun Jul 4 18:56:11 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 04 Jul 2010 18:56:11 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: <20100704172656.1dae21c1@pitrou.net> References: <4C3090CD.7020909@ochtman.nl> <20100704172656.1dae21c1@pitrou.net> Message-ID: Am 04.07.2010 17:26, schrieb Antoine Pitrou: > On Sun, 04 Jul 2010 15:46:53 +0200 > Dirkjan Ochtman wrote: >> >> Fourth, one-patch-per-issue is too restrictive. Small commits are useful >> because they're way easier to review. Concatenate several small commits >> leading up to a single issue fix into a single patch and it gets much >> harder to read. > > I don't agree with that. The commits obviously won't be independent > because they will be motivated by each other (or even dependent on each > other), therefore you have to remember what the other commits do when > reviewing one of them. What's more, when reading "hg log" months or > years later, it is hard to make sense of a single commit because you > don't really know what issue it was meant to contribute to fix. > > I know that's how Mercurial devs do things, but I don't really like > it. I think the best of both worlds is to encourage contributors to send more complicated patches in a series of easy-to-review steps, but when committing to Python, make one changeset out of them. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From ziade.tarek at gmail.com Sun Jul 4 19:09:36 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sun, 4 Jul 2010 19:09:36 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: References: <4C3090CD.7020909@ochtman.nl> <20100704172656.1dae21c1@pitrou.net> Message-ID: On Sun, Jul 4, 2010 at 6:56 PM, Georg Brandl wrote: > Am 04.07.2010 17:26, schrieb Antoine Pitrou: >> On Sun, 04 Jul 2010 15:46:53 +0200 >> Dirkjan Ochtman wrote: >>> >>> Fourth, one-patch-per-issue is too restrictive. Small commits are useful >>> because they're way easier to review. Concatenate several small commits >>> leading up to a single issue fix into a single patch and it gets much >>> harder to read. >> >> I don't agree with that. The commits obviously won't be independent >> because they will be motivated by each other (or even dependent on each >> other), therefore you have to remember what the other commits do when >> reviewing one of them. What's more, when reading "hg log" months or >> years later, it is hard to make sense of a single commit because you >> don't really know what issue it was meant to contribute to fix. >> >> I know that's how Mercurial devs do things, but I don't really like >> it. > > I think the best of both worlds is to encourage contributors to send > more complicated patches in a series of easy-to-review steps, but when > committing to Python, make one changeset out of them. Exactly, so one bugfix or one feature comes in a single changeset that contains ideally the code change + the doc change + the tests. Like Thomas has suggested, I'll start a "how to contribute" wiki page with the best practices, and give the url here so everyone can contribute/correct it. Tarek -- Tarek Ziad? | http://ziade.org From mal at egenix.com Mon Jul 5 09:30:54 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 05 Jul 2010 09:30:54 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: <4C30B195.1080804@jollans.com> References: <4C308B38.2050800@ochtman.nl> <4C30B195.1080804@jollans.com> Message-ID: <4C318A2E.1070303@egenix.com> Thomas Jollans wrote: > On 07/04/2010 03:53 PM, Tarek Ziad? wrote: >> >> Do you have an easy way to perform this cleanup ? Could you propose a >> process here ? >> >> I am bit skeptical that contributors will do this, whereas a patch >> policy makes sure we don't have to deal with this, and avoid asking >> people to have a high mercurial-fu. I am also skeptical that >> contributors are willing to digg into a clone to get what they want >> and/or check that it's fine to pull. >> >> It seems to me that patches are the universal format to propose a >> change and are easy to produce and consume. Contributors can use any >> process they want to create it, even without using mercurial. > > There's no reason to force those of us capable of producing clean hg > branches back into the world of patches. I can see why you'd want to be > able to say "no, this repo is a mess. Submit something presentable, like > a patch." Some "how to contribute" document might recommend using mq, > but it shouldn't be a requirement - pulling comes naturally with DVCS. > Python should use it. > > Accept patches - sure - not everyone uses mercurial. Require patches - > please don't! I'm with Tarek here: the only way for core developers to be able to review checkins on the checkins list is by looking at the patches that go in. Having to look at 10+ checkin emails for a single "patch" will break this review process - no-one will be able to follow what a particular pulled set of changes will do in the end, compared to what we had in the repo before the pull. As a result, the review process will no longer be possible. As example, see Tarek's pull/push of the distutils2 work. Those checkin email will just rush by and not get a second or third review. OTOH, I don't think that requiring to open a ticket on the tracker for everything is needed either. Aside 1: Isn't it interesting that the more we actually think about moving to Mercurial, the more we find that the existing Subversion model of working is actually a very workable model for a large open source project ?! Aside 2: This thread should really be moved to python-dev where it belongs. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 05 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 13 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From thomas at jollans.com Mon Jul 5 11:42:25 2010 From: thomas at jollans.com (Thomas Jollans) Date: Mon, 05 Jul 2010 11:42:25 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: <4C318A2E.1070303@egenix.com> References: <4C308B38.2050800@ochtman.nl> <4C30B195.1080804@jollans.com> <4C318A2E.1070303@egenix.com> Message-ID: <4C31A901.2000800@jollans.com> On 07/05/2010 09:30 AM, M.-A. Lemburg wrote: > Thomas Jollans wrote: >> On 07/04/2010 03:53 PM, Tarek Ziad? wrote: >>> >>> Do you have an easy way to perform this cleanup ? Could you propose a >>> process here ? >>> >>> I am bit skeptical that contributors will do this, whereas a patch >>> policy makes sure we don't have to deal with this, and avoid asking >>> people to have a high mercurial-fu. I am also skeptical that >>> contributors are willing to digg into a clone to get what they want >>> and/or check that it's fine to pull. >>> >>> It seems to me that patches are the universal format to propose a >>> change and are easy to produce and consume. Contributors can use any >>> process they want to create it, even without using mercurial. >> >> There's no reason to force those of us capable of producing clean hg >> branches back into the world of patches. I can see why you'd want to be >> able to say "no, this repo is a mess. Submit something presentable, like >> a patch." Some "how to contribute" document might recommend using mq, >> but it shouldn't be a requirement - pulling comes naturally with DVCS. >> Python should use it. >> >> Accept patches - sure - not everyone uses mercurial. Require patches - >> please don't! > > I'm with Tarek here: the only way for core developers to be able to > review checkins on the checkins list is by looking at the patches > that go in. > > Having to look at 10+ checkin emails for a single "patch" will > break this review process - no-one will be able to follow what > a particular pulled set of changes will do in the end, compared > to what we had in the repo before the pull. As a result, the > review process will no longer be possible. If the problem is the amount of changesets per "patch", then it has to be the responsibility of the person committing - be that a core dev or an external contributor - to make sure it's only a single changeset. OTOH, I don't think being that strict about it is a good idea - in many cases, having a handful of changesets is, IMHO, better, with Mercurial. Either way, if there is some sort of policy stating how changes should look, I for one would be happy to publish a branch on bitbucket or my own hgweb instance in that format. Permitting text patches is a must, but requiring text patches when we have actual distributed branching is quite the anachronism. > > As example, see Tarek's pull/push of the distutils2 work. Those > checkin email will just rush by and not get a second or third > review. If the problem is the amount of emails per "patch" then, for god's sake, change the script that writes the emails to send a mail per push, instead of a mail per commit ! DVCSs allow one to have small, atomic (but, of course, inter-dependent) commits, and push them later. I myself feel that this property should be valued, not feared. > OTOH, I don't think that requiring to open a ticket on the tracker > for everything is needed either. > > Aside 1: Isn't it interesting that the more we actually think about > moving to Mercurial, the more we find that the existing Subversion > model of working is actually a very workable model for a large > open source project ?! It's all a question of how changes are reviewed and synchronised. Of course, the Python subversion model works, no question. The specific approach "turn every commit into an email for proof reading" appears to work well with it. It may not work as well with hg. It may work better if you s/commit/push/ instead of s/commit/changeset/. Other projects work in a more distributed fashion, with developers' private repositories, changes being reviewed before pulling/merging. Linux is, of course, a prominent example. If this approach is for Python, I don't know. I doubt it, at least for the time being. But a suitable workflow will surely be found. Ah well, we'll see what happens. Thomas From mal at egenix.com Mon Jul 5 12:11:06 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 05 Jul 2010 12:11:06 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: <4C31A901.2000800@jollans.com> References: <4C308B38.2050800@ochtman.nl> <4C30B195.1080804@jollans.com> <4C318A2E.1070303@egenix.com> <4C31A901.2000800@jollans.com> Message-ID: <4C31AFBA.9080800@egenix.com> Thomas Jollans wrote: > On 07/05/2010 09:30 AM, M.-A. Lemburg wrote: >> Thomas Jollans wrote: >>> On 07/04/2010 03:53 PM, Tarek Ziad? wrote: >>>> >>>> Do you have an easy way to perform this cleanup ? Could you propose a >>>> process here ? >>>> >>>> I am bit skeptical that contributors will do this, whereas a patch >>>> policy makes sure we don't have to deal with this, and avoid asking >>>> people to have a high mercurial-fu. I am also skeptical that >>>> contributors are willing to digg into a clone to get what they want >>>> and/or check that it's fine to pull. >>>> >>>> It seems to me that patches are the universal format to propose a >>>> change and are easy to produce and consume. Contributors can use any >>>> process they want to create it, even without using mercurial. >>> >>> There's no reason to force those of us capable of producing clean hg >>> branches back into the world of patches. I can see why you'd want to be >>> able to say "no, this repo is a mess. Submit something presentable, like >>> a patch." Some "how to contribute" document might recommend using mq, >>> but it shouldn't be a requirement - pulling comes naturally with DVCS. >>> Python should use it. >>> >>> Accept patches - sure - not everyone uses mercurial. Require patches - >>> please don't! >> >> I'm with Tarek here: the only way for core developers to be able to >> review checkins on the checkins list is by looking at the patches >> that go in. >> >> Having to look at 10+ checkin emails for a single "patch" will >> break this review process - no-one will be able to follow what >> a particular pulled set of changes will do in the end, compared >> to what we had in the repo before the pull. As a result, the >> review process will no longer be possible. > > If the problem is the amount of changesets per "patch", then it has to > be the responsibility of the person committing - be that a core dev or > an external contributor - to make sure it's only a single changeset. > OTOH, I don't think being that strict about it is a good idea - in many > cases, having a handful of changesets is, IMHO, better, with Mercurial. > > Either way, if there is some sort of policy stating how changes should > look, I for one would be happy to publish a branch on bitbucket or my > own hgweb instance in that format. Permitting text patches is a must, > but requiring text patches when we have actual distributed branching is > quite the anachronism. You need those patches anyway, since that's how we review things on the issue tracker. The point I wanted to make was that (at least some of) the core devs do monitor the checkins list for new code and/or changes to existing code going in. This would not longer reasonably work, if you start pushing revisions of patches down the list as well. The history of those patches is not all that interesting to Python developers. It's the final outcome, that makes the difference. >> As example, see Tarek's pull/push of the distutils2 work. Those >> checkin email will just rush by and not get a second or third >> review. > > If the problem is the amount of emails per "patch" then, for god's sake, > change the script that writes the emails to send a mail per push, > instead of a mail per commit ! > > DVCSs allow one to have small, atomic (but, of course, inter-dependent) > commits, and push them later. I myself feel that this property should be > valued, not feared. This is not a matter of receiving the patch in 10+ emails, or lumping everything into one email. I simply don't see any benefit in having to follow the path of development of a patch. Much to the contrary: it only adds noise that distracts from the important bits. The discussion of a patch is recorded on the issue tracker anyway and in a form that is more easily comprehensible than a set of checkin messages. >> OTOH, I don't think that requiring to open a ticket on the tracker >> for everything is needed either. >> >> Aside 1: Isn't it interesting that the more we actually think about >> moving to Mercurial, the more we find that the existing Subversion >> model of working is actually a very workable model for a large >> open source project ?! > > It's all a question of how changes are reviewed and synchronised. Of > course, the Python subversion model works, no question. The specific > approach "turn every commit into an email for proof reading" appears to > work well with it. It may not work as well with hg. It may work better > if you s/commit/push/ instead of s/commit/changeset/. Other projects > work in a more distributed fashion, with developers' private > repositories, changes being reviewed before pulling/merging. Linux is, > of course, a prominent example. If this approach is for Python, I don't > know. I doubt it, at least for the time being. But a suitable workflow > will surely be found. > > > Ah well, we'll see what happens. Certainly. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 05 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 13 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From p.f.moore at gmail.com Mon Jul 5 14:20:43 2010 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 5 Jul 2010 13:20:43 +0100 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: <4C31AFBA.9080800@egenix.com> References: <4C308B38.2050800@ochtman.nl> <4C30B195.1080804@jollans.com> <4C318A2E.1070303@egenix.com> <4C31A901.2000800@jollans.com> <4C31AFBA.9080800@egenix.com> Message-ID: On 5 July 2010 11:11, M.-A. Lemburg wrote: > The point I wanted to make was that (at least some of) the core > devs do monitor the checkins list for new code and/or changes > to existing code going in. This would not longer reasonably > work, if you start pushing revisions of patches down the list > as well. I agree entirely that commits should be made up of "completed" patches, not of "work in progress" (patch 2 fixing a badly named variable in patch 1, etc). But there may be merit in breaking a large patch into a series of self-contained, incremental changes - which *can* be reviewed independently, but which make sense as a group. For example, one patch that introduces set literals, a second which updates the standard library code to use them. As a more radical possibility, a patch could be broken up into 3, one with the code changes, one with the tests and one with the documentation. That may be less acceptable, although it does allow for the possibility of someone with little C experience to contribute by reviewing the docs and tests without having to worry about the code. Ultimately, it's for the core devs to decide, though. Paul. From dirkjan at ochtman.nl Mon Jul 5 15:46:39 2010 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 05 Jul 2010 15:46:39 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: <4C31AFBA.9080800@egenix.com> References: <4C308B38.2050800@ochtman.nl> <4C30B195.1080804@jollans.com> <4C318A2E.1070303@egenix.com> <4C31A901.2000800@jollans.com> <4C31AFBA.9080800@egenix.com> Message-ID: On 2010-07-05 12:11, M.-A. Lemburg wrote: > The point I wanted to make was that (at least some of) the core > devs do monitor the checkins list for new code and/or changes > to existing code going in. This would not longer reasonably > work, if you start pushing revisions of patches down the list > as well. That was not what I meant at all. You don't send different patch revisions, or incremental improvements to a single change into a single repository. You send in chunks of changes that can stand on their own (for example in the test suite), instead of a single large patch that's much harder to review, which contains everything needed to fix a single issue. Cheers, Dirkjan From tjreedy at udel.edu Mon Jul 5 21:20:41 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 05 Jul 2010 15:20:41 -0400 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: References: <4C308B38.2050800@ochtman.nl> <4C30B195.1080804@jollans.com> <4C318A2E.1070303@egenix.com> <4C31A901.2000800@jollans.com> <4C31AFBA.9080800@egenix.com> Message-ID: On 7/5/2010 8:20 AM, Paul Moore wrote: > On 5 July 2010 11:11, M.-A. Lemburg wrote: >> The point I wanted to make was that (at least some of) the core >> devs do monitor the checkins list for new code and/or changes >> to existing code going in. This would not longer reasonably >> work, if you start pushing revisions of patches down the list >> as well. > > I agree entirely that commits should be made up of "completed" > patches, not of "work in progress" (patch 2 fixing a badly named > variable in patch 1, etc). > > But there may be merit in breaking a large patch into a series of > self-contained, incremental changes - which *can* be reviewed > independently, but which make sense as a group. For example, one patch > that introduces set literals, a second which updates the standard > library code to use them. Devs have occasionally asked a submitter of a large patch to split it into reviewable pieces. But that should be a special-case decision of a commiter reviewer. > As a more radical possibility, a patch could > be broken up into 3, one with the code changes, one with the tests and > one with the documentation. That may be less acceptable, although it > does allow for the possibility of someone with little C experience to > contribute by reviewing the docs and tests without having to worry > about the code. I do not see that as being so useful. Patches have section for each file and I have no trouble not reading a file section. Part of review is checking that doc and code changes match. Also, test and code patch have to be applied together. -- Terry Jan Reedy From brett at python.org Mon Jul 5 22:11:21 2010 From: brett at python.org (Brett Cannon) Date: Mon, 5 Jul 2010 13:11:21 -0700 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: <4C318A2E.1070303@egenix.com> References: <4C308B38.2050800@ochtman.nl> <4C30B195.1080804@jollans.com> <4C318A2E.1070303@egenix.com> Message-ID: On Mon, Jul 5, 2010 at 00:30, M.-A. Lemburg wrote: [SNIP] > Aside 1: Isn't it interesting that the more we actually think about > moving to Mercurial, the more we find that the existing Subversion > model of working is actually a very workable model for a large > open source project ?! Not really. The current system works and is understood without retraining. The switch to hg has never been about tweaking the workflow of committers, but that of contributors. From ncoghlan at gmail.com Mon Jul 5 22:41:19 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 6 Jul 2010 06:41:19 +1000 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: References: <4C308B38.2050800@ochtman.nl> <4C30B195.1080804@jollans.com> <4C318A2E.1070303@egenix.com> Message-ID: On Tue, Jul 6, 2010 at 6:11 AM, Brett Cannon wrote: > On Mon, Jul 5, 2010 at 00:30, M.-A. Lemburg wrote: > [SNIP] >> Aside 1: Isn't it interesting that the more we actually think about >> moving to Mercurial, the more we find that the existing Subversion >> model of working is actually a very workable model for a large >> open source project ?! > > Not really. The current system works and is understood without > retraining. The switch to hg has never been about tweaking the > workflow of committers, but that of contributors. Although, as with the CVS to SVN transmissions, the workflows of committers will likely change over time as we become more adept at exploiting the more powerful tool. I liked Joel Spolsky's observation that in moving from a centralised VCS to a distributed VCS, the key idea to wrap your head around is the shift from managing file (and repository) revisions to coherent changesets. I suspect that's something that can only happen properly by using a DVCS for a while. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Tue Jul 6 02:15:17 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 6 Jul 2010 02:15:17 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org References: <4C308B38.2050800@ochtman.nl> <4C30B195.1080804@jollans.com> <4C318A2E.1070303@egenix.com>

Message-ID: <20100706021517.2a6c7aad@pitrou.net> On Tue, 6 Jul 2010 06:41:19 +1000 Nick Coghlan wrote: > > Although, as with the CVS to SVN transmissions, the workflows of > committers will likely change over time as we become more adept at > exploiting the more powerful tool. > > I liked Joel Spolsky's observation that in moving from a centralised > VCS to a distributed VCS, the key idea to wrap your head around is the > shift from managing file (and repository) revisions to coherent > changesets. I suspect Spolsky has skipped on SVN then, because SVN already allows for coherent changesets (that's how we use it most of the time anyway). Regards Antoine. From stephen at xemacs.org Tue Jul 6 07:07:15 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 06 Jul 2010 14:07:15 +0900 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: <20100706021517.2a6c7aad@pitrou.net> References: <4C308B38.2050800@ochtman.nl> <4C30B195.1080804@jollans.com> <4C318A2E.1070303@egenix.com>

<20100706021517.2a6c7aad@pitrou.net> Message-ID: <87pqz19p18.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > SVN already allows for coherent changesets (that's how we use it > most of the time anyway). Indeed. That's one of the things I really like about this project. From stephen at xemacs.org Tue Jul 6 07:16:33 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 06 Jul 2010 14:16:33 +0900 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: References: <4C3090CD.7020909@ochtman.nl> <20100704172656.1dae21c1@pitrou.net> Message-ID: <87ocel9olq.fsf@uwakimon.sk.tsukuba.ac.jp> Georg Brandl writes: > Am 04.07.2010 17:26, schrieb Antoine Pitrou: > > On Sun, 04 Jul 2010 15:46:53 +0200 > > Dirkjan Ochtman wrote: > >> > >> Fourth, one-patch-per-issue is too restrictive. Small commits are useful > >> because they're way easier to review. Concatenate several small commits > >> leading up to a single issue fix into a single patch and it gets much > >> harder to read. > > > > I don't agree with that. The commits obviously won't be independent > > because they will be motivated by each other (or even dependent on each > > other), therefore you have to remember what the other commits do when > > reviewing one of them. What's more, when reading "hg log" months or > > years later, it is hard to make sense of a single commit because you > > don't really know what issue it was meant to contribute to fix. > > > > I know that's how Mercurial devs do things, but I don't really like > > it. > > I think the best of both worlds is to encourage contributors to send > more complicated patches in a series of easy-to-review steps, but when > committing to Python, make one changeset out of them. I don't see how this addresses Antoine's problem of connecting commits to issues at all. Some ways to address it are (1) require issue numbers in log messages, if there is an applicable issue (for non-committers, there should be a patch issue on the tracker, right?) and (2) require that the commits addressing a single issue be done on a single separate branch, then merged (which doesn't connect issues to commits, but does connect a series of commits). I don't really see why commits should take place in a lump, either. That makes bisecting less accurate, for one thing. Nor does it help with review; the review is already done by the time the commit takes place, no? OTOH, people who have a specific interest and want to review ex post are often going to want the bite-size patches, just as the original reviewer did, no? From ncoghlan at gmail.com Tue Jul 6 12:56:00 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 6 Jul 2010 20:56:00 +1000 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: <20100706021517.2a6c7aad@pitrou.net> References: <4C308B38.2050800@ochtman.nl> <4C30B195.1080804@jollans.com> <4C318A2E.1070303@egenix.com>

<20100706021517.2a6c7aad@pitrou.net> Message-ID: On Tue, Jul 6, 2010 at 10:15 AM, Antoine Pitrou wrote: > On Tue, 6 Jul 2010 06:41:19 +1000 > Nick Coghlan wrote: >> >> Although, as with the CVS to SVN transmissions, the workflows of >> committers will likely change over time as we become more adept at >> exploiting the more powerful tool. >> >> I liked Joel Spolsky's observation that in moving from a centralised >> VCS to a distributed VCS, the key idea to wrap your head around is the >> shift from managing file (and repository) revisions to coherent >> changesets. > > I suspect Spolsky has skipped on SVN then, because SVN already allows > for coherent changesets (that's how we use it most of the time anyway). No it doesn't. It has atomic commits (as do many other centralised version control systems), but it still only manages file revisions. The mental conversion Spolsky was talking about was specifically from SVN to Hg, the same one we're looking at. A DVCS isn't written in terms of file revisions the way SVN is, it's written in terms of a directed acyclic graph of changesets. If anyone wants to see what he actually wrote, rather than my hacked up paraphrase of it, it's the last programming article he did for Joel on Software: http://www.joelonsoftware.com/items/2010/03/17.html Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Tue Jul 6 13:49:28 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 06 Jul 2010 20:49:28 +0900 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: References: <4C308B38.2050800@ochtman.nl> <4C30B195.1080804@jollans.com> <4C318A2E.1070303@egenix.com>

<20100706021517.2a6c7aad@pitrou.net> Message-ID: <87d3v0akzb.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > The mental conversion Spolsky was talking about was specifically from > SVN to Hg, the same one we're looking at. A DVCS isn't written in > terms of file revisions the way SVN is, it's written in terms of a > directed acyclic graph of changesets. Sure. But I think Antoine's right. So is the Python workflow. At any given time, you've got dozens of patches in active development in people's workspaces and on the tracker. As they get baked, you pull in a coherent set and commit it. Here's what Joel says: In Subversion, you might think, "bring my version up to date with the main version" or "go back to the previous version." In Mercurial, you think, "get me Jacob's change set" or "let's just forget that change set." While it's certainly true that to work with Python's Subversion repo you need to translate to terms of a fairly linear progression of versions, I don't see people thinking that way about the workflow. I think people do expect commits to the svn repo to be coherent, and by and large they are. I personally expect this migration to make a big difference to the core committers, because it gives them that much more flexibility. Casual committers and pull-only tester types may have some trouble adjusting, but I really don't think it will be that bad. From daniel at stutzbachenterprises.com Tue Jul 6 16:58:08 2010 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Tue, 6 Jul 2010 09:58:08 -0500 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: References: <4C308B38.2050800@ochtman.nl> <4C30B195.1080804@jollans.com> <4C318A2E.1070303@egenix.com> Message-ID: On Mon, Jul 5, 2010 at 3:11 PM, Brett Cannon wrote: > The switch to hg has never been about tweaking the > workflow of committers, but that of contributors. > I've always thought of it as tweaking the workflow of collaboration. As an individual contributor and non-committer, the server switch isn't going to impact my workflow much. I use a DVCS locally to manage my work and then I submit a patch on the bug tracker. After the server switch, I'll do the same. A DVCS server will help a lot when I'm collaborating on a patch with others. As a concrete example, a few months ago I wrote a patch to speed up math.factorial (issue8692). Alexander Belopolsky and Mark Dickinson found a few corner-case flaws, suggested code-cleanup improvements, and some algorithmic alternatives. We went back and forth with several variations of patches before settling on a final patch. When Python is natively hosted in Mercurial, then the tools can explicitly track the relationship between all of the experimental patches. When just pushing patch files around, it's pretty hard to see that factorial-precompute-partials.patch is based on factorial-no-recursion.patch if you haven't been following the issue closely. It's also hard to examine the incremental changes between the two, which makes it hard to review an updated patch after reviewing the original. All of that would be a lot easier if I had started my patch as a clone of py3k on bitbucket. At the end of the process, the final committer can consolidate the changes into a single patch to keep the core repository clean. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Wed Jul 7 19:31:54 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 07 Jul 2010 19:31:54 +0200 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: <87ocel9olq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4C3090CD.7020909@ochtman.nl> <20100704172656.1dae21c1@pitrou.net> <87ocel9olq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Am 06.07.2010 07:16, schrieb Stephen J. Turnbull: > Georg Brandl writes: > > Am 04.07.2010 17:26, schrieb Antoine Pitrou: > > > On Sun, 04 Jul 2010 15:46:53 +0200 > > > Dirkjan Ochtman wrote: > > >> > > >> Fourth, one-patch-per-issue is too restrictive. Small commits are useful > > >> because they're way easier to review. Concatenate several small commits > > >> leading up to a single issue fix into a single patch and it gets much > > >> harder to read. > > > > > > I don't agree with that. The commits obviously won't be independent > > > because they will be motivated by each other (or even dependent on each > > > other), therefore you have to remember what the other commits do when > > > reviewing one of them. What's more, when reading "hg log" months or > > > years later, it is hard to make sense of a single commit because you > > > don't really know what issue it was meant to contribute to fix. > > > > > > I know that's how Mercurial devs do things, but I don't really like > > > it. > > > > I think the best of both worlds is to encourage contributors to send > > more complicated patches in a series of easy-to-review steps, but when > > committing to Python, make one changeset out of them. > > I don't see how this addresses Antoine's problem of connecting commits > to issues at all. I wasn't addressing Antoine's original problem, rather his reply to Dirkjan. Georg From stephen at xemacs.org Thu Jul 8 01:22:00 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 08 Jul 2010 08:22:00 +0900 Subject: [Python-ideas] Using only patches for pulling changes in hg.python.org In-Reply-To: References: <4C3090CD.7020909@ochtman.nl> <20100704172656.1dae21c1@pitrou.net> <87ocel9olq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87lj9mkhd3.fsf@uwakimon.sk.tsukuba.ac.jp> Georg Brandl writes: > Am 06.07.2010 07:16, schrieb Stephen J. Turnbull: > > Georg Brandl writes: > > > Am 04.07.2010 17:26, schrieb Antoine Pitrou: > > > > On Sun, 04 Jul 2010 15:46:53 +0200 > > > > Dirkjan Ochtman wrote: > > > >> > > > >> Fourth, one-patch-per-issue is too restrictive. Small > > > >> commits are useful because they're way easier to > > > >> review. Concatenate several small commits leading up to a > > > >> single issue fix into a single patch and it gets much > > > >> harder to read. > > > > > > > > I don't agree with that. The commits obviously won't be > > > > independent because they will be motivated by each other (or > > > > even dependent on each other), therefore you have to > > > > remember what the other commits do when reviewing one of > > > > them. What's more, when reading "hg log" months or years > > > > later, it is hard to make sense of a single commit because > > > > you don't really know what issue it was meant to contribute > > > > to fix. > > > > > > > > I know that's how Mercurial devs do things, but I don't > > > > really like it. > > > > > > I think the best of both worlds is to encourage contributors > > > to send more complicated patches in a series of easy-to-review > > > steps, but when committing to Python, make one changeset out > > > of them. > > > > I don't see how this addresses Antoine's problem of connecting > > commits to issues at all. > > I wasn't addressing Antoine's original problem, rather his reply to > Dirkjan. Huh? Are you referring to something other than the part of his post that you quoted? Antoine writes "you have to remember what the other commits do when reviewing them" and "it is hard to make sense of a single commit [in a series] because you don't know what issue it was meant to fix". I admit I'm not really sure what his issue is. It seems to me that connecting commits is what a feature branch (in conjunction with rebase) is designed to achieve. If you don't like rebase, you can either work fast enough that your whole sequence is done before the mainline moves on significantly, or you can refrain from updating until done (and have a potentially messy merge), or you can use MQ (which is really just a way of rebasing without the shame ;-). I'll have to test it, but AFAIK in all of the above strategies, as long as you don't push to the public repo until done, the logs of the commits on the feature branch should all be adjacent in the natural order of hg log. That seems to me to be the optimal strategy, in combination with reading long parts of history in a graphical DAG browser. Of course, that assumes that random pieces of the fix aren't dispersed among commits. In that case the logs will still be hard to read and understand, as will the diffs. People who like to commit early and often should indeed be encouraged to edit their feature branches to make each individual commit make sense to reviewers. (MQ helps to address this, as does Bazaar's loom feature or StGit.) Feature branches don't automatically organize commits in an intelligible way, that requires an intelligence driving the process. But they do make it possible. Once you have feature branches, then there's a question of the external issue. Here reviewers should pay attention to the log message, and make sure it describes the problem well, and includes cross references to any documentation (tracker issue or ML thread). But that's no different from the current process. I think that in many cases the process of coming up with coherent changesets that are reviewable will indeed result in a single commit to address the whole issue. But there will also be multicommit patterns that make sense, such as "refactor API and update current clients -> use new feature in a few places". The thing to remember is that DVCSes not only record a frozen view of history accurately, but can also be used to flexibly reorganize the presentation of that history "as it should have happened". I think of these workflows as opportunities to *improve* the quality of information presented by the history. But they aren't mandated by adopting hg. Contributors and reviewers who are satisfied with the current process should continue to refine a set of changes to a single commit. hg is certainly flexible enough to allow that, with several different workflows. And Antoine's worries (AIUI) are not unfounded. Eg, we should not allow people to be lazy and submit a feature branch with changes randomly assigned to different commits and log messages like "Lunch time! commit progress to date." But that's a social problem; I think that conventions will quickly evolve *from* the one patch per issue workflow to a *well-organized* feature branch per issue (as appropriate) because python-dev reviewers will demand it. From jacobidiego at gmail.com Sun Jul 11 19:58:43 2010 From: jacobidiego at gmail.com (Diego Jacobi) Date: Sun, 11 Jul 2010 14:58:43 -0300 Subject: [Python-ideas] pop multiple elements of a list at once Message-ID: Hi. As recommended here: http://bugs.python.org/issue9218 I am posting this to this list. I am currently working with buffer in an USB device and pyusb. So when i read from the buffer of endpoint, i get an array.Array() list. I handle this chunk of data with a thread to send a receive the information that i need. In this thread, i load a list with all the information that is read from the USB device, and another layer with pop this information from the threads buffer. The thing i found is that, to pop a variable chunk of data from this buffer without copying it and deleting the elements, i have to pop one element at the time. def get_chunk(self, size): for x in range(size): yield self.recv_buffer.pop() I guess that it would be improved if i can just pop a defined number of elements, like this: pop self.recv_buffer[:-size] or self.recv_buffer.pop(,-size) That would be... "pop from (the last element minus size) to (the last element)" in that way there is only one memory transaction. The new list (or maybe a tuple) points to the old memory address and the recv_buffer is advanced to a one new address. Data is not moved. Note that i like the idea of using "pop" as the "del" operator for lists, but i am concient that this would not be backward compatible. Thanks. Diego From brett at python.org Sun Jul 11 20:47:53 2010 From: brett at python.org (Brett Cannon) Date: Sun, 11 Jul 2010 11:47:53 -0700 Subject: [Python-ideas] pop multiple elements of a list at once In-Reply-To: References: Message-ID: On Sun, Jul 11, 2010 at 10:58, Diego Jacobi wrote: > Hi. > As recommended here: http://bugs.python.org/issue9218 > I am posting this to this list. > > > > I am currently working with buffer in an USB device and pyusb. > So when i read from the buffer of endpoint, i get an array.Array() list. > I handle this chunk of data with a thread to send a receive the > information that i need. > In this thread, i load a list with all the information that is read > from the USB device, and another layer with pop this information from > the threads buffer. > > The thing i found is that, to pop a variable chunk of data from this > buffer without copying it and deleting the elements, i have to pop one > element at the time. > > ? ?def get_chunk(self, size): > ? ? ? ?for x in range(size): > ? ? ? ? ? ?yield self.recv_buffer.pop() > > I guess that it would be improved if i can just pop a defined number > of elements, like this: > > pop self.recv_buffer[:-size] > or > self.recv_buffer.pop(,-size) > > That would be... "pop from (the last element minus size) to (the last element)" > in that way there is only one memory transaction. > The new list (or maybe a tuple) points to the old memory address and > the recv_buffer is advanced to a one new address. Data is not moved. Why can't you do ``del self.recv_buffer[-size:]``? > > Note that i like the idea of using "pop" as the "del" operator for > lists, but i am concient that this would not be backward compatible. Too specialized, so that will never fly. -Brett From ncoghlan at gmail.com Sun Jul 11 23:18:49 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 12 Jul 2010 07:18:49 +1000 Subject: [Python-ideas] pop multiple elements of a list at once In-Reply-To: References: Message-ID: I think you overestimate how standardised we could make this across all platforms and data structures. Under the hood, any such expansion to the .pop API would almost certainly be defined as equivalent to: def pop(self, index): result = self[index] del self[index] return result such that slice objects could be passed in as well as integers (or integer equivalents). (Currently pop on builtin objects rejects slice objects, as it only works with integers) In the meantime, if you want to manipulate memory while minimising copying, then the 2.7 memoryview object may be for you (assuming you can switch to the later version). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From cmjohnson.mailinglist at gmail.com Mon Jul 12 05:39:42 2010 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Sun, 11 Jul 2010 17:39:42 -1000 Subject: [Python-ideas] explicitation lines in python ? In-Reply-To: References: <4C24FEAF.4030304@gmail.com> <4C26F710.4030902@gmail.com> Message-ID: On Sun, Jun 27, 2010 at 8:25 PM, Nick Coghlan wrote: > The availability of "nonlocal" binding semantics also makes the > semantics much easier to define than they were in those previous > discussions (the lack of clear semantics for name binding statements > with an attached local namespace was the major factor blocking > creation of a reference implementation for this proposal back then). > > For example: > > ?c = sqrt(a*a + b*b) where: > ? ?a = retrieve_a() > ? ?b = retrieve_b() > > could translate to something like: > > ?def _anon(): # *(see below) > ? ?nonlocal c > ? ?a = retrieve_a() > ? ?b = retrieve_b() > ? ?c = sqrt(a*a + b*b) > ?_anon() > > *(unlike Python code, the compiler can make truly anonymous functions > by storing them solely on the VM stack. It already does this when > executing class definitions): I like this idea, but I would tweak it slightly. Maybe we should say EXPRESSION where: BLOCK is equivalent to def _(): BLOCK return EXPRESSION _() That way, c = a where: a = 7 would be equivalent to def _(): a = 7 return a c = _() One advantage of this equivalence is it would make it easier to work around a longstanding scoping gotcha. A na?ve coder might expect this code to print out numbers 0 to 4: >>> fs = [] >>> for n in range(5): ... def f(): ... print(item) ... fs.append(f) ... >>> [f() for f in fs] 4 4 4 4 4 [None, None, None, None, None] I think we all have enough experience to know this isn?t a totally unrealistic scenario. I personally stumbled into when I was trying to create a class by looping through a set of method names. To get around it, one could use a where clause like so: fs = [] for n in range(5): fs.append(f) where: shadow = n def f(): print(shadow) This would print out 0 to 4 as expected and be equivalent to >>> fs = [] >>> for n in range(5): ... def _(): ... shadow = n ... def f(): ... print(shadow) ... fs.append(f) ... _() ... >>> [f() for f in fs] 0 1 2 3 4 [None, None, None, None, None] I think a where-clause with def-like namespace semantics would be a positive addition to Python, once the moratorium is up. -- Carl Johnson From tjreedy at udel.edu Mon Jul 12 07:29:01 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 12 Jul 2010 01:29:01 -0400 Subject: [Python-ideas] pop multiple elements of a list at once In-Reply-To: References: Message-ID: On 7/11/2010 1:58 PM, Diego Jacobi wrote: > The thing i found is that, to pop a variable chunk of data from this > buffer without copying it and deleting the elements, i have to pop one > element at the time. In CPython, popping copies a reference and them deletes it from the list. The item popped is not copied. It is a convenience, which I proposed, but not a necessity. You can easily write a function that returns a slice after deleting it. def pop_slice(lis, n): tem = lis[:-n] del lis[:-n] return tem I expect this to run faster than popping more than a few items one at a time. -- Terry Jan Reedy From guido at python.org Mon Jul 12 08:35:13 2010 From: guido at python.org (Guido van Rossum) Date: Mon, 12 Jul 2010 08:35:13 +0200 Subject: [Python-ideas] pop multiple elements of a list at once In-Reply-To: References: Message-ID: On Sun, Jul 11, 2010 at 7:58 PM, Diego Jacobi wrote: > I guess that it would be improved if i can just pop a defined number > of elements, like this: > > pop self.recv_buffer[:-size] > or > self.recv_buffer.pop(,-size) > > That would be... "pop from (the last element minus size) to (the last element)" > in that way there is only one memory transaction. > The new list (or maybe a tuple) points to the old memory address and > the recv_buffer is advanced to a one new address. Data is not moved. I think yo misunderstand the implementation of lists (and the underlying malloc()). You can't break the memory used for the list elements into two pieces and give new ownership to a (leading) section of it. However you also seem to be worrying about "copying" too much -- the only things that get copied are the *pointers* to the objects popped off the stack, which is very cheap compared to the rest of the operation. It is true that to pop off a whole slice there is a more efficient way than calling pop() repeatedly -- but there's no need for a new primitive operation, as it can already be done by copying and then deleting the slice (again, the copying only copies the pointers). Try reading up on Python's memory model for objects, it will be quite enlightening. -- --Guido van Rossum (python.org/~guido) From cmjohnson.mailinglist at gmail.com Mon Jul 12 08:50:28 2010 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Sun, 11 Jul 2010 20:50:28 -1000 Subject: [Python-ideas] explicitation lines in python ? In-Reply-To: References: <4C24FEAF.4030304@gmail.com> <4C26F710.4030902@gmail.com>

Message-ID: One more quick thought about the advantage of a where-clause. Often times, there is thought of creating the equivalent of Haskell-style pattern matching using decorators. For example PJE has worked on creating "generic functions." One problem with using a decorator for this is that for use cases more complicated than just matching on type, the matcher itself needs to be a function that looks at the arguments then returns true or false based on whether they match a pattern. So, a proper decorator would need to take *two* functions, one to do the matching and one to actually be the body of the function. You can do this to some extent with lambdas or decorating and redecorating, but it quickly becomes a little tedious. With a where-clause one might instead write: fib = base(cond, action) where: def cond(n): return n in (0, 1) def action(n): return 1 fib.add_match(cond, action) where: def cond(n): return isinstance(n, int) and n > 1 def action(n): return n + fib(n - 1) And also for the property decorator. GvR made up a nice way of redecorating with properties, so this is a moot point now, but if we had had a where-clause before that, we could have instead written: myprop = property(getter, setter, deleter) where: def getter(self): etc. etc. OK, that?s as much advocacy as I feel like doing. See you all again in 6 months, when something like this is proposed again. ;-) -- Carl From ncoghlan at gmail.com Mon Jul 12 14:45:10 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 12 Jul 2010 22:45:10 +1000 Subject: [Python-ideas] pop multiple elements of a list at once In-Reply-To: References: Message-ID: On Mon, Jul 12, 2010 at 4:35 PM, Guido van Rossum wrote: > I think yo misunderstand the implementation of lists (and the > underlying malloc()). You can't break the memory used for the list > elements into two pieces and give new ownership to a (leading) section > of it. However you also seem to be worrying about "copying" too much > -- the only things that get copied are the *pointers* to the objects > popped off the stack, which is very cheap compared to the rest of the > operation. It is true that to pop off a whole slice there is a more > efficient way than calling pop() repeatedly -- but there's no need for > a new primitive operation, as it can already be done by copying and > then deleting the slice (again, the copying only copies the pointers). Note that the original poster was apparently talking about array.array() rather than an actual list (at least, that's the way I interpreted the phrase "array,Array() list"). In that context, the desire to avoid copying when invoking pop() makes a lot more sense than it does when using a builtin list. I agree that the suggestion of reassigning ownership of a chunk of an array is based on a misunderstanding of the way memory allocation works at the pymalloc and OS levels though. For the record, neither pymalloc nor the OS support breaking a chunk of already allocated memory in two that way - you need some master object to maintain control of it, and then use other pointers to look at subsections. Since memoryview objects in 3.x and 2.7 are designed specifically to provide a window onto a chunk of memory owned by another object (such as the storage underlying an array object) without copying, it seems like that is the kind of thing the original poster is actually looking for. (That said, supporting slice objects in pop() still doesn't strike me as an insane idea, although I'd probably want to see some use cases before we went to the hassle of adding it). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Mon Jul 12 14:59:56 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 12 Jul 2010 14:59:56 +0200 Subject: [Python-ideas] pop multiple elements of a list at once References:

Message-ID: <20100712145956.28695535@pitrou.net> On Mon, 12 Jul 2010 22:45:10 +1000 Nick Coghlan wrote: > > For the record, neither pymalloc nor the OS support breaking a chunk > of already allocated memory in two that way - you need some master > object to maintain control of it, and then use other pointers to look > at subsections. Since memoryview objects in 3.x and 2.7 are designed > specifically to provide a window onto a chunk of memory owned by > another object (such as the storage underlying an array object) > without copying, it seems like that is the kind of thing the original > poster is actually looking for. memoryviews don't provide a high-level view over their chunk of memory, though, only bytes-level. (they were specified to provide such a view, but it was never implemented) From ncoghlan at gmail.com Mon Jul 12 15:06:27 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 12 Jul 2010 23:06:27 +1000 Subject: [Python-ideas] explicitation lines in python ? In-Reply-To: References: <4C24FEAF.4030304@gmail.com> <4C26F710.4030902@gmail.com> Message-ID: On Mon, Jul 12, 2010 at 1:39 PM, Carl M. Johnson wrote: > I like this idea, but I would tweak it slightly. Maybe we should say > > EXPRESSION where: > ? ?BLOCK > > is equivalent to > > def _(): > ? ?BLOCK > ? ?return EXPRESSION > _() Implement it that way (or find someone who can), then get back to me* :) That said, my suggested semantics still have the desired effect in your use case, since your expression does not contain a name binding operation, so it makes no difference whether name binding would have been handled via a return value (your suggestion, which I tried and failed to implement last time) or via nonlocal name bindings (my suggestion this time around). Cheers, Nick. *P.S. There's a reason I stopped pushing this idea back then: the absolute nightmare that was trying to implement it without ready access to nonlocal variable definitions (trying to figure out what the return value should be and how it should be unpacked in the surrounding scope was seriously ugly). Using nonlocal semantics instead should make it relatively straightforward (fairly similar to a class definition in fact, although the compilation options for the nested code object will be different and there'll be a bit of additional dancing during the symbol pass to figure out any implicit nonlocal declarations for the inner scope). -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Mon Jul 12 15:32:51 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 12 Jul 2010 23:32:51 +1000 Subject: [Python-ideas] explicitation lines in python ? In-Reply-To: References: <4C24FEAF.4030304@gmail.com> <4C26F710.4030902@gmail.com> Message-ID: > That said, my suggested semantics still have the desired effect in > your use case, since your expression does not contain a name binding > operation, so it makes no difference whether name binding would have > been handled via a return value (your suggestion, which I tried and > failed to implement last time) or via nonlocal name bindings (my > suggestion this time around). Bleh, I just remembered why nonlocal semantics won't work for this use case: nonlocal only looks at function namespaces, so class and module namespaces don't count. That behaviour would be unacceptable for a where clause implementation. So this suggestion going anywhere post-moratorium is firstly dependent on someone figuring out how to properly split name binding operations across the two namespaces (such that the values are generated in the inner scope, but assigned in the outer scope). As an example of the kind of thing that actually makes this a nightmare: x = b[index] = value where: index = calc_target_index() value = calc_value() It turns out that name binding is only part of the problem though. Variable *lookup* actually shares one of the problems of nonlocal name binding: it skips over class scopes, so the inner scope can't see class level names. Generator expressions and most comprehensions (all bar 2.x list comprehensions) already have this problem - at class scope, only the outermost iterator can see names defined in the class body, since everything else is in a nested scope where name lookup skips over the class due to the name lookup semantics that were originally designed for method implementations (i.e. before we had things like generator expressions that implicitly created new scopes). It took a while for all these evil variable referencing semantic problems to come back to me, but they're the kind of thing that needs to be addressed before a where clause proposal can be taken seriously. As I noted in my last message, I *did* try to implement this years ago and I now remember that the only way I can see it working is to define a completely new means of compiling a code object such that variable lookup and nonlocal namebinding can "see" an immediately surrounding class scope (as well as outer function scopes) and still fall back to global semantics if the name is not found explicitly in the symbol table. I believe such an addition would actually be beneficial in other ways, as I personally consider the current name lookup quirks in generator expressions to be something of a wart and these new semantics for implicit scopes could potentially be used to fix that (although perhaps not until Py4k). However, adding such lookup semantics is a seriously non-trivial task (I've been working with the current compiler since I helped get it ready for inclusion in 2.5 back when it was still on the ast-compiler branch and I'm not sure where I would even start on a project like that. It wouldn't just be the compiler either, the VM itself would almost certainly need some changes). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Mon Jul 12 15:35:57 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 12 Jul 2010 23:35:57 +1000 Subject: [Python-ideas] pop multiple elements of a list at once In-Reply-To: <20100712145956.28695535@pitrou.net> References:

<20100712145956.28695535@pitrou.net> Message-ID: On Mon, Jul 12, 2010 at 10:59 PM, Antoine Pitrou wrote: > memoryviews don't provide a high-level view over their chunk of memory, > though, only bytes-level. > (they were specified to provide such a view, but it was never > implemented) True, but the use case the original poster mentioned was for a bytes level view. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From jacobidiego at gmail.com Mon Jul 12 17:13:34 2010 From: jacobidiego at gmail.com (Diego Jacobi) Date: Mon, 12 Jul 2010 12:13:34 -0300 Subject: [Python-ideas] pop multiple elements of a list at once In-Reply-To: References:

<20100712145956.28695535@pitrou.net> Message-ID: Hi. I apologize if i am having difficulties to explain myself in English. Also i just realise that i wasnt answering to the list, but directly to Brett Cannon. Sorry for that. Also i am taking into account that you are right and my concept of how memory is handled here for lists is way too different as how i am more used to work in firmwares. Anyway, the concept of my idea comes of my understanding of an low-leveled array. I do understand that poping returns only a pointer to that element. I didnt understand that every element inside a list is also a pointer to a python type, so the data is not copied, but the pointer is. The thing is that the elements on my list are just bytes. (unsigned char), and i think that an array of this type of data is called in python immutable, which means that i may not be using the right datatype. Anyway. slice support on pop(), and/or the ability to skip elements in a for loop without restarting the iteration will clear up a lot of code. If my scenario is yet no clear i give below some more details: When i think on how this will work on memory: def pop_slice(lis, n): tem = lis[:-n] del lis[:-n] return tem I think in "copying the elements of an array": BYTE* pop_slice(BYTE* list, unsigned int n){ BYTE* temp = malloc(n*sizeof(BYTE)); int i; for(i=0 ; i < n ; i++) { temp[i] = list[i]; } free(list, n); return temp; } Most python books and tutorials clearly says that this operation L[start:end] copies the elements requested. And being copy i understand the above behavior, which is less efficient than advancing the pointer. But i wanted to do (with "pop multiple bytes at once") is: typedef unsigned char BYTE; BYTE array[BIG_SIZE]; BYTE* incomming_buffer_pointer = &array[0]; BYTE* incomming_packet_pointer = &array[0]; BYTE* pop_slice(BYTE* list, unsigned int n){ BYTE* temp; temp = list; list = &list[n]; return temp; } .. incomming_packet_pointer = pop_slice( incomming_buffer_pointer , PACKET_SIZE) if (parse_packet_is_not_corrupt( incomming_packet_pointer ) ) parse_new_packet( incomming_packet_pointer ); else .... .. Thanks for analizing the idea. Jacobi Diego From ncoghlan at gmail.com Mon Jul 12 17:41:48 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 13 Jul 2010 01:41:48 +1000 Subject: [Python-ideas] pop multiple elements of a list at once In-Reply-To: References:

<20100712145956.28695535@pitrou.net> Message-ID: On Tue, Jul 13, 2010 at 1:13 AM, Diego Jacobi wrote: > Hi. > I apologize if i am having difficulties to explain myself in English. > Also i just realise that i wasnt answering to the list, but directly > to Brett Cannon. Sorry for that. > > Also i am taking into account that you are right and my concept of how > memory is handled here for lists is way too different as how i am more > used to work in firmwares. > > Anyway, the concept of my idea comes of my understanding of an > low-leveled array. For builtin lists and most other Python container types, think more along the lines of a list of pointers rather than a list of numbers. However, for the behaviour of array.array (rather than the builtin list), your understanding of the cost of slicing was actually pretty close to correct (since storing actual values rather than pointers is the way array.array gains its comparative memory efficiency). Python doesn't actually excel at memory efficiency when manipulating large data sets using conventional syntax (e.g. slicing). The old buffer objects were an initial attempt at providing memory efficient access to segments of data buffers, while the recently added memoryview objects are a more sophisticated (and safer) approach to the same idea. The NumPy extension, on the other hand, is able to very efficiently provide multiple views of the same region of memory without requiring copying (NumPy itself isn't particularly small though). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From shane at hathawaymix.org Sat Jul 17 10:36:28 2010 From: shane at hathawaymix.org (Shane Hathaway) Date: Sat, 17 Jul 2010 02:36:28 -0600 Subject: [Python-ideas] Looking for a "batch" function Message-ID: <4C416B8C.1070701@hathawaymix.org> Hi all, An operation I often want in my Python code is some kind of simple batch function. The batch function would take an iterator and return same-size batches from it, except the last batch, which could be smaller. Two parameters would be required: the iterator and the size of each batch. Here are some examples of what I would expect this batch function to do. Get batches from a list: >>> list(batch([1,2,3,4,5], 2)) [[1, 2], [3, 4], [5]] Get batches from a string: >>> list(batch('one two six ten', 4)) ['one ', 'two ', 'six ', 'ten'] Organize a stream of objects into a table: >>> list(batch(['Somewhere', 'CA', 90210, 'New York', 'NY', 10001], 3)) [['Somewhere', 'CA', 90210], ['New York', 'NY', 10001]] My intuition tells me that such a function should exist in Python, but I have not found it in the builtin functions, slice operators, or itertools. Did I miss it? Here is an implementation that satisfies all of the above examples, but requires a sliceable sequence as input, not just an iterator: def batch(input, batch_size): while input: yield input[:batch_size] input = input[batch_size:] Obviously, I can just include that function in my projects, but I wonder if there is some built-in version of it. If there isn't, maybe there should be. Shane From pyideas at rebertia.com Sat Jul 17 10:52:59 2010 From: pyideas at rebertia.com (Chris Rebert) Date: Sat, 17 Jul 2010 01:52:59 -0700 Subject: [Python-ideas] Looking for a "batch" function In-Reply-To: <4C416B8C.1070701@hathawaymix.org> References: <4C416B8C.1070701@hathawaymix.org> Message-ID: On Sat, Jul 17, 2010 at 1:36 AM, Shane Hathaway wrote: > Hi all, > > An operation I often want in my Python code is some kind of simple batch > function. ?The batch function would take an iterator and return same-size > batches from it, except the last batch, which could be smaller. ?Two > parameters would be required: the iterator and the size of each batch. ?Here > are some examples of what I would expect this batch function to do. > > Get batches from a list: > >>>> list(batch([1,2,3,4,5], 2)) > [[1, 2], [3, 4], [5]] > > Get batches from a string: > >>>> list(batch('one two six ten', 4)) > ['one ', 'two ', 'six ', 'ten'] > > Organize a stream of objects into a table: > >>>> list(batch(['Somewhere', 'CA', 90210, 'New York', 'NY', 10001], 3)) > [['Somewhere', 'CA', 90210], ['New York', 'NY', 10001]] > > My intuition tells me that such a function should exist in Python, but I > have not found it in the builtin functions, slice operators, or itertools. > ?Did I miss it? IMO, yes. > Obviously, I can just include that function in my projects, but I wonder if > there is some built-in version of it. ?If there isn't, maybe there should > be. See the "grouper" recipe in itertools: http://docs.python.org/library/itertools.html#recipes It does almost exactly what you want: grouper(3, 'ABCDEFG', 'x') --> ['A','B','C'], ['D','E','F'], ['G','x','x'] Cheers, Chris -- http://blog.rebertia.com From shane at hathawaymix.org Sat Jul 17 20:50:54 2010 From: shane at hathawaymix.org (Shane Hathaway) Date: Sat, 17 Jul 2010 12:50:54 -0600 Subject: [Python-ideas] Looking for a "batch" function In-Reply-To: References: <4C416B8C.1070701@hathawaymix.org> Message-ID: <4C41FB8E.5040907@hathawaymix.org> On 07/17/2010 02:52 AM, Chris Rebert wrote: > See the "grouper" recipe in itertools: > http://docs.python.org/library/itertools.html#recipes > It does almost exactly what you want: > grouper(3, 'ABCDEFG', 'x') --> ['A','B','C'], ['D','E','F'], ['G','x','x'] Interesting, but I have a few concerns with that answer: - It ignores the type of the container. If I provide a string as input, I expect an iterable of strings as output. - If I give a batch size of 1000000, grouper() is going to be rather inefficient. Even worse would be to allow users to specify the batch size. - Since grouper() is not actually in the standard library and it doesn't do quite what I need, it's rather unlikely that I'll use it. Another possible name for this functionality I am describing is packetize(). Computers always packetize data for transmission, storage, and display to users. Packetizing seems like such a common need that I think it should be built in to Python. Shane From taleinat at gmail.com Sat Jul 17 22:30:30 2010 From: taleinat at gmail.com (Tal Einat) Date: Sat, 17 Jul 2010 23:30:30 +0300 Subject: [Python-ideas] Looking for a "batch" function In-Reply-To: <4C41FB8E.5040907@hathawaymix.org> References: <4C416B8C.1070701@hathawaymix.org> <4C41FB8E.5040907@hathawaymix.org> Message-ID: On Sat, Jul 17, 2010 at 9:50 PM, Shane Hathaway wrote: > On 07/17/2010 02:52 AM, Chris Rebert wrote: >> >> See the "grouper" recipe in itertools: >> http://docs.python.org/library/itertools.html#recipes >> It does almost exactly what you want: >> grouper(3, 'ABCDEFG', 'x') --> ?['A','B','C'], ['D','E','F'], >> ['G','x','x'] > > Interesting, but I have a few concerns with that answer: > > - It ignores the type of the container. ?If I provide a string as input, I > expect an iterable of strings as output. > > - If I give a batch size of 1000000, grouper() is going to be rather > inefficient. ?Even worse would be to allow users to specify the batch size. > > - Since grouper() is not actually in the standard library and it doesn't do > quite what I need, it's rather unlikely that I'll use it. > > Another possible name for this functionality I am describing is packetize(). > ?Computers always packetize data for transmission, storage, and display to > users. ?Packetizing seems like such a common need that I think it should be > built in to Python. This reminds me of discussions about a "flatten" function. This kind of operation often has slightly different requirements in different scenarios. It is very simple to implement a version of this to meet your exact needs. Sometimes in these kinds of situations it is better not to have a built-in generic function, to force programmers to decide explicitly how they want it to work. You mentioned efficiency; to do this kind of operation efficiently ones really needs to know what kind of sequence/iterator is being "packetized", and implement accordingly. - Tal Einat From ncoghlan at gmail.com Sun Jul 18 03:02:15 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 18 Jul 2010 11:02:15 +1000 Subject: [Python-ideas] Looking for a "batch" function In-Reply-To: References: <4C416B8C.1070701@hathawaymix.org> <4C41FB8E.5040907@hathawaymix.org> Message-ID: On Sun, Jul 18, 2010 at 6:30 AM, Tal Einat wrote: > This kind of operation often has slightly different requirements in > different scenarios. It is very simple to implement a version of this > to meet your exact needs. Sometimes in these kinds of situations it is > better not to have a built-in generic function, to force programmers > to decide explicitly how they want it to work. > > You mentioned efficiency; to do this kind of operation efficiently > ones really needs to know what kind of sequence/iterator is being > "packetized", and implement accordingly. Indeed. There's actually a reasonably decent general windowing recipe on ASPN (http://code.activestate.com/recipes/577196-windowing-an-iterable-with-itertools/), but even that isn't appropriate for every use case. The OP, for example, has rather different requirements to what is implemented there: - non-overlapping windows, so tee() isn't needed - return type should match original container type A custom generator for that task is actually pretty trivial (note: untested, so may contain typos): def windowed(seq, window_len): for slice_start in range(0, len(seq), window_len): # use xrange() in 2.x slice_end = slice_start + window_len yield seq[slice_start:slice_end] Even adding support for overlapped windows is fairly easy: def windowed(seq, window_len, overlap=0): slice_step = window_len - overlap for slice_start in range(0, len(seq), slice_step): # use xrange() in 2.x slice_end = slice_start + window_len yield seq[slice_start:slice_end] However, those approaches don't support arbitrary iterators (i.e. those without __len__), they only support sequences. To support arbitrary iterators, you'd need to do something fancier with either collections.deque (either directly or via itertools.tee), but again, the most appropriate approach is going to be application specific (for byte data, you're probably going to want to use buffer or memoryview rather than the original container type). It isn't that this is an uncommon problem - it's that any appropriately general solution is going to be suboptimal in many specific applications, while an optimal solution for specific applications is going to be insufficiently general to be appropriate for the standard library. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From greg.ewing at canterbury.ac.nz Sun Jul 18 03:31:53 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 18 Jul 2010 13:31:53 +1200 Subject: [Python-ideas] Looking for a "batch" function In-Reply-To: <4C41FB8E.5040907@hathawaymix.org> References: <4C416B8C.1070701@hathawaymix.org> <4C41FB8E.5040907@hathawaymix.org> Message-ID: <4C425989.7010107@canterbury.ac.nz> Shane Hathaway wrote: > - It ignores the type of the container. If I provide a string as input, > I expect an iterable of strings as output. Fine, but... > - If I give a batch size of 1000000, grouper() is going to be rather > inefficient. I guess you would prefer each batch to be a lazy iterator over part of the original sequence -- but that would conflict with the previous requirement. -- Greg From shane at hathawaymix.org Sun Jul 18 03:54:43 2010 From: shane at hathawaymix.org (Shane Hathaway) Date: Sat, 17 Jul 2010 19:54:43 -0600 Subject: [Python-ideas] Looking for a "batch" function In-Reply-To: References: <4C416B8C.1070701@hathawaymix.org> <4C41FB8E.5040907@hathawaymix.org> Message-ID: <4C425EE3.80905@hathawaymix.org> On 07/17/2010 07:02 PM, Nick Coghlan wrote: > It isn't that this is an uncommon problem - it's that any > appropriately general solution is going to be suboptimal in many > specific applications, while an optimal solution for specific > applications is going to be insufficiently general to be appropriate > for the standard library. Well, I feel like there is in fact a general solution that would be near optimal for many applications, but I would rather spend time refining the idea in real projects rather than get much into theory at the moment. Thanks for the feedback. Shane From sergdavis at gmail.com Tue Jul 20 03:45:27 2010 From: sergdavis at gmail.com (Sergio Davis) Date: Mon, 19 Jul 2010 21:45:27 -0400 Subject: [Python-ideas] 'where' statement in Python? Message-ID: Dear members of the python-ideas mailing list, I'm not quite sure if this is the right place to ask for feedback about the idea, I apologize if this is not the case. I'm considering the following extension to Python's grammar: adding the 'where' keyword, which would work as follows: where_expr : expr 'where' NAME '=' expr The idea is to be able to write something like: a = (z**2+5) where z=2 being equivalent to (current Python syntax): a = (lambda z: z**2+5)(z=2) I thinkg this would be especially powerful in cases where the variable to be substituted ('z' in the example) comes in turn from a complicated expression, which makes it confusing to "embed" it in the main expression (the body of the 'lambda'), or in cases where the substitution must be performed more than once, and it may be more efficient to evaluate 'z' once. A more complicated example: vtype = decl[par_pos+1:FindMatching(par_pos, decl)].strip() where par_pos=decl.find('(') equivalent to (current Python syntax): vtype = (lambda par_pos: decl[par_pos+1:FindMatching(par_pos,decl)].strip())(par_pos=decl.find('(')) Extending this syntax to several assignments after the 'where' keyword could be implemented as: where_expr: expr 'where' NAME '=' expr (',' NAME '=' expr )* or (which I think may be more "pythonic"): where_expr: expr 'where' NAME (',' NAME)* '=' expr (',' expr)* as it mimics the same syntax for unpacking tuples. I would appreciate any feedback on the idea, especially if it has some obvious flaw or if it's redundant (meaning there is a clearer way of doing this 'trick' which I don't know about). best regards, Sergio Davis -- Sergio Davis Irarr?zabal Grupo de NanoMateriales, Universidad de Chile http://www.gnm.cl/~sdavis -------------- next part -------------- An HTML attachment was scrubbed... URL: From jackdied at gmail.com Tue Jul 20 03:52:47 2010 From: jackdied at gmail.com (Jack Diederich) Date: Mon, 19 Jul 2010 21:52:47 -0400 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: Message-ID: On Mon, Jul 19, 2010 at 9:45 PM, Sergio Davis wrote: > Dear members of the python-ideas mailing list, > > I'm not quite sure if this is the right place to ask for feedback about the > idea, I apologize if this is not the case. > > I'm considering the following extension to Python's grammar: adding the > 'where' keyword, which would work as follows: > > where_expr : expr 'where' NAME '=' expr > > The idea is to be able to write something like: > > a = (z**2+5) where z=2 > > being equivalent to (current Python syntax): > > a = (lambda z: z**2+5)(z=2) > > I thinkg this would be especially powerful in cases where the variable to be > substituted ('z' in the example) comes in turn from a complicated > expression, which makes it confusing to "embed" it in the main expression > (the body of the 'lambda'), or in cases where the substitution must be > performed more than once, and it may be more efficient to evaluate 'z' once. > A more complicated example: > > vtype = decl[par_pos+1:FindMatching(par_pos, decl)].strip() where > par_pos=decl.find('(') > > equivalent to (current Python syntax): > > vtype = (lambda par_pos: > decl[par_pos+1:FindMatching(par_pos,decl)].strip())(par_pos=decl.find('(')) > > Extending this syntax to several assignments after the 'where' keyword could > be implemented as: > > where_expr: expr 'where' NAME '=' expr (',' NAME '=' expr )* > > or (which I think may be more "pythonic"): > > where_expr: expr 'where' NAME (',' NAME)* '=' expr (',' expr)* > > as it mimics the same syntax for unpacking tuples. > > I would appreciate any feedback on the idea, especially if it has some > obvious flaw or if it's redundant (meaning there is a clearer way of doing > this 'trick' which I don't know about). > I think the "trick" to making it readable is putting the assignment first. par_pos = decl.find('(') vtype = decl[par_pos+1:FindMatching(par_pos, decl)].strip() versus: vtype = decl[par_pos+1:FindMatching(par_pos, decl)].strip() where par_pos=decl.find('(') -Jack From stephen at xemacs.org Tue Jul 20 07:29:01 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 20 Jul 2010 14:29:01 +0900 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: Message-ID: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Sergio Davis writes: > I'm considering the following extension to Python's grammar: adding the > 'where' keyword, which would work as follows: > > where_expr : expr 'where' NAME '=' expr We just had a long thread about this. http://mail.python.org/pipermail/python-ideas/2010-June/007476.html The sentiment was about -0 to -0.5 on the idea in general, although a couple of people with experience in implementing Python syntax expressed sympathy for it. There have also been threads on earlier variations of the idea, referenced in the thread above. HTH From ncoghlan at gmail.com Tue Jul 20 15:27:49 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 20 Jul 2010 23:27:49 +1000 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jul 20, 2010 at 3:29 PM, Stephen J. Turnbull wrote: > We just had a long thread about this. > > http://mail.python.org/pipermail/python-ideas/2010-June/007476.html > > The sentiment was about -0 to -0.5 on the idea in general, although a > couple of people with experience in implementing Python syntax > expressed sympathy for it. For the record, I am personally +1 on the idea (otherwise I wouldn't have put so much thought into it over the years). It's just a *lot* harder to define complete and consistent semantics for the concept than people often realise. However, having the question come up twice within the last month finally inspired me to write the current status of the topic down in a deferred PEP: http://www.python.org/dev/peps/pep-3150/ Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From 8mayday at gmail.com Tue Jul 20 17:01:39 2010 From: 8mayday at gmail.com (Andrey Popp) Date: Tue, 20 Jul 2010 19:01:39 +0400 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Also, what about some alternative for workaround the following: > Out of Order Execution: the where clause makes execution jump around a little strangely, as > the body of the where clause is executed before the simple statement in the clause header. The > closest any other part of Python comes to this before is the out of order evaluation in > conditional expressions. result = let: a = retrieve_a() b = retreive_b() in: a*a + b*b On Tue, Jul 20, 2010 at 6:57 PM, Andrey Popp <8mayday at gmail.com> wrote: > Hello, > > PEP-3150 is awesome, just a small addition ? why not to allow > one-liners `where`s: > > ? ?a = (b, b) where b = 43 > > And that also make sense for generator/list/set/dict comprehensions: > > ? ?mylist = [y for y in another_list if y < 5 where y = f(x)] > > On Tue, Jul 20, 2010 at 5:27 PM, Nick Coghlan wrote: >> On Tue, Jul 20, 2010 at 3:29 PM, Stephen J. Turnbull wrote: >>> We just had a long thread about this. >>> >>> http://mail.python.org/pipermail/python-ideas/2010-June/007476.html >>> >>> The sentiment was about -0 to -0.5 on the idea in general, although a >>> couple of people with experience in implementing Python syntax >>> expressed sympathy for it. >> >> For the record, I am personally +1 on the idea (otherwise I wouldn't >> have put so much thought into it over the years). It's just a *lot* >> harder to define complete and consistent semantics for the concept >> than people often realise. >> >> However, having the question come up twice within the last month >> finally inspired me to write the current status of the topic down in a >> deferred PEP: http://www.python.org/dev/peps/pep-3150/ >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > > > -- > Andrey Popp > > phone: +7 911 740 24 91 > e-mail: 8mayday at gmail.com > -- Andrey Popp phone: +7 911 740 24 91 e-mail: 8mayday at gmail.com From 8mayday at gmail.com Tue Jul 20 16:57:15 2010 From: 8mayday at gmail.com (Andrey Popp) Date: Tue, 20 Jul 2010 18:57:15 +0400 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Hello, PEP-3150 is awesome, just a small addition ? why not to allow one-liners `where`s: a = (b, b) where b = 43 And that also make sense for generator/list/set/dict comprehensions: mylist = [y for y in another_list if y < 5 where y = f(x)] On Tue, Jul 20, 2010 at 5:27 PM, Nick Coghlan wrote: > On Tue, Jul 20, 2010 at 3:29 PM, Stephen J. Turnbull wrote: >> We just had a long thread about this. >> >> http://mail.python.org/pipermail/python-ideas/2010-June/007476.html >> >> The sentiment was about -0 to -0.5 on the idea in general, although a >> couple of people with experience in implementing Python syntax >> expressed sympathy for it. > > For the record, I am personally +1 on the idea (otherwise I wouldn't > have put so much thought into it over the years). It's just a *lot* > harder to define complete and consistent semantics for the concept > than people often realise. > > However, having the question come up twice within the last month > finally inspired me to write the current status of the topic down in a > deferred PEP: http://www.python.org/dev/peps/pep-3150/ > > Cheers, > Nick. > > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Andrey Popp phone: +7 911 740 24 91 e-mail: 8mayday at gmail.com From stephen at xemacs.org Tue Jul 20 17:22:27 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 21 Jul 2010 00:22:27 +0900 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87zkxmxjnw.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > On Tue, Jul 20, 2010 at 3:29 PM, Stephen J. Turnbull wrote: > For the record, I am personally +1 on the idea Gee, and here I had you down as a +0.95. Can you forgive me? :-) > However, having the question come up twice within the last month > finally inspired me to write the current status of the topic down in a > deferred PEP: http://www.python.org/dev/peps/pep-3150/ Way cool! Thanks! From daniel at stutzbachenterprises.com Tue Jul 20 17:24:56 2010 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Tue, 20 Jul 2010 10:24:56 -0500 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: n Tue, Jul 20, 2010 at 8:27 AM, Nick Coghlan wrote: > For the record, I am personally +1 on the idea (otherwise I wouldn't > have put so much thought into it over the years). It's just a *lot* > harder to define complete and consistent semantics for the concept > than people often realise. > > However, having the question come up twice within the last month > finally inspired me to write the current status of the topic down in a > deferred PEP: http://www.python.org/dev/peps/pep-3150/ There was a related discussion on python-ideas in July 2009, spanning two threads, that you may want to additionally reference. A lot of corner cases were also brought up in that thread. Here are the starts of the two threads: http://mail.python.org/pipermail/python-ideas/2009-July/005082.html http://mail.python.org/pipermail/python-ideas/2009-July/005132.html -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.arbash.meinel at gmail.com Tue Jul 20 17:28:40 2010 From: john.arbash.meinel at gmail.com (John Arbash Meinel) Date: Tue, 20 Jul 2010 10:28:40 -0500 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4C45C0A8.8060407@gmail.com> Andrey Popp wrote: > Hello, > > PEP-3150 is awesome, just a small addition ? why not to allow > one-liners `where`s: > > a = (b, b) where b = 43 > > And that also make sense for generator/list/set/dict comprehensions: > > mylist = [y for y in another_list if y < 5 where y = f(x)] Do you mean: mylist = [y for x in another_list if y < 5 where y = f(x)] John =:-> From eric.twilegar at gmail.com Tue Jul 20 19:03:07 2010 From: eric.twilegar at gmail.com (et gmail) Date: Tue, 20 Jul 2010 12:03:07 -0500 Subject: [Python-ideas] first() and last() tests in for x in y loops In-Reply-To: <4c4365f1.5429e70a.4ba3.ffff8601@mx.google.com> References: <4c4365f1.5429e70a.4ba3.ffff8601@mx.google.com> Message-ID: <4c45d6e1.5429e70a.5067.ffffd4b5@mx.google.com> While doing "for x in y" loops I often need to know if I'm working on the first item or the last item in the list. For example imagine you are building a list of values separated by ","s. The last iteration you need to suppress the ",". One work around is to just take the last character off during at the end, but you get the idea. I could see the code looking something like this for item in List: if __first__: print 'we are in the first loop' doSomething() if __last__ is False: print ',' Sorry if the formatting is a little off. Does something like this already exist and I'm just being a newb. I am fairly new to the language. Also it would be nice if there was an auto counter like For item in List with count x . where x would then be an auto counter that incremented every iteration of the loop. Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Jul 20 19:13:13 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 20 Jul 2010 13:13:13 -0400 Subject: [Python-ideas] first() and last() tests in for x in y loops In-Reply-To: <4c45d6e1.5429e70a.5067.ffffd4b5@mx.google.com> References: <4c4365f1.5429e70a.4ba3.ffff8601@mx.google.com> <4c45d6e1.5429e70a.5067.ffffd4b5@mx.google.com> Message-ID: On Tue, Jul 20, 2010 at 1:03 PM, et gmail wrote: .. > Also it would be nice if there was an auto counter like > > For item in List with count x ? where x would then be an auto counter that > incremented every iteration of the loop. This feature is already there: for num,item in enumerate(List): ... I would like to suggest that when you don't know how to achieve certain result in python, you first ask on python-list or #python IRC channel. It is best to propose new features after users agree that the feature is not present. From phd at phd.pp.ru Tue Jul 20 19:19:01 2010 From: phd at phd.pp.ru (Oleg Broytman) Date: Tue, 20 Jul 2010 21:19:01 +0400 Subject: [Python-ideas] first() and last() tests in for x in y loops In-Reply-To: <4c45d6e1.5429e70a.5067.ffffd4b5@mx.google.com> References: <4c4365f1.5429e70a.4ba3.ffff8601@mx.google.com> <4c45d6e1.5429e70a.5067.ffffd4b5@mx.google.com> Message-ID: <20100720171901.GB32158@phd.pp.ru> On Tue, Jul 20, 2010 at 12:03:07PM -0500, et gmail wrote: > While doing "for x in y" loops I often need to know if I'm working on the > first item or the last item in the list. See http://ppa.cvs.sourceforge.net/ppa/misc/Repeat.py?rev=HEAD&content-type=text/vnd.viewcvs-markup (License: Python) > Also it would be nice if there was an auto counter like Find built-in function enumerate() in the docs. Oleg. -- Oleg Broytman http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From sturla at molden.no Tue Jul 20 19:18:57 2010 From: sturla at molden.no (Sturla Molden) Date: Tue, 20 Jul 2010 19:18:57 +0200 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4C45DA81.5020009@molden.no> Den 20.07.2010 16:57, skrev Andrey Popp: > a = (b, b) where b = 43 > I am +1 for a where module and -1 for a where keyword, and here is the reason: In MATLAB, we have the "find" function that serves the role of where. In NumPy, we have a function numpy.where and also masked arrays. The above statement with NumPy ndarrays would be: idx, = np.where(b == 47) a = (b[idx], b[idx]) or we could simply do this: a = (b[b==47], b[b==47]) And if we look at this proposed expression, mylist = [y for y in another_list if y < 5 where y == f(x)] Using NumPy, we can proceed like this: idx, = np.where(another_array == f(x)) mylist = [y for y in another_array[idx] if y < 5] The intension is just as clear, and it avoids a new "where" keyword. It is also similar to NumPy and Matlab. Not to mention that the "where keyword" in the above expression could be replaced with an "and", so it serve no real purpose: mylist = [y for y in another_list if (y < 5 and y == f(x))] why have a where keyword here? It is just redundant. So I'd rather speak of something useful instead: NumPy's "Fancy indexing". "Fancy indexing" (NumPy jargon) will in this context mean that we allow indexes to be an iterable, not just integers: mylist[(1,2,3)] == mylist[1,2,3] mylist[iterable] == [a(i) for i in iterable] That is what NumPy and Matlab do, as well as Fortran 90 (and certain C++ libraries such as Blitz++). It has all the power of the "where keyword", while being more flexible to use, and intention is more explicit. It is also well tested syntax. Thus with "fancy indexing": alist[iterable] == [alist[i] for i in iterable] That is what we really need! Note that this is not a language syntax change, it is just a change of how __setitem__ and __getitem__ works for certain container types. NumPy already does this, so the syntax itself is completely valid Python. And as for "where", it is just a function. Andrey's proposed where keyword is a crippled tool in comparison. That is, the real power of a list of indexers is that it can be obtained and manipulated with any conceivable method, e.g. slicing. It also allows numpy to have an "argsort" function, since an index list can be reused on multiple arrays: idx = np.argsort(array_a) sorteda = array_a[idx] sortedb = array_b[idx] is the same as tmp = sorted([a,i for i,a in enumerate(lista)]) sorteda = [a for a,i in tmp] sortedb = [listb[i] for a,i in tmp] Which is the more readable? Implementing a generic "where function" can be achieved with a lambda: idx = where(lambda x: x== 47, alist) or a list comprehension (this would be very similar to NumPy): idx = where([x==47 for x in alist]) But to begin with, I think we should get NumPy style "fancy indexing" to standard container types like list, tuple, string, bytes, bytearray, array and deque. That would just be a handful of subclasses, and I think they should (initially) be put in a standard library module, and possibly replace the current cointainers in Python 4000. But as for a where keyword: My opinion is a big -1, if I have the right to vote. We should rather implement a where function and overload the mentioned container types. The where function should go in the same module. So all in all, I am +1 for a "where module" and -1 for a "where keyword". P.S. I'll admit that dict and set might add to some confusion, since "fancy indexing" would be ambigous for them. Regards, Sturla From guido at python.org Tue Jul 20 20:16:04 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Jul 2010 19:16:04 +0100 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: <4C45DA81.5020009@molden.no> References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no> Message-ID: On Tue, Jul 20, 2010 at 6:18 PM, Sturla Molden wrote: > Den 20.07.2010 16:57, skrev Andrey Popp: >> >> ? ? a = (b, b) where b = 43 >> > > I am +1 for a where module and -1 for a where keyword, and here is the > reason: > > In MATLAB, we have the "find" function that serves the role of where. In > NumPy, we have a function numpy.where and also masked arrays. > > The above statement with NumPy ndarrays would be: > > ? idx, = np.where(b == 47) > ? a = (b[idx], b[idx]) > > or we could simply do this: > > ? a = (b[b==47], b[b==47]) It looks like NumPy's "where" is more like SQL's, while Nick's is more like Haskell's. These are totally different: in SQL it's a dynamic query (and its argument is a condition), whereas in Haskell it's purely a syntactic construct for defining some variables to be used as shorthands in an expression. Given the large number of Python users familiar with SQL compared to those familiar with Haskell, I think we'd do wise to pick a different keyword instead of 'where'. I can't think of one right now though. Your proposal is completely orthogonal to Nick's; the best thing to do is probably to start a different thread for yours. Note that Microsoft's LINQ is also similar to your suggestion. -- --Guido van Rossum (python.org/~guido) From daniel at stutzbachenterprises.com Tue Jul 20 20:29:04 2010 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Tue, 20 Jul 2010 13:29:04 -0500 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no> Message-ID: On Tue, Jul 20, 2010 at 1:16 PM, Guido van Rossum wrote: > Given the large number of Python users familiar with SQL compared to > those familiar with Haskell, I think we'd do wise to pick a different > keyword instead of 'where'. I can't think of one right now though. > > Taking a cue from mathematics, how about "given"? c = sqrt(a*a + b*b) given: a = retrieve_a() b = retrieve_b() -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From george.sakkis at gmail.com Tue Jul 20 20:35:20 2010 From: george.sakkis at gmail.com (George Sakkis) Date: Tue, 20 Jul 2010 20:35:20 +0200 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no>

Message-ID: On Tue, Jul 20, 2010 at 8:29 PM, Daniel Stutzbach wrote: > On Tue, Jul 20, 2010 at 1:16 PM, Guido van Rossum wrote: >> >> Given the large number of Python users familiar with SQL compared to >> those familiar with Haskell, I think we'd do wise to pick a different >> keyword instead of 'where'. I can't think of one right now though. > > Taking a cue from mathematics, how about "given"? > > c = sqrt(a*a + b*b) given: > ?? ?a = retrieve_a() > ?? ?b = retrieve_b() Or if we'd rather overload an existing keyword than add a new one, "with" reads well too. George From scialexlight at gmail.com Tue Jul 20 20:56:38 2010 From: scialexlight at gmail.com (Alex Light) Date: Tue, 20 Jul 2010 14:56:38 -0400 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no>

Message-ID: i would suggest overloading the 'with', 'as' combo c = sqrt(a*a + b*b) with: get_a(), get_b() as a, b or c = sqrt(a*a + b*b) with: get_a() as a get_b() as b it reads well and plus it follows since this statement acts similarly to a regular with and as statement with the behavior of the context manager being (in psudocode): set up: store original value of variable if any and set variable to new value. tear down: set value back to original or delete from local namespace if it never had one additionally we do not need to introduce any new keywords any way that this is implemented though it would be quite useful On Tue, Jul 20, 2010 at 2:35 PM, George Sakkis wrote: > On Tue, Jul 20, 2010 at 8:29 PM, Daniel Stutzbach > wrote: > > > On Tue, Jul 20, 2010 at 1:16 PM, Guido van Rossum > wrote: > >> > >> Given the large number of Python users familiar with SQL compared to > >> those familiar with Haskell, I think we'd do wise to pick a different > >> keyword instead of 'where'. I can't think of one right now though. > > > > Taking a cue from mathematics, how about "given"? > > > > c = sqrt(a*a + b*b) given: > > a = retrieve_a() > > b = retrieve_b() > > Or if we'd rather overload an existing keyword than add a new one, > "with" reads well too. > > George > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Tue Jul 20 21:15:04 2010 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 20 Jul 2010 20:15:04 +0100 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no>

Message-ID: <4C45F5B8.6080606@mrabarnett.plus.com> Alex Light wrote: > i would suggest overloading the 'with', 'as' combo > > c = sqrt(a*a + b*b) with: > get_a(), get_b() as a, b > > or > > > c = sqrt(a*a + b*b) with: > get_a() as a > get_b() as b > Why use 'as'? Why not: c = sqrt(a*a + b*b) with: a = get_a() b = get_b() which is like: def _(): a = get_a() b = get_b() return sqrt(a*a + b*b) c = _() del _ > > it reads well and plus it follows since this statement acts similarly to > a regular with and as statement with the behavior of the context manager > being (in psudocode): > set up: store original value of variable if any and set variable to new > value. > tear down: set value back to original or delete from local namespace if > it never had one > > additionally we do not need to introduce any new keywords > > any way that this is implemented though it would be quite useful > > On Tue, Jul 20, 2010 at 2:35 PM, George Sakkis > wrote: > > On Tue, Jul 20, 2010 at 8:29 PM, Daniel Stutzbach > > wrote: > > > On Tue, Jul 20, 2010 at 1:16 PM, Guido van Rossum > > wrote: > >> > >> Given the large number of Python users familiar with SQL compared to > >> those familiar with Haskell, I think we'd do wise to pick a > different > >> keyword instead of 'where'. I can't think of one right now though. > > > > Taking a cue from mathematics, how about "given"? > > > > c = sqrt(a*a + b*b) given: > > a = retrieve_a() > > b = retrieve_b() > > Or if we'd rather overload an existing keyword than add a new one, > "with" reads well too. > From bruce at leapyear.org Tue Jul 20 21:17:18 2010 From: bruce at leapyear.org (Bruce Leban) Date: Tue, 20 Jul 2010 12:17:18 -0700 Subject: [Python-ideas] fancy indexing Message-ID: [changing the subject; was: 'where' statement in Python?] I think this is an interesting idea (whether worth adding is a different question). I think it would be confusing that a[x] = (y,z) does something entirely different when x is 1 or (1,2). If python *were* to add something like this, I think perhaps a different syntax should be considered: a[[x]] = y y = a[[x]] which call __setitems__ and __getitems__ respectively. This makes it clear that something different is going on and eliminates the ambiguity for dicts. --- Bruce http://www.vroospeak.com http://google-gruyere.appspot.com On Tue, Jul 20, 2010 at 10:18 AM, Sturla Molden wrote: > So I'd rather speak of something useful instead: NumPy's "Fancy indexing". > > "Fancy indexing" (NumPy jargon) will in this context mean that we allow > indexes to be an iterable, not just integers: > > mylist[(1,2,3)] == mylist[1,2,3] > mylist[iterable] == [a(i) for i in iterable] > > That is what NumPy and Matlab do, as well as Fortran 90 (and certain C++ > libraries such as Blitz++). It has all the power of the "where keyword", > while being more flexible to use, and intention is more explicit. It is also > well tested syntax. > > Thus with "fancy indexing": > > alist[iterable] == [alist[i] for i in iterable] > > That is what we really need! > > Note that this is not a language syntax change, it is just a change of how > __setitem__ and __getitem__ works for certain container types. NumPy already > does this, so the syntax itself is completely valid Python. And as for > "where", it is just a function. > > Andrey's proposed where keyword is a crippled tool in comparison. That is, > the real power of a list of indexers is that it can be obtained and > manipulated with any conceivable method, e.g. slicing. It also allows numpy > to have an "argsort" function, since an index list can be reused on multiple > arrays: > > idx = np.argsort(array_a) > sorteda = array_a[idx] > sortedb = array_b[idx] > > is the same as > > tmp = sorted([a,i for i,a in enumerate(lista)]) > sorteda = [a for a,i in tmp] > sortedb = [listb[i] for a,i in tmp] > > Which is the more readable? > > Implementing a generic "where function" can be achieved with a lambda: > > idx = where(lambda x: x== 47, alist) > > or a list comprehension (this would be very similar to NumPy): > > idx = where([x==47 for x in alist]) > > But to begin with, I think we should get NumPy style "fancy indexing" to > standard container types like list, tuple, string, bytes, bytearray, array > and deque. That would just be a handful of subclasses, and I think they > should (initially) be put in a standard library module, and possibly replace > the current cointainers in Python 4000. > > But as for a where keyword: My opinion is a big -1, if I have the right to > vote. We should rather implement a where function and overload the mentioned > container types. The where function should go in the same module. > > So all in all, I am +1 for a "where module" and -1 for a "where keyword". > > P.S. I'll admit that dict and set might add to some confusion, since "fancy > indexing" would be ambigous for them. > > Regards, > Sturla > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Tue Jul 20 21:20:32 2010 From: sturla at molden.no (Sturla Molden) Date: Tue, 20 Jul 2010 21:20:32 +0200 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no> Message-ID: <4C45F700.9040902@molden.no> Den 20.07.2010 20:16, skrev Guido van Rossum: > It looks like NumPy's "where" is more like SQL's, Yes, it is roughtly like a WHERE statement in SQL or Fortran 90, or Python's built-in "filter" function (albeit more flexible). I am not sure I miss "fancy indexing" for Python lists because it is useful, or because Fortran have crippled my mind. This is of course easy to achieve with two utility functions, also demonstrating what NumPy's fancy indexing does. This is with a lambda: def where(cond, alist): return [i for i,a in enumerate(alist) if cond(a)] def fancyindex(alist, index): return [alist[i] for i in index] > Microsoft's LINQ is also similar to your suggestion. > LINQ is just to compensate for lack of duck-typing in C# ;-) Also when used to Pythons list comprehensions, LINQ syntax feels like thinking backwards (which can be quite annoying) :-( Sturla From sturla at molden.no Tue Jul 20 21:32:25 2010 From: sturla at molden.no (Sturla Molden) Date: Tue, 20 Jul 2010 21:32:25 +0200 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no>

Message-ID: <4C45F9C9.3040808@molden.no> Den 20.07.2010 20:56, skrev Alex Light: > i would suggest overloading the 'with', 'as' combo > > c = sqrt(a*a + b*b) with: > get_a(), get_b() as a, b > That will not work, the parser would think like this: c = sqrt(a*a + b*b) with: get_a(), (get_b() as a), b > > c = sqrt(a*a + b*b) with: > get_a() as a > get_b() as b > I think it tastes too much like functional programming. Specifically: It forces you to think backwards: it does not evaluate the lines in the order they read. The block below is evaluated before the expression above. It really means: a = get_a() b = get_b() c = sqrt(a*a + b*b) with: del a,b That I think is very annoying. Why not just use context managers instead? with get_a() as a, get_b() as b: c = sqrt(a*a + b*b) Now it reads the right way: top down, not bottom up. Regards, Sturla From bruce at leapyear.org Tue Jul 20 21:42:15 2010 From: bruce at leapyear.org (Bruce Leban) Date: Tue, 20 Jul 2010 12:42:15 -0700 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: List comprehensions are the one place where I might find this useful. I can do this with map though: [y for x in another_list if y < 5 where y = f(x)] [y for x in map(f, another_list) if x < 5] [x for x in another_list if y < 5 where y = f(x)] [x for (x,y) in map(lambda x: (x,f(x)), another_list) if y < 5] I think the variant with 'where' or 'with' would be a bit more readable but is it valuable enough? --- Bruce http://www.vroospeak.com http://google-gruyere.appspot.com On Tue, Jul 20, 2010 at 7:57 AM, Andrey Popp <8mayday at gmail.com> wrote: > > > mylist = [y for y in another_list if y < 5 where y = f(x)] > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jul 20 21:44:18 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Jul 2010 20:44:18 +0100 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: <4C45F9C9.3040808@molden.no> References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no>

<4C45F9C9.3040808@molden.no> Message-ID: On Tue, Jul 20, 2010 at 8:32 PM, Sturla Molden wrote: > Den 20.07.2010 20:56, skrev Alex Light: >> >> i would suggest overloading the 'with', 'as' combo >> >> c = sqrt(a*a + b*b) with: >> ? ?get_a(), get_b() as a, b >> > > That will not work, the parser would think like this: > > c = sqrt(a*a + b*b) with: > ? ?get_a(), (get_b() as a), b Not true, we can define the grouping as we like. However if we do something like this I'd rather use 'var = expr' instead of 'expr as var'. >> c = sqrt(a*a + b*b) with: >> ? ?get_a() as a >> ? ?get_b() as b >> > > I think it tastes too much like functional programming. Specifically: > > It forces you to think backwards: it does not evaluate the lines in the > order they read. The block below is evaluated before the expression above. > It really means: > > a = get_a() > b = get_b() > c = sqrt(a*a + b*b) with: > del a,b > > That I think is very annoying. Like it or not, except for the keyword to use and the 'as' issue, that's exactly the proposal (please read the PEP: http://www.python.org/dev/peps/pep-3150/ ). I personally like it; plus the "think backwards" idea is already used in other parts of Python' syntax, e.g. list comprehensions. And of course forward references work for method calls too. > Why not just use context managers instead? > > with get_a() as a, get_b() as b: > ? ? c = sqrt(a*a + b*b) > > Now it reads the right way: top down, not bottom up. Because this means something different and requires that get_a() return something that obeys the context manager protocol. -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Tue Jul 20 21:49:49 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 20 Jul 2010 15:49:49 -0400 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 7/20/2010 1:29 AM, Stephen J. Turnbull wrote: > Sergio Davis writes: > > > I'm considering the following extension to Python's grammar: adding the > > 'where' keyword, which would work as follows: > > > > where_expr : expr 'where' NAME '=' expr > > We just had a long thread about this. > > http://mail.python.org/pipermail/python-ideas/2010-June/007476.html > > The sentiment was about -0 to -0.5 on the idea in general, I did not comment then because I thought the idea of cluttering python with augmented local namespace blocks, with no functional gain, was rejected and dead, and hence unnecessary of comment. -10 For me, the idea would come close to destroying (what remains of) the simplicity that makes Python relatively easy to learn. It seems to be associated with the (to me, cracked) idea that names are pollution. I agree with Jack Diederich: >I think the "trick" to making it readable > is putting the assignment first. > par_pos = decl.find('(') > vtype = decl[par_pos+1:FindMatching(par_pos, decl)].strip() > versus: > vtype = decl[par_pos+1:FindMatching(par_pos, decl)].strip() where > par_pos=decl.find('(') The real horror would come with multiple assignments with multiple and nested where or whatever clauses. -- Terry Jan Reedy From sturla at molden.no Tue Jul 20 21:57:02 2010 From: sturla at molden.no (Sturla Molden) Date: Tue, 20 Jul 2010 21:57:02 +0200 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: > [x for x in another_list if y < 5 where y = f(x)] > [x for (x,y) in map(lambda x: (x,f(x)), another_list) if y < 5] Or allow nested list comprehensions? [x for (x,y) in [x,f(x) for x in another_list] if y < 5] this is getting closer to LINQ... We could use line-breaks for readability: [x for (x,y) in [x,f(x) for x in another_list] if y < 5] S. From 8mayday at gmail.com Tue Jul 20 22:02:07 2010 From: 8mayday at gmail.com (Andrey Popp) Date: Wed, 21 Jul 2010 00:02:07 +0400 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: <4C45C0A8.8060407@gmail.com> References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45C0A8.8060407@gmail.com> Message-ID: Yes, exactly this, it was a typo. On Tue, Jul 20, 2010 at 7:28 PM, John Arbash Meinel wrote: > Andrey Popp wrote: >> Hello, >> >> PEP-3150 is awesome, just a small addition ? why not to allow >> one-liners `where`s: >> >> ? ? a = (b, b) where b = 43 >> >> And that also make sense for generator/list/set/dict comprehensions: >> >> ? ? mylist = [y for y in another_list if y < 5 where y = f(x)] > > Do you mean: > > ?mylist = [y for x in another_list if y < 5 where y = f(x)] > > John > =:-> > -- Andrey Popp phone: +7 911 740 24 91 e-mail: 8mayday at gmail.com From sturla at molden.no Tue Jul 20 22:07:21 2010 From: sturla at molden.no (Sturla Molden) Date: Tue, 20 Jul 2010 22:07:21 +0200 Subject: [Python-ideas] fancy indexing In-Reply-To: <4C45F700.9040902@molden.no> References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no> <4C45F700.9040902@molden.no> Message-ID: <925d09c162cc4a9eca363b01403edda9.squirrel@webmail.uio.no> > Den 20.07.2010 20:16, skrev Guido van Rossum: > Yes, it is roughtly like a WHERE statement in SQL or Fortran 90, or > Python's built-in "filter" function (albeit more flexible). Fancy indexing is actually more like a join. Since the result from a numpy.where one array can be used to filter another array, it fancy indexing would be like a join between tables in a relational database. def join(blist, index): return [blist[i] for i in index] The big difference between fancy indexing and a join method is of course that indexing can appear on the left side of an expression. Sturla From ianb at colorstudy.com Tue Jul 20 22:09:02 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 20 Jul 2010 15:09:02 -0500 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: <4C45F700.9040902@molden.no> References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no> <4C45F700.9040902@molden.no> Message-ID: On Tue, Jul 20, 2010 at 2:20 PM, Sturla Molden wrote: > Microsoft's LINQ is also similar to your suggestion. >> >> > > LINQ is just to compensate for lack of duck-typing in C# ;-) Also when used > to Pythons list comprehensions, LINQ syntax feels like thinking backwards > (which can be quite annoying) :-( > Well, LINQ has a bit more going on from what I can tell -- you can actually get at the expressions and work with them. This is something NumPy and database abstraction layers both need, and they both currently use method override hacks that have certain limitations (e.g., you capture ==, but you can't capture "and"). If you work really hard you can decompile the bytecodes (DejaVu did this for lambdas, but not generator expressions). I don't think I've even seen a language proposal that actually tries to tackle this though. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scialexlight at gmail.com Tue Jul 20 22:13:07 2010 From: scialexlight at gmail.com (Alex Light) Date: Tue, 20 Jul 2010 16:13:07 -0400 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no>

<4C45F5B8.6080606@mrabarnett.plus.com> Message-ID: MRAB wrote: > > Why use 'as'? Why not: > i would use as because this whole where clause acts very similarly to a context manager in that it sets a variable to a value for a small block > c = a + b with: > get_a(), get_b() as a, b > is equivilent to > with get_a(), get_b() as a, b: > c = a + b > assuming get_a() and get_b() are context manager that return the values of a and b and then delete them at the end > the thing with this though it that the get_a() and get_b() do not need to be context managers the interpreter will create one so this will be valid: > c = a**2 with: > 4 as a > i.e. it creates a context manager whose __enter__ method returns 4 and whose exit method deletes a from the current namespace > strula moldan wrote: >That will not work, the parser would think like this: > >c = sqrt(a*a + b*b) with: > get_a(), (get_b() as a), b sorry haven't used 'as' in a while and i forgot that you could not use it like you can everything else in compound assignments (BTW why isn't a, b as c, d allowed it makes sense especially since a, b = c, d is allowed it should be changed so it does) >>c = sqrt(a*a + b*b) with: >> get_a() as a >> get_b() as b > > >I think it tastes too much like functional programming. Specifically: > >It forces you to think backwards: it does not evaluate the lines in the order they read. i think you miss the point which is understandable considering such a short example but consider if there are many many steps used to arrive at the variable. you can give it a simple, descriptive name and nobody needs to see how you got it unless they want to. or a situation where you have many many variables in the with statement and it is possible to understand from their names what they represent. i.e. sea = water() with (get_temperature(sea) as temp, get_depth(sea) as depth, get_purity(sea) as purity, get_salinity(sea) as saltiness, get_size(sea) as size #etc for a few more lines get_density(sea) as density): sea_num = temp**depth + purity - salinity * size - density # one line only is much harder to understand than simply sea_num = temp**depth + purity - salinity * size - density with: # one line only get_temperature(sea) as temp, get_depth(sea) as depth, get_purity(sea) as purity, get_salinity(sea) as saltiness, get_size(sea) as size #etc for a few more lines get_density(sea) as density the meaning of all the words is obvious (assuming you know that sea_num has to do with the sea and water) and since it is you do not really need to have all that clutter up above and can just put it below > On Tue, Jul 20, 2010 at 3:15 PM, MRAB wrote: > Alex Light wrote: > >> i would suggest overloading the 'with', 'as' combo >> >>> >> c = sqrt(a*a + b*b) with: >> >>> get_a(), get_b() as a, b >> >>> >> or >> >>> >> >> c = sqrt(a*a + b*b) with: >> >>> get_a() as a >> >>> get_b() as b >> >> Why use 'as'? Why not: > >> > > c = sqrt(a*a + b*b) with: > >> a = get_a() > >> b = get_b() > >> > which is like: > >> > def _(): > >> a = get_a() > >> b = get_b() > >> > return sqrt(a*a + b*b) > >> > c = _() > >> del _ > >> > >> it reads well and plus it follows since this statement acts similarly to a >> regular with and as statement with the behavior of the context manager being >> (in psudocode): set up: store original value of variable if any and set >> variable to new value. >> >>> tear down: set value back to original or delete from local namespace if >> it never had one >> >>> >> additionally we do not need to introduce any new keywords >> >>> >> any way that this is implemented though it would be quite useful >> >>> >> On Tue, Jul 20, 2010 at 2:35 PM, George Sakkis > george.sakkis at gmail.com>> wrote: >> >>> >> On Tue, Jul 20, 2010 at 8:29 PM, Daniel Stutzbach >> >>> > >>> > wrote: >> >>> >> > On Tue, Jul 20, 2010 at 1:16 PM, Guido van Rossum >> >>> > wrote: >> >>> >> >> >>> >> Given the large number of Python users familiar with SQL compared >> to >> >>> >> those familiar with Haskell, I think we'd do wise to pick a >> >>> different >> >>> >> keyword instead of 'where'. I can't think of one right now though. >> >>> > >> >>> > Taking a cue from mathematics, how about "given"? >> >>> > >> >>> > c = sqrt(a*a + b*b) given: >> >>> > a = retrieve_a() >> >>> > b = retrieve_b() >> >>> >> Or if we'd rather overload an existing keyword than add a new one, >> >>> "with" reads well too. >> >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Tue Jul 20 22:15:29 2010 From: sturla at molden.no (Sturla Molden) Date: Tue, 20 Jul 2010 22:15:29 +0200 Subject: [Python-ideas] fancy indexing In-Reply-To: References: Message-ID: <19309b1fb49ecc37281b810bbf7af682.squirrel@webmail.uio.no> > [changing the subject; was: 'where' statement in Python?] > a[[x]] = y > y = a[[x]] > > which call __setitems__ and __getitems__ respectively. This makes it clear > that something different is going on and eliminates the ambiguity for > dicts. Or use the * operator used to expand tuples for fucntion calls: a[*x] = y y = a[*x] analogous to foobar(*x). The intent would be the same. S. From merwok at netwok.org Tue Jul 20 23:07:06 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Tue, 20 Jul 2010 23:07:06 +0200 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4C460FFA.7000303@netwok.org> > Or allow nested list comprehensions? > > [x for (x,y) in [x,f(x) for x in another_list] if y < 5] This already works, with a syntax correction: [x for (x, y) in [(x, f(x)) for x in another_list] if y < 5] Regards From ncoghlan at gmail.com Tue Jul 20 23:07:57 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 21 Jul 2010 07:07:57 +1000 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Jul 21, 2010 at 12:57 AM, Andrey Popp <8mayday at gmail.com> wrote: > Hello, > > PEP-3150 is awesome, just a small addition ? why not to allow > one-liners `where`s: > > ? ?a = (b, b) where b = 43 > > And that also make sense for generator/list/set/dict comprehensions: > > ? ?mylist = [y for y in another_list if y < 5 where y = f(x)] As with any other suite, a one-liner would be allowed on the same line as the colon: a = (b, b) where: b = call_once_only() mylist = [y for y in another_list if y < x] where: x = call_once_only() Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Tue Jul 20 23:59:07 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 20 Jul 2010 17:59:07 -0400 Subject: [Python-ideas] first() and last() tests in for x in y loops In-Reply-To: <4c45d6e1.5429e70a.5067.ffffd4b5@mx.google.com> References: <4c4365f1.5429e70a.4ba3.ffff8601@mx.google.com> <4c45d6e1.5429e70a.5067.ffffd4b5@mx.google.com> Message-ID: On 7/20/2010 1:03 PM, et gmail wrote: > I could see the code looking something like this > for item in List: > if __first__: > print ?we are in the first loop? > doSomething() > if __last__ is False: > print ?,? > > Sorry if the formatting is a little off. Do not use tabs when posting code. > Does something like this already exist I am posting an extended answer to "How to treat the first or last item differently" on python-list (gmane.comp.python.general) so that others can see and find it. -- Terry Jan Reedy From ncoghlan at gmail.com Wed Jul 21 00:13:29 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 21 Jul 2010 08:13:29 +1000 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no>

<4C45F5B8.6080606@mrabarnett.plus.com>

Message-ID: On Wed, Jul 21, 2010 at 6:13 AM, Alex Light wrote: > > > > MRAB wrote: > >>?Why use 'as'? Why not: > > i would use as because this whole where clause acts very similarly to a > context manager in that it sets a variable to a value for a small block No, the idea is for the indented suite to be a perfectly normal suite of Python code. We want to be able to define functions, classes, etc in there. Inventing a new mini-language specifically for these clauses would be a bad idea (and make them unnecessarily hard to understand) For the record, I've updated the PEP* based on the discussion in this thread (I switched to "given" as the draft keyword due to the Haskell/SQL semantic confusion for the "where" keyword - we've had that discussion before, I just forgot about it last night when putting the PEP together) *Diff here: http://svn.python.org/view/peps/trunk/pep-3150.txt?r1=82992&r2=83002 Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Wed Jul 21 00:28:03 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 21 Jul 2010 00:28:03 +0200 Subject: [Python-ideas] 'where' statement in Python? References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100721002803.0d10def2@pitrou.net> On Tue, 20 Jul 2010 23:27:49 +1000 Nick Coghlan wrote: > > For the record, I am personally +1 on the idea (otherwise I wouldn't > have put so much thought into it over the years). It's just a *lot* > harder to define complete and consistent semantics for the concept > than people often realise. > > However, having the question come up twice within the last month > finally inspired me to write the current status of the topic down in a > deferred PEP: http://www.python.org/dev/peps/pep-3150/ I am worried that this complexifies Python syntax without any obvious benefit in terms of expressive power, new abstractions, or concision. There is a benefit (learning curve, readibility of foreign code) to a simple syntax. I am somewhere between -0.5 and -1. Regards Antoine. From pyideas at rebertia.com Wed Jul 21 00:30:43 2010 From: pyideas at rebertia.com (Chris Rebert) Date: Tue, 20 Jul 2010 15:30:43 -0700 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no>

<4C45F5B8.6080606@mrabarnett.plus.com>

Message-ID: On Tue, Jul 20, 2010 at 3:13 PM, Nick Coghlan wrote: > For the record, I've updated the PEP* based on the discussion in this > thread (I switched to "given" as the draft keyword due to the > Haskell/SQL semantic confusion for the "where" keyword - we've had > that discussion before, I just forgot about it last night when putting > the PEP together) > > *Diff here: http://svn.python.org/view/peps/trunk/pep-3150.txt?r1=82992&r2=83002 Nitpicking: Could the example code snippets please be made PEP 8 compliant (in particular, use 4-space indents)? They currently wobble between 2, 3, and 4-space indents, and the readability of Alex Light's example in particular is diminished by the use of only 2-space indents. Cheers, Chris From ncoghlan at gmail.com Wed Jul 21 00:38:07 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 21 Jul 2010 08:38:07 +1000 Subject: [Python-ideas] 'where' statement in Python? In-Reply-To: References: <878w56zppe.fsf@uwakimon.sk.tsukuba.ac.jp> <4C45DA81.5020009@molden.no>

<4C45F5B8.6080606@mrabarnett.plus.com>