From oliphant at enthought.com Tue Jun 1 00:54:59 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Mon, 31 May 2010 23:54:59 -0500 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> On May 31, 2010, at 9:16 AM, josef.pktd at gmail.com wrote: > Since Travis seems to want to take back control of scipy.stats, I am > considering my role as inofficial maintainer as ended. Obviously I've offended you. That has never been my intent. I apologize if my enthusiasm for getting some changes that I wanted to see into SciPy stepped on an area you felt ownership of. I do not mind if people add changes to code that I've written and I assume that others feel the same. That has always been the development mode of SciPy. We clearly have different development styles. I think we can find a way to work together. I think the move to github will help. I did not understand that you felt such ownership of scipy.stats. I have certainly appreciated your input. I do like a more "free-wheeling" style to code development than one that is bogged down with "rules" and "procedures". This clearly is not your style. For me, it comes down to time to spend. I love working on SciPy and NumPy. I don't have a lot of time to do it. When I see quick changes I can make that add value I like to be able to do it. I think we both want the same thing while we may disagree about the best way to get there. In my mind, discussion doesn't end when a check-in is made --- it just begins. You should never interpret my checking something in as the final word. We clearly have a different view of "trunk" I certainly don't want my approach to open source development to offend others or chase them away. If I check in something you don't like, then tell me and let's talk about it. If you need to vent and call me names, a private email to me or others can go a long way. What do we need to do to keep you around? Is there specifically something you didn't like about my recent check-ins? In this case, the features added were not terribly extensive. The current unit tests helped ferret out major problems. Yes, I could write more tests and documentation, and you have been a model of writing tests and documentation. I have been particularly impressed by the amount of quality documentation you have written. While you seem to dismiss the episode as problematic, I actually think curve_fit was a good example of how something very positive can emerge quickly when people are open and willing to work together. While formal, strict test-driven development is easy to point to for salvation -- it does have its costs. I've always used informal test-driven development. Just because I don't *always* add formal unit tests for every piece of code written does not mean the code that is currently in SciPy is un-tested and useless. Such an approach leaves me open to criticism, which I acknowledge. But, I think there have been far too many dismissive comments about the state of the code. I would argue that the problem with scipy.stats does not lie mainly in distributions.py or the lack of test-driven-development --- but in the lack of certain easy to use features. Quality code comes out of people who care --- not out of procedure. I think you are someone who cares and your code reflects that. We would all benefit from your staying part of the main development. Sincere regards, -Travis From oliphant at enthought.com Tue Jun 1 01:02:42 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 1 Jun 2010 00:02:42 -0500 Subject: [SciPy-Dev] Development process (was: scipy.stats) In-Reply-To: References: Message-ID: On May 31, 2010, at 9:09 PM, Matthew Brett wrote: > Hi David, > >> I'm sure you mean (for it's the nature of this list, is it not, that we all >> have free rein to be as diplomatic, or not, as we wish) something along the >> lines of: if you tone it down a bit and make it less personal, i.e., be a >> bit more 'diplomatic,' then people are more likely to take you seriously. > > Sorry - I should have replied to the earlier thread after starting this one. > > I think that is indeed Charles' point, that the best thing to do, is > to identify the general problem, where the problem does not start with > 'if only X would not ...' but is more on the lines of 'there must be > a problem in our process because the following things happen fairly > often ... ' > > That's what I am trying to do with this thread. I think we have > structural problem in organization, where it is not clear what the > process for code maintenance is. I think many people believe that we > need such a process, but, given we do not have one, it is inevitable > that things like this (significant portions of untested code suddenly > appearing in trunk) are going to happen. > > What we need is a ) agreement that there is problem and b) an idea of > how to go forward. > > I think it's also obvious that that conversation has to happen in > public and on record so we can all have our say and agree. I'm sure > it's possible to do that. > > And - Travis (sorry - I am sure you are doing more enjoyable things > for Memorial day) - of course it's essential that you join in with and > / or lead that conversation. > How many people interested in this discussion will be at SciPy this year? It may be a good idea to have a discussion about this at the conference. We could phone conference others in as well so that every voice can be heard. I do think we need to address this issue. I did not realize I was offending people with my enthusiasm for having a chance to work on SciPy. I have always resisted too much "procedure" and "policy" so that it becomes difficult for people to contribute. I really think technology changes and DVCS can help with this process. -Travis From oliphant at enthought.com Tue Jun 1 01:20:41 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 1 Jun 2010 00:20:41 -0500 Subject: [SciPy-Dev] [SciPy-User] log pdf, cdf, etc In-Reply-To: References: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> Message-ID: <12883887-E601-467B-9C56-55BDA8169C19@enthought.com> On May 31, 2010, at 6:39 AM, Ralf Gommers wrote: > > > On Sun, May 30, 2010 at 5:38 AM, wrote: > On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant wrote: > > > > Hey Josef, > > > > I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution). > > I would like to get the private _logpdf in a useful (vectorized or > broadcastable) version because for estimation and optimization, I want > to avoid the logpdf overhead. So, my testing will be on the underline > versions. > > > > > I also added your _fitstart suggestion. I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet. > > I have written a semi-frozen fit function and posted to the mailing > list a long time ago, but since I'm not sure about the API and I'm > expanding to several new estimators, I kept this under > work-in-progress. > > Similar _fitstart might need extra options, for estimation when some > parameters are fixed, e.g. there are good moment estimators that work > when some of the parameters (e.g. loc or scale) are fixed. Also > _fitstart is currently used only by my fit_frozen. > > I was hoping to get this done this year, maybe together with the > enhancements that Per Brodtkorb proposed two years ago, e.g. Method of > Maximum Spacings. > > I also have a Generalized Method of Moments estimator based on > matching quantiles and moments in the works. > > So, I don't want yet to be pinned down with any API for the estimation > enhancements. > > These recent changes are a bit problematic for several reasons: > - there are many new methods for distributions without tests. These methods are simple to see and verify. Which methods specifically are you concerned about? > - there are no docs for many new private and public methods They are all fairly self explanatory. But, docs can be added if needed. > - invalid syntax: http://projects.scipy.org/scipy/ticket/1186 This has been fixed (it was easier to fix the syntax then file the ticket...) Also to be clear this is only invalid for Python < 2.6 (the comment makes it sound like somehow the changes weren't tested at all). > - the old rv_continuous doc template was put back in I'm not sure what you mean. Which change did this? > > This, plus Josef saying that he doesn't want to fix the API for some methods yet, makes me want to take it out of the 0.8.x branch. Any objections to that Travis or Josef? I would really like to see these changes go in to 0.8.x. If Josef feels strongly about the API in the future, we can change it for the next release. I don't understand what the specific concerns are. -Travis > > Cheers, > Ralf > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Jun 1 01:26:54 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 31 May 2010 22:26:54 -0700 Subject: [SciPy-Dev] scipy.stats In-Reply-To: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> Message-ID: Hi, > I do like a more "free-wheeling" style to code development than one that is bogged down with "rules" and "procedures". Well - but that is because you don't maintenance. Imagine a maintainer puts in a lot of effort to make the code well-documented and tested. Then, you have put in new code that has neither documentation nor tests. As a good maintainer, it's really painful for them that there's new code without documentation or tests. They can only feel abused in that situation, because it seems as if you are expecting them to clean up after you - without asking. I'm offering this only as an explanation for why this situation can get people pretty pissed. See you, Matthew From oliphant at enthought.com Tue Jun 1 02:15:30 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 1 Jun 2010 01:15:30 -0500 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> Message-ID: <3430B5AD-E3C2-4CE7-B07F-D8210C2E53D5@enthought.com> On Jun 1, 2010, at 12:26 AM, Matthew Brett wrote: > Hi, > >> I do like a more "free-wheeling" style to code development than one that is bogged down with "rules" and "procedures". > > Well - but that is because you don't maintenance. Imagine a > maintainer puts in a lot of effort to make the code well-documented > and tested. Then, you have put in new code that has neither > documentation nor tests. As a good maintainer, it's really painful > for them that there's new code without documentation or tests. They > can only feel abused in that situation, because it seems as if you are > expecting them to clean up after you - without asking. I don't think that is fair. I have been "maintaining" SciPy and NumPy code for over 10 years. I have done an immense amount of work in porting SciPy to NumPy and continuing to fix bugs that I am made aware of. I don't have as much time to commit to SciPy as I would like. -Travis From oliphant at enthought.com Tue Jun 1 02:38:35 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 1 Jun 2010 01:38:35 -0500 Subject: [SciPy-Dev] Recent changes to scipy stats Message-ID: <4D0A9D22-882F-4FCC-82D5-740D332CF7F9@enthought.com> My recent changes to trunk certainly started a controversy. I'm not exactly sure why. I do not mean to give the impression that people should "clean" up after me which has been implied by some. Please let me know if there is something specific that you would like me to do. I appreciate the speific concerns that Ralf raised as opposed to "generalizations" and metaphors that are open to interpretation. All of his concerns have been addressed, I think, except the addition of all tests that some would like to see. Some of the added methods are so simple, that I do not think they require tests to verify their accuracy --- you can look at the code and understand it. In cases like this I get somewhat frustrated with a naive fixed rule like "no check-ins" without "tests". There can always be more tests, but tests cost and should be part of a general improvement strategy and not just trotted out as a weapon when there is disagreement about something else. Is there a disagreement about other changes that have been made? The only one I can think of that could be controversial is perhaps pulling in Josef's expect methods from his file when he did not want the "API" methods finalized. I'm fine with removing them if he wants to do that. Perhaps, the interface I chose to fix certain parameters for the fit methods is also in question. I really don't know as I have received no specific communication about the concerns. I welcome any review or comment on what has been done. As I am not able to follow all threads on SciPy-User and SciPy-Dev, I did not know that Ralf was going to create the 0.8.x branch when he did. Perhaps I should have known, but I did not know. -Travis From d.l.goldsmith at gmail.com Tue Jun 1 04:07:12 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 1 Jun 2010 01:07:12 -0700 Subject: [SciPy-Dev] Recent changes to scipy stats In-Reply-To: <4D0A9D22-882F-4FCC-82D5-740D332CF7F9@enthought.com> References: <4D0A9D22-882F-4FCC-82D5-740D332CF7F9@enthought.com> Message-ID: On Mon, May 31, 2010 at 11:38 PM, Travis Oliphant wrote: > > My recent changes to trunk certainly started a controversy. I'm not > exactly sure why. I do not mean to give the impression that people should > "clean" up after me which has been implied by some. Please let me know if > there is something specific that you would like me to do. > > I appreciate the speific concerns that Ralf raised as opposed to > "generalizations" and metaphors that are open to interpretation. All of > his concerns have been addressed, I think, except the addition of all tests > that some would like to see. > > Some of the added methods are so simple, that I do not think they require > tests to verify their accuracy --- you can look at the code and understand > it. In cases like this I get somewhat frustrated with a naive fixed rule > like "no check-ins" without "tests". > > There can always be more tests, but tests cost and should be part of a > general improvement strategy and not just trotted out as a weapon when there > is disagreement about something else. > > Is there a disagreement about other changes that have been made? The > only one I can think of that could be controversial is perhaps pulling in > Josef's expect methods from his file when he did not want the "API" methods > finalized. I'm fine with removing them if he wants to do that. > > Perhaps, the interface I chose to fix certain parameters for the fit > methods is also in question. I really don't know as I have received no > specific communication about the concerns. I welcome any review or comment > on what has been done. > > As I am not able to follow all threads on SciPy-User and SciPy-Dev, I did > not know that Ralf was going to create the 0.8.x branch when he did. > Perhaps I should have known, but I did not know. > > -Travis > IMO, the problem - in general, not just w/ any one person - is not the particulars of what's been done, but the attitude, when it's exhibited by an individual, any individual, that the rules may be disregarded when that individual, any individual, unilaterally and spontaneously decides those rules are inconvenient. The rules are there for very good reasons; paraphrasing a recent set of statements by Robert K.: We should follow the rules that we have agreed to because we should make good on our promises. Otherwise, we might as well not make those promises...don't look for excuses to break them...break them [only] when it would be Really Bad if [one] were to follow them. Generally...try to make good on [one's] promises and not renege on them just because [one] *think[s]* no one [else] will notice...[only] break rules/promises when they are in tension with other promises. This is not such a case. Words to commit by. (Thanks, Robert; my apologies if you would rather not have been quoted in this way/situation.) DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Tue Jun 1 04:09:50 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 1 Jun 2010 03:09:50 -0500 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: <9A04B7F1-D738-46F6-8E38-ABE06C3CC0FF@enthought.com> On May 31, 2010, at 9:16 AM, josef.pktd at gmail.com wrote: > > This is more about the process then the content, distributions was > Travis's baby (although unfinished), and most of his changes are very > good, but I don't want to look for the 5-10% (?) typos anymore. I really am not sure what the difference between looking at timeline of changes and a formal "review" process really is? In either case you are "looking for someone's mistakes or problems". I do think your estimate of typos is a bit aggressive. Really? 5-10% typos. What is the denominator? -Travis From josef.pktd at gmail.com Tue Jun 1 04:12:06 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 1 Jun 2010 04:12:06 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> Message-ID: On Tue, Jun 1, 2010 at 12:54 AM, Travis Oliphant wrote: > > On May 31, 2010, at 9:16 AM, josef.pktd at gmail.com wrote: > >> Since Travis seems to want to take back control of scipy.stats, I am >> considering my role as inofficial maintainer as ended. > > Obviously I've offended you. ? That has never been my intent. ? I apologize if my enthusiasm for getting some changes that I wanted to see into SciPy stepped on an area you felt ownership of. ? ? I do not mind if people add changes to code that I've written and I assume that others feel the same. ? That has always been the development mode of SciPy. ? We clearly have different development styles. ? ?I think we can find a way to work together. ? I think the move to github will help. > > I did not understand that you felt such ownership of scipy.stats. ?I have certainly appreciated your input. > > I do like a more "free-wheeling" style to code development than one that is bogged down with "rules" and "procedures". ? ? This clearly is not your style. ? For me, it comes down to time to spend. ? I love working on SciPy and NumPy. ? ?I don't have a lot of time to do it. ? When I see quick changes I can make that add value I like to be able to do it. ? I think we both want the same thing while we may disagree about the best way to get there. > In my mind, discussion doesn't end when a check-in is made --- it just begins. ? You should never interpret my checking something in as the final word. ? We clearly have a different view of "trunk" > > I certainly don't want my approach to open source development to offend others or chase them away. ?If I check in something you don't like, then tell me and let's talk about it. ? ?If you need to vent and call me names, a private email to me or others can go a long way. > > What do we need to do to keep you around? ? Is there specifically something you didn't like about my recent check-ins? > > In this case, the features added were not terribly extensive. ? The current unit tests helped ferret out major problems. ?Yes, I could write more tests and documentation, and you have been a model of writing tests and documentation. ? I have been particularly impressed by the amount of quality documentation you have written. > > While you seem to dismiss the episode as problematic, I actually think curve_fit was a good example of how something very positive can emerge quickly when people are open and willing to work together. > > While formal, strict test-driven development is easy to point to for salvation -- it does have its costs. ? I've always used informal test-driven development. ? Just because I don't *always* add formal unit tests for every piece of code written does not mean the code that is currently in SciPy is un-tested and useless. ? Such an approach leaves me open to criticism, which I acknowledge. ?But, I think there have been far too many dismissive comments about the state of the code. > > I would argue that the problem with scipy.stats does not lie mainly in distributions.py or the lack of test-driven-development --- but in the lack of certain easy to use features. ? ?Quality code comes out of people who care --- not out of procedure. > > I think you are someone who cares and your code reflects that. ? ?We would all benefit from your staying part of the main development. (not answering inline to keep thoughts together) I think the main disagreements are about the quality control of the trunk and whether scipy development is a community effort or not. I think most of us write code in spurts as we find time and some idea bites us, and I have a written a lot of code. However, this is *not* trunk code, this is sandbox code. As Skipper described, in statsmodels almost all development occurs in the sandbox and in branches, and it is only included in the "official" core of statsmodels after it has been verified and tests have been added. sandbox code is everything from first draft version to almost finished code. And one of Skippers task in his gsoc is to clean out the sandbox. Once it is in trunk (core) any further refactoring follows very strict rules. *Every* new function or method needs test before going into trunk or right after. And I hope the test coverage of scipy goes towards that goal. This also applies to trivial functions, because they might be victims of some later refactoring. I have seen a lot of stranded non-functional code in scipy.stats, stats.models and in other parts of scipy. Review before or after commit I think (non-minor) changes, especially new functions, methods and classes need to be offered to the mailing list for comments, review before being committed. (Plus to make it feasible, we have an implied: "If nobody voices disagreement, then I will commit".) The git mirror has been working for a long time, and most development in scipy seems to follow this policy. curve_fit is a good example, Travis committed the changes, without mentioning it on the mailing list. I saw the commit, commented that the statistics of the new function is incorrect and we changed after several rounds until it was verified. I don't think it has any tests yet. Specific to stats: I want a reference for any function where the explanation cannot be found with a Wikipedia search with one of the terms in the docstring. One or a few weeks ago, scipy.stats gained a new function, my asking on the mailing list what it is supposed to be, didn't receive any reply. (besides the problem that the function had the same name as an existing function). Dumping new code into scipy trunk, without any review and tests, hoping that someone else looks for the problems is not an approach that I find acceptable. And personally, I refuse now being "dumped at". And I will *not* spend my time in the next three days writing missing tests and verifying code that has been committed to trunk this weekend. Asking me if I have commit rights, shows at least some disconnect from the development of scipy in the last three years, since I have been pretty (too) noisy about it on the mailing lists. Josef > > Sincere regards, > > -Travis > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From josef.pktd at gmail.com Tue Jun 1 04:22:08 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 1 Jun 2010 04:22:08 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: <9A04B7F1-D738-46F6-8E38-ABE06C3CC0FF@enthought.com> References: <9A04B7F1-D738-46F6-8E38-ABE06C3CC0FF@enthought.com> Message-ID: On Tue, Jun 1, 2010 at 4:09 AM, Travis Oliphant wrote: > > On May 31, 2010, at 9:16 AM, josef.pktd at gmail.com wrote: > >> >> This is more about the process then the content, distributions was >> Travis's baby (although unfinished), and most of his changes are very >> good, but I don't want to look for the 5-10% (?) typos anymore. > > I really am not sure what the difference between looking at timeline of changes and a formal "review" process really is? ?In either case you are "looking for someone's mistakes or problems". ? I do think your estimate of typos is a bit aggressive. ?Really? ?5-10% typos. ? ?What is the denominator? I just replied for most of this. My test run in the middle of the weekend (before I gave up), had about 4 or 5 test failures in the new _logpdf _logcdf methods. Third and forth moments (skew, kurtosis) might still return about 5% incorrect numbers, which I accept since it was written at a different time. Same with many generic methods in stats.distributions that I fixed two and a half years ago and which seems to never have worked from what I inferred from the history. denominator: functions/methods that return numbers 5-10% is just a guess, I never tried to measure it, maybe it's only 3%, but each one requires an afternoon to hunt down the reference and the correct formula. Josef > > -Travis > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From oliphant at enthought.com Tue Jun 1 04:25:31 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 1 Jun 2010 03:25:31 -0500 Subject: [SciPy-Dev] Recent changes to scipy stats In-Reply-To: References: <4D0A9D22-882F-4FCC-82D5-740D332CF7F9@enthought.com> Message-ID: > IMO, the problem - in general, not just w/ any one person - is not the particulars of what's been done, but the attitude, when it's exhibited by an individual, any individual, that the rules may be disregarded when that individual, any individual, unilaterally and spontaneously decides those rules are inconvenient. The rules are there for very good reasons; paraphrasing a recent set of statements by Robert K.: What is the rule that has been broken exactly? I'd really like to know what people are actually annoyed by and who exactly is annoyed? Perhaps my confidence with committing to trunk is what is fundamentally the issue. It's clear that some people prefer a different process and perhaps the move to a distributed version control will help things. I do feel a certain confidence with code that I have written and I like to get changes into trunk quickly. That has always been my style. I don't think I have changed in this regard. Perhaps it is seen as brazen or inconsiderate, but I don't see it that way. I actually think it very inconsiderate that I should be treated with such rudeness for contributing needed functionality. Sometimes rules become rules inappropriately. Why should one development process hold sway over another? Who is right? Well, clearly, it's just a matter of the people around and what they want to see. If the majority here want to see a different process, then that's where we will go. But, to really do it, we will need to move to a distributed version control process, I think --- or at least I will need to. I will try to work on that when I can find the motivation. -Travis From oliphant at enthought.com Tue Jun 1 04:32:56 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 1 Jun 2010 03:32:56 -0500 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <9A04B7F1-D738-46F6-8E38-ABE06C3CC0FF@enthought.com> Message-ID: <5C7630FC-D91E-4CD9-91D7-7A4CCEBEFD04@enthought.com> On Jun 1, 2010, at 3:22 AM, josef.pktd at gmail.com wrote: > On Tue, Jun 1, 2010 at 4:09 AM, Travis Oliphant wrote: >> >> On May 31, 2010, at 9:16 AM, josef.pktd at gmail.com wrote: >> >>> >>> This is more about the process then the content, distributions was >>> Travis's baby (although unfinished), and most of his changes are very >>> good, but I don't want to look for the 5-10% (?) typos anymore. >> >> I really am not sure what the difference between looking at timeline of changes and a formal "review" process really is? In either case you are "looking for someone's mistakes or problems". I do think your estimate of typos is a bit aggressive. Really? 5-10% typos. What is the denominator? > > I just replied for most of this. > > My test run in the middle of the weekend (before I gave up), had about > 4 or 5 test failures in the new _logpdf _logcdf methods. In this particular case, you can just look at the pdf method and compare it with the logpdf method. I only added ones that were obvious. Are you running a test different from >>> from scipy.stats import test >>> test() to get these errors? Are you saying the skew and kurtosis test functions return different numbers than expected? -Travis From cournape at gmail.com Tue Jun 1 04:35:38 2010 From: cournape at gmail.com (David Cournapeau) Date: Tue, 1 Jun 2010 17:35:38 +0900 Subject: [SciPy-Dev] scipy.stats In-Reply-To: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> Message-ID: On Tue, Jun 1, 2010 at 1:54 PM, Travis Oliphant wrote: > > I do like a more "free-wheeling" style to code development than one that is bogged down with "rules" and "procedures". ? ? This clearly is not your style. ? For me, it comes down to time to spend. ? I love working on SciPy and NumPy. ? ?I don't have a lot of time to do it. ? When I see quick changes I can make that add value I like to be able to do it. ? I think we both want the same thing while we may disagree about the best way to get there. > In my mind, discussion doesn't end when a check-in is made --- it just begins. ? You should never interpret my checking something in as the final word. ? We clearly have a different view of "trunk" I think the main issue is that you only see tests as a nuisance because it gives you less time to do the actual work. Testing, documenting indeed has a cost - but by not doing it, you are transferring this cost to someone else. IOW, the cost of your changes are the same with or without tests - it just ends up being someone else doing the work you don't do, work that you recognize yourself as not being the most interesting one. I think we all understand how valuable your contribution has been (and still is !) to numpy/scipy. But whether you like it or not, now that scipy/numpy are matured packages used by a lot of people, some "overhead" and process is unavoidable. cheers, David From oliphant at enthought.com Tue Jun 1 04:43:54 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 1 Jun 2010 03:43:54 -0500 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> Message-ID: On Jun 1, 2010, at 3:12 AM, josef.pktd at gmail.com wrote: > On Tue, Jun 1, 2010 at 12:54 AM, Travis Oliphant wrote: >> >> On May 31, 2010, at 9:16 AM, josef.pktd at gmail.com wrote: >> >>> Since Travis seems to want to take back control of scipy.stats, I am >>> considering my role as inofficial maintainer as ended. >> >> Obviously I've offended you. That has never been my intent. I apologize if my enthusiasm for getting some changes that I wanted to see into SciPy stepped on an area you felt ownership of. I do not mind if people add changes to code that I've written and I assume that others feel the same. That has always been the development mode of SciPy. We clearly have different development styles. I think we can find a way to work together. I think the move to github will help. >> >> I did not understand that you felt such ownership of scipy.stats. I have certainly appreciated your input. >> >> I do like a more "free-wheeling" style to code development than one that is bogged down with "rules" and "procedures". This clearly is not your style. For me, it comes down to time to spend. I love working on SciPy and NumPy. I don't have a lot of time to do it. When I see quick changes I can make that add value I like to be able to do it. I think we both want the same thing while we may disagree about the best way to get there. >> In my mind, discussion doesn't end when a check-in is made --- it just begins. You should never interpret my checking something in as the final word. We clearly have a different view of "trunk" >> >> I certainly don't want my approach to open source development to offend others or chase them away. If I check in something you don't like, then tell me and let's talk about it. If you need to vent and call me names, a private email to me or others can go a long way. >> >> What do we need to do to keep you around? Is there specifically something you didn't like about my recent check-ins? >> >> In this case, the features added were not terribly extensive. The current unit tests helped ferret out major problems. Yes, I could write more tests and documentation, and you have been a model of writing tests and documentation. I have been particularly impressed by the amount of quality documentation you have written. >> >> While you seem to dismiss the episode as problematic, I actually think curve_fit was a good example of how something very positive can emerge quickly when people are open and willing to work together. >> >> While formal, strict test-driven development is easy to point to for salvation -- it does have its costs. I've always used informal test-driven development. Just because I don't *always* add formal unit tests for every piece of code written does not mean the code that is currently in SciPy is un-tested and useless. Such an approach leaves me open to criticism, which I acknowledge. But, I think there have been far too many dismissive comments about the state of the code. >> >> I would argue that the problem with scipy.stats does not lie mainly in distributions.py or the lack of test-driven-development --- but in the lack of certain easy to use features. Quality code comes out of people who care --- not out of procedure. >> >> I think you are someone who cares and your code reflects that. We would all benefit from your staying part of the main development. > > (not answering inline to keep thoughts together) > > I think the main disagreements are about the quality control of the > trunk and whether scipy development is a community effort or not. I certainly think scipy development is a community effort. I'm very sorry for making you feel "dumped" on. That has never been my intent. I was simply hoping to contribute a little where I could. > As Skipper described, in statsmodels almost all development occurs in > the sandbox and in branches, and it is only included in the "official" > core of statsmodels after it has been verified and tests have been > added. sandbox code is everything from first draft version to almost > finished code. > And one of Skippers task in his gsoc is to clean out the sandbox. > Once it is in trunk (core) any further refactoring follows very strict rules. This has not been SciPy's process. I can understand people may want it to become SciPy's process, but it has not been. There are dangers of this process --- there is a reason that the mantra of "release early and release often". It can also prevent progress when you are dealing with people's spare time because all of that process takes time and man-power and effort. There is some value in it, I'm just not sure the extent of that value in contrast to other uses of that time. For example. I would love to see statsmodels get more use. I think there is much code there that is usable. Yet, it remains outside of SciPy. If we agree to change the SciPy process will you agree to put statsmodels into SciPy? > > Specific to stats: I want a reference for any function where the > explanation cannot be found with a Wikipedia search with one of the > terms in the docstring. One or a few weeks ago, scipy.stats gained a > new function, my asking on the mailing list what it is supposed to be, > didn't receive any reply. (besides the problem that the function had > the same name as an existing function). I did not see your message. I changed the name of the function and didn't know you were concerned about the addition. It is a convenience function for bayes_mvs that returns the distribution objects from which the other numbers can be obtained instead of just the numbers. The paper is already referenced in bayes_mvs. > Dumping new code into scipy trunk, without any review and tests, > hoping that someone else looks for the problems is not an approach > that I find acceptable. That was never my "hope". I planned to and have fixed all problems that I saw later and that others have pointed out. You can never test for all possible failures. > > Asking me if I have commit rights, shows at least some disconnect from > the development of scipy in the last three years, since I have been > pretty (too) noisy about it on the mailing lists. I know you have been noisy on the lists --- that's why I spoke to you about _logpdf and friends. It also appears that you don't commit that often. This is your process. But, it made me wonder if permissions were an issue. I was pretty sure you had been given commit rights, but I could not remember. I'm sorry if that offended you. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Tue Jun 1 04:54:20 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 1 Jun 2010 01:54:20 -0700 Subject: [SciPy-Dev] Recent changes to scipy stats In-Reply-To: References: <4D0A9D22-882F-4FCC-82D5-740D332CF7F9@enthought.com> Message-ID: On Tue, Jun 1, 2010 at 1:25 AM, Travis Oliphant wrote: > > > IMO, the problem - in general, not just w/ any one person - is not the > particulars of what's been done, but the attitude, when it's exhibited by an > individual, any individual, that the rules may be disregarded when that > individual, any individual, unilaterally and spontaneously decides those > rules are inconvenient. The rules are there for very good reasons; > paraphrasing a recent set of statements by Robert K.: > > What is the rule that has been broken exactly? Good point: I've been led to believe that there's a "rule" (aka "policy") against checking-in code that doesn't include passing unit tests and a Standard-compliant docstring, but I must concede that I can't say *where* (other than on the listserv) *any* rules are recorded, so if I am mistaken - if there is in fact no such rule/policy - I apologize, I guess we are all (or at least all those w/ commit privilege) free to commit as it suits our "style." > I'd really like to know what people are actually annoyed by and who exactly > is annoyed? > Great, I'd like to know who *isn't* annoyed. DG > > Perhaps my confidence with committing to trunk is what is fundamentally the > issue. It's clear that some people prefer a different process and perhaps > the move to a distributed version control will help things. > > I do feel a certain confidence with code that I have written and I like to > get changes into trunk quickly. That has always been my style. I don't > think I have changed in this regard. Perhaps it is seen as brazen or > inconsiderate, but I don't see it that way. I actually think it very > inconsiderate that I should be treated with such rudeness for contributing > needed functionality. > > Sometimes rules become rules inappropriately. Why should one development > process hold sway over another? Who is right? Well, clearly, it's just a > matter of the people around and what they want to see. If the majority > here want to see a different process, then that's where we will go. But, to > really do it, we will need to move to a distributed version control process, > I think --- or at least I will need to. I will try to work on that when I > can find the motivation. > > -Travis > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jun 1 05:50:46 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 1 Jun 2010 05:50:46 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: <5C7630FC-D91E-4CD9-91D7-7A4CCEBEFD04@enthought.com> References: <9A04B7F1-D738-46F6-8E38-ABE06C3CC0FF@enthought.com> <5C7630FC-D91E-4CD9-91D7-7A4CCEBEFD04@enthought.com> Message-ID: On Tue, Jun 1, 2010 at 4:32 AM, Travis Oliphant wrote: > > On Jun 1, 2010, at 3:22 AM, josef.pktd at gmail.com wrote: > >> On Tue, Jun 1, 2010 at 4:09 AM, Travis Oliphant wrote: >>> >>> On May 31, 2010, at 9:16 AM, josef.pktd at gmail.com wrote: >>> >>>> >>>> This is more about the process then the content, distributions was >>>> Travis's baby (although unfinished), and most of his changes are very >>>> good, but I don't want to look for the 5-10% (?) typos anymore. >>> >>> I really am not sure what the difference between looking at timeline of changes and a formal "review" process really is? ?In either case you are "looking for someone's mistakes or problems". ? I do think your estimate of typos is a bit aggressive. ?Really? ?5-10% typos. ? ?What is the denominator? >> >> I just replied for most of this. >> >> My test run in the middle of the weekend (before I gave up), had about >> 4 or 5 test failures in the new _logpdf _logcdf methods. > > In this particular case, you can just look at the pdf method and compare it with the logpdf method. ?I only added ones that were obvious. ?Are you running a test different from > >>>> from scipy.stats import test >>>> test() no, I was running a variation on the new tests for logpdf logcdf, that I have attached to the ticket > > to get these errors? > > Are you saying the skew and kurtosis test functions return different numbers than expected? no, the methods in the distributions for distfn.stats(moments="sk") or distfn.moment(3) or 4 I think, the f distribution is the only one where I went through the formulas to find the typo. I think skew and kurtosistests are ok, although I would have to look it up to be sure. Josef > > -Travis > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From josef.pktd at gmail.com Tue Jun 1 06:48:12 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 1 Jun 2010 06:48:12 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> Message-ID: On Tue, Jun 1, 2010 at 4:43 AM, Travis Oliphant wrote: > > On Jun 1, 2010, at 3:12 AM, josef.pktd at gmail.com wrote: > > On Tue, Jun 1, 2010 at 12:54 AM, Travis Oliphant > wrote: > > On May 31, 2010, at 9:16 AM, josef.pktd at gmail.com wrote: > > Since Travis seems to want to take back control of scipy.stats, I am > > considering my role as inofficial maintainer as ended. > > Obviously I've offended you. ? That has never been my intent. ? I apologize > if my enthusiasm for getting some changes that I wanted to see into SciPy > stepped on an area you felt ownership of. ? ? I do not mind if people add > changes to code that I've written and I assume that others feel the same. > That has always been the development mode of SciPy. ? We clearly have > different development styles. ? ?I think we can find a way to work together. > ? I think the move to github will help. > > I did not understand that you felt such ownership of scipy.stats. ?I have > certainly appreciated your input. > > I do like a more "free-wheeling" style to code development than one that is > bogged down with "rules" and "procedures". ? ? This clearly is not your > style. ? For me, it comes down to time to spend. ? I love working on SciPy > and NumPy. ? ?I don't have a lot of time to do it. ? When I see quick > changes I can make that add value I like to be able to do it. ? I think we > both want the same thing while we may disagree about the best way to get > there. > > In my mind, discussion doesn't end when a check-in is made --- it just > begins. ? You should never interpret my checking something in as the final > word. ? We clearly have a different view of "trunk" > > I certainly don't want my approach to open source development to offend > others or chase them away. ?If I check in something you don't like, then > tell me and let's talk about it. ? ?If you need to vent and call me names, a > private email to me or others can go a long way. > > What do we need to do to keep you around? ? Is there specifically something > you didn't like about my recent check-ins? > > In this case, the features added were not terribly extensive. ? The current > unit tests helped ferret out major problems. ?Yes, I could write more tests > and documentation, and you have been a model of writing tests and > documentation. ? I have been particularly impressed by the amount of quality > documentation you have written. > > While you seem to dismiss the episode as problematic, I actually think > curve_fit was a good example of how something very positive can emerge > quickly when people are open and willing to work together. > > While formal, strict test-driven development is easy to point to for > salvation -- it does have its costs. ? I've always used informal test-driven > development. ? Just because I don't *always* add formal unit tests for every > piece of code written does not mean the code that is currently in SciPy is > un-tested and useless. ? Such an approach leaves me open to criticism, which > I acknowledge. ?But, I think there have been far too many dismissive > comments about the state of the code. > > I would argue that the problem with scipy.stats does not lie mainly in > distributions.py or the lack of test-driven-development --- but in the lack > of certain easy to use features. ? ?Quality code comes out of people who > care --- not out of procedure. > > I think you are someone who cares and your code reflects that. ? ?We would > all benefit from your staying part of the main development. > > (not answering inline to keep thoughts together) > > I think the main disagreements are about the quality control of the > trunk and whether scipy development is a community effort or not. > > I certainly think scipy development is a community effort. ? I'm very sorry > for making you feel "dumped" on. ? That has never been my intent. ?I was > simply hoping to contribute a little where I could. I only feel "dumped" on, because I want tested and verified stage. I could leave it to somebody else in five years to clean it up. And I don't want to add lot's of notes in docstrings, "use at your own risk, this function hasn't been verified" as we sometimes do in our (statsmodels) sandbox. > > As Skipper described, in statsmodels almost all development occurs in > the sandbox and in branches, and it is only included in the "official" > core of statsmodels after it has been verified and tests have been > added. sandbox code is everything from first draft version to almost > finished code. > And one of Skippers task in his gsoc is to clean out the sandbox. > Once it is in trunk (core) any further refactoring follows very strict > rules. > > This has not been SciPy's process. ? I can understand people may want it to > become SciPy's process, but it has not been. ?There are dangers of this > process --- there is a reason that the mantra of "release early and release > often". ?It can also prevent progress when you are dealing with people's > spare time because all of that process takes time and man-power and effort. > ? There is some value in it, I'm just not sure the extent of that value in > contrast to other uses of that time. I think that's another discussion I have seen already several times. I think it's time that scipy moves to a "verified" only stage, instead of "this is a young project, still work in progress and use at your own risk" > For example. ?I would love to see statsmodels get more use. ? I think there > is much code there that is usable. ?Yet, it remains outside of SciPy. > If we agree to change the SciPy process will you agree to put statsmodels > into SciPy? I hope that statsmodels becomes too big for scipy, but I still would like to see core models to go into scipy. To quote myself from the pystatsmodels mailing list. "The way it looks like, I don't think statsmodels (as a whole) will go back into scipy, the count of python lines of code of statsmodels is already almost 20% of the one in scipy according to ohloh. Large parts of the code are still in the sandbox but with another gsoc and continued development we will have too much statistics coverage for statsmodels to be absorbed by scipy." There are now at least 3 very active scikits, image, learn and statsmodels, and I think the model of developing and maturing code in a scikit starts to work pretty well. For me it's easier to develop and mature inside a pure python package, which is also more accessible for new contributors. One of my wishful target audience are contributors on Windows, which would become rather difficult as part of scipy and git. > > Specific to stats: I want a reference for any function where the > explanation cannot be found with a Wikipedia search with one of the > terms in the docstring. One or a few weeks ago, scipy.stats gained a > new function, my asking on the mailing list what it is supposed to be, > didn't receive any reply. (besides the problem that the function had > the same name as an existing function). > > I did not see your message. ? I changed the name of the function and didn't > know you were concerned about the addition. ? It is a convenience function > for bayes_mvs that returns the distribution objects from which the other > numbers can be obtained instead of just the numbers. ? ? The paper is > already referenced in bayes_mvs. This explanation would have made a good comment in the notes section of the docstring, and I wouldn't have to try to remember and look up whether this might be some posterior distribution for a diffuse prior with normal likelihood. > > Dumping new code into scipy trunk, without any review and tests, > hoping that someone else looks for the problems is not an approach > that I find acceptable. > > That was never my "hope". ?I planned to and have fixed all problems that I > saw later and that others have pointed out. ? You can never test for all > possible failures. For many cases, I haven't seen you committed to do any maintenance on it. At least, there are many functions that never got a test added later on. You respond to bug reports, but that is after the fact, when someone already ran into it. What I think has to be required are basic tests. I'm not religious about testing for all possible failures. Edge cases, numerical precision problems, problems with initially not targeted use cases can and need still be handled after the code is in trunk. And as Skipper said, and I felt from the beginning about scipy, a package where you cannot rely (up to a high) degree on the correctness of the results is pretty unattractive for serious work. Nobody wants to retract a paper because there was a programming mistake somewhere. So, verification of the code for the main usecase(s) is the minimum requirement that Skipper and I agreed upon last summer for any statistics/econometrics in python development. > > > Asking me if I have commit rights, shows at least some disconnect from > the development of scipy in the last three years, since I have been > pretty (too) noisy about it on the mailing lists. > > I know you have been noisy on the lists --- that's why I spoke to you about > _logpdf and friends. ?It also appears that you don't commit that often. After, I had several crashes late last year and because I'm working now mostly on statsmodels, I haven't kept my scipy development setup up to date very often. I'm usually pretty fast in responding to open issues and Stefan and Ralph made commits to scipy.stats that I reviewed and were discussed on the mailing list or in a ticket. On the other hand, I'm not "pushing" my own code into scipy very fast, although I push it to the mailing list. Mainly, because I'm reluctant to commit my own code when I don't think it's perfect yet, and when the response on the mailing list doesn't look like there is an urgent demand for it. I only see feedback when the code gets questions later on on the mailing list or on stackoverflow. So this is maybe not the best approach. > This is your process. ? But, it made me wonder if permissions were an issue. > ? ?I was pretty sure you had been given commit rights, but I could not > remember. ? I'm sorry if that offended you. I might have overreacted initially, but I would have expected you to participate in the discussion or at least mention that you work on it, instead of announcing it at almost (?) the same time as making the commits. Josef > -Travis > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From vincent at vincentdavis.net Tue Jun 1 08:59:50 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Tue, 1 Jun 2010 06:59:50 -0600 Subject: [SciPy-Dev] Recent changes to scipy stats In-Reply-To: References: <4D0A9D22-882F-4FCC-82D5-740D332CF7F9@enthought.com> Message-ID: The diversity of perspective as to who has done what and more importantly if it was right or ok to do seems to imply that there is lack of clear roles/policies. If there was it seems that we could hope there would not be this diversity in perspectives. Someone with more experience than me should make the rules and policies. Moving to github would be great IMO. Have a great day Vincent On Tue, Jun 1, 2010 at 2:54 AM, David Goldsmith wrote: > On Tue, Jun 1, 2010 at 1:25 AM, Travis Oliphant > wrote: >> >> > IMO, the problem - in general, not just w/ any one person - is not the >> > particulars of what's been done, but the attitude, when it's exhibited by an >> > individual, any individual, that the rules may be disregarded when that >> > individual, any individual, unilaterally and spontaneously decides those >> > rules are inconvenient. ?The rules are there for very good reasons; >> > paraphrasing a recent set of statements by Robert K.: >> >> What is the rule that has been broken exactly? > > Good point: I've been led to believe that there's a "rule" (aka "policy") > against checking-in code that doesn't include passing unit tests and a > Standard-compliant docstring, but I must concede that I can't say *where* > (other than on the listserv) *any* rules are recorded, so if I am mistaken - > if there is in fact no such rule/policy - I apologize, I guess we are all > (or at least all those w/ commit privilege) free to commit as it suits our > "style." > >> >> I'd really like to know what people are actually annoyed by and who >> exactly is annoyed? > > Great, I'd like to know who *isn't* annoyed. > > DG >> >> Perhaps my confidence with committing to trunk is what is fundamentally >> the issue. ?It's clear that some people prefer a different process and >> perhaps the move to a distributed version control will help things. >> >> I do feel a certain confidence with code that I have written and I like to >> get changes into trunk quickly. ? That has always been my style. ? ?I don't >> think I have changed in this regard. ? Perhaps it is seen as brazen or >> inconsiderate, but I don't see it that way. ? ?I actually think it very >> inconsiderate that I should be treated with such rudeness for contributing >> needed functionality. >> >> Sometimes rules become rules inappropriately. ?Why should one development >> process hold sway over another? ?Who is right? ?Well, clearly, it's just a >> matter of the people around and what they want to see. ? If the majority >> here want to see a different process, then that's where we will go. ?But, to >> really do it, we will need to move to a distributed version control process, >> I think --- or at least I will need to. ?I will try to work on that when I >> can find the motivation. >> >> -Travis >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide. ?(As interpreted > by Robert Graves) > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From ralf.gommers at googlemail.com Tue Jun 1 09:19:58 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 1 Jun 2010 21:19:58 +0800 Subject: [SciPy-Dev] [SciPy-User] log pdf, cdf, etc In-Reply-To: <12883887-E601-467B-9C56-55BDA8169C19@enthought.com> References: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> <12883887-E601-467B-9C56-55BDA8169C19@enthought.com> Message-ID: On Tue, Jun 1, 2010 at 1:20 PM, Travis Oliphant wrote: > > On May 31, 2010, at 6:39 AM, Ralf Gommers wrote: > > These recent changes are a bit problematic for several reasons: > - there are many new methods for distributions without tests. > > These methods are simple to see and verify. Which methods specifically > are you concerned about? > They're not all simple, for example rv_continuous._reduce_func. Since it contains inner function definitions inside an "else" block there's also a good chance it's actually broken. And in principle I'm worried about all of them. The python 2.4/2.5 syntax error was caught early, but what if some code you regard as simple is broken in a less obvious way on 2.4/2.5? Maybe a user finds it in a release candidate, forcing us to build an extra one? Or just after the final release? > > - there are no docs for many new private and public methods > > > They are all fairly self explanatory. But, docs can be added if needed. > For you, and maybe for me too. But for undergraduate students, or Joe in accounting who inherited this random app that's essential for his job? It's simple, no public docs without docstrings. And preferably no private ones either. Thanks for fixing all public docs quickly though. You missed just one, gamma.fit. > > - invalid syntax: http://projects.scipy.org/scipy/ticket/1186 > > > This has been fixed (it was easier to fix the syntax then file the > ticket...) Also to be clear this is only invalid for Python < 2.6 (the > comment makes it sound like somehow the changes weren't tested at all). > > I didn't mean to imply that you were committing code that didn't even work for you. > - the old rv_continuous doc template was put back in > > > I'm not sure what you mean. Which change did this? > The first one of your recent commits, r6392. The docstrings for subclasses of rv_continuous and rv_discrete are not generated from this template anymore, which is why it was removed. Look at line 862 (# generate docstring for subclass instances) and below that to see how it works now. If you're wondering why that changed, the main reasons are (1) to make the docstrings conform to the standard, (2) to be able to put useful info in the base classes, like "this is how you subclass it: ..." instead of a template, and (3) to be able to customize individual distribution docstrings easily. > > This, plus Josef saying that he doesn't want to fix the API for some > methods yet, makes me want to take it out of the 0.8.x branch. Any > objections to that Travis or Josef? > > > I would really like to see these changes go in to 0.8.x. If Josef feels > strongly about the API in the future, we can change it for the next release. > I don't understand what the specific concerns are. > > No you can't. For API changes we do have a policy, they need deprecation first. Which means if we release it like this now, we're stuck with it till 0.10 / 1.0. In summary, I see quite a few reasons why this shouldn't go in and don't see a compelling reason to release it right now. The 0.9 release is (tentatively) planned for September, so you don't have to worry that your changes sit in trunk unreleased for 1.5 years. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Jun 1 09:23:25 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 1 Jun 2010 21:23:25 +0800 Subject: [SciPy-Dev] Scipy archive on PyPI In-Reply-To: <4C045DCD.1080503@silveregg.co.jp> References: <201005311655.53741.ce@vejnar.eu> <4C045DCD.1080503@silveregg.co.jp> Message-ID: On Tue, Jun 1, 2010 at 9:09 AM, David wrote: > On 06/01/2010 12:15 AM, Ralf Gommers wrote: > > > > > > On Mon, May 31, 2010 at 10:55 PM, Charles Vejnar > > wrote: > > > > Hi, > > > > I was trying to install Scipy with easy_install and it seems that > > downloading > > from Sourceforge is no longer possible (Sourceforge no longer gives > > a direct > > link to the .tar.gz file) which makes the install fail. > > > > Would it be possible to always upload the latest Scipy tarball to > PyPI ? > > > > It's possible, but because that encourages the use of easy_install/pip > > it would probably give more problems than that it helps. Just today > > there was a thread on numpy-discussion about pip failing and standard > > "python setup.py install" fixing the problem. easy_install is just as > > problematic as pip, if not more so. > > Unfortunately, people will always use those half broken tools. I think > we should at least put the tarballs - I also used to put a simple > executable (result of bdist_wininst) so that easy_install numpy works on > windows. > > OK, I'll do the same then. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Tue Jun 1 09:32:32 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Tue, 1 Jun 2010 07:32:32 -0600 Subject: [SciPy-Dev] Development process (was: scipy.stats) In-Reply-To: References: Message-ID: On Mon, May 31, 2010 at 11:02 PM, Travis Oliphant wrote: > How many people interested in this discussion will be at SciPy this year? ?It may be a good idea to have a discussion about this at the conference. ? ?We could phone conference others in as well so that every voice can be heard. I think it should be done here on the list. This makes it easier for all to review and refer back to. Also makes it more open, NOT that you are trying to do it behind closed doors. > I do think we need to address this issue. ? I did not realize I was offending people with my enthusiasm for having a chance to work on SciPy. ? I have always resisted too much "procedure" and "policy" so that it becomes difficult for people to contribute. ? ?I really think technology changes and DVCS can help with this process. I am all for DVCS. (I posted this on another thread but it is more appropriate here or maybe I just want to repeat myself :) ) The diversity of perspective as to who has done what and more importantly if it was right or ok to do seems to imply that there is lack of clear roles/policies. If there was it seems that we could hope there would not be this diversity in perspectives. > > -Travis > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From charlesr.harris at gmail.com Tue Jun 1 09:50:30 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 Jun 2010 07:50:30 -0600 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> Message-ID: On Tue, Jun 1, 2010 at 2:43 AM, Travis Oliphant wrote: > > On Jun 1, 2010, at 3:12 AM, josef.pktd at gmail.com wrote: > > On Tue, Jun 1, 2010 at 12:54 AM, Travis Oliphant > wrote: > > > On May 31, 2010, at 9:16 AM, josef.pktd at gmail.com wrote: > > > Since Travis seems to want to take back control of scipy.stats, I am > > considering my role as inofficial maintainer as ended. > > > Obviously I've offended you. That has never been my intent. I apologize > if my enthusiasm for getting some changes that I wanted to see into SciPy > stepped on an area you felt ownership of. I do not mind if people add > changes to code that I've written and I assume that others feel the same. > That has always been the development mode of SciPy. We clearly have > different development styles. I think we can find a way to work together. > I think the move to github will help. > > > I did not understand that you felt such ownership of scipy.stats. I have > certainly appreciated your input. > > > I do like a more "free-wheeling" style to code development than one that is > bogged down with "rules" and "procedures". This clearly is not your > style. For me, it comes down to time to spend. I love working on SciPy > and NumPy. I don't have a lot of time to do it. When I see quick > changes I can make that add value I like to be able to do it. I think we > both want the same thing while we may disagree about the best way to get > there. > > In my mind, discussion doesn't end when a check-in is made --- it just > begins. You should never interpret my checking something in as the final > word. We clearly have a different view of "trunk" > > > I certainly don't want my approach to open source development to offend > others or chase them away. If I check in something you don't like, then > tell me and let's talk about it. If you need to vent and call me names, a > private email to me or others can go a long way. > > > What do we need to do to keep you around? Is there specifically something > you didn't like about my recent check-ins? > > > In this case, the features added were not terribly extensive. The current > unit tests helped ferret out major problems. Yes, I could write more tests > and documentation, and you have been a model of writing tests and > documentation. I have been particularly impressed by the amount of quality > documentation you have written. > > > While you seem to dismiss the episode as problematic, I actually think > curve_fit was a good example of how something very positive can emerge > quickly when people are open and willing to work together. > > > While formal, strict test-driven development is easy to point to for > salvation -- it does have its costs. I've always used informal test-driven > development. Just because I don't *always* add formal unit tests for every > piece of code written does not mean the code that is currently in SciPy is > un-tested and useless. Such an approach leaves me open to criticism, which > I acknowledge. But, I think there have been far too many dismissive > comments about the state of the code. > > > I would argue that the problem with scipy.stats does not lie mainly in > distributions.py or the lack of test-driven-development --- but in the lack > of certain easy to use features. Quality code comes out of people who > care --- not out of procedure. > > > I think you are someone who cares and your code reflects that. We would > all benefit from your staying part of the main development. > > > (not answering inline to keep thoughts together) > > I think the main disagreements are about the quality control of the > trunk and whether scipy development is a community effort or not. > > > I certainly think scipy development is a community effort. I'm very sorry > for making you feel "dumped" on. That has never been my intent. I was > simply hoping to contribute a little where I could. > > As Skipper described, in statsmodels almost all development occurs in > the sandbox and in branches, and it is only included in the "official" > core of statsmodels after it has been verified and tests have been > added. sandbox code is everything from first draft version to almost > finished code. > And one of Skippers task in his gsoc is to clean out the sandbox. > Once it is in trunk (core) any further refactoring follows very strict > rules. > > > This has not been SciPy's process. I can understand people may want it to > become SciPy's process, but it has not been. There are dangers of this > process --- there is a reason that the mantra of "release early and release > often". It can also prevent progress when you are dealing with people's > spare time because all of that process takes time and man-power and effort. > There is some value in it, I'm just not sure the extent of that value in > contrast to other uses of that time. > > Numpy/Scipy has changed from the days when there were just a few folks involved and the urgent need was to get some code, any code, out there. I'm sure many projects start that way because in beginning the idea is the important thing, the perfection of the implementation not so much. But as things progress and more people use the code, correctness becomes important. The numpy/numeric C code itself shows this process, with the early code quality being what I would classify as "undergraduate" C. That doesn't mean Numeric wasn't useful, obviously many people found it so or we wouldn't be here, but it does mean that the code wasn't easy to maintain or understand. Now the basic ideas have been worked out and the originators have moved on while at the same time the code has become more widely used, so the need becomes maintenance, correctness, distribution, and attracting the people to do those things. That requires a different sort of process. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.czesla at hs.uni-hamburg.de Tue Jun 1 13:05:19 2010 From: stefan.czesla at hs.uni-hamburg.de (Stefan) Date: Tue, 1 Jun 2010 17:05:19 +0000 (UTC) Subject: [SciPy-Dev] =?utf-8?q?np=2Esavetxt=3A_apply_patch_in_enhancement_?= =?utf-8?q?ticket_1079_to=09add_headers=3F?= References: Message-ID: Skipper Seabold gmail.com> writes: > > Hi all, > > I am assuming that this is ok to request via the list... Could we > discuss or could someone apply the patch in enhancement ticket 1079? > > http://projects.scipy.org/numpy/ticket/1079 > > I needed this functionality recently, and this is a quick and easy fix > that may have been overlooked. > > There is also another enhancement request about this here: > http://projects.scipy.org/numpy/ticket/1236 > > The only thing that I can think of that might need to be added is a > test to see that the header length is the same as the number of > columns, but really that might just be up to the user to supply the > right headers. It might also be nice to have a header = True, that > uses the field names for a structured array, but I can live without > that. > > Cheers, > > Skipper > Hi, +1; we have the same problem quite frequently. Our current solution looks similar to what has been proposed in ticket 1079, and we wonder why a solution has not yet found its way into the official release of numpy. We can, however, image a slightly different implementation and would like to hear the community's opinion on it. If the header is given as a plane string (such as envisaged in ticket 1079), the user has to care for the correct formatting, in particular, the user has to supply the comment character(s) and the new line formatting. This might be against intuition, because many users will at first try to supply their header(s) without specifying those formatting characters. The result will be a file not readable with numpy.loadtxt, and the error might not be detected right away. As numpy.loadtxt has a default comment character ('#'), the same may be implemented for numpy.savetxt. In this case, numpy.savetxt would get two additional keywords (e.g. header, comment(character)), which bloats the interface, but potentially provides more safety. Cheers, Stefan & Christian From jsseabold at gmail.com Tue Jun 1 13:48:28 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 1 Jun 2010 13:48:28 -0400 Subject: [SciPy-Dev] np.savetxt: apply patch in enhancement ticket 1079 to add headers? In-Reply-To: References: Message-ID: On Tue, Jun 1, 2010 at 1:05 PM, Stefan wrote: > Skipper Seabold gmail.com> writes: > >> >> Hi all, >> >> I am assuming that this is ok to request via the list... ?Could we >> discuss or could someone apply the patch in enhancement ticket 1079? >> >> http://projects.scipy.org/numpy/ticket/1079 >> >> I needed this functionality recently, and this is a quick and easy fix >> that may have been overlooked. >> >> There is also another enhancement request about this here: >> http://projects.scipy.org/numpy/ticket/1236 >> >> The only thing that I can think of that might need to be added is a >> test to see that the header length is the same as the number of >> columns, but really that might just be up to the user to supply the >> right headers. ?It might also be nice to have a header = True, that >> uses the field names for a structured array, but I can live without >> that. >> >> Cheers, >> >> Skipper >> > > Hi, And here I was thinking no one was listening so long ago. > > +1; we have the same problem quite frequently. Our current solution looks > similar to what has been proposed in ticket 1079, and we wonder why a solution > has not yet found its way into the official release of numpy. > > We can, however, image a slightly different implementation and would like to > hear the community's opinion on it. > > If the header is given as a plane string (such as envisaged in ticket 1079), the > user has to care for the correct formatting, in particular, the user has to > supply the comment character(s) and the new line formatting. This might be > against intuition, because many users will at first try to supply their > header(s) without specifying those formatting characters. The result will be a > file not readable with numpy.loadtxt, and the error might not be detected right > away. I'm not sure I understand why I would want to specify a comment character for writing a csv file (unless of course I had some comments to add). Also note that since that patch was written, savetxt takes a user supplied newline keyword, so you can just append that to the header string. > > As numpy.loadtxt has a default comment character ('#'), the same may be > implemented for numpy.savetxt. In this case, numpy.savetxt would get two > additional keywords (e.g. header, comment(character)), which bloats the > interface, but potentially provides more safety. > FWIW, I ended up rolling my own using the most recent pre-Python 3 changes for savetxt that accepts a list of names instead of one string or if the provided array has the attribute dtype.names (non-nested rec or structured arrays) it uses those. Whatever is done I think the support for structured arrays is nice, and I think having this functionality is a no-brainer. I need it quite often. Skipper From matthew.brett at gmail.com Tue Jun 1 13:55:08 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 1 Jun 2010 10:55:08 -0700 Subject: [SciPy-Dev] Development process (was: scipy.stats) In-Reply-To: References: Message-ID: Hi, On Tue, Jun 1, 2010 at 6:32 AM, Vincent Davis wrote: > On Mon, May 31, 2010 at 11:02 PM, Travis Oliphant > wrote: > >> How many people interested in this discussion will be at SciPy this year? ?It may be a good idea to have a discussion about this at the conference. ? ?We could phone conference others in as well so that every voice can be heard. > > I think it should be done here on the list. This makes it easier for > all to review and refer back to. Also makes it more open, NOT that you > are trying to do it behind closed doors. I agree very much that discussion on the list is better. I think it helps solidify the idea of numpy and scipy being a community project, where all discussion is public and open. I know that can be a little tough sometimes, but that too has its benefits in clearing the air and making people feel that the discussion is open. See you, Matthew From matthew.brett at gmail.com Tue Jun 1 14:57:10 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 1 Jun 2010 11:57:10 -0700 Subject: [SciPy-Dev] scipy.stats In-Reply-To: <3430B5AD-E3C2-4CE7-B07F-D8210C2E53D5@enthought.com> References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> <3430B5AD-E3C2-4CE7-B07F-D8210C2E53D5@enthought.com> Message-ID: Hi, >> Well - but that is because you don't maintenance. ? Imagine a >> maintainer puts in a lot of effort to make the code well-documented >> and tested. ?Then, you have put in new code that has neither >> documentation nor tests. ? ?As a good maintainer, it's really painful >> for them that there's new code without documentation or tests. ? They >> can only feel abused in that situation, because it seems as if you are >> expecting them to clean up after you - without asking. > > I don't think that is fair. ?I have been "maintaining" SciPy and NumPy code for over 10 years. ? I have done an immense amount of work in porting SciPy to NumPy and continuing to fix bugs that I am made aware of. ?I don't have as much time to commit to SciPy as I would like. I wasn't really saying whether it was fair or not, I was only trying to explain why it might cause offense. When I say that you don't do maintenance, I mean that you are not currently the person who has to make sure that the code is readable and maintainable. That is hard and often thankless work. I presume that you agree that numpy and scipy code should have documentation and tests. I presume also that when you commit code without documentation or tests, that you do not usually intend to come back and do these later - say - before the next release. That means that someone else has to do it. It will take them a lot longer than it would take you because they don't know the code as well. I realize this is not your intent, but, it's tempting in this situation to feel that you think that your time is more valuable than the person who has to write the documentation and tests - and that's a painful feeling to have - hence - I believe - the level of bad feeling that arises... See you, Matthew From d.l.goldsmith at gmail.com Tue Jun 1 14:59:48 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 1 Jun 2010 11:59:48 -0700 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> Message-ID: On Tue, Jun 1, 2010 at 6:50 AM, Charles R Harris wrote: > Numpy/Scipy has changed from the days when there were just a few folks > involved and the urgent need was to get some code, any code, out there. I'm > sure many projects start that way because in beginning the idea is the > important thing, the perfection of the implementation not so much. But as > things progress and more people use the code, correctness becomes important. > The numpy/numeric C code itself shows this process, with the early code > quality being what I would classify as "undergraduate" C. That doesn't mean > Numeric wasn't useful, obviously many people found it so or we wouldn't be > here, but it does mean that the code wasn't easy to maintain or understand. > Now the basic ideas have been worked out and the originators have moved on > while at the same time the code has become more widely used, so the need > becomes maintenance, correctness, distribution, and attracting the people to > do those things. That requires a different sort of process. > And reliability, i.e., it is not enough to claim that the code is correct, people (scientists whose reputations are at stake) need to be able to *rely* on it being correct. Perhaps it hasn't been long enough, but I note two things at this point: 0) No one has disputed that we have (and have had for some amount of time, i.e., it didn't go into affect yesterday) a standing policy that new code submissions are supposed to have passing tests and a Standard-compliant docstring *before* being checked-in, and 1) No one has indicated a specific place where this (or any other standing policy) may be found for reference. So, I propose that we establish such a place - even if we don't presently populate it with *anything* - so that, if we wish to discuss, e.g., whether or not rules may be subject to individuals' "style," we can at least all know exactly what rules we're discussing. DG > > > > Chuck > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jun 1 15:24:22 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 1 Jun 2010 15:24:22 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> <3430B5AD-E3C2-4CE7-B07F-D8210C2E53D5@enthought.com> Message-ID: On Tue, Jun 1, 2010 at 2:57 PM, Matthew Brett wrote: > Hi, > >>> Well - but that is because you don't maintenance. ? Imagine a >>> maintainer puts in a lot of effort to make the code well-documented >>> and tested. ?Then, you have put in new code that has neither >>> documentation nor tests. ? ?As a good maintainer, it's really painful >>> for them that there's new code without documentation or tests. ? They >>> can only feel abused in that situation, because it seems as if you are >>> expecting them to clean up after you - without asking. >> >> I don't think that is fair. ?I have been "maintaining" SciPy and NumPy code for over 10 years. ? I have done an immense amount of work in porting SciPy to NumPy and continuing to fix bugs that I am made aware of. ?I don't have as much time to commit to SciPy as I would like. > > I wasn't really saying whether it was fair or not, I was only trying > to explain why it might cause offense. > > When I say that you don't do maintenance, I mean that you are not > currently the person who has to make sure that the code is readable > and maintainable. ? That is hard and often thankless work. > > I presume that you agree that numpy and scipy code should have > documentation and tests. ? ?I presume also that when you commit code > without documentation or tests, that you do not usually intend to come > back and do these later - say - before the next release. ? That means > that someone else has to do it. ?It will take them a lot longer than > it would take you because they don't know the code as well. > > I realize this is not your intent, but, it's tempting in this > situation to feel that you think that your time is more valuable than > the person who has to write the documentation and tests - and that's a > painful feeling to have - hence - I believe - the level of bad feeling > that arises... Just to emphasis my point I'm mainly concerned about quality control of trunk. Open source development is still a collaborative process, and the person to write the code and the final tests doesn't necessarily have to be the same person. For example, Skipper is doing a lot more than his "fair" share of writing formal tests in statsmodels, and I'm writing a good amount of test code for scipy. (Skipper and I usually provide sufficient documentation, developer comments, and references that we are in most cases able to understand the code.) But if the original coder doesn't have the time to bring the code up to testing and documentation standard, then the code should stay out of trunk until someone finds the time to get it through a review and quality control process. The problem is that, that someone might not be able to figure out how to fix possible problems (my example is Fisher's exact test). Josef > > See you, > > Matthew > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From matthew.brett at gmail.com Tue Jun 1 15:24:41 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 1 Jun 2010 12:24:41 -0700 Subject: [SciPy-Dev] git on windows (was: scipy.stats) Message-ID: Hi, I thought I'd split this one off onto its own thread too because it's an important issue independent of scipy.stats > For me it's easier to develop and mature inside a pure python package, > which is also more accessible for new contributors. One of my wishful > target audience are contributors on Windows, which would become rather > difficult as part of scipy and git. I would imagine that everyone agrees that it's very important that we have developers like you (Josef) who are using windows as their main platform. It's by far the best way to make sure we are shaking out bugs on windows. So, given that there seems a strong mood to switch to git, we should make sure that this does not cause problems for windows developers. So: Josef - and others a) are there any problems that you know of using git from the windows shell? b) Do you think you would prefer to use mercurial as a client for the git repo : http://github.com/blog/439-hg-git-mercurial-plugin ? In that case we should set up documentation for that. c) Do you want to stick with bzr? That might be possible (https://launchpad.net/bzr-git, http://github.com/matthew-brett/git-bzr) but that will likely be considerably harder than a mercurial client. See you, Matthew From stefan at sun.ac.za Tue Jun 1 15:25:36 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 1 Jun 2010 12:25:36 -0700 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> Message-ID: On 1 June 2010 11:59, David Goldsmith wrote: > Perhaps it hasn't been long enough, but I note two things at this point: > > 0) No one has disputed that we have (and have had for some amount of time, > i.e., it didn't go into affect yesterday) a standing policy that new code > submissions are supposed to have passing tests and a Standard-compliant > docstring *before* being checked-in, and We have had many discussions around unit testing and code review, but the fact is that there is no such policy. Whether that should change or not is another question. > 1) No one has indicated a specific place where this (or any other standing > policy) may be found for reference. Developer guidelines may be found here: http://projects.scipy.org/scipy http://projects.scipy.org/numpy Ideally, all the guidelines should be checked in to the repo under numpy/docs (the documentation guidelines, for example, already are). Regards St?fan From d.l.goldsmith at gmail.com Tue Jun 1 15:31:54 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 1 Jun 2010 12:31:54 -0700 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: On Tue, Jun 1, 2010 at 12:24 PM, Matthew Brett wrote: > Hi, > > I thought I'd split this one off onto its own thread too because it's > an important issue independent of scipy.stats > > > For me it's easier to develop and mature inside a pure python package, > > which is also more accessible for new contributors. One of my wishful > > target audience are contributors on Windows, which would become rather > > difficult as part of scipy and git. > > I would imagine that everyone agrees that it's very important that we > have developers like you (Josef) who are using windows as their main > platform. It's by far the best way to make sure we are shaking out > bugs on windows. > > So, given that there seems a strong mood to switch to git, we should > make sure that this does not cause problems for windows developers. > > So: Josef - and others > > a) are there any problems that you know of using git from the windows > shell? > None in principle here (and from what I've garnered through the discussion, I am supportive of the move, as long as we don't deprecate the SVN trunk too quickly), but do we have anyone, even just one person, who is already reasonably facile in this regard who'd be willing to support others through the transition? DG > b) Do you think you would prefer to use mercurial as a client for the > git repo : http://github.com/blog/439-hg-git-mercurial-plugin ? In > that case we should set up documentation for that. > c) Do you want to stick with bzr? That might be possible > (https://launchpad.net/bzr-git, > http://github.com/matthew-brett/git-bzr) but that will likely be > considerably harder than a mercurial client. > > See you, > > Matthew > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Tue Jun 1 15:31:55 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 1 Jun 2010 12:31:55 -0700 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> Message-ID: 2010/6/1 St?fan van der Walt : >> 0) No one has disputed that we have (and have had for some amount of time, >> i.e., it didn't go into affect yesterday) a standing policy that new code >> submissions are supposed to have passing tests and a Standard-compliant >> docstring *before* being checked-in, and > > We have had many discussions around unit testing and code review, but > the fact is that there is no such policy. ?Whether that should change > or not is another question. Looks like I read your message too hastily. I meant to comment on a policy surrounding addition of tests and code review. By the way, you'll notice that we have *guidelines*, not policy. I think that this is an important indicator of the way that SciPy development takes place (we agree by consensus and help each other out, rather than enforcing restrictions). If our guidelines may be modified to benefit one another so that we may all enjoy working on SciPy, that would be a good thing. Regards St?fan From matthew.brett at gmail.com Tue Jun 1 15:44:00 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 1 Jun 2010 12:44:00 -0700 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: Hi, >> a) are there any problems that you know of using git from the windows >> shell? > > None in principle here (and from what I've garnered through the discussion, > I am supportive of the move, as long as we don't deprecate the SVN trunk too > quickly), but do we have anyone, even just one person, who is already > reasonably facile in this regard who'd be willing to support others through > the transition? I would not claim to be very experienced, but I have not had any problems using msysgit with either the windows shell or the (rather good) windows power shell. The bash shell does have problems but the windows shells have proved more useful. I'd certainly be willing to help as far as I can - but I think the next step is to find what problems people are having (or expect to have) and go from there. See you, Matthew From d.l.goldsmith at gmail.com Tue Jun 1 15:47:59 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 1 Jun 2010 12:47:59 -0700 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> Message-ID: 2010/6/1 St?fan van der Walt > 2010/6/1 St?fan van der Walt : > >> 0) No one has disputed that we have (and have had for some amount of > time, > >> i.e., it didn't go into affect yesterday) a standing policy that new > code > >> submissions are supposed to have passing tests and a Standard-compliant > >> docstring *before* being checked-in, and > > > > We have had many discussions around unit testing and code review, but > > the fact is that there is no such policy. Whether that should change > > or not is another question. > > Looks like I read your message too hastily. I meant to comment on a > policy surrounding addition of tests and code review. > > By the way, you'll notice that we have *guidelines*, not policy. I > think that this is an important indicator of the way that SciPy > development takes place (we agree by consensus and help each other > out, rather than enforcing restrictions). > > If our guidelines may be modified to benefit one another so that we > may all enjoy working on SciPy, that would be a good thing. > > Regards > St?fan > Thanks, St?fan. So, we don't have a policy (or even a guideline that I could see) addressing minimum requirements code must meet before check-in - my apologies to all, and especially Travis. DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Tue Jun 1 15:53:27 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 1 Jun 2010 12:53:27 -0700 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: On Tue, Jun 1, 2010 at 12:44 PM, Matthew Brett wrote: > Hi, > >> a) are there any problems that you know of using git from the windows > >> shell? > > > > None in principle here (and from what I've garnered through the > discussion, > > I am supportive of the move, as long as we don't deprecate the SVN trunk > too > > quickly), but do we have anyone, even just one person, who is already > > reasonably facile in this regard who'd be willing to support others > through > > the transition? > > I would not claim to be very experienced, but I have not had any > problems using msysgit with either the windows shell or the (rather > good) windows power shell. The bash shell does have problems but > the windows shells have proved more useful. > > I'd certainly be willing to help as far as I can - but I think the > next step is to find what problems people are having (or expect to > have) and go from there. > Of course, but it's comforting to know there's someone in the community whom we might hope will know the answers to questions that arise. :-) FFR, what are your platform specifics (e.g., I'm running Win7 Home Prem. 64bit), in case it turns out to matter. DG > > See you, > > Matthew > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jun 1 16:31:28 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 1 Jun 2010 16:31:28 -0400 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: On Tue, Jun 1, 2010 at 3:44 PM, Matthew Brett wrote: > Hi, > >>> a) are there any problems that you know of using git from the windows >>> shell? >> >> None in principle here (and from what I've garnered through the discussion, >> I am supportive of the move, as long as we don't deprecate the SVN trunk too >> quickly), but do we have anyone, even just one person, who is already >> reasonably facile in this regard who'd be willing to support others through >> the transition? > > I would not claim to be very experienced, but I have not had any > problems using msysgit with either the windows shell or the (rather > good) windows power shell. ? ?The bash shell does have problems but > the windows shells have proved more useful. It depends a lot on the part that I am working on. I wouldn't want to switch statsmodels where I do my main development to git. For scipy.stats (or bugfixes in other parts of scipy) I will give git a try, or look at the mercurial interface, if git doesn't work out for me. My main problem with git was the treatment of the file system, and I find it much easier to work with separate branches as in bzr or mercurial. For scipy, I never had to maintain a longer lived branch where I needed to worry about synchronizing with a changing trunk. I prepare most changes in scipy on standalone files, because they have a much faster development and test cycle, and merging them back into the scipy source is usually easy. (caveat: large/invasive changes like Ralf's docstring improvements are a lot more difficult to handle this way, but he was finally able to commit them himself.) And since I never (except for two c code bugfixes in numpy random) worked on compiled code, I didn't need a full develop-compile-test cycle. So, any version control system is fine with me, and maybe I can get used to the advantages of git. As long as it is possible to stick with the basic workflow of git without anything fancy, similar what I have seen while skimming the nipy docs, I think it is not a problem on windows. The basic commands and for example eclipse, GUI plugins look similar enough. However, if/when parts of statsmodels go into scipy and I have to do maintenance of less isolated code, then I think the Mercurial interface might be my preferred choice. I haven't used Mercurial much yet, but I don't see any problems with it. So, the bottom line is, that documentation for the hg-git interface would be very useful for Windows users (or those that think git is a strange/unfamiliar concept.) Josef > > I'd certainly be willing to help as far as I can - but I think the > next step is to find what problems people are having (or expect to > have) and go from there. > > See you, > > Matthew > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From d.l.goldsmith at gmail.com Tue Jun 1 16:32:20 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 1 Jun 2010 13:32:20 -0700 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? Message-ID: The docstring Standard seems to be careful to note which sections are considered optional, and the "Extended Summary" is *not* on that list. However, I'm encountering many SciPy docstrings in the Wiki lacking this section and yet marked as "Needs review": should I ignore this deficiency and add a ticket to clarify the Standard, or should such docstrings be moved back to "Being written"? DG -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Tue Jun 1 16:40:12 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 1 Jun 2010 13:40:12 -0700 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: On 1 June 2010 13:32, David Goldsmith wrote: > The docstring Standard seems to be careful to note which sections are > considered optional, and the "Extended Summary" is *not* on that list. > However, I'm encountering many SciPy docstrings in the Wiki lacking this > section and yet marked as "Needs review": should I ignore this deficiency > and add a ticket to clarify the Standard, or should such docstrings be moved > back to "Being written"? Typically, there is no reason not to have an extended section. Can you give an example where it would seem unnecessary? Unless those functions mentioned above are exceptional, we should probably add blurbs for them. Regards St?fan From josef.pktd at gmail.com Tue Jun 1 16:45:57 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 1 Jun 2010 16:45:57 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> Message-ID: 2010/6/1 St?fan van der Walt : > 2010/6/1 St?fan van der Walt : >>> 0) No one has disputed that we have (and have had for some amount of time, >>> i.e., it didn't go into affect yesterday) a standing policy that new code >>> submissions are supposed to have passing tests and a Standard-compliant >>> docstring *before* being checked-in, and >> >> We have had many discussions around unit testing and code review, but >> the fact is that there is no such policy. ?Whether that should change >> or not is another question. > > Looks like I read your message too hastily. ?I meant to comment on a > policy surrounding addition of tests and code review. > > By the way, you'll notice that we have *guidelines*, not policy. ?I > think that this is an important indicator of the way that SciPy > development takes place (we agree by consensus and help each other > out, rather than enforcing restrictions). > > If our guidelines may be modified to benefit one another so that we > may all enjoy working on SciPy, that would be a good thing. I don't know or remember whether the guidelines have ever been decided upon, but my impression was that offering larger changes for review has become the established, de facto rule. Maybe it's time to spell out the conclusions explicitly, so we don't have to repeat the same discussion every one to one and a half years. http://mail.scipy.org/pipermail/scipy-dev/2009-February/011241.html That thread is too long to see whether there was any conclusion. Josef > > Regards > St?fan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From stefan at sun.ac.za Tue Jun 1 16:45:47 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 1 Jun 2010 13:45:47 -0700 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: On 1 June 2010 13:31, wrote: > So, the bottom line is, that documentation for the hg-git interface > would be very useful for Windows users (or those that think git is a > strange/unfamiliar concept.) These interfaces are somewhat dangerous, in the sense that you may encounter rather untypical scenarios and strange bugs in those tools (for example, we even have to be careful with git-svn, and that tool is widely used). Do you think a clear, simple, numpy/scipy-oriented tutorial could sufficiently lower the barrier to adoption? I think the bzr work-flow you are used to is probably very similar to the one you'd follow with git. Regards St?fan From d.l.goldsmith at gmail.com Tue Jun 1 16:48:08 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 1 Jun 2010 13:48:08 -0700 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: 2010/6/1 St?fan van der Walt > On 1 June 2010 13:32, David Goldsmith wrote: > > The docstring Standard seems to be careful to note which sections are > > considered optional, and the "Extended Summary" is *not* on that list. > > However, I'm encountering many SciPy docstrings in the Wiki lacking this > > section and yet marked as "Needs review": should I ignore this deficiency > > and add a ticket to clarify the Standard, or should such docstrings be > moved > > back to "Being written"? > > Typically, there is no reason not to have an extended section. Can > you give an example where it would seem unnecessary? No: my position would appear to be the same as yours, and my inclination would be to "revert" them to "Being written." I'm basically inviting people to tell me that that would be too strict. :-) So far, it's +1 that it wouldn't. DG > Unless those > functions mentioned above are exceptional, we should probably add > blurbs for them. > > Regards > St?fan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Jun 1 16:51:51 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 Jun 2010 14:51:51 -0600 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: 2010/6/1 St?fan van der Walt > On 1 June 2010 13:31, wrote: > > So, the bottom line is, that documentation for the hg-git interface > > would be very useful for Windows users (or those that think git is a > > strange/unfamiliar concept.) > > These interfaces are somewhat dangerous, in the sense that you may > encounter rather untypical scenarios and strange bugs in those tools > (for example, we even have to be careful with git-svn, and that tool > is widely used). > > Do you think a clear, simple, numpy/scipy-oriented tutorial could > sufficiently lower the barrier to adoption? I think the bzr work-flow > you are used to is probably very similar to the one you'd follow with > git. > > I looked at the trial version of smartgit a while back and it seemed decent to me as a git interface. I didn't actually use it, though. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Tue Jun 1 16:55:37 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Tue, 1 Jun 2010 14:55:37 -0600 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> <3430B5AD-E3C2-4CE7-B07F-D8210C2E53D5@enthought.com> Message-ID: On Tue, Jun 1, 2010 at 1:24 PM, wrote: > Open source development is still a collaborative process, and the > person to write the code and the final tests doesn't necessarily have > to be the same person. For example, Skipper is doing a lot more than > his "fair" share of writing formal tests in statsmodels, and I'm > writing a good amount of test code for scipy. > (Skipper and I usually provide sufficient documentation, developer > comments, and references that we are in most cases able to understand > the code.) > > But if the original coder doesn't have the time to bring the code up > to testing and documentation standard, then the code should stay out > of trunk until someone finds the time to get it through a review and > quality control process. The problem is that, that someone might not > be able to figure out how to fix possible problems (my example is > Fisher's exact test). I think this is important and as someone that is fairly new to python and open source development I greatly value they input of others in reviewing and making suggestions to my contributions. In addition, I might not know how to finish all parts of a contribution and knowing that I can contribute pieces that are within my abilities and others can review and possibly finish encourage me to contribute. Vincent > > Josef > > > > > > > >> >> See you, >> >> Matthew >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From fperez.net at gmail.com Tue Jun 1 18:11:27 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 1 Jun 2010 15:11:27 -0700 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: Hi Josef, On Tue, Jun 1, 2010 at 1:31 PM, wrote: > My main problem with git was the treatment of the file system, and I > find it much easier to work with separate branches as in bzr or > mercurial. One thing I've found very useful after transitioning to git for IPython is the git new-workdir command: http://kerneltrap.org/mailarchive/git/2008/5/21/1900044 http://nuclearsquid.com/writings/git-new-workdir.html It lets me keep a few branches around that I want 'permanent' on my filesystem, in a bzr shared-repo style, while using git for the lightweight feature-only branches. This is how it looks like right now on my system: - Main ipython git repo: uqbar[ipython]> cd ipython/ (Master)uqbar[ipython]> git branch -a 0.10 0.10.1 0.8 0.9 * Master master remotes/mainline/0.10 remotes/mainline/0.10.1 remotes/mainline/0.8 remotes/mainline/0.9 remotes/mainline/master remotes/min/0.10 remotes/min/0.10.1 remotes/min/0.8 remotes/min/0.9 remotes/min/master remotes/origin/master And a separate 'branches' repo, populated with new-workdir: (Master)uqbar[ipython]> cd ../branches uqbar[branches]> d /home/fperez/ipython/branches total 16 drwxr-xr-x 10 fperez 4096 2010-05-13 01:35 0.10/ drwxr-xr-x 10 fperez 4096 2010-05-13 15:48 0.10.1/ where I keep branches I may need to see persistently on disk. HTH. Cheers, f From matthew.brett at gmail.com Tue Jun 1 18:14:45 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 1 Jun 2010 15:14:45 -0700 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: Hi, > My main problem with git was the treatment of the file system, and I > find it much easier to work with separate branches as in bzr or > mercurial. Yes, it is true that the git lightweight branch model takes some time to get used to. My experience is that it's quick to get used to the git way, and once I did, it was a large relief to get rid of all those branch directories when we switched, but I understand that it is a leap. I am sure you know this, but you can replicate the heavyweight branches of hg and bzr with: # initial git clone of 'trunk' git clone git://github.com/nipy/nipy.git # make a heavyweight branch git clone nipy my-nipy-branch # push somewhere # First add repo for the branch via github interface, then cd my-nipy-branch git remote add origin git at github.com:matthew-brett/my-nipy-branch.git git push origin master I think you'd agree that it's not a windows / unix difference though. I'd agree it is a larger conceptual leap from svn to git than it is from svn to bzr or svn to mercurial. The git argument is that making that initial leap gives you a great deal of freedom and flexibility, but it can be intimidating at first. > As long as it is possible to stick with the basic workflow of git > without anything fancy, similar what I have seen while skimming the > nipy docs, I think it is not a problem on windows. I think that is true that most of us won't need to go further than the nipy basic workflow - but we haven't been using git long enough to know that very well. I would defer to the git masters out there - David, Pauli and others - ? > However, if/when parts of statsmodels go into scipy and I have to do > maintenance of less isolated code, then I think the Mercurial > interface might be my preferred choice. > > I haven't used Mercurial much yet, but I don't see any problems with it. > > So, the bottom line is, that documentation for the hg-git interface > would be very useful for Windows users (or those that think git is a > strange/unfamiliar concept.) So - two issues: 1) The conceptual issues involved in switching mind-set from svn or bzr to git. That may require some thought and documentation 2) There might be some technical issues using git on windows - but I think so far we don't have any reason to think so? 3) Some people may prefer mercurial for other reasons; it would be good to respect that if possible. So, it may well be worth making a hg-git doc for numpy when we do the transition - with the caveats that David raised. In the meantime, it would be very good to hear of any problems that do come up specifically using git on windows... See you, Matthew From matthew.brett at gmail.com Tue Jun 1 18:15:52 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 1 Jun 2010 15:15:52 -0700 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: > So, it may well be worth making a hg-git doc for numpy when we do the > transition - with the caveats that David raised. That Stefan raised ! Sorry man. I just fused the two of you because you're both so awesome.. ;) Matthew From d.l.goldsmith at gmail.com Tue Jun 1 18:37:09 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 1 Jun 2010 15:37:09 -0700 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: 2010/6/1 St?fan van der Walt > On 1 June 2010 13:31, wrote: > > So, the bottom line is, that documentation for the hg-git interface > > would be very useful for Windows users (or those that think git is a > > strange/unfamiliar concept.) > > These interfaces are somewhat dangerous, in the sense that you may > encounter rather untypical scenarios and strange bugs in those tools > (for example, we even have to be careful with git-svn, and that tool > is widely used). > > Do you think a clear, simple, numpy/scipy-oriented tutorial could > sufficiently lower the barrier to adoption? Is that a rhetorical question? I don't think there's any doubt that such would very likely have the stated result. The question is, is that your way of offering to write it? :-) DG > I think the bzr work-flow > you are used to is probably very similar to the one you'd follow with > git. > > Regards > St?fan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Tue Jun 1 19:03:40 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 1 Jun 2010 16:03:40 -0700 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> <3430B5AD-E3C2-4CE7-B07F-D8210C2E53D5@enthought.com> Message-ID: On 1 June 2010 13:55, Vincent Davis wrote: > I think this is important and as someone that is fairly new to python > and open source development I greatly value they input of others in > reviewing and making suggestions to my contributions. In addition, I > might not know how to finish all parts of a contribution and knowing > that I can contribute pieces that are within my abilities and others > can review and possibly finish encourage me to contribute. This is one scenario in which a DVCS really shines: you can mature your code, incorporating feedback as you go along, and have it included once its ready. Like you suggest, that actually lowers the barrier to entry. Asking contributors to repeatedly rework patches quickly turns into a mess. Regards St?fan From oliphant at enthought.com Tue Jun 1 19:25:35 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 1 Jun 2010 18:25:35 -0500 Subject: [SciPy-Dev] [SciPy-User] log pdf, cdf, etc In-Reply-To: References: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> <12883887-E601-467B-9C56-55BDA8169C19@enthought.com> Message-ID: On Jun 1, 2010, at 8:19 AM, Ralf Gommers wrote: > > > On Tue, Jun 1, 2010 at 1:20 PM, Travis Oliphant wrote: > > On May 31, 2010, at 6:39 AM, Ralf Gommers wrote: > > These recent changes are a bit problematic for several reasons: > - there are many new methods for distributions without tests. > > These methods are simple to see and verify. Which methods specifically are you concerned about? > > They're not all simple, for example rv_continuous._reduce_func. Since it contains inner function definitions inside an "else" block there's also a good chance it's actually broken. > > And in principle I'm worried about all of them. The python 2.4/2.5 syntax error was caught early, but what if some code you regard as simple is broken in a less obvious way on 2.4/2.5? Maybe a user finds it in a release candidate, forcing us to build an extra one? Or just after the final release? > >> - there are no docs for many new private and public methods > > They are all fairly self explanatory. But, docs can be added if needed. > > For you, and maybe for me too. But for undergraduate students, or Joe in accounting who inherited this random app that's essential for his job? It's simple, no public docs without docstrings. And preferably no private ones either. > > Thanks for fixing all public docs quickly though. You missed just one, gamma.fit. > >> - invalid syntax: http://projects.scipy.org/scipy/ticket/1186 > > This has been fixed (it was easier to fix the syntax then file the ticket...) Also to be clear this is only invalid for Python < 2.6 (the comment makes it sound like somehow the changes weren't tested at all). > > I didn't mean to imply that you were committing code that didn't even work for you. >> - the old rv_continuous doc template was put back in > > I'm not sure what you mean. Which change did this? > > The first one of your recent commits, r6392. The docstrings for subclasses of rv_continuous and rv_discrete are not generated from this template anymore, which is why it was removed. Look at line 862 (# generate docstring for subclass instances) and below that to see how it works now. > > If you're wondering why that changed, the main reasons are (1) to make the docstrings conform to the standard, (2) to be able to put useful info in the base classes, like "this is how you subclass it: ..." instead of a template, and (3) to be able to customize individual distribution docstrings easily. > > >> >> This, plus Josef saying that he doesn't want to fix the API for some methods yet, makes me want to take it out of the 0.8.x branch. Any objections to that Travis or Josef? > > I would really like to see these changes go in to 0.8.x. If Josef feels strongly about the API in the future, we can change it for the next release. I don't understand what the specific concerns are. > > No you can't. For API changes we do have a policy, they need deprecation first. Which means if we release it like this now, we're stuck with it till 0.10 / 1.0. > > > In summary, I see quite a few reasons why this shouldn't go in and don't see a compelling reason to release it right now. The 0.9 release is (tentatively) planned for September, so you don't have to worry that your changes sit in trunk unreleased for 1.5 years. As the one doing the work of release manager, you have a lot of latitude in making this decision, of course. The compelling reason to release it right now is to get the improved features which nobody has actually voiced specific concerns about. Specifically improvements to the fit method of distribution objects (the ability to fix specific parameters of the distribution and vary others in the fit) is a very nice-to-have feature. The API change problem you mention is actually an argument for putting it in now (because we *can* deprecate it in 0.9 and then have whatever unspecified correct API come out in 1.0). I have not heard that there is real disagreement about the API either. It feels like I've addressed the major reasons you feel it can't go in. The functionality is tested. There are docstrings. I just removed the rv_continuous doc template. I really don't know why that was added. I did not make a specific change to include it. It must have been a merge error. Suggestions about how to give gamma.fit and beta.fit the docstring of it's parent would be appreciated. I don't think a general rule of "no private methods without docstrings" is necessarily appropriate, and a bit of an example of going overboard with "rules" and "procedures." Private methods are not meant to be called outside of code and should not necessarily have to be documented with docstrings. Every docstring creates more code to maintain and keep consistent with the actual code. One of the great things about Python is that you can read the code itself so that it is much closer to self-documenting code (close to it but not there --- I like comments and docstrings too). Thanks for your efforts. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Tue Jun 1 19:33:56 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 1 Jun 2010 18:33:56 -0500 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> <3430B5AD-E3C2-4CE7-B07F-D8210C2E53D5@enthought.com> Message-ID: <1331DFB9-4FCA-44F3-A1D0-C00714A60511@enthought.com> On Jun 1, 2010, at 1:57 PM, Matthew Brett wrote: > Hi, > >>> Well - but that is because you don't maintenance. Imagine a >>> maintainer puts in a lot of effort to make the code well-documented >>> and tested. Then, you have put in new code that has neither >>> documentation nor tests. As a good maintainer, it's really painful >>> for them that there's new code without documentation or tests. They >>> can only feel abused in that situation, because it seems as if you are >>> expecting them to clean up after you - without asking. >> >> I don't think that is fair. I have been "maintaining" SciPy and NumPy code for over 10 years. I have done an immense amount of work in porting SciPy to NumPy and continuing to fix bugs that I am made aware of. I don't have as much time to commit to SciPy as I would like. > > I wasn't really saying whether it was fair or not, I was only trying > to explain why it might cause offense. > > When I say that you don't do maintenance, I mean that you are not > currently the person who has to make sure that the code is readable > and maintainable. That is hard and often thankless work. > > I presume that you agree that numpy and scipy code should have > documentation and tests. I presume also that when you commit code > without documentation or tests, that you do not usually intend to come > back and do these later - say - before the next release. That means > that someone else has to do it. It will take them a lot longer than > it would take you because they don't know the code as well. > No, that is actually not what I imply but checking something in to the trunk. I plan to submit tests and docs before the next release when I commit code. I don't expect anyone else to do that for me. I always welcome help, but I don't expect it. I really think this is more about how people view commits to the trunk than anything else. I like to use SVN as a version control system. My commits to trunk are always more incremental. I like to get things committed in self-contained chunks. Adding the requirement to put in documentation and tests before committing stretches out that "incremental" work element to longer than I ever have time for in one sitting. Clearly, if I were using DVCS to a published branch that could be then merged to the trunk this problem would not have arisen. I see that I need to move to that style. People are reading far more into my committing to trunk than I ever meant to imply. -Travis From matthew.brett at gmail.com Tue Jun 1 19:48:25 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 1 Jun 2010 16:48:25 -0700 Subject: [SciPy-Dev] scipy.stats In-Reply-To: <1331DFB9-4FCA-44F3-A1D0-C00714A60511@enthought.com> References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> <3430B5AD-E3C2-4CE7-B07F-D8210C2E53D5@enthought.com> <1331DFB9-4FCA-44F3-A1D0-C00714A60511@enthought.com> Message-ID: Hi, >> I presume that you agree that numpy and scipy code should have >> documentation and tests. ? ?I presume also that when you commit code >> without documentation or tests, that you do not usually intend to come >> back and do these later - say - before the next release. ? That means >> that someone else has to do it. ?It will take them a lot longer than >> it would take you because they don't know the code as well. >> > > No, that is actually not what I imply but checking something in to the trunk. ? ?I plan to submit tests and docs before the next > release when I commit code. ? ?I don't expect anyone else to do that for me. ? I always welcome help, but I don't expect it. I am sure if people know that that is what you intend, and when DVCS allows that to happen, no-one will be upset, and we will all return to our usual mode of being very grateful for all the work that you've done and are doing. And - thanks for the clarification - sometimes things that seem obvious - aren't obvious - and it's good to say them out loud... See you, Matthew From david at silveregg.co.jp Tue Jun 1 20:57:18 2010 From: david at silveregg.co.jp (David) Date: Wed, 02 Jun 2010 09:57:18 +0900 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: <4C05AC6E.1030600@silveregg.co.jp> On 06/02/2010 05:31 AM, josef.pktd at gmail.com wrote: > On Tue, Jun 1, 2010 at 3:44 PM, Matthew Brett wrote: >> Hi, >> >>>> a) are there any problems that you know of using git from the windows >>>> shell? >>> >>> None in principle here (and from what I've garnered through the discussion, >>> I am supportive of the move, as long as we don't deprecate the SVN trunk too >>> quickly), but do we have anyone, even just one person, who is already >>> reasonably facile in this regard who'd be willing to support others through >>> the transition? >> >> I would not claim to be very experienced, but I have not had any >> problems using msysgit with either the windows shell or the (rather >> good) windows power shell. The bash shell does have problems but >> the windows shells have proved more useful. > > It depends a lot on the part that I am working on. I wouldn't want to > switch statsmodels where I do my main development to git. I don't think it is anyone's intention to force you to use git for your own packages :) > As long as it is possible to stick with the basic workflow of git > without anything fancy, similar what I have seen while skimming the > nipy docs, I think it is not a problem on windows. The basic commands > and for example eclipse, GUI plugins look similar enough. We started some time ago a document in that respect: instead of describing git's features, we have a workflow-oriented document: http://projects.scipy.org/numpy/wiki/GitWorkflow David From charlesr.harris at gmail.com Tue Jun 1 21:07:27 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 Jun 2010 19:07:27 -0600 Subject: [SciPy-Dev] [SciPy-User] log pdf, cdf, etc In-Reply-To: References: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> <12883887-E601-467B-9C56-55BDA8169C19@enthought.com> Message-ID: On Tue, Jun 1, 2010 at 5:25 PM, Travis Oliphant wrote: > > On Jun 1, 2010, at 8:19 AM, Ralf Gommers wrote: > > > > On Tue, Jun 1, 2010 at 1:20 PM, Travis Oliphant wrote: > >> >> On May 31, 2010, at 6:39 AM, Ralf Gommers wrote: >> > > >> These recent changes are a bit problematic for several reasons: >> - there are many new methods for distributions without tests. >> >> These methods are simple to see and verify. Which methods specifically >> are you concerned about? >> > > They're not all simple, for example rv_continuous._reduce_func. Since it > contains inner function definitions inside an "else" block there's also a > good chance it's actually broken. > > And in principle I'm worried about all of them. The python 2.4/2.5 syntax > error was caught early, but what if some code you regard as simple is broken > in a less obvious way on 2.4/2.5? Maybe a user finds it in a release > candidate, forcing us to build an extra one? Or just after the final > release? > >> >> - there are no docs for many new private and public methods >> >> >> They are all fairly self explanatory. But, docs can be added if needed. >> > > For you, and maybe for me too. But for undergraduate students, or Joe in > accounting who inherited this random app that's essential for his job? It's > simple, no public docs without docstrings. And preferably no private ones > either. > > Thanks for fixing all public docs quickly though. You missed just one, > gamma.fit. > >> >> - invalid syntax: http://projects.scipy.org/scipy/ticket/1186 >> >> >> This has been fixed (it was easier to fix the syntax then file the >> ticket...) Also to be clear this is only invalid for Python < 2.6 (the >> comment makes it sound like somehow the changes weren't tested at all). >> >> I didn't mean to imply that you were committing code that didn't even work > for you. > >> - the old rv_continuous doc template was put back in >> >> >> I'm not sure what you mean. Which change did this? >> > > The first one of your recent commits, r6392. The docstrings for subclasses > of rv_continuous and rv_discrete are not generated from this template > anymore, which is why it was removed. Look at line 862 (# generate docstring > for subclass instances) and below that to see how it works now. > > If you're wondering why that changed, the main reasons are (1) to make the > docstrings conform to the standard, (2) to be able to put useful info in the > base classes, like "this is how you subclass it: ..." instead of a template, > and (3) to be able to customize individual distribution docstrings easily. > > >> >> This, plus Josef saying that he doesn't want to fix the API for some >> methods yet, makes me want to take it out of the 0.8.x branch. Any >> objections to that Travis or Josef? >> >> >> I would really like to see these changes go in to 0.8.x. If Josef feels >> strongly about the API in the future, we can change it for the next release. >> I don't understand what the specific concerns are. >> >> No you can't. For API changes we do have a policy, they need deprecation > first. Which means if we release it like this now, we're stuck with it till > 0.10 / 1.0. > > > > In summary, I see quite a few reasons why this shouldn't go in and don't > see a compelling reason to release it right now. The 0.9 release is > (tentatively) planned for September, so you don't have to worry that your > changes sit in trunk unreleased for 1.5 years. > > > As the one doing the work of release manager, you have a lot of latitude in > making this decision, of course. The compelling reason to release it > right now is to get the improved features which nobody has actually voiced > specific concerns about. > > There have been expressed concerns as to both the design and validation. I think it should be removed and these changes put into a branch or up on github until they have been tested and documented. There is no rush, and really, there is no reason for folks to use code that hasn't been validated except for testing, and testing can be done using the branch. > Specifically improvements to the fit method of distribution objects (the > ability to fix specific parameters of the distribution and vary others in > the fit) is a very nice-to-have feature. The API change problem you > mention is actually an argument for putting it in now (because we *can* > deprecate it in 0.9 and then have whatever unspecified correct API come out > in 1.0). I have not heard that there is real disagreement about the API > either. > No, it is a argument for *not* putting it in now. There is no rush, and until the code has been looked over and thoroughly tested, there is no guarantee that either the API is suitable or that the implementation is correct. > > It feels like I've addressed the major reasons you feel it can't go in. > The functionality is tested. There are docstrings. I just removed the > rv_continuous doc template. I really don't know why that was added. I did > not make a specific change to include it. It must have been a merge error. > > We don't know what else might be wrong. Look at what happened with datetime and all the work that made for David. > Suggestions about how to give gamma.fit and beta.fit the docstring of it's > parent would be appreciated. > > I don't think a general rule of "no private methods without docstrings" is > necessarily appropriate, and a bit of an example of going overboard with > "rules" and "procedures." Private methods are not meant to be called > outside of code and should not necessarily have to be documented with > docstrings. Every docstring creates more code to maintain and keep > consistent with the actual code. > > One of the great things about Python is that you can read the code itself > so that it is much closer to self-documenting code > (close to it but not there --- I like comments and docstrings too). > > Python beyond the trivial is *not* self documenting, no code is self documenting. There is always a struggle to grasp the larger design and intent, as well of niggling questions of correctness. All python serves to do is remove a lot of verbiage by abstracting common objects like lists and hash tables. That helps, but it is far from all that is needed. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Tue Jun 1 21:22:56 2010 From: david at silveregg.co.jp (David) Date: Wed, 02 Jun 2010 10:22:56 +0900 Subject: [SciPy-Dev] [SciPy-User] log pdf, cdf, etc In-Reply-To: References: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> <12883887-E601-467B-9C56-55BDA8169C19@enthought.com> Message-ID: <4C05B270.8090608@silveregg.co.jp> On 06/01/2010 10:19 PM, Ralf Gommers wrote: > > No you can't. For API changes we do have a policy, they need deprecation > first. Which means if we release it like this now, we're stuck with it > till 0.10 / 1.0. I am not the release manager for 0.8.0, but I don't understand why we even discuss it *again*. *Every - single - time* this has happened in the past, it has caused numerous issues. It can be put in 0.8.1 later, the choice is not between now and one year and a half. David From josef.pktd at gmail.com Tue Jun 1 23:07:43 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 1 Jun 2010 23:07:43 -0400 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: On Tue, Jun 1, 2010 at 6:11 PM, Fernando Perez wrote: > Hi Josef, > > On Tue, Jun 1, 2010 at 1:31 PM, ? wrote: >> My main problem with git was the treatment of the file system, and I >> find it much easier to work with separate branches as in bzr or >> mercurial. > > One thing I've found very useful after transitioning to git for > IPython is the git new-workdir command: > > http://kerneltrap.org/mailarchive/git/2008/5/21/1900044 > http://nuclearsquid.com/writings/git-new-workdir.html > > It lets me keep a few branches around that I want 'permanent' on my > filesystem, in a bzr shared-repo style, while using git for the > lightweight feature-only branches. ?This is how it looks like right > now on my system: thanks very useful information , git-new-workdir seems to be what I would like. Does it work on Windows? last year I didn't find any way to do this. I haven't updated git since then and I don't see any git-new-workdir in the git folders. (But maybe I deleted it when I switched from the full 1.x GB git install to the light version - without mingw and the kitchen sink) Josef > > - Main ipython git repo: > uqbar[ipython]> cd ipython/ > (Master)uqbar[ipython]> git branch -a > ?0.10 > ?0.10.1 > ?0.8 > ?0.9 > * Master > ?master > ?remotes/mainline/0.10 > ?remotes/mainline/0.10.1 > ?remotes/mainline/0.8 > ?remotes/mainline/0.9 > ?remotes/mainline/master > ?remotes/min/0.10 > ?remotes/min/0.10.1 > ?remotes/min/0.8 > ?remotes/min/0.9 > ?remotes/min/master > ?remotes/origin/master > > > And a separate 'branches' repo, populated with new-workdir: > > (Master)uqbar[ipython]> cd ../branches > uqbar[branches]> d > /home/fperez/ipython/branches > total 16 > drwxr-xr-x 10 fperez 4096 2010-05-13 01:35 0.10/ > drwxr-xr-x 10 fperez 4096 2010-05-13 15:48 0.10.1/ > > > where I keep branches I may need to see persistently on disk. > > HTH. > > Cheers, > > f > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From fperez.net at gmail.com Tue Jun 1 23:15:38 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 1 Jun 2010 20:15:38 -0700 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: Hi Josef, On Tue, Jun 1, 2010 at 8:07 PM, wrote: > > thanks very useful information , git-new-workdir seems to be what I would like. > Does it work on Windows? > > last year I didn't find any way to do this. I haven't updated git > since then and I don't see any git-new-workdir in the git folders. > (But maybe I deleted it when I switched from the full 1.x GB git > install to the light version - without mingw and the kitchen sink) Unfortunately I don't know if it works on Windows; on my linux box it ships here: /usr/share/doc/git-core/contrib/workdir/git-new-workdir and I had to enable it by copying this script to somewhere in my PATH and making it executable. I have no idea if these contortions would work on Windows as well, though. Cheers, f From josef.pktd at gmail.com Tue Jun 1 23:38:21 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 1 Jun 2010 23:38:21 -0400 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: On Tue, Jun 1, 2010 at 6:14 PM, Matthew Brett wrote: > Hi, > >> My main problem with git was the treatment of the file system, and I >> find it much easier to work with separate branches as in bzr or >> mercurial. > > Yes, it is true that the git lightweight branch model takes some time > to get used to. ?My experience is that it's quick to get used to the > git way, and once I did, it was a large relief to get rid of all those > branch directories when we switched, but I understand that it is a > leap. > > I am sure you know this, but you can replicate the heavyweight > branches of hg and bzr with: > > # initial git clone of 'trunk' > git clone git://github.com/nipy/nipy.git > # make a heavyweight branch > git clone nipy my-nipy-branch > # push somewhere > # First add repo for the branch via github interface, then > cd my-nipy-branch > git remote add origin git at github.com:matthew-brett/my-nipy-branch.git > git push origin master However, I think this works only with a remote remote, github or similar When I looked at bzr vs hg vs git, I also thought about my private use, where I didn't find a way to compare across branches in separate directories. My work style in statsmodels is similar to the mailing list reference that Fernando gave. Mainly I have many uncommitted files in each branch, test scripts, examples scripts, quick checks whether a rewrite would work, or R and matlab files. None of it I want to commit to the repository, but have available when I work on it again. > > I think you'd agree that it's not a windows / unix difference though. > I'd agree it is a larger conceptual leap from svn to git than it is > from svn to bzr or svn to mercurial. ?The git argument is that making > that initial leap gives you a great deal of freedom and flexibility, > but it can be intimidating at first. A great deal of freedom gives any new user also a lot of opportunities to shoot in his own foot. And my impression from the mailing lists is that the rescue team is called more often than with bzr or hg. My recommendation to myself is not to use with git more than the 10 or so basic commands similar to svn or bzr. Then I don't think it will create any real problems. So the basic workflow description by the nipy and numpy/scipy git developers will be the most useful help for the transition. (just confirming what is obvious to you) > >> As long as it is possible to stick with the basic workflow of git >> without anything fancy, similar what I have seen while skimming the >> nipy docs, I think it is not a problem on windows. > > I think that is true that most of us won't need to go further than the > nipy basic workflow - but we haven't been using git long enough to > know that very well. ?I would defer to the git masters out there - > David, Pauli and others - ? > >> However, if/when parts of statsmodels go into scipy and I have to do >> maintenance of less isolated code, then I think the Mercurial >> interface might be my preferred choice. >> >> I haven't used Mercurial much yet, but I don't see any problems with it. >> >> So, the bottom line is, that documentation for the hg-git interface >> would be very useful for Windows users (or those that think git is a >> strange/unfamiliar concept.) > > So - two issues: > > 1) The conceptual issues involved in switching mind-set from svn or > bzr to git. ?That may require some thought and documentation > 2) There might be some technical issues using git on windows - but I > think so far we don't have any reason to think so? > 3) Some people may prefer mercurial for other reasons; it would be > good to respect that if possible. > > So, it may well be worth making a hg-git doc for numpy when we do the > transition - with the caveats that David raised. > > In the meantime, it would be very good to hear of any problems that do > come up specifically using git on windows... Right now I only use 3 or so git commands and I don't see any problems. Cheers, Josef > > See you, > > Matthew > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From matthew.brett at gmail.com Tue Jun 1 23:39:53 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 1 Jun 2010 20:39:53 -0700 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: Hi, On Tue, Jun 1, 2010 at 8:07 PM, wrote: > On Tue, Jun 1, 2010 at 6:11 PM, Fernando Perez wrote: >> Hi Josef, >> >> On Tue, Jun 1, 2010 at 1:31 PM, ? wrote: >>> My main problem with git was the treatment of the file system, and I >>> find it much easier to work with separate branches as in bzr or >>> mercurial. >> >> One thing I've found very useful after transitioning to git for >> IPython is the git new-workdir command: >> >> http://kerneltrap.org/mailarchive/git/2008/5/21/1900044 >> http://nuclearsquid.com/writings/git-new-workdir.html >> >> It lets me keep a few branches around that I want 'permanent' on my >> filesystem, in a bzr shared-repo style, while using git for the >> lightweight feature-only branches. ?This is how it looks like right >> now on my system: > > thanks very useful information , git-new-workdir seems to be what I would like. > Does it work on Windows? Sadly - probably not without a little hacking... http://code.google.com/p/msysgit/issues/detail?id=99 But, if you think you need it, the script is so short that it would only take a short time to port to python (it's in sh): http://git.kernel.org/?p=git/git.git;a=blob_plain;f=contrib/workdir/git-new-workdir;hb=HEAD I see there are symbolic links there, that will require a little fancy footwork on windows, as you know. If I have time I'll give it a go. See you, Matthew From matthew.brett at gmail.com Tue Jun 1 23:53:06 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 1 Jun 2010 20:53:06 -0700 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: Hi, >> # initial git clone of 'trunk' >> git clone git://github.com/nipy/nipy.git >> # make a heavyweight branch >> git clone nipy my-nipy-branch >> # push somewhere >> # First add repo for the branch via github interface, then >> cd my-nipy-branch >> git remote add origin git at github.com:matthew-brett/my-nipy-branch.git >> git push origin master > > However, I think this works only with a remote remote, github or similar > When I looked at bzr vs hg vs git, I also thought about my private > use, where I didn't find a way to compare across branches in separate > directories. Ah - with the paragraph below, I begin to see what you mean. You often have uncommitted changes, hence the need for several working trees. You can compare repositories, but it's a bit harder that with - say - bzr: http://stackoverflow.com/questions/687450/how-do-i-compare-two-git-repositories > My work style in statsmodels is similar to the mailing list reference > that Fernando gave. Mainly I have many uncommitted files in each > branch, test scripts, examples scripts, quick checks whether a rewrite > would work, or R and matlab files. None of it I want to commit to the > repository, but have available when I work on it again. Right - I see your point. Maybe the git solution to that workflow will be more obvious to others than it is to me. > A great deal of freedom gives any new user also a lot of opportunities > to shoot in his own foot. > And my impression from the mailing lists is that the rescue team is > called more often than with bzr or hg. > My recommendation to myself is not to use with git more than the 10 or > so basic commands similar to svn or bzr. Then I don't think it will > create any real problems. That's fair. It is easier to mess up with git - it has a steeper learning curve when you go past the basics. It is well worthwhile spending some time understanding the model underneath it - good links from Fernando's page : http://www.fperez.org/py4science/git.html ; I particularly liked http://tom.preston-werner.com/2009/05/19/the-git-parable.html . > So the basic workflow description by the nipy and numpy/scipy git > developers will be the most useful help for the transition. (just > confirming what is obvious to you) Worth saying - thanks for the thoughtful feedback, Matthew From josef.pktd at gmail.com Tue Jun 1 23:56:18 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 1 Jun 2010 23:56:18 -0400 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: On Tue, Jun 1, 2010 at 11:39 PM, Matthew Brett wrote: > Hi, > > On Tue, Jun 1, 2010 at 8:07 PM, ? wrote: >> On Tue, Jun 1, 2010 at 6:11 PM, Fernando Perez wrote: >>> Hi Josef, >>> >>> On Tue, Jun 1, 2010 at 1:31 PM, ? wrote: >>>> My main problem with git was the treatment of the file system, and I >>>> find it much easier to work with separate branches as in bzr or >>>> mercurial. >>> >>> One thing I've found very useful after transitioning to git for >>> IPython is the git new-workdir command: >>> >>> http://kerneltrap.org/mailarchive/git/2008/5/21/1900044 >>> http://nuclearsquid.com/writings/git-new-workdir.html >>> >>> It lets me keep a few branches around that I want 'permanent' on my >>> filesystem, in a bzr shared-repo style, while using git for the >>> lightweight feature-only branches. ?This is how it looks like right >>> now on my system: >> >> thanks very useful information , git-new-workdir seems to be what I would like. >> Does it work on Windows? > > Sadly - probably not without a little hacking... > > http://code.google.com/p/msysgit/issues/detail?id=99 > > But, if you think you need it, the script is so short that it would > only take a short time to port to python (it's in sh): > > http://git.kernel.org/?p=git/git.git;a=blob_plain;f=contrib/workdir/git-new-workdir;hb=HEAD > > I see there are symbolic links there, that will require a little fancy > footwork on windows, as you know. > > If I have time I'll give it a go. I don't think that's necessary (symlinks sound tricky) and scipy will be on a public repository, so your multiple (if I understand correctly) clone solution will work. Josef > > See you, > > Matthew > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From vincent at vincentdavis.net Wed Jun 2 00:02:20 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Tue, 1 Jun 2010 22:02:20 -0600 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: On Tue, Jun 1, 2010 at 9:56 PM, wrote: > On Tue, Jun 1, 2010 at 11:39 PM, Matthew Brett wrote: >> Hi, >> >> On Tue, Jun 1, 2010 at 8:07 PM, ? wrote: >>> On Tue, Jun 1, 2010 at 6:11 PM, Fernando Perez wrote: >>>> Hi Josef, >>>> >>>> On Tue, Jun 1, 2010 at 1:31 PM, ? wrote: >>>>> My main problem with git was the treatment of the file system, and I >>>>> find it much easier to work with separate branches as in bzr or >>>>> mercurial. >>>> >>>> One thing I've found very useful after transitioning to git for >>>> IPython is the git new-workdir command: >>>> >>>> http://kerneltrap.org/mailarchive/git/2008/5/21/1900044 >>>> http://nuclearsquid.com/writings/git-new-workdir.html >>>> >>>> It lets me keep a few branches around that I want 'permanent' on my >>>> filesystem, in a bzr shared-repo style, while using git for the >>>> lightweight feature-only branches. ?This is how it looks like right >>>> now on my system: >>> >>> thanks very useful information , git-new-workdir seems to be what I would like. >>> Does it work on Windows? >> >> Sadly - probably not without a little hacking... >> >> http://code.google.com/p/msysgit/issues/detail?id=99 >> >> But, if you think you need it, the script is so short that it would >> only take a short time to port to python (it's in sh): >> >> http://git.kernel.org/?p=git/git.git;a=blob_plain;f=contrib/workdir/git-new-workdir;hb=HEAD >> >> I see there are symbolic links there, that will require a little fancy >> footwork on windows, as you know. >> >> If I have time I'll give it a go. > > I don't think that's necessary (symlinks sound tricky) and scipy will > be on a public repository, so your multiple (if I understand > correctly) clone solution will work. Could you not use hg to do what you want (work with local directories) and the use hg-git when you need to? I am kinda partial to hg and bzr. Vincent > > Josef > >> >> See you, >> >> Matthew >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From josef.pktd at gmail.com Wed Jun 2 00:14:14 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 2 Jun 2010 00:14:14 -0400 Subject: [SciPy-Dev] git on windows (was: scipy.stats) In-Reply-To: References: Message-ID: On Wed, Jun 2, 2010 at 12:02 AM, Vincent Davis wrote: > On Tue, Jun 1, 2010 at 9:56 PM, ? wrote: >> On Tue, Jun 1, 2010 at 11:39 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Tue, Jun 1, 2010 at 8:07 PM, ? wrote: >>>> On Tue, Jun 1, 2010 at 6:11 PM, Fernando Perez wrote: >>>>> Hi Josef, >>>>> >>>>> On Tue, Jun 1, 2010 at 1:31 PM, ? wrote: >>>>>> My main problem with git was the treatment of the file system, and I >>>>>> find it much easier to work with separate branches as in bzr or >>>>>> mercurial. >>>>> >>>>> One thing I've found very useful after transitioning to git for >>>>> IPython is the git new-workdir command: >>>>> >>>>> http://kerneltrap.org/mailarchive/git/2008/5/21/1900044 >>>>> http://nuclearsquid.com/writings/git-new-workdir.html >>>>> >>>>> It lets me keep a few branches around that I want 'permanent' on my >>>>> filesystem, in a bzr shared-repo style, while using git for the >>>>> lightweight feature-only branches. ?This is how it looks like right >>>>> now on my system: >>>> >>>> thanks very useful information , git-new-workdir seems to be what I would like. >>>> Does it work on Windows? >>> >>> Sadly - probably not without a little hacking... >>> >>> http://code.google.com/p/msysgit/issues/detail?id=99 >>> >>> But, if you think you need it, the script is so short that it would >>> only take a short time to port to python (it's in sh): >>> >>> http://git.kernel.org/?p=git/git.git;a=blob_plain;f=contrib/workdir/git-new-workdir;hb=HEAD >>> >>> I see there are symbolic links there, that will require a little fancy >>> footwork on windows, as you know. >>> >>> If I have time I'll give it a go. >> >> I don't think that's necessary (symlinks sound tricky) and scipy will >> be on a public repository, so your multiple (if I understand >> correctly) clone solution will work. > > Could you not use hg to do what you want (work with local directories) > and the use hg-git when you need to? > > I am kinda partial to hg and bzr. I will give git a try, reviewing patches sounds easier with git. For other things, hg-git will be the likely outcome. Josef > > Vincent > >> >> Josef >> >>> >>> See you, >>> >>> Matthew >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From warren.weckesser at enthought.com Wed Jun 2 00:28:35 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 01 Jun 2010 23:28:35 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table Message-ID: <4C05DDF3.9010206@enthought.com> I've been digging into some basic statistics recently, and developed the following function for applying the chi-square test to a contingency table. Does something like this already exist in scipy.stats? If not, any objects to adding it? (Tests are already written :) Warren ----- def chisquare_contingency(table): """Chi-square calculation for a contingency (R x C) table. This function computes the chi-square statistic and p-value of the data in the table. The expected frequencies are computed based on the relative frequencies in the table. Parameters ---------- table : array_like, 2D The contingency table, also known as the R x C table. Returns ------- chisquare statistic : float The chisquare test statistic p : float The p-value of the test. """ table = np.asarray(table) if table.ndim != 2: raise ValueError("table must be a 2D array.") # Create the table of expected frequencies. total = table.sum() row_sum = table.sum(axis=1).reshape(-1,1) col_sum = table.sum(axis=0) expected = row_sum * col_sum / float(total) # Since we are passing in 1D arrays of length table.size, the default # number of degrees of freedom is table.size-1. # For a contingency table, the actual number degrees of freedom is # (nr - 1)*(nc-1). We use the ddof argument # of the chisquare function to adjust the default. nr, nc = table.shape dof = (nr - 1) * (nc - 1) dof_adjust = (table.size - 1) - dof chi2, p = chisquare(np.ravel(table), np.ravel(expected), ddof=dof_adjust) return chi2, p ----- From d.l.goldsmith at gmail.com Wed Jun 2 01:09:17 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 1 Jun 2010 22:09:17 -0700 Subject: [SciPy-Dev] Difference between scipy.stats.gengamma and scipy.stats.distributions.gengamma Message-ID: Is there a difference between these two? Same question for stats.lognorm and stats.distributions.lognorm? Thanks. DG -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jun 2 01:23:16 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 2 Jun 2010 01:23:16 -0400 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C05DDF3.9010206@enthought.com> References: <4C05DDF3.9010206@enthought.com> Message-ID: On Wed, Jun 2, 2010 at 12:28 AM, Warren Weckesser wrote: > I've been digging into some basic statistics recently, and developed the > following function for applying the chi-square test to a contingency > table. ?Does something like this already exist in scipy.stats? If not, > any objects to adding it? ?(Tests are already written :) There is no test like this yet in scipy.stats, and I think it is a good addition. My main question, which maybe Bruce can answer, is whether the function should allow more than 2 dimensions. The function would be easy to generalize but I don't know how common the test for example for independence in (RxCxD) is. (Options could still be added later without changing the API, in case there are any.) I would also look briefly at the R manual, to see what features their test has. (I'm not a real user of contingency tables) The docstring I think should mention that this is a test for independence, and that it is only appropriate if the expected count in each cell is at least 5. (off the top of my head) "Chi-square test for independence in a contingency (R x C) table" is (R x C) standard notation (letters)? dof_adjust, I would have to check. Can you open a ticket, mainly for the record, but to see if there are any useful generalization? But I think it can go in. A comment: The function matches the pattern of the current scipy.stats functions, but in statsmodels I would most likely also make the expected values available, so that users can directly compare data and expected values. Thanks, Josef > > Warren > > ----- > > def chisquare_contingency(table): > ? ?"""Chi-square calculation for a contingency (R x C) table. > > ? ?This function computes the chi-square statistic and p-value of the > ? ?data in the table. ?The expected frequencies are computed based on > ? ?the relative frequencies in the table. > > ? ?Parameters > ? ?---------- > ? ?table : array_like, 2D > ? ? ? ?The contingency table, also known as the R x C table. > > ? ?Returns > ? ?------- > ? ?chisquare statistic : float > ? ? ? ?The chisquare test statistic > ? ?p : float > ? ? ? ?The p-value of the test. > ? ?""" > ? ?table = np.asarray(table) > ? ?if table.ndim != 2: > ? ? ? ?raise ValueError("table must be a 2D array.") > > ? ?# Create the table of expected frequencies. > ? ?total = table.sum() > ? ?row_sum = table.sum(axis=1).reshape(-1,1) > ? ?col_sum = table.sum(axis=0) > ? ?expected = row_sum * col_sum / float(total) > > ? ?# Since we are passing in 1D arrays of length table.size, the default > ? ?# number of degrees of freedom is table.size-1. > ? ?# For a contingency table, the actual number degrees of freedom is > ? ?# (nr - 1)*(nc-1). ?We use the ddof argument > ? ?# of the chisquare function to adjust the default. > ? ?nr, nc = table.shape > ? ?dof = (nr - 1) * (nc - 1) > ? ?dof_adjust = (table.size - 1) - dof > > ? ?chi2, p = chisquare(np.ravel(table), np.ravel(expected), > ddof=dof_adjust) > ? ?return chi2, p > > ----- > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From josef.pktd at gmail.com Wed Jun 2 01:26:03 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 2 Jun 2010 01:26:03 -0400 Subject: [SciPy-Dev] Difference between scipy.stats.gengamma and scipy.stats.distributions.gengamma In-Reply-To: References: Message-ID: On Wed, Jun 2, 2010 at 1:09 AM, David Goldsmith wrote: > Is there a difference between these two?? Same question for stats.lognorm > and stats.distributions.lognorm?? Thanks. No, they are the same instance of the distribution scipy.stats.__init__ has a from distributions import * or something like this Josef > > DG > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide. ?(As interpreted > by Robert Graves) > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From scott.sinclair.za at gmail.com Wed Jun 2 01:41:10 2010 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Wed, 2 Jun 2010 07:41:10 +0200 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: On 1 June 2010 22:48, David Goldsmith wrote: > 2010/6/1 St?fan van der Walt >> >> On 1 June 2010 13:32, David Goldsmith wrote: >> > The docstring Standard seems to be careful to note which sections are >> > considered optional, and the "Extended Summary" is *not* on that list. >> > However, I'm encountering many SciPy docstrings in the Wiki lacking this >> > section and yet marked as "Needs review": should I ignore this >> > deficiency >> > and add a ticket to clarify the Standard, or should such docstrings be >> > moved >> > back to "Being written"? >> >> Typically, there is no reason not to have an extended section. ?Can >> you give an example where it would seem unnecessary? > > No: my position would appear to be the same as yours, and my inclination > would be to "revert" them to "Being written." Wouldn't it better to revert them to "Needs editing" instead? The "Being written" status implies that someone is actively working on the docstring... Cheers, Scott From d.l.goldsmith at gmail.com Wed Jun 2 02:34:04 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 1 Jun 2010 23:34:04 -0700 Subject: [SciPy-Dev] Difference between scipy.stats.gengamma and scipy.stats.distributions.gengamma In-Reply-To: References: Message-ID: On Tue, Jun 1, 2010 at 10:26 PM, wrote: > On Wed, Jun 2, 2010 at 1:09 AM, David Goldsmith > wrote: > > Is there a difference between these two? Same question for stats.lognorm > > and stats.distributions.lognorm? Thanks. > > No, they are the same instance of the distribution > > scipy.stats.__init__ has a from distributions import * or something like > this > OK, thanks! > DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Wed Jun 2 02:44:41 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 1 Jun 2010 23:44:41 -0700 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: On Tue, Jun 1, 2010 at 10:41 PM, Scott Sinclair wrote: > On 1 June 2010 22:48, David Goldsmith wrote: > > 2010/6/1 St?fan van der Walt > >> > >> On 1 June 2010 13:32, David Goldsmith wrote: > >> > The docstring Standard seems to be careful to note which sections are > >> > considered optional, and the "Extended Summary" is *not* on that list. > >> > However, I'm encountering many SciPy docstrings in the Wiki lacking > this > >> > section and yet marked as "Needs review": should I ignore this > >> > deficiency > >> > and add a ticket to clarify the Standard, or should such docstrings be > >> > moved > >> > back to "Being written"? > >> > >> Typically, there is no reason not to have an extended section. Can > >> you give an example where it would seem unnecessary? > > > > No: my position would appear to be the same as yours, and my inclination > > would be to "revert" them to "Being written." > > Wouldn't it better to revert them to "Needs editing" instead? The > "Being written" status implies that someone is actively working on the > docstring... > > Cheers, > Scott > Correct; actually, what I'm doing for these, and other prematurely promoted docstrings, is checking the log: only if the most recent edit was substantial and within the last 6 mo. (indicating some amount of recent "ownership") am I pushing back to "Being written," otherwise, which, so far, is the dominant case by far, I am indeed pushing it back to "Needs editing." :-) DG > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ce at vejnar.eu Wed Jun 2 05:52:49 2010 From: ce at vejnar.eu (Charles Vejnar) Date: Wed, 02 Jun 2010 11:52:49 +0200 Subject: [SciPy-Dev] Scipy archive on PyPI In-Reply-To: References: <201005311655.53741.ce@vejnar.eu> <4C045DCD.1080503@silveregg.co.jp> Message-ID: <201006021152.49749.ce@vejnar.eu> On Tuesday 01 June 2010, Ralf Gommers wrote: > On Tue, Jun 1, 2010 at 9:09 AM, David wrote: > > On 06/01/2010 12:15 AM, Ralf Gommers wrote: > > > On Mon, May 31, 2010 at 10:55 PM, Charles Vejnar > > > > > > wrote: > > > Hi, > > > > > > I was trying to install Scipy with easy_install and it seems that > > > downloading > > > from Sourceforge is no longer possible (Sourceforge no longer gives > > > a direct > > > link to the .tar.gz file) which makes the install fail. > > > > > > Would it be possible to always upload the latest Scipy tarball to > > > > PyPI ? > > > > > It's possible, but because that encourages the use of easy_install/pip > > > it would probably give more problems than that it helps. Just today > > > there was a thread on numpy-discussion about pip failing and standard > > > "python setup.py install" fixing the problem. easy_install is just as > > > problematic as pip, if not more so. > > > > Unfortunately, people will always use those half broken tools. I think > > we should at least put the tarballs - I also used to put a simple > > executable (result of bdist_wininst) so that easy_install numpy works on > > windows. > > > > OK, I'll do the same then. > > Ralf Thank you Charles From stefan.czesla at hs.uni-hamburg.de Wed Jun 2 07:21:06 2010 From: stefan.czesla at hs.uni-hamburg.de (Stefan) Date: Wed, 2 Jun 2010 11:21:06 +0000 (UTC) Subject: [SciPy-Dev] =?utf-8?q?np=2Esavetxt=3A_apply_patch_in_enhancement_?= =?utf-8?q?ticket_1079=09to_add_headers=3F?= References: Message-ID: > > If the header is given as a plane string > > (such as envisaged in ticket 1079), the > > user has to care for the correct formatting, in particular, > > the user has to > > supply the comment character(s) and the new line formatting. > > This might be > > against intuition, because many users will at first try to supply their > > header(s) without specifying those formatting characters. > > The result will be a > > file not readable with numpy.loadtxt, and the error might > > not be detected right > > away. > > I'm not sure I understand why I would want to specify a comment > character for writing a csv file (unless of course I had some comments > to add). We are possibly talking about different things. In our approach of using numpy.savetxt comments (preceeding the actual data) and a header are essentially the same, such as in the following example. Basically, we want to add some lines of additional information at the top of the file written with numpy.savetxt, and be able to recover the data with numpy.loadtxt (for which the 'header' would then be irrelevant, what may not be your intention, or is it?). #Now comes the data #column1 [kg] column2 [apple] 1 2 3 5 > > Also note that since that patch was written, savetxt takes a user > supplied newline keyword, so you can just append that to the header > string. > True, we were not aware of this, but this does not help much for the comment/header. > > > > As numpy.loadtxt has a default comment character ('#'), the same may be > > implemented for numpy.savetxt. In this case, numpy.savetxt would get two > > additional keywords (e.g. header, comment(character)), which bloats the > > interface, but potentially provides more safety. > > > > FWIW, I ended up rolling my own using the most recent pre-Python 3 > changes for savetxt that accepts a list of names instead of one string > or if the provided array has the attribute dtype.names (non-nested rec > or structured arrays) it uses those. Whatever is done I think the > support for structured arrays is nice, and I think having this > functionality is a no-brainer. I need it quite often. > Although, we have not been using record arrays too often, we see their advantages and agree that it should be possible to use them as you described it. We also thought about a solution, using the __str__ method for the 'header object'. In this vain, an arbitrary header class (including a plane string) providing an __str__ member may be handed to numpy.savetxt, which can use it to write the header. > Skipper > From ralf.gommers at googlemail.com Wed Jun 2 07:22:00 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 2 Jun 2010 19:22:00 +0800 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: On Wed, Jun 2, 2010 at 2:44 PM, David Goldsmith wrote: > On Tue, Jun 1, 2010 at 10:41 PM, Scott Sinclair gmail.com> wrote: > >> On 1 June 2010 22:48, David Goldsmith wrote: >> > 2010/6/1 St?fan van der Walt >> >> >> >> On 1 June 2010 13:32, David Goldsmith wrote: >> >> > The docstring Standard seems to be careful to note which sections are >> >> > considered optional, and the "Extended Summary" is *not* on that >> list. >> >> > However, I'm encountering many SciPy docstrings in the Wiki lacking >> this >> >> > section and yet marked as "Needs review": should I ignore this >> >> > deficiency >> >> > and add a ticket to clarify the Standard, or should such docstrings >> be >> >> > moved >> >> > back to "Being written"? >> >> >> >> Typically, there is no reason not to have an extended section. Can >> >> you give an example where it would seem unnecessary? >> > I think we shouldn't go overboard here. In the great majority of cases it's needed but sometimes there's just not much info to add besides what's in the summary and parameter description. Examples: http://docs.scipy.org/numpy/docs/numpy.core.umath.add/ http://docs.scipy.org/numpy/docs/numpy.lib.ufunclike.isneginf/ http://docs.scipy.org/numpy/docs/numpy.core.umath.logical_or/ These are all good docstrings and should not be reset to "needs editing" imho. And if you really have info to add, I suggest to just add it the moment you see it - will be a lot more productive in the end. Finally, there's a huge amount of low hanging fruit in the scipy docs. Why not just take a module and dig in? These details can wait for a while. Best regards, Ralf > >> > No: my position would appear to be the same as yours, and my inclination >> > would be to "revert" them to "Being written." >> >> Wouldn't it better to revert them to "Needs editing" instead? The >> "Being written" status implies that someone is actively working on the >> docstring... >> >> Cheers, >> Scott >> > > Correct; actually, what I'm doing for these, and other prematurely promoted > docstrings, is checking the log: only if the most recent edit was > substantial and within the last 6 mo. (indicating some amount of recent > "ownership") am I pushing back to "Being written," otherwise, which, so far, > is the dominant case by far, I am indeed pushing it back to "Needs editing." > :-) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nmb at wartburg.edu Wed Jun 2 08:24:25 2010 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Wed, 02 Jun 2010 07:24:25 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C05DDF3.9010206@enthought.com> References: <4C05DDF3.9010206@enthought.com> Message-ID: <4C064D79.4030106@wartburg.edu> On 2010-06-01 23:28 , Warren Weckesser wrote: > I've been digging into some basic statistics recently, and developed the > following function for applying the chi-square test to a contingency > table. Does something like this already exist in scipy.stats? If not, > any objects to adding it? (Tests are already written :) Something like this would be great in scipy.stats since I end up doing the exact same thing by hand whenever I grade introductory statistics exams. Thanks for writing this! I've got some code review comments that I'll include below. > def chisquare_contingency(table): I think that chiquare_twoway fits the common name for this test better, but as Joseph mentions, this neglects the possibility of expanding this to n-dimensions. > """Chi-square calculation for a contingency (R x C) table. The docstring should emphasize that this is a hypothesis test. See for example http://docs.scipy.org/scipy/docs/scipy.stats.stats.ttest_rel/. I'm not familiar with the R x C notation, but it does work to make clear which chi square test this is. > > This function computes the chi-square statistic and p-value of the > data in the table. The expected frequencies are computed based on > the relative frequencies in the table. I try to explain what the null and alternative hypotheses are for the tests in scipy.stats. > > Parameters > ---------- > table : array_like, 2D > The contingency table, also known as the R x C table. This could also say something like "The table contains the observed frequencies of each category." > > Returns > ------- > chisquare statistic : float > The chisquare test statistic > p : float > The p-value of the test. A function like this could really use an example, perhaps straight from one of the tests. > """ > table = np.asarray(table) > if table.ndim != 2: > raise ValueError("table must be a 2D array.") > > # Create the table of expected frequencies. > total = table.sum() > row_sum = table.sum(axis=1).reshape(-1,1) > col_sum = table.sum(axis=0) > expected = row_sum * col_sum / float(total) I think that np.outer(row_sum, col_sum) is clearer than reshaping one to be a column vector. > > # Since we are passing in 1D arrays of length table.size, the default > # number of degrees of freedom is table.size-1. > # For a contingency table, the actual number degrees of freedom is > # (nr - 1)*(nc-1). We use the ddof argument > # of the chisquare function to adjust the default. > nr, nc = table.shape > dof = (nr - 1) * (nc - 1) > dof_adjust = (table.size - 1) - dof > > chi2, p = chisquare(np.ravel(table), np.ravel(expected), > ddof=dof_adjust) > return chi2, p From josef.pktd at gmail.com Wed Jun 2 10:03:09 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 2 Jun 2010 10:03:09 -0400 Subject: [SciPy-Dev] old new (?) StatisticReview Message-ID: 4 years ago Robert Kern defined guide lines or check list for the review of the functions in scipy.stats http://projects.scipy.org/scipy/wiki/StatisticsReview This is a useful checklist for evaluating legacy functions,... in scipy.stats. And I think I (implicitly) followed this most of the time in my stats cleanup. But the criteria should not apply to only existing functions (i.e. that have entered trunk), but also to new code. The only point I want to strengthen is number "1. The function works. Sometimes, you just have to state the obvious." to "1. The function works and produces correct result." "works" sounds too much like "it doesn't raise an exception" "correct" is also a vague term, but it captures more the spirit. The checklist could be reviewed or rephrased, but I would like to have guide lines spelled out more explicitly, so we know what the rules of the game are. (even if they are guidelines) Josef From josef.pktd at gmail.com Wed Jun 2 10:37:56 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 2 Jun 2010 10:37:56 -0400 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C064D79.4030106@wartburg.edu> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> Message-ID: On Wed, Jun 2, 2010 at 8:24 AM, Neil Martinsen-Burrell wrote: > On 2010-06-01 23:28 , Warren Weckesser wrote: >> I've been digging into some basic statistics recently, and developed the >> following function for applying the chi-square test to a contingency >> table. ?Does something like this already exist in scipy.stats? If not, >> any objects to adding it? ?(Tests are already written :) > > Something like this would be great in scipy.stats since I end up doing > the exact same thing by hand whenever I grade introductory statistics > exams. ?Thanks for writing this! > > I've got some code review comments that I'll include below. > >> def chisquare_contingency(table): > > I think that chiquare_twoway fits the common name for this test better, > but as Joseph mentions, this neglects the possibility of expanding this > to n-dimensions. > >> ? ? ?"""Chi-square calculation for a contingency (R x C) table. > > The docstring should emphasize that this is a hypothesis test. ?See for > example http://docs.scipy.org/scipy/docs/scipy.stats.stats.ttest_rel/. > I'm not familiar with the R x C notation, but it does work to make clear > which chi square test this is. > >> >> ? ? ?This function computes the chi-square statistic and p-value of the >> ? ? ?data in the table. ?The expected frequencies are computed based on >> ? ? ?the relative frequencies in the table. > > I try to explain what the null and alternative hypotheses are for the > tests in scipy.stats. > >> >> ? ? ?Parameters >> ? ? ?---------- >> ? ? ?table : array_like, 2D >> ? ? ? ? ?The contingency table, also known as the R x C table. > > This could also say something like "The table contains the observed > frequencies of each category." > >> >> ? ? ?Returns >> ? ? ?------- >> ? ? ?chisquare statistic : float >> ? ? ? ? ?The chisquare test statistic >> ? ? ?p : float >> ? ? ? ? ?The p-value of the test. > > A function like this could really use an example, perhaps straight from > one of the tests. > >> ? ? ?""" >> ? ? ?table = np.asarray(table) >> ? ? ?if table.ndim != 2: >> ? ? ? ? ?raise ValueError("table must be a 2D array.") >> >> ? ? ?# Create the table of expected frequencies. >> ? ? ?total = table.sum() >> ? ? ?row_sum = table.sum(axis=1).reshape(-1,1) >> ? ? ?col_sum = table.sum(axis=0) >> ? ? ?expected = row_sum * col_sum / float(total) > > I think that np.outer(row_sum, col_sum) is clearer than reshaping one to > be a column vector. > >> >> ? ? ?# Since we are passing in 1D arrays of length table.size, the default >> ? ? ?# number of degrees of freedom is table.size-1. >> ? ? ?# For a contingency table, the actual number degrees of freedom is >> ? ? ?# (nr - 1)*(nc-1). ?We use the ddof argument >> ? ? ?# of the chisquare function to adjust the default. >> ? ? ?nr, nc = table.shape >> ? ? ?dof = (nr - 1) * (nc - 1) >> ? ? ?dof_adjust = (table.size - 1) - dof >> >> ? ? ?chi2, p = chisquare(np.ravel(table), np.ravel(expected), >> ddof=dof_adjust) >> ? ? ?return chi2, p Just a thought: I think it would be useful to have this kind of proposals on the scipy-user list (even though it is a dev issue), just to be able to get more feedback from potential users. And again, Thanks Neil, it's very nice to have the statistics in the docstrings instead of having to run to Wikipedia Josef > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From bsouthey at gmail.com Wed Jun 2 10:41:39 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 02 Jun 2010 09:41:39 -0500 Subject: [SciPy-Dev] np.savetxt: apply patch in enhancement ticket 1079 to add headers? In-Reply-To: References: Message-ID: <4C066DA3.8010609@gmail.com> On 06/02/2010 06:21 AM, Stefan wrote: > >>> If the header is given as a plane string >>> (such as envisaged in ticket 1079), the >>> user has to care for the correct formatting, in particular, >>> the user has to >>> supply the comment character(s) and the new line formatting. >>> This might be >>> against intuition, because many users will at first try to supply their >>> header(s) without specifying those formatting characters. >>> The result will be a >>> file not readable with numpy.loadtxt, and the error might >>> not be detected right >>> away. >>> >> I'm not sure I understand why I would want to specify a comment >> character for writing a csv file (unless of course I had some comments >> to add). >> > We are possibly talking about different things. In our approach of using > numpy.savetxt comments (preceeding the actual data) and a header > are essentially the same, such as in the following example. > Basically, we want to add some lines > of additional information at the top of the file written with > numpy.savetxt, and be able to recover the data with numpy.loadtxt > (for which the 'header' would > then be irrelevant, what may not be your intention, or is it?). > > #Now comes the data > #column1 [kg] column2 [apple] > 1 2 > 3 5 > > Not that I am complaining rather trying to understand what is expected to happen. Under the patch, it is very much user beware. The header argument can be anything or nothing. There is no check for the contents or if the delimiter used is the same as the rest of the output. Further with the newline option there is no guarantee that the lines in the header will have the same line endings throughout the file. So what should a user be allowed to use as a header? You could write a whole program there or an explanation of the following output - which is very appealing. You could force a list of strings so that you print out newline.join(header) - okay not quite because it should include the comment argument. Should savetxt be restricted to something that loadtxt can read? This is potentially problematic if you want a header line. Although it could return the number of header lines. [savetxt should also be updated to allow bz2 as loadtxt handles those now - not that I have used it] > >> Also note that since that patch was written, savetxt takes a user >> supplied newline keyword, so you can just append that to the header >> string. >> >> > True, we were not aware of this, but this does not help much for the > comment/header. > Entered as ~3 months ago: http://projects.scipy.org/numpy/changeset/8180 Should this be forced to check for valid options for new lines? Otherwise you from this 'np.savetxt('junk.text', [1,2,3,4,5], newline='what')' you get: 1.000000000000000000e+00what2.000000000000000000e+00what3.000000000000000000e+00what4.000000000000000000e+00what5.000000000000000000e+00what Which is not going to be read back by loadtxt. >>> As numpy.loadtxt has a default comment character ('#'), the same may be >>> implemented for numpy.savetxt. In this case, numpy.savetxt would get two >>> additional keywords (e.g. header, comment(character)), which bloats the >>> interface, but potentially provides more safety. >>> >>> >> FWIW, I ended up rolling my own using the most recent pre-Python 3 >> changes for savetxt that accepts a list of names instead of one string >> or if the provided array has the attribute dtype.names (non-nested rec >> or structured arrays) it uses those. Whatever is done I think the >> support for structured arrays is nice, and I think having this >> functionality is a no-brainer. I need it quite often. >> >> > Although, we have not been using record arrays too often, we see their > advantages and agree that it should be possible to use them as you described > it. > We also thought about a solution, using the __str__ method for the 'header > object'. In this vain, an arbitrary header class (including a plane string) > providing an __str__ member may be handed to numpy.savetxt, > which can use it to write the header. > > >> Skipper >> >> > It would nice if savetxt used the dtype of the input to get a header and format by default unless overwritten by the user. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Wed Jun 2 12:02:02 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 02 Jun 2010 11:02:02 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> Message-ID: <4C06807A.40301@gmail.com> On 06/02/2010 09:37 AM, josef.pktd at gmail.com wrote: > On Wed, Jun 2, 2010 at 8:24 AM, Neil Martinsen-Burrell wrote: > >> On 2010-06-01 23:28 , Warren Weckesser wrote: >> >>> I've been digging into some basic statistics recently, and developed the >>> following function for applying the chi-square test to a contingency >>> table. Does something like this already exist in scipy.stats? If not, >>> any objects to adding it? (Tests are already written :) >>> >> Something like this would be great in scipy.stats since I end up doing >> the exact same thing by hand whenever I grade introductory statistics >> exams. Thanks for writing this! >> You might find SAS helpful: http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/freq_toc.htm However, this code is the chi-squared test part as SAS will compute the actual cell numbers. Also an extension to scipy.stats.chisquare() so we can not have both functions. Really this should be combined with fisher.py in ticket 956: http://projects.scipy.org/scipy/ticket/956 >> I've got some code review comments that I'll include below. >> >> >>> def chisquare_contingency(table): >>> >> I think that chiquare_twoway fits the common name for this test better, >> but as Joseph mentions, this neglects the possibility of expanding this >> to n-dimensions. >> >> >>> """Chi-square calculation for a contingency (R x C) table. >>> >> The docstring should emphasize that this is a hypothesis test. See for >> example http://docs.scipy.org/scipy/docs/scipy.stats.stats.ttest_rel/. >> I'm not familiar with the R x C notation, but it does work to make clear >> which chi square test this is. >> >> >>> This function computes the chi-square statistic and p-value of the >>> data in the table. The expected frequencies are computed based on >>> the relative frequencies in the table. >>> >> I try to explain what the null and alternative hypotheses are for the >> tests in scipy.stats. >> It is also an asymptotic test so cell size should be mentioned. >> >>> Parameters >>> ---------- >>> table : array_like, 2D >>> The contingency table, also known as the R x C table. >>> >> This could also say something like "The table contains the observed >> frequencies of each category." >> >> >>> Returns >>> ------- >>> chisquare statistic : float >>> The chisquare test statistic >>> p : float >>> The p-value of the test. >>> >> A function like this could really use an example, perhaps straight from >> one of the tests. >> It needs at least to support both the 1-d and 2-d cases (preferably where R and C > 2) >>> """ >>> table = np.asarray(table) >>> if table.ndim != 2: >>> raise ValueError("table must be a 2D array.") >>> This should not be restricted to 2-d array's. At the very least it should handle 1-d and 2-d array_like inputs. There also should have correct handling of masked arrays because np.asarray ignores the mask - I do not recall what happens with Matrix class. Obviously one needs to address how masked values are handled such as replacing the values with zero. >>> # Create the table of expected frequencies. >>> total = table.sum() >>> total=table.sum(dtype=float) # dtype will not be needed if integer division is not used (ie Python3) >>> row_sum = table.sum(axis=1).reshape(-1,1) >>> col_sum = table.sum(axis=0) >>> expected = row_sum * col_sum / float(total) >>> expected = row_sum * col_sum /total >> I think that np.outer(row_sum, col_sum) is clearer than reshaping one to >> be a column vector. >> Make it one liner: expected = np.outer( table.sum(axis=1), table.sum(axis=0))/total >>> # Since we are passing in 1D arrays of length table.size, the default >>> # number of degrees of freedom is table.size-1. >>> # For a contingency table, the actual number degrees of freedom is >>> # (nr - 1)*(nc-1). We use the ddof argument >>> # of the chisquare function to adjust the default. >>> nr, nc = table.shape >>> dof = (nr - 1) * (nc - 1) >>> dof_adjust = (table.size - 1) - dof >>> >>> chi2, p = chisquare(np.ravel(table), np.ravel(expected), >>> ddof=dof_adjust) >>> Where is your chisquare function - this is meant to be a standard alone function? Why not do say: import special chi2_value=(((table-expected)**2)/expected).sum() chi2_prob=special.chdtrc(dof,chi2_value) >>> return chi2, p >>> > > Just a thought: > I think it would be useful to have this kind of proposals on the > scipy-user list (even though it is a dev issue), just to be able to > get more feedback from potential users. > > And again, > Thanks Neil, it's very nice to have the statistics in the docstrings > instead of having to run to Wikipedia > > Josef > > >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From nmb at wartburg.edu Wed Jun 2 12:26:04 2010 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Wed, 02 Jun 2010 11:26:04 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C06807A.40301@gmail.com> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> Message-ID: <4C06861C.1060401@wartburg.edu> On 2010-06-02 11:02 , Bruce Southey wrote: > On 06/02/2010 09:37 AM, josef.pktd at gmail.com wrote: >> On Wed, Jun 2, 2010 at 8:24 AM, Neil Martinsen-Burrell wrote: >> >>> On 2010-06-01 23:28 , Warren Weckesser wrote: >>> >>>> I've been digging into some basic statistics recently, and developed the >>>> following function for applying the chi-square test to a contingency >>>> table. Does something like this already exist in scipy.stats? If not, >>>> any objects to adding it? (Tests are already written :) >>>> >>> Something like this would be great in scipy.stats since I end up doing >>> the exact same thing by hand whenever I grade introductory statistics >>> exams. Thanks for writing this! >>> > You might find SAS helpful: > http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/freq_toc.htm I'm not sure what you mean by this. I have no problem performing the test, it's just inconvenient that it isn't already a part of scipy.stats > However, this code is the chi-squared test part as SAS will compute the > actual cell numbers. Also an extension to scipy.stats.chisquare() so we > can not have both functions. Again, I don't understand what you mean that we can't have both functions? I believe (from a statistics teacher's point of view) that the Chi-Squared goodness of fit test (which is stats.chisquare) is a different beast from the Chi-Square test for independence (which is stats.chisquare_contingency). The fact that the distribution of the test statistic is the same should not tempt us to put them into the same function. > Really this should be combined with fisher.py in ticket 956: > http://projects.scipy.org/scipy/ticket/956 Wow, apparently I have lots of disagreements today, but I don't think that this should be combined with Fisher's Exact test. (I would like to see that ticket mature to the point where it can be added to scipy.stats.) I like the functions in scipy.stats to correspond in a one-to-one manner with the statistical tests. I think that the docs should "See Also" the appropriate exact (and non-parametric) tests, but I think that one function/one test is a good rule. This is particularly true for people (like me) who would like to someday be able to use scipy.stats in a pedagogical context. -Neil From stefan.czesla at hs.uni-hamburg.de Wed Jun 2 13:14:04 2010 From: stefan.czesla at hs.uni-hamburg.de (Stefan) Date: Wed, 2 Jun 2010 17:14:04 +0000 (UTC) Subject: [SciPy-Dev] np.savetxt: apply patch in enhancement ticket 1079 to add headers? References: <4C066DA3.8010609@gmail.com> Message-ID: > Not that I am complaining rather trying to understand what is expected > to happen. > Under the patch, it is very much user beware.? The header argument can > be anything or nothing. There is no check for the contents or if the > delimiter used is the same as the rest of the output. Further with the > newline option there is no guarantee that the lines in the header will > have the same line endings throughout the file. > So what should a user be allowed to use as a header? > You could write a whole program there or an explanation of the > following output - which is very appealing. You could force a list of > strings so that you print out newline.join(header) - okay not quite > because it should include the comment argument. > Should savetxt be restricted to something that loadtxt can read? > This is potentially problematic if you want a header line. Although it > could return the number of header lines. > [savetxt should also be updated to allow bz2 as loadtxt handles those > now - not that I have used it] > > > > > Also note that since that patch was written, savetxt takes a user > supplied newline keyword, so you can just append that to the header > string. > > > > True, we were not aware of this, but this does not help much for the > comment/header. > > > > Entered as ~3 months ago:http://projects.scipy.org/numpy/changeset/8180 > Should this be forced to check for valid options for new lines? > Otherwise you from this? 'np.savetxt('junk.text', [1,2,3,4,5], > newline='what')' you get: > 1.000000000000000000e+00what2.000000000000000000e+00what 3.000000000000000000e+00what4.000000000000000000e+00 what5.000000000000000000e+00what > Which is not going to be read back by loadtxt. > > > > As numpy.loadtxt has a default comment character ('#'), the same may be > implemented for numpy.savetxt. In this case, numpy.savetxt would get two > additional keywords (e.g. header, comment(character)), which bloats the > interface, but potentially provides more safety. > > > > > FWIW, I ended up rolling my own using the most recent pre-Python 3 > changes for savetxt that accepts a list of names instead of one string > or if the provided array has the attribute dtype.names (non-nested rec > or structured arrays) it uses those. Whatever is done I think the > support for structured arrays is nice, and I think having this > functionality is a no-brainer. I need it quite often. > > > > Although, we have not been using record arrays too often, we see their > advantages and agree that it should be possible to use them as you described > it. > We also thought about a solution, using the __str__ method for the 'header > object'. In this vain, an arbitrary header class (including a plane string) > providing an __str__ member may be handed to numpy.savetxt, > which can use it to write the header. > So let us briefly summarize whats on the table. It appears to us that there are basically three open issues: (1) a csv like header for savetxt written files (first line contains column names) (2) comments (introduced by comment character e.g. '#') at the beginning of the file (preceding the data) (3) the role of the 'newline' option As was noted, the patch (ticket 1079) enables both to write a csv like header (1) and comment line(s) introduced by a comment character (e.g. '#'). Nonetheless, this solution is quite unsatisfactory in our opinion, because it may be error prone, as the user is in charge of the entire formatting. Despite this, we think that it should be up to the user what amount of information is to be put at the top of the file, but the format should be checked as far as possible. Using either a string or a list/tuple of strings, as proposed by Bruce, seems to be a reasonable possibility to implement the desired functionality. Maybe two individual keywords ('header' and 'comment') should exist to distinguish whether the the user requests case (1) or (2). As for loadtxt the default comment character should be '#', but it may be changed by the user. We think that savetxt should not be restricted to output, which can be read by loadtxt. Although it should be possible to add commments to the output file, so that it remains readable by loadtxt (without tweaking it e.g. with the skiprows keyword). We agree that the newline keyword may cause inconsistencies in the file (if ticket 1079 were applied), and possibly strange behavior such as when newline='what' is specified. Yet, this question does not only concern the header/comments. Stefan & Christian From bsouthey at gmail.com Wed Jun 2 14:10:12 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 02 Jun 2010 13:10:12 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C06861C.1060401@wartburg.edu> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> Message-ID: <4C069E84.4020308@gmail.com> On 06/02/2010 11:26 AM, Neil Martinsen-Burrell wrote: > On 2010-06-02 11:02 , Bruce Southey wrote: >> On 06/02/2010 09:37 AM, josef.pktd at gmail.com wrote: >>> On Wed, Jun 2, 2010 at 8:24 AM, Neil >>> Martinsen-Burrell wrote: >>> >>>> On 2010-06-01 23:28 , Warren Weckesser wrote: >>>> >>>>> I've been digging into some basic statistics recently, and >>>>> developed the >>>>> following function for applying the chi-square test to a contingency >>>>> table. Does something like this already exist in scipy.stats? If >>>>> not, >>>>> any objects to adding it? (Tests are already written :) >>>>> >>>> Something like this would be great in scipy.stats since I end up doing >>>> the exact same thing by hand whenever I grade introductory statistics >>>> exams. Thanks for writing this! >>>> >> You might find SAS helpful: >> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/freq_toc.htm >> > > I'm not sure what you mean by this. I have no problem performing the > test, it's just inconvenient that it isn't already a part of scipy.stats Because this is the main SAS procedure that does contingency tables and tests. There is useful information as well. > >> However, this code is the chi-squared test part as SAS will compute the >> actual cell numbers. Also an extension to scipy.stats.chisquare() so we >> can not have both functions. > > Again, I don't understand what you mean that we can't have both > functions? I believe (from a statistics teacher's point of view) that > the Chi-Squared goodness of fit test (which is stats.chisquare) is a > different beast from the Chi-Square test for independence (which is > stats.chisquare_contingency). The fact that the distribution of the > test statistic is the same should not tempt us to put them into the > same function. Please read scipy.stats.chisquare() because scipy.stats.chisquare() is the 1-d case of yours. Quote from the docstring: " The chi square test tests the null hypothesis that the categorical data has the given frequencies." Also go the web site provided in the docstring. By default you get the expected frequencies but you can also put in your own using the f_exp variable. You could do the same in your code. > >> Really this should be combined with fisher.py in ticket 956: >> http://projects.scipy.org/scipy/ticket/956 > > Wow, apparently I have lots of disagreements today, but I don't think > that this should be combined with Fisher's Exact test. (I would like > to see that ticket mature to the point where it can be added to > scipy.stats.) I like the functions in scipy.stats to correspond in a > one-to-one manner with the statistical tests. I think that the docs > should "See Also" the appropriate exact (and non-parametric) tests, > but I think that one function/one test is a good rule. This is > particularly true for people (like me) who would like to someday be > able to use scipy.stats in a pedagogical context. > > -Neil I don't see any 'disagreements' rather just different ways to do things and identifying areas that need to be addressed for more general use. I accept your opinion as here only because these functions only accept the digested (ie summarized) data. Bruce From nmb at wartburg.edu Wed Jun 2 14:18:01 2010 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Wed, 02 Jun 2010 13:18:01 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C069E84.4020308@gmail.com> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> Message-ID: <4C06A059.6020901@wartburg.edu> On 2010-06-02 13:10 , Bruce Southey wrote: [...] >>> However, this code is the chi-squared test part as SAS will compute the >>> actual cell numbers. Also an extension to scipy.stats.chisquare() so we >>> can not have both functions. >> >> Again, I don't understand what you mean that we can't have both >> functions? I believe (from a statistics teacher's point of view) that >> the Chi-Squared goodness of fit test (which is stats.chisquare) is a >> different beast from the Chi-Square test for independence (which is >> stats.chisquare_contingency). The fact that the distribution of the >> test statistic is the same should not tempt us to put them into the >> same function. > Please read scipy.stats.chisquare() because scipy.stats.chisquare() is > the 1-d case of yours. > Quote from the docstring: > " The chi square test tests the null hypothesis that the categorical data > has the given frequencies." > Also go the web site provided in the docstring. > > By default you get the expected frequencies but you can also put in your > own using the f_exp variable. You could do the same in your code. In fact, Warren correctly used stats.chisquare with the expected frequencies calculated from the null hypothesis and the corrected degrees of freedom. chisquare_contingency is in some sense a convenience method for taking care of these pre-calculations before calling stats.chisquare. Can you explain more clearly to me why we should not include such a convenience function? >>> Really this should be combined with fisher.py in ticket 956: >>> http://projects.scipy.org/scipy/ticket/956 >> >> Wow, apparently I have lots of disagreements today, but I don't think >> that this should be combined with Fisher's Exact test. (I would like >> to see that ticket mature to the point where it can be added to >> scipy.stats.) I like the functions in scipy.stats to correspond in a >> one-to-one manner with the statistical tests. I think that the docs >> should "See Also" the appropriate exact (and non-parametric) tests, >> but I think that one function/one test is a good rule. This is >> particularly true for people (like me) who would like to someday be >> able to use scipy.stats in a pedagogical context. >> >> -Neil > I don't see any 'disagreements' rather just different ways to do things > and identifying areas that need to be addressed for more general use. Agreed. :) [...] -Neil From stefan at sun.ac.za Wed Jun 2 14:23:55 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 Jun 2010 11:23:55 -0700 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: On 2 June 2010 04:22, Ralf Gommers wrote: >>> >> Typically, there is no reason not to have an extended section. ?Can >>> >> you give an example where it would seem unnecessary? > > I think we shouldn't go overboard here. In the great majority of cases it's > needed but sometimes there's just not much info to add besides what's in the > summary and parameter description. Examples: > http://docs.scipy.org/numpy/docs/numpy.core.umath.add/ > http://docs.scipy.org/numpy/docs/numpy.lib.ufunclike.isneginf/ > http://docs.scipy.org/numpy/docs/numpy.core.umath.logical_or/ Thanks, Ralf. Those are the examples I was looking for, and I agree. Regards St?fan From d.l.goldsmith at gmail.com Wed Jun 2 14:35:14 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 2 Jun 2010 11:35:14 -0700 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: On Tue, Jun 1, 2010 at 1:32 PM, David Goldsmith wrote: > The docstring Standard seems to be careful to note which sections are > considered optional, and the "Extended Summary" is *not* on that list. > However, > I'm encountering many SciPy docstrings I'm not talking about NumPy docstrings; I'm not looking at/touching NumPy docstrings; I'm only going after low-hangingl SciPy fruit. DG > ; cain the Wiki lacking this section and yet marked as "Needs review": > should I ignore this deficiency and add a ticket to clarify the Standard, or > should such docstrings be moved back to "Being written"? > > DG > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide. (As interpreted > by Robert Graves) > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Wed Jun 2 14:39:17 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 02 Jun 2010 13:39:17 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C06A059.6020901@wartburg.edu> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> Message-ID: <4C06A555.20803@gmail.com> On 06/02/2010 01:18 PM, Neil Martinsen-Burrell wrote: > On 2010-06-02 13:10 , Bruce Southey wrote: > [...] > >>>> However, this code is the chi-squared test part as SAS will compute >>>> the >>>> actual cell numbers. Also an extension to scipy.stats.chisquare() >>>> so we >>>> can not have both functions. >>> >>> Again, I don't understand what you mean that we can't have both >>> functions? I believe (from a statistics teacher's point of view) that >>> the Chi-Squared goodness of fit test (which is stats.chisquare) is a >>> different beast from the Chi-Square test for independence (which is >>> stats.chisquare_contingency). The fact that the distribution of the >>> test statistic is the same should not tempt us to put them into the >>> same function. >> Please read scipy.stats.chisquare() because scipy.stats.chisquare() is >> the 1-d case of yours. >> Quote from the docstring: >> " The chi square test tests the null hypothesis that the categorical >> data >> has the given frequencies." >> Also go the web site provided in the docstring. >> >> By default you get the expected frequencies but you can also put in your >> own using the f_exp variable. You could do the same in your code. > > In fact, Warren correctly used stats.chisquare with the expected > frequencies calculated from the null hypothesis and the corrected > degrees of freedom. chisquare_contingency is in some sense a > convenience method for taking care of these pre-calculations before > calling stats.chisquare. Can you explain more clearly to me why we > should not include such a convenience function? I do not understand you here. Clearly you have not read scipy.stats.chisquare() to know what it is doing. You should also read the cited url including the second part: http://faculty.vassar.edu/lowry/ch8pt2.html I don't see any 'pre-calculations' in the code. You have to compute the 'expected value' for each cell because of the overall null hypothesis. Then you have to sum across all cells the value of (observed-expected)*(observed-expected)/expected to get the test statistic. That is trivial to do within the code and a waste of cpu time and memory to send it to another function to do that. Bruce > >>>> Really this should be combined with fisher.py in ticket 956: >>>> http://projects.scipy.org/scipy/ticket/956 >>> >>> Wow, apparently I have lots of disagreements today, but I don't think >>> that this should be combined with Fisher's Exact test. (I would like >>> to see that ticket mature to the point where it can be added to >>> scipy.stats.) I like the functions in scipy.stats to correspond in a >>> one-to-one manner with the statistical tests. I think that the docs >>> should "See Also" the appropriate exact (and non-parametric) tests, >>> but I think that one function/one test is a good rule. This is >>> particularly true for people (like me) who would like to someday be >>> able to use scipy.stats in a pedagogical context. >>> >>> -Neil >> I don't see any 'disagreements' rather just different ways to do things >> and identifying areas that need to be addressed for more general use. > > Agreed. :) > > [...] > > -Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Wed Jun 2 14:39:54 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 Jun 2010 11:39:54 -0700 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: On 2 June 2010 11:35, David Goldsmith wrote: >> I'm encountering many SciPy docstrings > > I'm not talking about NumPy docstrings; I'm not looking at/touching NumPy > docstrings; I'm only going after low-hangingl SciPy fruit. I think Ralf's point was that we have more important things to do than nitpick around whether some functions should have extended sections or not. Let's get cracking on the many docstrings that are not even close to done. Regards St?fan From josef.pktd at gmail.com Wed Jun 2 14:41:47 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 2 Jun 2010 14:41:47 -0400 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C06A059.6020901@wartburg.edu> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> Message-ID: On Wed, Jun 2, 2010 at 2:18 PM, Neil Martinsen-Burrell wrote: > On 2010-06-02 13:10 , Bruce Southey wrote: > [...] > >>>> However, this code is the chi-squared test part as SAS will compute the >>>> actual cell numbers. Also an extension to scipy.stats.chisquare() so we >>>> can not have both functions. >>> >>> Again, I don't understand what you mean that we can't have both >>> functions? I believe (from a statistics teacher's point of view) that >>> the Chi-Squared goodness of fit test (which is stats.chisquare) is a >>> different beast from the Chi-Square test for independence (which is >>> stats.chisquare_contingency). The fact that the distribution of the >>> test statistic is the same should not tempt us to put them into the >>> same function. >> Please read scipy.stats.chisquare() because scipy.stats.chisquare() is >> the 1-d case of yours. >> Quote from the docstring: >> " The chi square test tests the null hypothesis that the categorical data >> has the given frequencies." >> Also go the web site provided in the docstring. >> >> By default you get the expected frequencies but you can also put in your >> own using the f_exp variable. You could do the same in your code. > > In fact, Warren correctly used stats.chisquare with the expected > frequencies calculated from the null hypothesis and the corrected > degrees of freedom. ?chisquare_contingency is in some sense a > convenience method for taking care of these pre-calculations before > calling stats.chisquare. ?Can you explain more clearly to me why we > should not include such a convenience function? Just a clarification, before I find time to work my way through the other comments stats.chisquare is a generic test for goodness-of-fit for discreted or binned distributions. and from the docstring of it "If no expected frequencies are given, the total N is assumed to be equally distributed across all groups." default is uniform distribution chisquare_twoway is a special case that additional calculates the correct expected frequencies for the test of independencs based on the margin totals. The resulting distribution is not uniform. I agree with Neil that this is a very useful convenience function. I never heard of a one-way contingency table, my question was whether the function should also handle 3-way or 4-way tables, additional to two-way. I thought about the question how the input should be specified for my initial response, the alternative would be to use the original data or a "long" format instead of a table. But I thought that as a convenience function using the table format will be the most common use. I have written in the past functions that calculate the contingency table, and would be very useful to have a more complete coverage of tools to work with contingency tables in scipy.stats (or temporarily in statsmodels, where we are working also on the anova type of analysis) So, I think the way it is it is a nice function and we don't have to put all contingency table analysis into this function. Josef > >>>> Really this should be combined with fisher.py in ticket 956: >>>> http://projects.scipy.org/scipy/ticket/956 >>> >>> Wow, apparently I have lots of disagreements today, but I don't think >>> that this should be combined with Fisher's Exact test. (I would like >>> to see that ticket mature to the point where it can be added to >>> scipy.stats.) I like the functions in scipy.stats to correspond in a >>> one-to-one manner with the statistical tests. I think that the docs >>> should "See Also" the appropriate exact (and non-parametric) tests, >>> but I think that one function/one test is a good rule. This is >>> particularly true for people (like me) who would like to someday be >>> able to use scipy.stats in a pedagogical context. >>> >>> -Neil >> I don't see any 'disagreements' rather just different ways to do things >> and identifying areas that need to be addressed for more general use. > > Agreed. :) > > [...] > > -Neil > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From d.l.goldsmith at gmail.com Wed Jun 2 14:47:05 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 2 Jun 2010 11:47:05 -0700 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: I'm working on the docstrings - is any one else? DG 2010/6/2 St?fan van der Walt > On 2 June 2010 11:35, David Goldsmith wrote: > >> I'm encountering many SciPy docstrings > > > > I'm not talking about NumPy docstrings; I'm not looking at/touching NumPy > > docstrings; I'm only going after low-hangingl SciPy fruit. > > I think Ralf's point was that we have more important things to do than > nitpick around whether some functions should have extended sections or > not. Let's get cracking on the many docstrings that are not even > close to done. > > Regards > St?fan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Wed Jun 2 14:59:04 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 Jun 2010 11:59:04 -0700 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: On 2 June 2010 11:47, David Goldsmith wrote: > I'm working on the docstrings - is any one else? In the past, there used to be very targeted mini-sprints; are we following a similar process this time? If so, where should we focus our attention? If you post a list of 5 functions that need urgent attention, I'll put in some time to document at least one of them. Regards St?fan From stefan at sun.ac.za Wed Jun 2 14:55:24 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 Jun 2010 11:55:24 -0700 Subject: [SciPy-Dev] Recent changes to scipy stats In-Reply-To: References: <4D0A9D22-882F-4FCC-82D5-740D332CF7F9@enthought.com> Message-ID: Dear Travis (and others) On 1 June 2010 01:25, Travis Oliphant wrote: > I actually think it very inconsiderate that I should be treated with such rudeness for contributing needed functionality. I was saddened to witness the tone of these conversations, and I wish certain rash personal comments by Charles and David G were rather not made; they certainly don't reflect the attitude of the community as a whole. While you and I have very different approaches to software engineering, I respect the fact that we both aim to achieve the same goal: create a better SciPy. In the past, this spirit of innovation helped to form a remarkably friendly, driven and effective community in which decisions were reached by civil argument and consensus, rather than hard-line rules and policies. Hopefully, we can all return our focus to steering this ship in the same direction. If some technological changes would help with that process, that's well worth investigating (Jarrod and I are nearly ready with a NEP for switching to Github). Kind regards St?fan From njs at pobox.com Wed Jun 2 15:06:29 2010 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 2 Jun 2010 12:06:29 -0700 Subject: [SciPy-Dev] scipy.stats In-Reply-To: <1331DFB9-4FCA-44F3-A1D0-C00714A60511@enthought.com> References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> <3430B5AD-E3C2-4CE7-B07F-D8210C2E53D5@enthought.com> <1331DFB9-4FCA-44F3-A1D0-C00714A60511@enthought.com> Message-ID: On Tue, Jun 1, 2010 at 4:33 PM, Travis Oliphant wrote: > > I really think this is more about how people view commits to the trunk than anything else. ? I like to use SVN as a version control system. ? My commits to trunk are always more incremental. ? I like to get things committed in self-contained chunks. ? Adding the requirement to put in documentation and tests before committing stretches out that "incremental" work element to longer than I ever have time for in one sitting. > > Clearly, if I were using DVCS to a published branch that could be then merged to the trunk this problem would not have arisen. ? I see that I need to move to that style. ? ?People are reading far more into my committing to trunk than I ever meant to imply. I remember when I first started hacking free software, this was the model that *every* project used, and when people started talking about "always releasable trunks" it seemed like the weirdest, most unlikely concept ever. (I guess that makes this a generational thing?) Having finally wrapped my head around it on a few other projects, though, I can't imagine ever going back. Those "rules" and "procedures" are about as jackbooted as a dayplanner or a todo list... they let us avoid all the stress of having to remember which pieces *have* to get added before a release can happen, accidentally crashing into other people's work, having big debates, etc.; we can just get on with hacking and the resulting code is even better. (Because *everyone*'s code is better for being reviewed and tested. Even mine!) The other thing that helped reconcile me to this style of development was figuring out how to make testing less of a chore. Personally, I can't deal with TDD -- I don't understand how people know what the API should look like (to write the test) until they've written the implementation! But a much simpler method works for me: I never would commit code without at least *running* it, so now I've trained myself to just type those "hey, does this thing I just wrote work at *all*?" lines into a test function instead of a REPL. And while I'm sure there are all sorts of wonderful virtues and maintenance benefits to having a test suite, the real reason I do this is discovering that while I'm actually hacking, it's way easier to hit the 're-run tests' button than it is to re-copy/paste that line of code into the REPL. Kind of embarrassing in retrospect... No idea how any of this applies to others, but maybe someone will find it useful. -- Nathaniel From matthew.brett at gmail.com Wed Jun 2 15:51:34 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 2 Jun 2010 12:51:34 -0700 Subject: [SciPy-Dev] Recent changes to scipy stats In-Reply-To: References: <4D0A9D22-882F-4FCC-82D5-740D332CF7F9@enthought.com> Message-ID: Hi, > On 1 June 2010 01:25, Travis Oliphant wrote: >> I actually think it very inconsiderate that I should be treated with such rudeness for contributing needed functionality. > > I was saddened to witness the tone of these conversations, and I wish > certain rash personal comments by Charles and David G were rather not > made; they certainly don't reflect the attitude of the community as a > whole. Well - hold on though. Of course we should call people out on being personally offensive - but if we're going to do that, we should do it at the time of the email - directly to that person - it's only fair. And - I think we have to be careful also to defend our ability to be direct and honest when then are problems that need to be addressed. Any community needs that in order to grow, I believe. See you, Matthew From bsouthey at gmail.com Wed Jun 2 16:03:07 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 02 Jun 2010 15:03:07 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> Message-ID: <4C06B8FB.8080806@gmail.com> On 06/02/2010 01:41 PM, josef.pktd at gmail.com wrote: > On Wed, Jun 2, 2010 at 2:18 PM, Neil Martinsen-Burrell wrote: > >> On 2010-06-02 13:10 , Bruce Southey wrote: >> [...] >> >> >>>>> However, this code is the chi-squared test part as SAS will compute the >>>>> actual cell numbers. Also an extension to scipy.stats.chisquare() so we >>>>> can not have both functions. >>>>> >>>> Again, I don't understand what you mean that we can't have both >>>> functions? I believe (from a statistics teacher's point of view) that >>>> the Chi-Squared goodness of fit test (which is stats.chisquare) is a >>>> different beast from the Chi-Square test for independence (which is >>>> stats.chisquare_contingency). The fact that the distribution of the >>>> test statistic is the same should not tempt us to put them into the >>>> same function. >>>> >>> Please read scipy.stats.chisquare() because scipy.stats.chisquare() is >>> the 1-d case of yours. >>> Quote from the docstring: >>> " The chi square test tests the null hypothesis that the categorical data >>> has the given frequencies." >>> Also go the web site provided in the docstring. >>> >>> By default you get the expected frequencies but you can also put in your >>> own using the f_exp variable. You could do the same in your code. >>> >> In fact, Warren correctly used stats.chisquare with the expected >> frequencies calculated from the null hypothesis and the corrected >> degrees of freedom. chisquare_contingency is in some sense a >> convenience method for taking care of these pre-calculations before >> calling stats.chisquare. Can you explain more clearly to me why we >> should not include such a convenience function? >> > Just a clarification, before I find time to work my way through the > other comments > > stats.chisquare is a generic test for goodness-of-fit for discreted or > binned distributions. > and from the docstring of it > "If no expected frequencies are given, the total > N is assumed to be equally distributed across all groups." > > default is uniform distribution > > Try: http://en.wikipedia.org/wiki/Pearson's_chi-square_test The use of the uniform distribution is rather misleading and technically wrong as it does not help address the expected number of outcomes in a cell: http://en.wikipedia.org/wiki/Discrete_uniform_distribution > chisquare_twoway is a special case that additional calculates the > correct expected frequencies for the test of independencs based on the > margin totals. The resulting distribution is not uniform. > Actually the null hypothesis is rather different between 1-way and 2-way tables so you can not say that chisquare_twoway is a special case of chisquare. I am not sure what you mean by the 'resulting distribution is not uniform'. The distribution of the cells values has nothing to do with the uniform distribution in either case because it is not used in the data nor in the formulation of the test. (And, yes, I have had to do the proof that the test statistic is Chi-squared - which is why there is the warning about small cells...). > I agree with Neil that this is a very useful convenience function. > My problem with the chisquare_twoway is that it should not call another function to finish two lines of code. It is just an excessive waste of resources. > I never heard of a one-way contingency table, my question was whether > the function should also handle 3-way or 4-way tables, additional to > two-way. > Correct to both of these as I just consider these as n-way tables. I think that contingency tables by definition only applies to the 2-d case. Pivot tables are essentially the same thing. I would have to lookup on how to get the expected number of outcomes but probably of the form Ni.. * N.j. *N..k/N... for the 3-way (the 2-way table is of the form Ni.*N.j/N..) for i=rows, j=columns, k=3rd axis and '.' means sum for that axis. > I thought about the question how the input should be specified for my > initial response, the alternative would be to use the original data or > a "long" format instead of a table. But I thought that as a > convenience function using the table format will be the most common > use. > I have written in the past functions that calculate the contingency > table, and would be very useful to have a more complete coverage of > tools to work with contingency tables in scipy.stats (or temporarily > in statsmodels, where we are working also on the anova type of > analysis) > It depends on what tasks are needed. Really there are two steps: 1) Cross-tabulation that summarized the data from whatever input (groupby would help here). 2) Statistical tests - series of functions that accept summarized data only. If you have separate functions then the burden is on the user to find and call all the desired functions. You can also provide a single helper function to do all that because you don't want to repeat unnecessary calls. > So, I think the way it is it is a nice function and we don't have to > put all contingency table analysis into this function. > > Josef > Bruce > >> >>>>> Really this should be combined with fisher.py in ticket 956: >>>>> http://projects.scipy.org/scipy/ticket/956 >>>>> >>>> Wow, apparently I have lots of disagreements today, but I don't think >>>> that this should be combined with Fisher's Exact test. (I would like >>>> to see that ticket mature to the point where it can be added to >>>> scipy.stats.) I like the functions in scipy.stats to correspond in a >>>> one-to-one manner with the statistical tests. I think that the docs >>>> should "See Also" the appropriate exact (and non-parametric) tests, >>>> but I think that one function/one test is a good rule. This is >>>> particularly true for people (like me) who would like to someday be >>>> able to use scipy.stats in a pedagogical context. >>>> >>>> -Neil >>>> >>> I don't see any 'disagreements' rather just different ways to do things >>> and identifying areas that need to be addressed for more general use. >>> >> Agreed. :) >> >> [...] >> >> -Neil >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Wed Jun 2 16:21:32 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Wed, 2 Jun 2010 15:21:32 -0500 Subject: [SciPy-Dev] Difference between scipy.stats.gengamma and scipy.stats.distributions.gengamma In-Reply-To: References: Message-ID: <5FDEAF11-E3B4-4285-A8C7-8E1676201466@enthought.com> On Jun 2, 2010, at 12:26 AM, josef.pktd at gmail.com wrote: > On Wed, Jun 2, 2010 at 1:09 AM, David Goldsmith wrote: >> Is there a difference between these two? Same question for stats.lognorm >> and stats.distributions.lognorm? Thanks. > > No, they are the same instance of the distribution > > scipy.stats.__init__ has a from distributions import * or something like this In general, the original design concept in scipy name-spaces is that names should not be imported from their "leaf-node", but from somewhere higher up. The fact that the distribution objects are in scipy.stats.distributions should not be relied upon. This is the same philosophy in NumPy (i.e. you shouldn't import things from numpy.core or numpy.lib directly). -Travis > > Josef > >> >> DG >> >> -- >> Mathematician: noun, someone who disavows certainty when their uncertainty >> set is non-empty, even if that set has measure zero. >> >> Hope: noun, that delusive spirit which escaped Pandora's jar and, with her >> lies, prevents mankind from committing a general suicide. (As interpreted >> by Robert Graves) >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From oliphant at enthought.com Wed Jun 2 16:23:44 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Wed, 2 Jun 2010 15:23:44 -0500 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> <3430B5AD-E3C2-4CE7-B07F-D8210C2E53D5@enthought.com> <1331DFB9-4FCA-44F3-A1D0-C00714A60511@enthought.com> Message-ID: On Jun 2, 2010, at 2:06 PM, Nathaniel Smith wrote: > On Tue, Jun 1, 2010 at 4:33 PM, Travis Oliphant wrote: >> >> I really think this is more about how people view commits to the trunk than anything else. I like to use SVN as a version control system. My commits to trunk are always more incremental. I like to get things committed in self-contained chunks. Adding the requirement to put in documentation and tests before committing stretches out that "incremental" work element to longer than I ever have time for in one sitting. >> >> Clearly, if I were using DVCS to a published branch that could be then merged to the trunk this problem would not have arisen. I see that I need to move to that style. People are reading far more into my committing to trunk than I ever meant to imply. > > I remember when I first started hacking free software, this was the > model that *every* project used, and when people started talking about > "always releasable trunks" it seemed like the weirdest, most unlikely > concept ever. (I guess that makes this a generational thing?) Having > finally wrapped my head around it on a few other projects, though, I > can't imagine ever going back. Those "rules" and "procedures" are > about as jackbooted as a dayplanner or a todo list... they let us > avoid all the stress of having to remember which pieces *have* to get > added before a release can happen, accidentally crashing into other > people's work, having big debates, etc.; we can just get on with > hacking and the resulting code is even better. (Because *everyone*'s > code is better for being reviewed and tested. Even mine!) > > The other thing that helped reconcile me to this style of development > was figuring out how to make testing less of a chore. Personally, I > can't deal with TDD -- I don't understand how people know what the API > should look like (to write the test) until they've written the > implementation! But a much simpler method works for me: I never would > commit code without at least *running* it, so now I've trained myself > to just type those "hey, does this thing I just wrote work at *all*?" > lines into a test function instead of a REPL. And while I'm sure there > are all sorts of wonderful virtues and maintenance benefits to having > a test suite, the real reason I do this is discovering that while I'm > actually hacking, it's way easier to hit the 're-run tests' button > than it is to re-copy/paste that line of code into the REPL. Kind of > embarrassing in retrospect... > > No idea how any of this applies to others, but maybe someone will find > it useful. I found it very useful. Thanks for sharing your experience. -Travis > > -- Nathaniel > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From ilanschnell at gmail.com Wed Jun 2 18:17:11 2010 From: ilanschnell at gmail.com (Ilan Schnell) Date: Wed, 2 Jun 2010 17:17:11 -0500 Subject: [SciPy-Dev] import error in scipy.stats on RH3 32-bit Message-ID: Hello group, I'm not exactly sure what has changed in scipy.stats, but building and importing all extensions with the first 0.8.x brach (revision 6446) worked fine on CentOS release 3.9 (32-bit). Now (revision 6476), I can still build everything, but when I try to import scipy.stats.vonmises_cython, I get the following unresolved symbol: ImportError: /home/tester/master/lib/python2.6/site-packages/scipy/linalg/clapack.so: undefined symbol: clapack_sgesv Strangely, I don't get this import error on any platform (64-bit/32-bit, Windows, MaxOSX, Redhat 5, Solaris). Does anyone know what could be going on here? - Ilan From ben.root at ou.edu Wed Jun 2 19:03:36 2010 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 2 Jun 2010 18:03:36 -0500 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: You may have my keyboard! Seriously, though, I just registered a username on the wiki (weathergod) and I would like to help out with documentation. Ben Root On Wed, Jun 2, 2010 at 1:47 PM, David Goldsmith wrote: > I'm working on the docstrings - is any one else? > > DG > > 2010/6/2 St?fan van der Walt > > On 2 June 2010 11:35, David Goldsmith wrote: >> >> I'm encountering many SciPy docstrings >> > >> > I'm not talking about NumPy docstrings; I'm not looking at/touching >> NumPy >> > docstrings; I'm only going after low-hangingl SciPy fruit. >> >> I think Ralf's point was that we have more important things to do than >> nitpick around whether some functions should have extended sections or >> not. Let's get cracking on the many docstrings that are not even >> close to done. >> >> Regards >> St?fan >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide. (As interpreted > by Robert Graves) > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Wed Jun 2 19:27:14 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 Jun 2010 16:27:14 -0700 Subject: [SciPy-Dev] Recent changes to scipy stats In-Reply-To: References: <4D0A9D22-882F-4FCC-82D5-740D332CF7F9@enthought.com> Message-ID: On 2 June 2010 12:51, Matthew Brett wrote: >> I was saddened to witness the tone of these conversations, and I wish >> certain rash personal comments by Charles and David G were rather not >> made; they certainly don't reflect the attitude of the community as a >> whole. > > Well - hold on though. ?Of course we should call people out on being > personally offensive - but if we're going to do that, we should do it > at the time of the email - directly to that person - it's only fair. Unfortunately, those comments were made in public; if we express our disagreement in private only, the offended party would never even be aware of any disagreement in the community. As for calling people out at time of writing, such a time limit suggests that any distress caused is similarly limited, which it is not. > And - I think we have to be careful also to defend our ability to be > direct and honest when then are problems that need to be addressed. > Any community needs that in order to grow, I believe. Yes, absolutely: direct and honest discourse is great. Offensive statements are not. Regards St?fan From matthew.brett at gmail.com Wed Jun 2 19:34:34 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 2 Jun 2010 16:34:34 -0700 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: <18D26A1A-0164-4D80-8619-BAC28FC33D11@enthought.com> <3430B5AD-E3C2-4CE7-B07F-D8210C2E53D5@enthought.com> <1331DFB9-4FCA-44F3-A1D0-C00714A60511@enthought.com> Message-ID: Hi, > The other thing that helped reconcile me to this style of development > was figuring out how to make testing less of a chore. Personally, I > can't deal with TDD -- I don't understand how people know what the API > should look like (to write the test) until they've written the > implementation! I just thought I'd pitch in with this one, because there can be confusion between writing code with tests, and test-driven-development. My understanding is that there is good objective evidence that test-driven-development improves code quality, but it takes a lot of discipline until you are used to it. In my experience it's most important precisely for defining the API, because, in writing the tests, you start defining what the API will look like, and then I find that my API is pretty bad and I change it before I've written the code. But - if TDD is an ideal - it is of course a matter of personal practice. But - having tests for your code - developed before, after, or during your code - that's really important for having maintainable code - as I'm sure we all agree. And - yes - absolutely - if you are doing _any_ kind of testing when developing, please do check that in, even if that's all you've got - at least it's something, Thanks a lot, Matthew From stefan at sun.ac.za Wed Jun 2 20:02:40 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 Jun 2010 17:02:40 -0700 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: On 2 June 2010 16:03, Benjamin Root wrote: > You may have my keyboard! > > Seriously, though, I just registered a username on the wiki (weathergod) and > I would like to help out with documentation. Added, and welcome! Cheers St?fan From warren.weckesser at enthought.com Wed Jun 2 20:04:11 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 02 Jun 2010 19:04:11 -0500 Subject: [SciPy-Dev] Mea culpa: deprecation and API changes In-Reply-To: <4C044B0F.4000103@enthought.com> References: <4C012BFE.4090103@enthought.com> <4C0149EB.8030608@enthought.com> <4C03F00F.3020806@enthought.com> <4C044B0F.4000103@enthought.com> Message-ID: <4C06F17B.7040209@enthought.com> Warren Weckesser wrote: > Opinion wanted: codata.find(sub) used to print a list of strings. A > while ago, in response to http://projects.scipy.org/scipy/ticket/996, I > changed it to return the list of strings. But this is an API change, > and should follow the deprecation policy. One way to do this is to > restore find() to its previous behavior, and deprecate the function. At > the same time, add a new function, find_string(sub), which returns the > list of strings. What do you think? > > Instead of creating a new function, I added a keyword argument whose default value (True) preserves the old behavior. When it is False, it returns the keys instead of printing them. In 0.9, the default behavior will be reversed. Warren > Warren > > > Warren Weckesser wrote: > >> David Cournapeau wrote: >> >> >>> On Sun, May 30, 2010 at 2:07 AM, Warren Weckesser >>> wrote: >>> >>> >>> >>>> David Cournapeau wrote: >>>> >>>> >>>> >>>>> On Sun, May 30, 2010 at 12:00 AM, Warren Weckesser >>>>> wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> What I would like to do is leave trunk as it is, and after 0.8 is >>>>>> branched, make the appropriate changes in the branch to follow the >>>>>> deprecation policy. Is that a reasonable approach? >>>>>> >>>>>> >>>>>> >>>>>> >>>>> May I ask why do you want to do that way ? >>>>> >>>>> >>>>> >>>> Because it doesn't look like I will have time to make the changes before >>>> Ralf branches 0.8 tomorrow. >>>> >>>> >>>> >>>> >>>>> Putting the deprecation in >>>>> the release branch means people tracking trunk will never see them. >>>>> >>>>> >>>>> >>>>> >>>> Good point. But in case I am misinterpreting what you mean by >>>> "tracking trunk" and "see": I assume this means it is important to have >>>> a record of the deprecation changes in the svn logs, and not that some >>>> who is *using* scipy from trunk also needs to be exposed to the >>>> deprecation warning for some minimum amount of time. >>>> >>>> >>>> >>> actually, I meant both. For example, I often use scipy from trunk, and >>> rarely from releases. I will never see the deprecation, which is not >>> good. >>> >>> Also, I think we should generally try to never put things in release >>> branches, but always backport from trunk (except for branch specific >>> changes). Having the 0.8 branch created tomorrow does not mean you >>> cannot put the changes into trunk, and backport them in 0.8 later - >>> deprecation which were already agreed on are the kind of things which >>> can happen after the branching without putting much burden on the >>> release process. >>> >>> >>> >>> >>>> If the changes are >>>> made to trunk, then they will be undone immediately after 0.8 is >>>> branched. >>>> >>>> >>>> >>> deprecated features do not be to be removed just after the trunk is >>> opened for the next release cycle (0.9 here). >>> >>> >>> >>> >>>> ever have a copy that includes the deprecation warnings. In other >>>> words, deprecations are linked to releases, not to "time in trunk". >>>> >>>> >>>> >>> Indeed - but I think that we should let the deprecation be in place >>> for as long as possible in the source code repository. >>> >>> >>> >>> >> OK. It might be a couple more days before I can make the reversions and >> deprecations, but I'll get them in before the beta release on June 6. >> >> Warren >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From vincent at vincentdavis.net Wed Jun 2 20:11:09 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Wed, 2 Jun 2010 18:11:09 -0600 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: I just setup an account. vincentdavis I assume I will find instruction but how does the review/commit of updates work (in brief) Thanks Vincent 2010/6/2 St?fan van der Walt : > On 2 June 2010 16:03, Benjamin Root wrote: >> You may have my keyboard! >> >> Seriously, though, I just registered a username on the wiki (weathergod) and >> I would like to help out with documentation. > > Added, and welcome! > > Cheers > St?fan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From charlesr.harris at gmail.com Wed Jun 2 21:14:59 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 2 Jun 2010 19:14:59 -0600 Subject: [SciPy-Dev] import error in scipy.stats on RH3 32-bit In-Reply-To: References: Message-ID: On Wed, Jun 2, 2010 at 4:17 PM, Ilan Schnell wrote: > Hello group, > > I'm not exactly sure what has changed in scipy.stats, but > building and importing all extensions with the first 0.8.x brach > (revision 6446) worked fine on CentOS release 3.9 (32-bit). > > Does revision 6446 still work? The vonmises distribution hasn't been touched in a long time. Now (revision 6476), I can still build everything, but when > I try to import scipy.stats.vonmises_cython, I get the following > unresolved symbol: > ImportError: > /home/tester/master/lib/python2.6/site-packages/scipy/linalg/clapack.so: > undefined symbol: clapack_sgesv > > Strangely, I don't get this import error on any platform (64-bit/32-bit, > Windows, MaxOSX, Redhat 5, Solaris). > > Does anyone know what could be going on here? > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmay31 at gmail.com Wed Jun 2 21:32:44 2010 From: rmay31 at gmail.com (Ryan May) Date: Wed, 2 Jun 2010 20:32:44 -0500 Subject: [SciPy-Dev] Mea culpa: deprecation and API changes In-Reply-To: <4C06F17B.7040209@enthought.com> References: <4C012BFE.4090103@enthought.com> <4C0149EB.8030608@enthought.com> <4C03F00F.3020806@enthought.com> <4C044B0F.4000103@enthought.com> <4C06F17B.7040209@enthought.com> Message-ID: On Wed, Jun 2, 2010 at 7:04 PM, Warren Weckesser wrote: > Warren Weckesser wrote: >> Opinion wanted: ?codata.find(sub) used to print a list of strings. ?A >> while ago, in response to http://projects.scipy.org/scipy/ticket/996, ?I >> changed it to return the list of strings. ?But this is an API change, >> and should follow the deprecation policy. ?One way to do this is to >> restore find() to its previous behavior, and deprecate the function. ?At >> the same time, add a new function, find_string(sub), which returns the >> list of strings. ?What do you think? >> >> > > Instead of creating a new function, I added a keyword argument whose > default value (True) preserves the old behavior. ?When it is False, it > returns the keys instead of printing them. ?In 0.9, the default behavior > will be reversed. Why not always return the list and just make only the print controlled by the kwarg? That way the return type of the function doesn't depend on a kwarg, which IIRC is considered bad style. You won't break existing code, which will just ignore the new return value. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma From warren.weckesser at enthought.com Wed Jun 2 21:45:53 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 02 Jun 2010 20:45:53 -0500 Subject: [SciPy-Dev] Mea culpa: deprecation and API changes In-Reply-To: References: <4C012BFE.4090103@enthought.com> <4C0149EB.8030608@enthought.com> <4C03F00F.3020806@enthought.com> <4C044B0F.4000103@enthought.com> <4C06F17B.7040209@enthought.com> Message-ID: <4C070951.8070209@enthought.com> Ryan May wrote: > On Wed, Jun 2, 2010 at 7:04 PM, Warren Weckesser > wrote: > >> Warren Weckesser wrote: >> >>> Opinion wanted: codata.find(sub) used to print a list of strings. A >>> while ago, in response to http://projects.scipy.org/scipy/ticket/996, I >>> changed it to return the list of strings. But this is an API change, >>> and should follow the deprecation policy. One way to do this is to >>> restore find() to its previous behavior, and deprecate the function. At >>> the same time, add a new function, find_string(sub), which returns the >>> list of strings. What do you think? >>> >>> >>> >> Instead of creating a new function, I added a keyword argument whose >> default value (True) preserves the old behavior. When it is False, it >> returns the keys instead of printing them. In 0.9, the default behavior >> will be reversed. >> > > Why not always return the list and just make only the print controlled > by the kwarg? That way the return type of the function doesn't depend > on a kwarg, which IIRC is considered bad style. You won't break > existing code, which will just ignore the new return value. > That seemed the most conservative approach, despite being bad style. It can all be cleaned up in 0.9 anyway. I'm currently working on "fixing" signal.waveforms.chirp to maintain compatibility for one release cycle. More judgment calls will be required, and I'm sure that not everyone would do it the same way. Anyone want to write the official "SciPy Developers Deprecation Guidelines (with recommended patterns of deprecation and a bunch of use-cases)"? Warren > Ryan > > From d.l.goldsmith at gmail.com Wed Jun 2 21:54:30 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 2 Jun 2010 18:54:30 -0700 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: On Wed, Jun 2, 2010 at 5:11 PM, Vincent Davis wrote: > I just setup an account. vincentdavis > I assume I will find instruction but how does the review/commit of > updates work (in brief) > Ah, good question, with a somewhat complicated answer, I'm afraid. At a minimum, when a writer/editor feels that a docstring is "done," s/he "promotes" it to "Needs review" status. In addition, since we feel that, very generally speaking, a "Needs review" docstring is in a more advanced state than whatever is in the current distribution, s/he also marks the docstring as "OK to apply Yes." Then, eventually, two things happen: a release manager/worker comes along and merges "OK to apply Yes" docstrings into the source code, and a reviewer - different than the writer/editor(s) who worked on the docstring - comes along, reviews the docstring, and either promotes it to "Reviewed, needs proof" or demotes it to "Reviewed, needs work." Here is where it gets a little "complicated." The review effort, which _NumPy_ is largely ready for, is stalled pending implementation of enhancements to the Wiki to support a dual review system: in the past, parties have found reviewed and proofed, i.e., "finalized" docstrings which are either pretty unclear, or were pretty clear but had technical deficiencies. Consequently, we've been wanting to implement a system whereby each docstring must pass both a technical and a "presentation" review, but, as I said, the Wiki presently doesn't support this. Joe Harrington and myself have been trying to line up the labor to get this done, so far unsuccessfully. I think there may soon be an announcement concerning this... (The delay in the review process is at least in part why we've opted to go ahead and start incorporating "unfinalized" docstrings into the source.) Welcome aboard, and thanks! DG > > Thanks > Vincent > > 2010/6/2 St?fan van der Walt : > > On 2 June 2010 16:03, Benjamin Root wrote: > >> You may have my keyboard! > >> > >> Seriously, though, I just registered a username on the wiki (weathergod) > and > >> I would like to help out with documentation. > > > > Added, and welcome! > > > > Cheers > > St?fan > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Wed Jun 2 22:06:30 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 02 Jun 2010 21:06:30 -0500 Subject: [SciPy-Dev] Mea culpa: deprecation and API changes In-Reply-To: <4C070951.8070209@enthought.com> References: <4C012BFE.4090103@enthought.com> <4C0149EB.8030608@enthought.com> <4C03F00F.3020806@enthought.com> <4C044B0F.4000103@enthought.com> <4C06F17B.7040209@enthought.com> <4C070951.8070209@enthought.com> Message-ID: <4C070E26.3060306@enthought.com> Warren Weckesser wrote: > Ryan May wrote: > >> On Wed, Jun 2, 2010 at 7:04 PM, Warren Weckesser >> wrote: >> >> >>> Warren Weckesser wrote: >>> >>> >>>> Opinion wanted: codata.find(sub) used to print a list of strings. A >>>> while ago, in response to http://projects.scipy.org/scipy/ticket/996, I >>>> changed it to return the list of strings. But this is an API change, >>>> and should follow the deprecation policy. One way to do this is to >>>> restore find() to its previous behavior, and deprecate the function. At >>>> the same time, add a new function, find_string(sub), which returns the >>>> list of strings. What do you think? >>>> >>>> >>>> >>>> >>> Instead of creating a new function, I added a keyword argument whose >>> default value (True) preserves the old behavior. When it is False, it >>> returns the keys instead of printing them. In 0.9, the default behavior >>> will be reversed. >>> >>> >> Why not always return the list and just make only the print controlled >> by the kwarg? That way the return type of the function doesn't depend >> on a kwarg, which IIRC is considered bad style. You won't break >> existing code, which will just ignore the new return value. >> >> > > That seemed the most conservative approach, despite being bad style. It > can all be cleaned up in 0.9 anyway. > > I'm currently working on "fixing" signal.waveforms.chirp to maintain > compatibility for one release cycle. More judgment calls will be > required, and I'm sure that not everyone would do it the same way. > > Anyone want to write the official "SciPy Developers Deprecation > Guidelines (with recommended patterns of deprecation and a bunch of > use-cases)"? > Hmmm... perhaps that should be "Developers' Deprecation Guidelines". Without the apostrophe, it could mean something else. :) > Warren > > >> Ryan >> >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From vincent at vincentdavis.net Wed Jun 2 22:09:39 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Wed, 2 Jun 2010 20:09:39 -0600 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: On Wed, Jun 2, 2010 at 7:54 PM, David Goldsmith wrote: > On Wed, Jun 2, 2010 at 5:11 PM, Vincent Davis > wrote: >> >> I just setup an account. vincentdavis >> I assume I will find instruction but how does the review/commit of >> updates work (in brief) > > Ah, good question, with a somewhat complicated answer, I'm afraid.? At a > minimum, when a writer/editor feels that a docstring is "done," s/he > "promotes" it to "Needs review" status.? In addition, since we feel that, > very generally speaking, a "Needs review" docstring is in a more advanced > state than whatever is in the current distribution, s/he also marks the > docstring as "OK to apply Yes."? Then, eventually, two things happen: a > release manager/worker comes along and merges "OK to apply Yes" docstrings > into the source code, and a reviewer - different than the writer/editor(s) > who worked on the docstring - comes along, reviews the docstring, and either > promotes it to "Reviewed, needs proof" or demotes it to "Reviewed, needs > work." > > Here is where it gets a little "complicated."? The review effort, which > _NumPy_ is largely ready for, is stalled pending implementation of > enhancements to the Wiki to support a dual review system: in the past, > parties have found reviewed and proofed, i.e., "finalized" docstrings which > are either pretty unclear, or were pretty clear but had technical > deficiencies.? Consequently, we've been wanting to implement a system > whereby each docstring must pass both a technical and a "presentation" > review, but, as I said, the Wiki presently doesn't support this.? Joe > Harrington and myself have been trying to line up the labor to get this > done, so far unsuccessfully.? I think there may soon be an announcement > concerning this...? (The delay in the review process is at least in part why > we've opted to go ahead and start incorporating "unfinalized" docstrings > into the source.) As I am always interested in learning new things is there any help I can offer in getting the wiki review feature implemented? Thanks for the summary, this clears up a few of the question I had after looking over things. Vincent > > Welcome aboard, and thanks! > > DG >> >> Thanks >> Vincent >> >> 2010/6/2 St?fan van der Walt : >> > On 2 June 2010 16:03, Benjamin Root wrote: >> >> You may have my keyboard! >> >> >> >> Seriously, though, I just registered a username on the wiki >> >> (weathergod) and >> >> I would like to help out with documentation. >> > >> > Added, and welcome! >> > >> > Cheers >> > St?fan >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> > >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide. ?(As interpreted > by Robert Graves) > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From vincent at vincentdavis.net Wed Jun 2 22:22:49 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Wed, 2 Jun 2010 20:22:49 -0600 Subject: [SciPy-Dev] Warning of deprecation in doc's ? Message-ID: For example scipy.stats.stats.cov when you view source has "scipy.stats.cov is deprecated; please update your code to use numpy.cov." Should this be in the docs ? and is there an example of how this should be pointed out. This is something I actually implemented in a program then discovered that is was deprecated. I would have like that to be in the online docs. Thanks Vincent From vincent at vincentdavis.net Wed Jun 2 22:30:49 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Wed, 2 Jun 2010 20:30:49 -0600 Subject: [SciPy-Dev] Summer Marathon Skypecon tomorrow In-Reply-To: References: Message-ID: On Thu, May 27, 2010 at 11:47 AM, David Goldsmith wrote: > So far, no one has RSVP-ed (positive or negative).? Is this due to: > > A) Lack of time; > B) Bad scheduling (i.e., you have time, just not at the time we've chosen); > C) Lack of interest; > D) Lack of issues to discuss (i.e., you have interest, but not in a meeting > without a specific aganda); > E) Bad choice of conference media (i.e., can't/won't do Skype); > F) Just forgot to RSVP; > G) Some of the above; > H) Other/None of the above? > > If I haven't heard from anyone, in the positive, by midnight tonight, EDT, > this week's Skypecon is canceled. > I thought it "would" happen and hoped to be able to listen, (Unlikely I have anything to contribute) I was unsure I would be available so didn't rsvp. It might be nice to record the skype call and make it available as a sudo podcast. It might be nice for those that miss it and might be a nice weekly update on scipy, this I could possibly help with. Thanks Vincent > DG > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From luis94855510 at gmail.com Wed Jun 2 22:44:33 2010 From: luis94855510 at gmail.com (Luis Saavedra) Date: Wed, 2 Jun 2010 22:44:33 -0400 Subject: [SciPy-Dev] how to get "help docs" in other languages Message-ID: Hi all, that is my problem... how to get "Help" sections or "guide for documentation authors" in other languages for my project, in a automagical way :P, ?that is a request for feature? regards, Luis -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis94855510 at gmail.com Wed Jun 2 22:57:54 2010 From: luis94855510 at gmail.com (Luis Saavedra) Date: Wed, 2 Jun 2010 22:57:54 -0400 Subject: [SciPy-Dev] how to get "help docs" in other languages In-Reply-To: References: Message-ID: ups,sorry for the noise, that list is for scipy not for sphinx O_o 2010/6/2 Luis Saavedra > Hi all, > > that is my problem... how to get "Help" sections or "guide for > documentation authors" in other languages for my project, in a automagical > way :P, ?that is a request for feature? > > regards, > Luis > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilanschnell at gmail.com Wed Jun 2 23:03:53 2010 From: ilanschnell at gmail.com (Ilan Schnell) Date: Wed, 2 Jun 2010 22:03:53 -0500 Subject: [SciPy-Dev] import error in scipy.stats on RH3 32-bit In-Reply-To: References: Message-ID: Hello Chuck, yes 6446 works. Actually, as the error indicates, the unresolved symbol in is linalg/clapack.so, it just happened that during my testing the stats package was imported first, so I initially thought the error was there. However, something has changed between 6446 and 6476, as I wasn't seeing this error before. Looking at the revision log of the 0.8.x branch, but I cannot see any obvious. And I'm also puzzled why this only happens on one particular platform. To make sure the build environment hasn't changed, I rebuild 6446 on the same system, and it still works. - Ilan On Wed, Jun 2, 2010 at 8:14 PM, Charles R Harris wrote: > > > On Wed, Jun 2, 2010 at 4:17 PM, Ilan Schnell wrote: >> >> Hello group, >> >> I'm not exactly sure what has changed in scipy.stats, but >> building and importing all extensions with the first 0.8.x brach >> (revision 6446) worked fine on CentOS release 3.9 (32-bit). >> > > Does revision 6446 still work? The vonmises distribution hasn't been touched > in a long time. > > >> Now (revision 6476), I can still build everything, but when >> I try to import scipy.stats.vonmises_cython, I get the following >> unresolved symbol: >> ImportError: >> /home/tester/master/lib/python2.6/site-packages/scipy/linalg/clapack.so: >> undefined symbol: clapack_sgesv >> >> Strangely, I don't get this import error on any platform (64-bit/32-bit, >> Windows, MaxOSX, Redhat 5, Solaris). >> >> Does anyone know what could be going on here? >> > > Chuck > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From d.l.goldsmith at gmail.com Wed Jun 2 23:07:07 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 2 Jun 2010 20:07:07 -0700 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: References: Message-ID: On Wed, Jun 2, 2010 at 7:22 PM, Vincent Davis wrote: > For example scipy.stats.stats.cov when you view source has > "scipy.stats.cov is deprecated; please update your code to use > numpy.cov." Should this be in the docs ? and is there an example of > how this should be pointed out. > This is something I actually implemented in a program then discovered > that is was deprecated. I would have like that to be in the online > docs. > > Thanks > Vincent > I vaguely recollect this being discussed before, but I can't find anything about it in our docstring Standard, in our Q+A section, nor (easily) at the Python site (generally, when in doubt, we default to Python docstring standards); so, how 'bout it guys and gals: should deprecation be noted in docstrings and if so, where and how? DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Wed Jun 2 23:11:04 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 2 Jun 2010 20:11:04 -0700 Subject: [SciPy-Dev] Summer Marathon Skypecon tomorrow In-Reply-To: References: Message-ID: On Wed, Jun 2, 2010 at 7:30 PM, Vincent Davis wrote: > On Thu, May 27, 2010 at 11:47 AM, David Goldsmith > wrote: > > So far, no one has RSVP-ed (positive or negative). Is this due to: > > > > A) Lack of time; > > B) Bad scheduling (i.e., you have time, just not at the time we've > chosen); > > C) Lack of interest; > > D) Lack of issues to discuss (i.e., you have interest, but not in a > meeting > > without a specific aganda); > > E) Bad choice of conference media (i.e., can't/won't do Skype); > > F) Just forgot to RSVP; > > G) Some of the above; > > H) Other/None of the above? > > > > If I haven't heard from anyone, in the positive, by midnight tonight, > EDT, > > this week's Skypecon is canceled. > > > > I thought it "would" happen and hoped to be able to listen, (Unlikely > I have anything to contribute) I was unsure I would be available so > didn't rsvp. It might be nice to record the skype call and make it > available as a sudo podcast. It might be nice for those that miss it > and might be a nice weekly update on scipy, this I could possibly help > with. > > Thanks > Vincent > I'd certainly prefer that to taking notes and publishing minutes! ;-) (Though I'll probably do the latter anyway for people who don't want to listen to the whole thing.) Who besides Vincent would participate this week? DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Jun 2 23:19:10 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 2 Jun 2010 21:19:10 -0600 Subject: [SciPy-Dev] import error in scipy.stats on RH3 32-bit In-Reply-To: References: Message-ID: On Wed, Jun 2, 2010 at 9:03 PM, Ilan Schnell wrote: > Hello Chuck, > yes 6446 works. Actually, as the error indicates, the unresolved > symbol in is linalg/clapack.so, it just happened that during my > testing the stats package was imported first, so I initially thought > the error was there. > However, something has changed between 6446 and 6476, as > I wasn't seeing this error before. Looking at the revision log of > the 0.8.x branch, but I cannot see any obvious. And I'm also > puzzled why this only happens on one particular platform. > To make sure the build environment hasn't changed, I rebuild 6446 > on the same system, and it still works. > > I hate to ask this of anyone, but... could you determine which revision caused the problem? Sadistical Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Jun 2 23:28:40 2010 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 2 Jun 2010 22:28:40 -0500 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: References: Message-ID: As a power user of these tools, I often will encounter these warnings while bulding my code piece-wise, however, I can easily imagine a case where a regular user simply seeing a useful feature and spending time coding around it, only to discover that it will soon be deprecated. I would certainly be annoyed in such a case. A quick and easy way to list deprecations would be towards the end of the docstring, but the user might not scroll all the way down past the feature that they found. So, to raise visibility, such deprecation warnings should be towards the beginning of the docstring. Just a thought... is it feasible for the doc building system to scan through the function code and spot a deprecation warning and thereby be able to add a list of deprecation warnings to the docstring? Obviously, such warnings would have to follow some standard format, but it would be neat if such things could be automated. Just my 2 cents, Ben Root On Wed, Jun 2, 2010 at 10:07 PM, David Goldsmith wrote: > On Wed, Jun 2, 2010 at 7:22 PM, Vincent Davis wrote: > >> For example scipy.stats.stats.cov when you view source has >> "scipy.stats.cov is deprecated; please update your code to use >> numpy.cov." Should this be in the docs ? and is there an example of >> how this should be pointed out. >> This is something I actually implemented in a program then discovered >> that is was deprecated. I would have like that to be in the online >> docs. >> >> Thanks >> Vincent >> > > I vaguely recollect this being discussed before, but I can't find anything > about it in our docstring Standard, in our Q+A section, nor (easily) at the > Python site (generally, when in doubt, we default to Python docstring > standards); so, how 'bout it guys and gals: should deprecation be noted in > docstrings and if so, where and how? > > DG > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilanschnell at gmail.com Wed Jun 2 23:29:51 2010 From: ilanschnell at gmail.com (Ilan Schnell) Date: Wed, 2 Jun 2010 22:29:51 -0500 Subject: [SciPy-Dev] import error in scipy.stats on RH3 32-bit In-Reply-To: References: Message-ID: Not yet. I'll look more into it tomorrow. :-) - Ilan On Wed, Jun 2, 2010 at 10:19 PM, Charles R Harris wrote: > > > On Wed, Jun 2, 2010 at 9:03 PM, Ilan Schnell wrote: >> >> Hello Chuck, >> yes 6446 works. ?Actually, as the error indicates, the unresolved >> symbol in is linalg/clapack.so, it just happened that during my >> testing the stats package was imported first, so I initially thought >> the error was there. >> However, something has changed between 6446 and 6476, as >> I wasn't seeing this error before. ?Looking at the revision log of >> the 0.8.x branch, but I cannot see any obvious. ?And I'm also >> puzzled why this only happens on one particular platform. >> To make sure the build environment hasn't changed, I rebuild 6446 >> on the same system, and it still works. >> > > I hate to ask this of anyone, but... could you determine which revision > caused the problem? > > Sadistical Chuck > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From d.l.goldsmith at gmail.com Thu Jun 3 00:05:26 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 2 Jun 2010 21:05:26 -0700 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: References: Message-ID: On Wed, Jun 2, 2010 at 8:28 PM, Benjamin Root wrote: > As a power user of these tools, I often will encounter these warnings while > bulding my code piece-wise, however, I can easily imagine a case where a > regular user simply seeing a useful feature and spending time coding around > it, only to discover that it will soon be deprecated. I would certainly be > annoyed in such a case. > > A quick and easy way to list deprecations would be towards the end of the > docstring, but the user might not scroll all the way down past the feature > that they found. So, to raise visibility, such deprecation warnings should > be towards the beginning of the docstring. > > Just a thought... is it feasible for the doc building system to scan > through the function code and spot a deprecation warning and thereby be able > to add a list of deprecation warnings to the docstring? Obviously, such > warnings would have to follow some standard format, but it would be neat if > such things could be automated. > > Just my 2 cents, > Ben Root > pydocweb (our doc editing Wiki) does do something like that in that it automatically prepends the function signature to the docstring (at least I think it's pydocweb that's doing it), so I think it's possible in principle. code.google.com/p/pydocweb hosts a ticketing system (the "Issues" tab) - may I ask you to go there and file an "enhancement" ticket for this - the worst that can happen is that someone (probably Pauli V.) will mark it as "will not do" with some sort of explanation as to why. That said, pydocweb has a long backlog of open issues, and this is not the highest priority among them. Accordingly, we probably shouldn't wait for it to solve our problem, i.e., we should still decide on where and how to note this, and do it manually when we encounter the situation. So, so far we have one "vote" for "yes, near the beginning." :-) DG > > On Wed, Jun 2, 2010 at 10:07 PM, David Goldsmith wrote: > >> On Wed, Jun 2, 2010 at 7:22 PM, Vincent Davis wrote: >> >>> For example scipy.stats.stats.cov when you view source has >>> "scipy.stats.cov is deprecated; please update your code to use >>> numpy.cov." Should this be in the docs ? and is there an example of >>> how this should be pointed out. >>> This is something I actually implemented in a program then discovered >>> that is was deprecated. I would have like that to be in the online >>> docs. >>> >>> Thanks >>> Vincent >>> >> >> I vaguely recollect this being discussed before, but I can't find anything >> about it in our docstring Standard, in our Q+A section, nor (easily) at the >> Python site (generally, when in doubt, we default to Python docstring >> standards); so, how 'bout it guys and gals: should deprecation be noted in >> docstrings and if so, where and how? >> >> DG >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Jun 3 00:17:51 2010 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 2 Jun 2010 23:17:51 -0500 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: References: Message-ID: On Wed, Jun 2, 2010 at 11:05 PM, David Goldsmith wrote: > On Wed, Jun 2, 2010 at 8:28 PM, Benjamin Root wrote: > >> As a power user of these tools, I often will encounter these warnings >> while bulding my code piece-wise, however, I can easily imagine a case where >> a regular user simply seeing a useful feature and spending time coding >> around it, only to discover that it will soon be deprecated. I would >> certainly be annoyed in such a case. >> >> A quick and easy way to list deprecations would be towards the end of the >> docstring, but the user might not scroll all the way down past the feature >> that they found. So, to raise visibility, such deprecation warnings should >> be towards the beginning of the docstring. >> >> Just a thought... is it feasible for the doc building system to scan >> through the function code and spot a deprecation warning and thereby be able >> to add a list of deprecation warnings to the docstring? Obviously, such >> warnings would have to follow some standard format, but it would be neat if >> such things could be automated. >> >> Just my 2 cents, >> Ben Root >> > > pydocweb (our doc editing Wiki) does do something like that in that it > automatically prepends the function signature to the docstring (at least I > think it's pydocweb that's doing it), so I think it's possible in > principle. code.google.com/p/pydocweb hosts a ticketing system (the > "Issues" tab) - may I ask you to go there and file an "enhancement" ticket > for this - the worst that can happen is that someone (probably Pauli V.) > will mark it as "will not do" with some sort of explanation as to why. > > That said, pydocweb has a long backlog of open issues, and this is not the > highest priority among them. Accordingly, we probably shouldn't wait for it > to solve our problem, i.e., we should still decide on where and how to note > this, and do it manually when we encounter the situation. So, so far we > have one "vote" for "yes, near the beginning." :-) > > I will look into that tomorrow. And I certainly agree that we should not wait until pydocweb presents us a solution. We should certainly follow some sort of standard way to mark/tag/denote these deprecation warnings, that way 'grep' can still be a very valuable tool here. Ben Root > DG > > > >> >> On Wed, Jun 2, 2010 at 10:07 PM, David Goldsmith > > wrote: >> >>> On Wed, Jun 2, 2010 at 7:22 PM, Vincent Davis wrote: >>> >>>> For example scipy.stats.stats.cov when you view source has >>>> "scipy.stats.cov is deprecated; please update your code to use >>>> numpy.cov." Should this be in the docs ? and is there an example of >>>> how this should be pointed out. >>>> This is something I actually implemented in a program then discovered >>>> that is was deprecated. I would have like that to be in the online >>>> docs. >>>> >>>> Thanks >>>> Vincent >>>> >>> >>> I vaguely recollect this being discussed before, but I can't find >>> anything about it in our docstring Standard, in our Q+A section, nor >>> (easily) at the Python site (generally, when in doubt, we default to Python >>> docstring standards); so, how 'bout it guys and gals: should deprecation be >>> noted in docstrings and if so, where and how? >>> >>> DG >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide. (As interpreted > by Robert Graves) > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Thu Jun 3 00:32:26 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Wed, 2 Jun 2010 22:32:26 -0600 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: References: Message-ID: On Wed, Jun 2, 2010 at 10:05 PM, David Goldsmith wrote: > On Wed, Jun 2, 2010 at 8:28 PM, Benjamin Root wrote: >> >> As a power user of these tools, I often will encounter these warnings >> while bulding my code piece-wise, however, I can easily imagine a case where >> a regular user simply seeing a useful feature and spending time coding >> around it, only to discover that it will soon be deprecated.? I would >> certainly be annoyed in such a case. >> >> A quick and easy way to list deprecations would be towards the end of the >> docstring, but the user might not scroll all the way down past the feature >> that they found.? So, to raise visibility, such deprecation warnings should >> be towards the beginning of the docstring. ? So, so far we > have one "vote" for "yes, near the beginning." :-) > I vote near the beginning for the reasons Benjamin notes "the user might not scroll all the way down past the feature that they found" And including as much reference to the replacement as possible (a link to the doc?, function name......) Make it easy to find its replacement. Vincent > DG > > >> >> On Wed, Jun 2, 2010 at 10:07 PM, David Goldsmith >> wrote: >>> >>> On Wed, Jun 2, 2010 at 7:22 PM, Vincent Davis >>> wrote: >>>> >>>> For example scipy.stats.stats.cov when you view source has >>>> "scipy.stats.cov is deprecated; please update your code to use >>>> numpy.cov." Should this be in the docs ? and is there an example of >>>> how this should be pointed out. >>>> This is something I actually implemented in a program then discovered >>>> that is was deprecated. I would have like that to be in the online >>>> docs. >>>> >>>> Thanks >>>> Vincent >>> >>> I vaguely recollect this being discussed before, but I can't find >>> anything about it in our docstring Standard, in our Q+A section, nor >>> (easily) at the Python site (generally, when in doubt, we default to Python >>> docstring standards); so, how 'bout it guys and gals: should deprecation be >>> noted in docstrings and if so, where and how? >>> >>> DG >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide. ?(As interpreted > by Robert Graves) > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From vincent at vincentdavis.net Thu Jun 3 00:34:40 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Wed, 2 Jun 2010 22:34:40 -0600 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: References: Message-ID: On Wed, Jun 2, 2010 at 10:17 PM, Benjamin Root wrote: > I will look into that tomorrow.? And I certainly agree that we should not > wait until pydocweb presents us a solution.? We should certainly follow some > sort of standard way to mark/tag/denote these deprecation warnings, that way > 'grep' can still be a very valuable tool here. Also when it will be deprecated would be good to know/document. Thanks Vincent > > Ben Root > > >> >> DG >> >> >>> >>> On Wed, Jun 2, 2010 at 10:07 PM, David Goldsmith >>> wrote: >>>> >>>> On Wed, Jun 2, 2010 at 7:22 PM, Vincent Davis >>>> wrote: >>>>> >>>>> For example scipy.stats.stats.cov when you view source has >>>>> "scipy.stats.cov is deprecated; please update your code to use >>>>> numpy.cov." Should this be in the docs ? and is there an example of >>>>> how this should be pointed out. >>>>> This is something I actually implemented in a program then discovered >>>>> that is was deprecated. I would have like that to be in the online >>>>> docs. >>>>> >>>>> Thanks >>>>> Vincent >>>> >>>> I vaguely recollect this being discussed before, but I can't find >>>> anything about it in our docstring Standard, in our Q+A section, nor >>>> (easily) at the Python site (generally, when in doubt, we default to Python >>>> docstring standards); so, how 'bout it guys and gals: should deprecation be >>>> noted in docstrings and if so, where and how? >>>> >>>> DG >>>> >>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>> >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> >> >> >> -- >> Mathematician: noun, someone who disavows certainty when their uncertainty >> set is non-empty, even if that set has measure zero. >> >> Hope: noun, that delusive spirit which escaped Pandora's jar and, with her >> lies, prevents mankind from committing a general suicide. ?(As interpreted >> by Robert Graves) >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From nmb at wartburg.edu Thu Jun 3 00:47:50 2010 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Wed, 02 Jun 2010 23:47:50 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C06B8FB.8080806@gmail.com> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> Message-ID: <4C0733F6.7040608@wartburg.edu> On 2010-06-02 15:03 , Bruce Southey wrote: > On 06/02/2010 01:41 PM, josef.pktd at gmail.com wrote: >> On Wed, Jun 2, 2010 at 2:18 PM, Neil Martinsen-Burrell wrote: >> >>> On 2010-06-02 13:10 , Bruce Southey wrote: [...] >> I agree with Neil that this is a very useful convenience function. >> > My problem with the chisquare_twoway is that it should not call another > function to finish two lines of code. It is just an excessive waste of > resources. Do you mean that you would rather see the equivalent of chisq = (table - expected)**2 / expected return chisq, chisqprob(chisq, dof) at the bottom of chisquare_contingency than the current call to chisquare? I'm certainly okay with that. >> I never heard of a one-way contingency table, my question was whether >> the function should also handle 3-way or 4-way tables, additional to >> two-way. >> > Correct to both of these as I just consider these as n-way tables. I > think that contingency tables by definition only applies to the 2-d > case. Pivot tables are essentially the same thing. I would have to > lookup on how to get the expected number of outcomes but probably of the > form Ni.. * N.j. *N..k/N... for the 3-way (the 2-way table is of the > form Ni.*N.j/N..) for i=rows, j=columns, k=3rd axis and '.' means sum > for that axis. That is the correct (tensor) formula for higher dimensional tables. Pragmatically, since the number of cells climbs so rapidly with increasing dimension, there are more problems with small expected counts. If we thought people would be interested in using it, we could certainly define a chisquare_nway function as well. >> I thought about the question how the input should be specified for my >> initial response, the alternative would be to use the original data or >> a "long" format instead of a table. But I thought that as a >> convenience function using the table format will be the most common >> use. >> I have written in the past functions that calculate the contingency >> table, and would be very useful to have a more complete coverage of >> tools to work with contingency tables in scipy.stats (or temporarily >> in statsmodels, where we are working also on the anova type of >> analysis) >> > It depends on what tasks are needed. Really there are two steps: > 1) Cross-tabulation that summarized the data from whatever input > (groupby would help here). > 2) Statistical tests - series of functions that accept summarized data only. > > If you have separate functions then the burden is on the user to find > and call all the desired functions. You can also provide a single helper > function to do all that because you don't want to repeat unnecessary calls. The facilities for handling raw, frame-style data in scipy.stats are not too strong. A tabulation function that we could stick together with the chisquare* functions to make a single helper would certainly be convenient. -Neil From d.l.goldsmith at gmail.com Thu Jun 3 01:04:27 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 2 Jun 2010 22:04:27 -0700 Subject: [SciPy-Dev] {True, False} should be replaced w/ bool, correct? Message-ID: Just checking; see, e.g., scipy .io .matlab .mio.savemat appendmat parameter. (Or is it possible that the function really needs to see either the word True or the word False?) DG -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jun 3 02:09:56 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 02:09:56 -0400 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C0733F6.7040608@wartburg.edu> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> <4C0733F6.7040608@wartburg.edu> Message-ID: On Thu, Jun 3, 2010 at 12:47 AM, Neil Martinsen-Burrell wrote: > On 2010-06-02 15:03 , Bruce Southey wrote: >> On 06/02/2010 01:41 PM, josef.pktd at gmail.com wrote: >>> On Wed, Jun 2, 2010 at 2:18 PM, Neil Martinsen-Burrell ?wrote: >>> >>>> On 2010-06-02 13:10 , Bruce Southey wrote: > > [...] > >>> I agree with Neil that this is a very useful convenience function. >>> >> My problem with the chisquare_twoway is that it should not call another >> function to finish two lines of code. It is just an excessive waste of >> resources. > > Do you mean that you would rather see the equivalent of > > chisq = (table - expected)**2 / expected > return chisq, chisqprob(chisq, dof) > > at the bottom of chisquare_contingency than the current call to > chisquare? ?I'm certainly okay with that. But don't forget to ravel or you get cell-wise chisquare :) For non-performance sensitive parts, as in this case I usually go by how easy the function is to understand and to test. for example I prefer distributions.chi2.sf(chisq, dof) to chisqprob(chisq, dof) (I haven't checked if it is correct because I immediately see that it is a one-sided pvalue. inlining in this case might be nicer because of dof (when inlining) versus ddof (when calling chisquare), I found the ddof confusing to read related: while I was skimming Bruce's reference http://faculty.vassar.edu/lowry/ch8pt2.html I saw that they recommend continuity correction for the 2by2 case. Do you know what the common position on continuity correction is in this case? (In something vaguely related to this, I read recently that some continuity correction make the test too conservative and are not recommended. But I don't remember for which test I read this.) If there is test specific continuity correction, then chisquare will have to be inlined. > >>> I never heard of a one-way contingency table, my question was whether >>> the function should also handle 3-way or 4-way tables, additional to >>> two-way. >>> >> Correct to both of these as I just consider these as n-way tables. I >> think that contingency tables by definition only applies to the 2-d >> case. Pivot tables are essentially the same thing. I would have to >> lookup on how to get the expected number of outcomes but probably of the >> form Ni.. * N.j. *N..k/N... for the 3-way (the 2-way table is of the >> form Ni.*N.j/N..) for i=rows, j=columns, k=3rd axis and '.' means sum >> for that axis. > > That is the correct (tensor) formula for higher dimensional tables. > Pragmatically, since the number of cells climbs so rapidly with > increasing dimension, there are more problems with small expected > counts. ?If we thought people would be interested in using it, we could > certainly define a chisquare_nway function as well. I'm not too happy about having a large number of small functions especially if they have code duplication and need to be separately maintained. When there is a demand for a convenient special case, then it could just call the more general function. For testing distribution, the common approach in the case when there are too few expected counts in some cells, is, to combine several cells together in one bin. I guess, there might be something like this also feasible for nway, i.e. coarsen the grid, or not? > >>> I thought about the question how the input should be specified for my >>> initial response, the alternative would be to use the original data or >>> a "long" format instead of a table. But I thought that as a >>> convenience function using the table format will be the most common >>> use. >>> I have written in the past functions that calculate the contingency >>> table, and would be very useful to have a more complete coverage of >>> tools to work with contingency tables in scipy.stats (or temporarily >>> in statsmodels, where we are working also on the anova type of >>> analysis) >>> >> It depends on what tasks are needed. Really there are two steps: >> 1) Cross-tabulation that summarized the data from whatever input >> (groupby would help here). >> 2) Statistical tests - series of functions that accept summarized data only. >> >> If you have separate functions then the burden is on the user to find >> and call all the desired functions. You can also provide a single helper >> function to do all that because you don't want to repeat unnecessary calls. > > The facilities for handling raw, frame-style data in scipy.stats are not > too strong. ?A tabulation function that we could stick together with the > chisquare* functions to make a single helper would certainly be convenient. Since broader coverage of contingency tables with all the data handling, bincount and table conversions would a much larger set of functions. I think our still evolving design for statistics (including test) in statsmodels is to move to a more object oriented design, to keep things together, and to take advantage of reusing previous calculations. In this case it could be a ContingencyTable class that could combine creating the countdata from raw data (with or without missing values), marginalization if it's 3-way or higher, attach several tests, create a nice string that can be printed, and so on. With lazy evaluation and reuse of previous calculations, we think this would be a better design than only having standalone functions. grouping functions together: While statisticians might have a good overview of all the different test, I found the "laundry list" of functions in scipy.stats for a long time pretty confusing. Instead of having group of functions fisherexact, chisquare_twoway, chisquare_nway, and several other possible candidates for independence tests in contingency tables, we are starting to combine them together, e.g independence_tests, mean_tests, variance_tests and correlation_test We were discussing this in statsmodels in a different context, mainly diagnostic tests for regression, e.g. heteroscedasticity, autocorrelation tests or more recently post-hoc tests. In the current case, I also thought that combining with a fisherexact or other tests would potentially be useful, with a keyword argument that selects "chisquare", "exact", "..." Which is in this case not yet relevant because fisherexact, even when it works, is only for 2by2, and I don't think mixing them together is very useful. Josef > -Neil > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From stefan at sun.ac.za Thu Jun 3 02:29:30 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 Jun 2010 23:29:30 -0700 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: On 2 June 2010 17:11, Vincent Davis wrote: > I just setup an account. vincentdavis Thanks, Vincent. I gave you editing permission. Guidelines are accessible from the front page, let me know if you get stuck. Regards St?fan From stefan at sun.ac.za Thu Jun 3 02:33:39 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 Jun 2010 23:33:39 -0700 Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? In-Reply-To: References: Message-ID: On 2 June 2010 19:09, Vincent Davis wrote: > As I am always interested in learning new things is there any help I > can offer in getting the wiki review feature implemented? Sure, have a look at: http://code.google.com/p/pydocweb/ There are many issues that require attention, and all help is appreciated. Regards St?fan From josef.pktd at gmail.com Thu Jun 3 02:39:01 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 02:39:01 -0400 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> <4C0733F6.7040608@wartburg.edu> Message-ID: On Thu, Jun 3, 2010 at 2:09 AM, wrote: > On Thu, Jun 3, 2010 at 12:47 AM, Neil Martinsen-Burrell > wrote: >> On 2010-06-02 15:03 , Bruce Southey wrote: >>> On 06/02/2010 01:41 PM, josef.pktd at gmail.com wrote: >>>> On Wed, Jun 2, 2010 at 2:18 PM, Neil Martinsen-Burrell ?wrote: >>>> >>>>> On 2010-06-02 13:10 , Bruce Southey wrote: >> >> [...] >> >>>> I agree with Neil that this is a very useful convenience function. >>>> >>> My problem with the chisquare_twoway is that it should not call another >>> function to finish two lines of code. It is just an excessive waste of >>> resources. >> >> Do you mean that you would rather see the equivalent of >> >> chisq = (table - expected)**2 / expected >> return chisq, chisqprob(chisq, dof) >> >> at the bottom of chisquare_contingency than the current call to >> chisquare? ?I'm certainly okay with that. > > But don't forget to ravel or you get cell-wise chisquare :) > For non-performance sensitive parts, as in this case I usually go by > how easy the function is to understand and to test. > for example I prefer distributions.chi2.sf(chisq, dof) to > chisqprob(chisq, dof) (I haven't checked if it is correct because I > immediately see that it is a one-sided pvalue. > > inlining in this case might be nicer because of dof (when inlining) > versus ddof (when calling chisquare), I found the ddof confusing to > read > > related: while I was skimming Bruce's reference > http://faculty.vassar.edu/lowry/ch8pt2.html > I saw that they recommend continuity correction for the 2by2 case. > Do you know what the common position on continuity correction is in this case? > > (In something vaguely related to this, I read recently that some > continuity correction make the test too conservative and are not > recommended. But I don't remember for which test I read this.) It actually is for chisquare http://en.wikipedia.org/wiki/Yates%27_correction_for_continuity Josef > > If there is test specific continuity correction, then chisquare will > have to be inlined. > >> >>>> I never heard of a one-way contingency table, my question was whether >>>> the function should also handle 3-way or 4-way tables, additional to >>>> two-way. >>>> >>> Correct to both of these as I just consider these as n-way tables. I >>> think that contingency tables by definition only applies to the 2-d >>> case. Pivot tables are essentially the same thing. I would have to >>> lookup on how to get the expected number of outcomes but probably of the >>> form Ni.. * N.j. *N..k/N... for the 3-way (the 2-way table is of the >>> form Ni.*N.j/N..) for i=rows, j=columns, k=3rd axis and '.' means sum >>> for that axis. >> >> That is the correct (tensor) formula for higher dimensional tables. >> Pragmatically, since the number of cells climbs so rapidly with >> increasing dimension, there are more problems with small expected >> counts. ?If we thought people would be interested in using it, we could >> certainly define a chisquare_nway function as well. > > I'm not too happy about having a large number of small functions > especially if they have code duplication and need to be separately > maintained. > When there is a demand for a convenient special case, then it could > just call the more general function. > > For testing distribution, the common approach in the case when there > are too few expected counts in some cells, is, to combine several > cells together in one bin. > I guess, there might be something like this also feasible for nway, > i.e. coarsen the grid, or not? > >> >>>> I thought about the question how the input should be specified for my >>>> initial response, the alternative would be to use the original data or >>>> a "long" format instead of a table. But I thought that as a >>>> convenience function using the table format will be the most common >>>> use. >>>> I have written in the past functions that calculate the contingency >>>> table, and would be very useful to have a more complete coverage of >>>> tools to work with contingency tables in scipy.stats (or temporarily >>>> in statsmodels, where we are working also on the anova type of >>>> analysis) >>>> >>> It depends on what tasks are needed. Really there are two steps: >>> 1) Cross-tabulation that summarized the data from whatever input >>> (groupby would help here). >>> 2) Statistical tests - series of functions that accept summarized data only. >>> >>> If you have separate functions then the burden is on the user to find >>> and call all the desired functions. You can also provide a single helper >>> function to do all that because you don't want to repeat unnecessary calls. >> >> The facilities for handling raw, frame-style data in scipy.stats are not >> too strong. ?A tabulation function that we could stick together with the >> chisquare* functions to make a single helper would certainly be convenient. > > Since broader coverage of contingency tables with all the data > handling, bincount and table conversions would a much larger set of > functions. > > I think our still evolving design for statistics (including test) in > statsmodels is to move to a more object oriented design, to keep > things together, and to take advantage of reusing previous > calculations. > > In this case it could be a ContingencyTable class that could combine > creating the countdata from raw data (with or without missing values), > marginalization if it's 3-way or higher, attach several tests, create > a nice string that can be printed, and so on. With lazy evaluation and > reuse of previous calculations, we think this would be a better design > than only having standalone functions. > > grouping functions together: > While statisticians might have a good overview of all the different > test, I found the "laundry list" of functions in scipy.stats for a > long time pretty confusing. > Instead of having group of functions fisherexact, chisquare_twoway, > chisquare_nway, and several other possible candidates for independence > tests in contingency tables, we are starting to combine them together, > e.g independence_tests, mean_tests, variance_tests and > correlation_test > > We were discussing this in statsmodels in a different context, mainly > diagnostic tests for regression, e.g. heteroscedasticity, > autocorrelation tests or more recently post-hoc tests. > > In the current case, I also thought that combining with a fisherexact > or other tests would potentially be useful, with a keyword argument > that selects "chisquare", "exact", "..." > Which is in this case not yet relevant because fisherexact, even when > it works, is only for 2by2, and I don't think mixing them together is > very useful. > > Josef > > > >> -Neil >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > From josef.pktd at gmail.com Thu Jun 3 02:48:12 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 02:48:12 -0400 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C06B8FB.8080806@gmail.com> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> Message-ID: On Wed, Jun 2, 2010 at 4:03 PM, Bruce Southey wrote: > On 06/02/2010 01:41 PM, josef.pktd at gmail.com wrote: > > On Wed, Jun 2, 2010 at 2:18 PM, Neil Martinsen-Burrell > wrote: > > > On 2010-06-02 13:10 , Bruce Southey wrote: > [...] > > > > However, this code is the chi-squared test part as SAS will compute the > actual cell numbers. Also an extension to scipy.stats.chisquare() so we > can not have both functions. > > > Again, I don't understand what you mean that we can't have both > functions? I believe (from a statistics teacher's point of view) that > the Chi-Squared goodness of fit test (which is stats.chisquare) is a > different beast from the Chi-Square test for independence (which is > stats.chisquare_contingency). The fact that the distribution of the > test statistic is the same should not tempt us to put them into the > same function. > > > Please read scipy.stats.chisquare() because scipy.stats.chisquare() is > the 1-d case of yours. > Quote from the docstring: > " The chi square test tests the null hypothesis that the categorical data > has the given frequencies." > Also go the web site provided in the docstring. > > By default you get the expected frequencies but you can also put in your > own using the f_exp variable. You could do the same in your code. > > > In fact, Warren correctly used stats.chisquare with the expected > frequencies calculated from the null hypothesis and the corrected > degrees of freedom. ?chisquare_contingency is in some sense a > convenience method for taking care of these pre-calculations before > calling stats.chisquare. ?Can you explain more clearly to me why we > should not include such a convenience function? > > > Just a clarification, before I find time to work my way through the > other comments > > stats.chisquare is a generic test for goodness-of-fit for discreted or > binned distributions. > and from the docstring of it > "If no expected frequencies are given, the total > N is assumed to be equally distributed across all groups." > > default is uniform distribution > > > > Try: > http://en.wikipedia.org/wiki/Pearson's_chi-square_test > > The use of the uniform distribution is rather misleading and technically > wrong as it does not help address the expected number of outcomes in a cell: quote from the wikipedia page: "A simple example is the hypothesis that an ordinary six-sided dice is "fair", i.e., all six outcomes are equally likely to occur." I don't see anything misleading or technically wrong with the uniform distributions, or if they come from a Poisson, Hypergeometric, binned Normal or any of number of other distributions. > http://en.wikipedia.org/wiki/Discrete_uniform_distribution > > > chisquare_twoway is a special case that additional calculates the > correct expected frequencies for the test of independencs based on the > margin totals. The resulting distribution is not uniform. > > > Actually the null hypothesis is rather different between 1-way and 2-way > tables so you can not say that chisquare_twoway is a special case of > chisquare. What is the Null hypothesis in a one-way table? Josef > > I am not sure what you mean by the 'resulting distribution is not uniform'. > The distribution of the cells values has nothing to do with the uniform > distribution in either case because it is not used in the data nor in the > formulation of the test. (And, yes, I have had to do the proof that the test > statistic is Chi-squared - which is why there is the warning about small > cells...). > > I agree with Neil that this is a very useful convenience function. > > > My problem with the chisquare_twoway is that it should not call another > function to finish two lines of code. It is just an excessive waste of > resources. > > I never heard of a one-way contingency table, my question was whether > the function should also handle 3-way or 4-way tables, additional to > two-way. > > > Correct to both of these as I just consider these as n-way tables. I think > that contingency tables by definition only applies to the 2-d case. Pivot > tables are essentially the same thing. I would have to lookup on how to get > the expected number of outcomes but probably of the form Ni.. * N.j. > *N..k/N... for the 3-way (the 2-way table is of the form Ni.*N.j/N..) for > i=rows, j=columns, k=3rd axis and '.' means sum for that axis. > > I thought about the question how the input should be specified for my > initial response, the alternative would be to use the original data or > a "long" format instead of a table. But I thought that as a > convenience function using the table format will be the most common > use. > > I have written in the past functions that calculate the contingency > table, and would be very useful to have a more complete coverage of > tools to work with contingency tables in scipy.stats (or temporarily > in statsmodels, where we are working also on the anova type of > analysis) > > > It depends on what tasks are needed.? Really there are two steps: > 1) Cross-tabulation that summarized the data from whatever input (groupby > would help here). > 2) Statistical tests - series of functions that accept summarized data only. > > If you have separate functions then the burden is on the user to find and > call all the desired functions. You can also provide a single helper > function to do all that because you don't want to repeat unnecessary calls. > > So, I think the way it is it is a nice function and we don't have to > put all contingency table analysis into this function. > > Josef > > > Bruce > > > > > > Really this should be combined with fisher.py in ticket 956: > http://projects.scipy.org/scipy/ticket/956 > > > Wow, apparently I have lots of disagreements today, but I don't think > that this should be combined with Fisher's Exact test. (I would like > to see that ticket mature to the point where it can be added to > scipy.stats.) I like the functions in scipy.stats to correspond in a > one-to-one manner with the statistical tests. I think that the docs > should "See Also" the appropriate exact (and non-parametric) tests, > but I think that one function/one test is a good rule. This is > particularly true for people (like me) who would like to someday be > able to use scipy.stats in a pedagogical context. > > -Neil > > > I don't see any 'disagreements' rather just different ways to do things > and identifying areas that need to be addressed for more general use. > > > Agreed. :) > > [...] > > -Neil > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From ralf.gommers at googlemail.com Thu Jun 3 06:40:08 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 3 Jun 2010 18:40:08 +0800 Subject: [SciPy-Dev] {True, False} should be replaced w/ bool, correct? In-Reply-To: References: Message-ID: On Thu, Jun 3, 2010 at 1:04 PM, David Goldsmith wrote: > Just checking; see, e.g., scipy . > io .matlab > .mio .savemat > appendmat parameter. (Or is it possible that the function really needs to > see either the word True or the word False?) > Correct, {True, False} should always be changed to bool in the docs. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Thu Jun 3 08:50:53 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 03 Jun 2010 07:50:53 -0500 Subject: [SciPy-Dev] Deprecate stats.glm? Message-ID: <4C07A52D.30503@enthought.com> stats.glm looks like it was started and then abandoned without being finished. It was last touched in November 2007. Should this function be deprecated so it can eventually be removed? Warren From warren.weckesser at enthought.com Thu Jun 3 09:27:29 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 03 Jun 2010 08:27:29 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> Message-ID: <4C07ADC1.6040504@enthought.com> Just letting you know that I'm not ignoring all the great comments from josef, Neil and Bruce about my suggestion for chisquare_contingency. Unfortunately, I won't have time to think about all the deeper suggestions for another week or so. For now, I'll just say that I agree with josef's and Neil's suggestions for the docstring, and that Neil's summary of the function as simply a convenience function that calls stats.chisquare with appropriate arguments to perform a test of independence on a contingency table is exactly what I had in mind. Warren josef.pktd at gmail.com wrote: > On Wed, Jun 2, 2010 at 4:03 PM, Bruce Southey wrote: > >> On 06/02/2010 01:41 PM, josef.pktd at gmail.com wrote: >> >> On Wed, Jun 2, 2010 at 2:18 PM, Neil Martinsen-Burrell >> wrote: >> >> >> On 2010-06-02 13:10 , Bruce Southey wrote: >> [...] >> >> >> >> However, this code is the chi-squared test part as SAS will compute the >> actual cell numbers. Also an extension to scipy.stats.chisquare() so we >> can not have both functions. >> >> >> Again, I don't understand what you mean that we can't have both >> functions? I believe (from a statistics teacher's point of view) that >> the Chi-Squared goodness of fit test (which is stats.chisquare) is a >> different beast from the Chi-Square test for independence (which is >> stats.chisquare_contingency). The fact that the distribution of the >> test statistic is the same should not tempt us to put them into the >> same function. >> >> >> Please read scipy.stats.chisquare() because scipy.stats.chisquare() is >> the 1-d case of yours. >> Quote from the docstring: >> " The chi square test tests the null hypothesis that the categorical data >> has the given frequencies." >> Also go the web site provided in the docstring. >> >> By default you get the expected frequencies but you can also put in your >> own using the f_exp variable. You could do the same in your code. >> >> >> In fact, Warren correctly used stats.chisquare with the expected >> frequencies calculated from the null hypothesis and the corrected >> degrees of freedom. chisquare_contingency is in some sense a >> convenience method for taking care of these pre-calculations before >> calling stats.chisquare. Can you explain more clearly to me why we >> should not include such a convenience function? >> >> >> Just a clarification, before I find time to work my way through the >> other comments >> >> stats.chisquare is a generic test for goodness-of-fit for discreted or >> binned distributions. >> and from the docstring of it >> "If no expected frequencies are given, the total >> N is assumed to be equally distributed across all groups." >> >> default is uniform distribution >> >> >> >> Try: >> http://en.wikipedia.org/wiki/Pearson's_chi-square_test >> >> The use of the uniform distribution is rather misleading and technically >> wrong as it does not help address the expected number of outcomes in a cell: >> > > quote from the wikipedia page: > "A simple example is the hypothesis that an ordinary six-sided dice is > "fair", i.e., all six outcomes are equally likely to occur." > > I don't see anything misleading or technically wrong with the uniform > distributions, > or if they come from a Poisson, Hypergeometric, binned Normal or any > of number of other distributions. > > > >> http://en.wikipedia.org/wiki/Discrete_uniform_distribution >> >> >> chisquare_twoway is a special case that additional calculates the >> correct expected frequencies for the test of independencs based on the >> margin totals. The resulting distribution is not uniform. >> >> >> Actually the null hypothesis is rather different between 1-way and 2-way >> tables so you can not say that chisquare_twoway is a special case of >> chisquare. >> > > What is the Null hypothesis in a one-way table? > > Josef > > >> I am not sure what you mean by the 'resulting distribution is not uniform'. >> The distribution of the cells values has nothing to do with the uniform >> distribution in either case because it is not used in the data nor in the >> formulation of the test. (And, yes, I have had to do the proof that the test >> statistic is Chi-squared - which is why there is the warning about small >> cells...). >> >> I agree with Neil that this is a very useful convenience function. >> >> >> My problem with the chisquare_twoway is that it should not call another >> function to finish two lines of code. It is just an excessive waste of >> resources. >> >> I never heard of a one-way contingency table, my question was whether >> the function should also handle 3-way or 4-way tables, additional to >> two-way. >> >> >> Correct to both of these as I just consider these as n-way tables. I think >> that contingency tables by definition only applies to the 2-d case. Pivot >> tables are essentially the same thing. I would have to lookup on how to get >> the expected number of outcomes but probably of the form Ni.. * N.j. >> *N..k/N... for the 3-way (the 2-way table is of the form Ni.*N.j/N..) for >> i=rows, j=columns, k=3rd axis and '.' means sum for that axis. >> >> I thought about the question how the input should be specified for my >> initial response, the alternative would be to use the original data or >> a "long" format instead of a table. But I thought that as a >> convenience function using the table format will be the most common >> use. >> >> I have written in the past functions that calculate the contingency >> table, and would be very useful to have a more complete coverage of >> tools to work with contingency tables in scipy.stats (or temporarily >> in statsmodels, where we are working also on the anova type of >> analysis) >> >> >> It depends on what tasks are needed. Really there are two steps: >> 1) Cross-tabulation that summarized the data from whatever input (groupby >> would help here). >> 2) Statistical tests - series of functions that accept summarized data only. >> >> If you have separate functions then the burden is on the user to find and >> call all the desired functions. You can also provide a single helper >> function to do all that because you don't want to repeat unnecessary calls. >> >> So, I think the way it is it is a nice function and we don't have to >> put all contingency table analysis into this function. >> >> Josef >> >> >> Bruce >> >> >> >> >> >> Really this should be combined with fisher.py in ticket 956: >> http://projects.scipy.org/scipy/ticket/956 >> >> >> Wow, apparently I have lots of disagreements today, but I don't think >> that this should be combined with Fisher's Exact test. (I would like >> to see that ticket mature to the point where it can be added to >> scipy.stats.) I like the functions in scipy.stats to correspond in a >> one-to-one manner with the statistical tests. I think that the docs >> should "See Also" the appropriate exact (and non-parametric) tests, >> but I think that one function/one test is a good rule. This is >> particularly true for people (like me) who would like to someday be >> able to use scipy.stats in a pedagogical context. >> >> -Neil >> >> >> I don't see any 'disagreements' rather just different ways to do things >> and identifying areas that need to be addressed for more general use. >> >> >> Agreed. :) >> >> [...] >> >> -Neil >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From bsouthey at gmail.com Thu Jun 3 09:27:26 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 03 Jun 2010 08:27:26 -0500 Subject: [SciPy-Dev] np.savetxt: apply patch in enhancement ticket 1079 to add headers? In-Reply-To: References: <4C066DA3.8010609@gmail.com> Message-ID: <4C07ADBE.3050407@gmail.com> On 06/02/2010 12:14 PM, Stefan wrote: > >> Not that I am complaining rather trying to understand what is expected >> to happen. >> Under the patch, it is very much user beware. The header argument can >> be anything or nothing. There is no check for the contents or if the >> delimiter used is the same as the rest of the output. Further with the >> newline option there is no guarantee that the lines in the header will >> have the same line endings throughout the file. >> So what should a user be allowed to use as a header? >> You could write a whole program there or an explanation of the >> following output - which is very appealing. You could force a list of >> strings so that you print out newline.join(header) - okay not quite >> because it should include the comment argument. >> Should savetxt be restricted to something that loadtxt can read? >> This is potentially problematic if you want a header line. Although it >> could return the number of header lines. >> [savetxt should also be updated to allow bz2 as loadtxt handles those >> now - not that I have used it] >> >> >> >> >> Also note that since that patch was written, savetxt takes a user >> supplied newline keyword, so you can just append that to the header >> string. >> >> >> >> True, we were not aware of this, but this does not help much for the >> comment/header. >> >> >> >> Entered as ~3 months ago:http://projects.scipy.org/numpy/changeset/8180 >> Should this be forced to check for valid options for new lines? >> Otherwise you from this 'np.savetxt('junk.text', [1,2,3,4,5], >> newline='what')' you get: >> >> > 1.000000000000000000e+00what2.000000000000000000e+00what > 3.000000000000000000e+00what4.000000000000000000e+00 > what5.000000000000000000e+00what > >> Which is not going to be read back by loadtxt. >> >> >> >> As numpy.loadtxt has a default comment character ('#'), the same may be >> implemented for numpy.savetxt. In this case, numpy.savetxt would get two >> additional keywords (e.g. header, comment(character)), which bloats the >> interface, but potentially provides more safety. >> >> >> >> >> FWIW, I ended up rolling my own using the most recent pre-Python 3 >> changes for savetxt that accepts a list of names instead of one string >> or if the provided array has the attribute dtype.names (non-nested rec >> or structured arrays) it uses those. Whatever is done I think the >> support for structured arrays is nice, and I think having this >> functionality is a no-brainer. I need it quite often. >> >> >> >> Although, we have not been using record arrays too often, we see their >> advantages and agree that it should be possible to use them as you described >> it. >> We also thought about a solution, using the __str__ method for the 'header >> object'. In this vain, an arbitrary header class (including a plane string) >> providing an __str__ member may be handed to numpy.savetxt, >> which can use it to write the header. >> >> > > So let us briefly summarize whats on the table. It appears to us that > there are basically three open issues: > (1) a csv like header for savetxt written files (first line contains column > names) > (2) comments (introduced by comment character e.g. '#') at the beginning > of the file (preceding the data) > (3) the role of the 'newline' option > > As was noted, the patch (ticket 1079) enables both to write a csv like > header (1) and comment line(s) introduced by a comment character (e.g. '#'). > Nonetheless, this solution is quite unsatisfactory > in our opinion, because it may be error prone, > as the user is in charge of the entire formatting. Despite this, we think > that it should be up to the user what amount of information is to be put > at the top of the file, but the format should be checked as far as possible. > > Using either a string or a list/tuple of strings, as proposed by Bruce, > seems to be a reasonable possibility to implement the desired functionality. > Maybe two individual keywords ('header' and 'comment') should exist to > distinguish whether the the user requests case (1) or (2). As for loadtxt > the default comment character should be '#', but it may be changed by the > user. > > We think that savetxt should not be restricted to output, which can be read > by loadtxt. Although it should be possible to add commments to the output > file, so that it remains readable by loadtxt (without tweaking it > e.g. with the skiprows keyword). > > We agree that the newline keyword may cause inconsistencies in the file > (if ticket 1079 were applied), > and possibly strange behavior such as when newline='what' is specified. > Yet, this question does not only concern the header/comments. > > Stefan& Christian > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > I am in agreement with what you suggest so post a patch. :-) Some of what I suggested was over thinking what can really be done and keep the function relatively simple and easy to use. My wish list would be that: 1) If the header is added that it allows names from structured/record arrays to be used and perhaps autogenerated (such as var1, var2, ..., varn). 2) That the dtype of the array_like input be used in the fmt when fmt is not provided. Bruce From josef.pktd at gmail.com Thu Jun 3 09:38:32 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 09:38:32 -0400 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: <4C07A52D.30503@enthought.com> References: <4C07A52D.30503@enthought.com> Message-ID: On Thu, Jun 3, 2010 at 8:50 AM, Warren Weckesser wrote: > stats.glm looks like it was started and then abandoned without being > finished. ?It was last touched in November 2007. ?Should this function > be deprecated so it can eventually be removed? My thoughts when I looked at it was roughly: leave it alone since it's working, but don't "advertise" it because we should get a better replacement. similar to linregress the more general version will be available when scipy.stats gets the full OLS model. >>> x = (np.arange(20)>9).astype(int) >>> y = x + np.random.randn(20) >>> stats.glm(y,x) (-1.7684287512254859, 0.093933208147769023) >>> stats.ttest_ind(y[:10], y[10:]) (-1.7684287512254859, 0.093933208147768926) In the current form it doesn't do much different than ttest_ind except for different argument structure. I think it could be made to work on string labels if _support.unique is replaced by np.unique (which we are doing in statsmodels) >>> x = (np.arange(20)>9).astype(str) >>> x array(['F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'T'], dtype='|S1') >>> stats.glm(y,x) Traceback (most recent call last): File "", line 1, in stats.glm(y,x) File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\stats.py", line 3315, in glm p = _support.unique(para) File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\_support.py", line 45, in unique if np.add.reduce(np.equal(uniques,item).flat) == 0: AttributeError: 'NotImplementedType' object has no attribute 'flat' Josef > > Warren > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From bsouthey at gmail.com Thu Jun 3 10:07:42 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 03 Jun 2010 09:07:42 -0500 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: References: Message-ID: <4C07B72E.2050504@gmail.com> On 06/02/2010 10:28 PM, Benjamin Root wrote: > As a power user of these tools, I often will encounter these warnings > while bulding my code piece-wise, however, I can easily imagine a case > where a regular user simply seeing a useful feature and spending time > coding around it, only to discover that it will soon be deprecated. I > would certainly be annoyed in such a case. > > A quick and easy way to list deprecations would be towards the end of > the docstring, but the user might not scroll all the way down past the > feature that they found. So, to raise visibility, such deprecation > warnings should be towards the beginning of the docstring. > > Just a thought... is it feasible for the doc building system to scan > through the function code and spot a deprecation warning and thereby > be able to add a list of deprecation warnings to the docstring? > Obviously, such warnings would have to follow some standard format, > but it would be neat if such things could be automated. > > Just my 2 cents, > Ben Root > > On Wed, Jun 2, 2010 at 10:07 PM, David Goldsmith > > wrote: > > On Wed, Jun 2, 2010 at 7:22 PM, Vincent Davis > > wrote: > > For example scipy.stats.stats.cov when you view source has > "scipy.stats.cov is deprecated; please update your code to use > numpy.cov." Should this be in the docs ? and is there an > example of > how this should be pointed out. > This is something I actually implemented in a program then > discovered > that is was deprecated. I would have like that to be in the online > docs. > > Thanks > Vincent > > > I vaguely recollect this being discussed before, but I can't find > anything about it in our docstring Standard, in our Q+A section, > nor (easily) at the Python site (generally, when in doubt, we > default to Python docstring standards); so, how 'bout it guys and > gals: should deprecation be noted in docstrings and if so, where > and how? > > DG > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > Users should first check that numpy does not have the functionality that a user needs. Duplicated functionality between numpy and scipy is or was a main reason for depreciation. There are or were cases where numpy is different than scipy but I think these are being corrected as when these are found. Many of the warnings predate the numpy and scipy documentation marathon efforts and some depreciations may still be in tickets so it is very doubtful that an automated system will detect either of these cases anyhow. In the doc marathon someone will have to find these cases and deal with them appropriately - noting, as the person who created the ticket, that some of the scipy.stats should be gone in the tentative scipy 0.9 release. In the future, someone will have to come up with a rule to force documentation change when a depreciation event occurs and then enforce it. In fact, for numpy (as scipy does not yet have the same policy) the desired documentation changes should be added to: http://projects.scipy.org/numpy/wiki/ApiDeprecation Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Thu Jun 3 10:10:25 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 3 Jun 2010 08:10:25 -0600 Subject: [SciPy-Dev] {True, False} should be replaced w/ bool, correct? In-Reply-To: References: Message-ID: On Thu, Jun 3, 2010 at 4:40 AM, Ralf Gommers wrote: > > > On Thu, Jun 3, 2010 at 1:04 PM, David Goldsmith > wrote: >> >> Just checking; see, e.g., scipy.io.matlab.mio.savemat appendmat >> parameter.? (Or is it possible that the function really needs to see either >> the word True or the word False?) > > Correct, {True, False} should always be changed to bool in the docs. I didn't see how the "defualt" should be noted on bool options. I think in most cases it should be clear but it might be nice is it was explicit. Vincent > > Cheers, > Ralf > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From vincent at vincentdavis.net Thu Jun 3 10:14:55 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 3 Jun 2010 08:14:55 -0600 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: <4C07B72E.2050504@gmail.com> References: <4C07B72E.2050504@gmail.com> Message-ID: On Thu, Jun 3, 2010 at 8:07 AM, Bruce Southey wrote: > On 06/02/2010 10:28 PM, Benjamin Root wrote: > > As a power user of these tools, I often will encounter these warnings while > bulding my code piece-wise, however, I can easily imagine a case where a > regular user simply seeing a useful feature and spending time coding around > it, only to discover that it will soon be deprecated.? I would certainly be > annoyed in such a case. > > A quick and easy way to list deprecations would be towards the end of the > docstring, but the user might not scroll all the way down past the feature > that they found.? So, to raise visibility, such deprecation warnings should > be towards the beginning of the docstring. > > Just a thought... is it feasible for the doc building system to scan through > the function code and spot a deprecation warning and thereby be able to add > a list of deprecation warnings to the docstring?? Obviously, such warnings > would have to follow some standard format, but it would be neat if such > things could be automated. > > Just my 2 cents, > Ben Root > > On Wed, Jun 2, 2010 at 10:07 PM, David Goldsmith > wrote: >> >> On Wed, Jun 2, 2010 at 7:22 PM, Vincent Davis >> wrote: >>> >>> For example scipy.stats.stats.cov when you view source has >>> "scipy.stats.cov is deprecated; please update your code to use >>> numpy.cov." Should this be in the docs ? and is there an example of >>> how this should be pointed out. >>> This is something I actually implemented in a program then discovered >>> that is was deprecated. I would have like that to be in the online >>> docs. >>> >>> Thanks >>> Vincent >> >> I vaguely recollect this being discussed before, but I can't find anything >> about it in our docstring Standard, in our Q+A section, nor (easily) at the >> Python site (generally, when in doubt, we default to Python docstring >> standards); so, how 'bout it guys and gals: should deprecation be noted in >> docstrings and if so, where and how? >> >> DG >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > Users should first check that numpy does not have the functionality that a > user needs. This is news to me, My point is that unless this is a very clearly and obviously presented in scipy it is an assumption only you know about :) Vincent Duplicated functionality between numpy and scipy is or was a > main reason for depreciation. There are or were cases where numpy is > different than scipy but I think these are being corrected as when these are > found. > > Many of the warnings predate the numpy and scipy documentation marathon > efforts and some depreciations may still be in tickets so it is very > doubtful that an automated system will detect either of these cases anyhow. > In the doc marathon someone will have to find these cases and deal with them > appropriately - noting, as the person who created the ticket, that some of > the scipy.stats should be gone in the tentative scipy 0.9 release. > > In the future, someone will have to come up with a rule to force > documentation change when a depreciation event occurs and then enforce it. > In fact, for numpy (as scipy does not yet have the same policy) the desired > documentation changes should be added to: > http://projects.scipy.org/numpy/wiki/ApiDeprecation > > > Bruce > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From josef.pktd at gmail.com Thu Jun 3 10:15:32 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 10:15:32 -0400 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: <4C07B72E.2050504@gmail.com> References: <4C07B72E.2050504@gmail.com> Message-ID: On Thu, Jun 3, 2010 at 10:07 AM, Bruce Southey wrote: > On 06/02/2010 10:28 PM, Benjamin Root wrote: > > As a power user of these tools, I often will encounter these warnings while > bulding my code piece-wise, however, I can easily imagine a case where a > regular user simply seeing a useful feature and spending time coding around > it, only to discover that it will soon be deprecated.? I would certainly be > annoyed in such a case. > > A quick and easy way to list deprecations would be towards the end of the > docstring, but the user might not scroll all the way down past the feature > that they found.? So, to raise visibility, such deprecation warnings should > be towards the beginning of the docstring. > > Just a thought... is it feasible for the doc building system to scan through > the function code and spot a deprecation warning and thereby be able to add > a list of deprecation warnings to the docstring?? Obviously, such warnings > would have to follow some standard format, but it would be neat if such > things could be automated. > > Just my 2 cents, > Ben Root > > On Wed, Jun 2, 2010 at 10:07 PM, David Goldsmith > wrote: >> >> On Wed, Jun 2, 2010 at 7:22 PM, Vincent Davis >> wrote: >>> >>> For example scipy.stats.stats.cov when you view source has >>> "scipy.stats.cov is deprecated; please update your code to use >>> numpy.cov." Should this be in the docs ? and is there an example of >>> how this should be pointed out. >>> This is something I actually implemented in a program then discovered >>> that is was deprecated. I would have like that to be in the online >>> docs. >>> >>> Thanks >>> Vincent >> >> I vaguely recollect this being discussed before, but I can't find anything >> about it in our docstring Standard, in our Q+A section, nor (easily) at the >> Python site (generally, when in doubt, we default to Python docstring >> standards); so, how 'bout it guys and gals: should deprecation be noted in >> docstrings and if so, where and how? >> >> DG >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > Users should first check that numpy does not have the functionality that a > user needs. Duplicated functionality between numpy and scipy is or was a > main reason for depreciation. There are or were cases where numpy is > different than scipy but I think these are being corrected as when these are > found. > > Many of the warnings predate the numpy and scipy documentation marathon > efforts and some depreciations may still be in tickets so it is very > doubtful that an automated system will detect either of these cases anyhow. > In the doc marathon someone will have to find these cases and deal with them > appropriately - noting, as the person who created the ticket, that some of > the scipy.stats should be gone in the tentative scipy 0.9 release. > > In the future, someone will have to come up with a rule to force > documentation change when a depreciation event occurs and then enforce it. > In fact, for numpy (as scipy does not yet have the same policy) the desired > documentation changes should be added to: > http://projects.scipy.org/numpy/wiki/ApiDeprecation I have never seen any guidelines or rules to add Deprecation Warnings into the docstrings. It would be good to define a standard for the docstrings first. For scipy.stats, I just copied recently the deprecation warnings to the notes section, because the notes section does not have rules for it's content. Josef > > Bruce > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From warren.weckesser at enthought.com Thu Jun 3 10:18:02 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 03 Jun 2010 09:18:02 -0500 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: References: <4C07A52D.30503@enthought.com> Message-ID: <4C07B99A.8080101@enthought.com> josef.pktd at gmail.com wrote: > On Thu, Jun 3, 2010 at 8:50 AM, Warren Weckesser > wrote: > >> stats.glm looks like it was started and then abandoned without being >> finished. It was last touched in November 2007. Should this function >> be deprecated so it can eventually be removed? >> > > My thoughts when I looked at it was roughly: > leave it alone since it's working, but don't "advertise" it because we > should get a better replacement. > How does one not advertise it? The docstring is wrong, incomplete, and not useful. It has no tests. Currently, it appears that it just duplicates ttest_ind. As far as I know, no one is working on it. Leaving it in wastes users' time reading about it. It erodes confidence in other functions in scipy: "Is foo() a good function, or has it been abandoned, like glm()?" To me, it is an ideal candidate for removal. Warren > similar to linregress the more general version will be available when > scipy.stats gets the full OLS model. > > >>>> x = (np.arange(20)>9).astype(int) >>>> y = x + np.random.randn(20) >>>> stats.glm(y,x) >>>> > (-1.7684287512254859, 0.093933208147769023) > >>>> stats.ttest_ind(y[:10], y[10:]) >>>> > (-1.7684287512254859, 0.093933208147768926) > > In the current form it doesn't do much different than ttest_ind except > for different argument structure. > > I think it could be made to work on string labels if _support.unique > is replaced by np.unique (which we are doing in statsmodels) > > >>>> x = (np.arange(20)>9).astype(str) >>>> x >>>> > array(['F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'T', 'T', 'T', > 'T', 'T', 'T', 'T', 'T', 'T', 'T'], > dtype='|S1') > >>>> stats.glm(y,x) >>>> > Traceback (most recent call last): > File "", line 1, in > stats.glm(y,x) > File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\stats.py", > line 3315, in glm > p = _support.unique(para) > File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\_support.py", > line 45, in unique > if np.add.reduce(np.equal(uniques,item).flat) == 0: > AttributeError: 'NotImplementedType' object has no attribute 'flat' > > Josef > > >> Warren >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From ralf.gommers at googlemail.com Thu Jun 3 10:19:52 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 3 Jun 2010 22:19:52 +0800 Subject: [SciPy-Dev] [SciPy-User] log pdf, cdf, etc In-Reply-To: References: <6D1C6011-7B0A-45DB-9B54-6CAE1FA38F71@enthought.com> <12883887-E601-467B-9C56-55BDA8169C19@enthought.com> Message-ID: On Wed, Jun 2, 2010 at 7:25 AM, Travis Oliphant wrote: > > On Jun 1, 2010, at 8:19 AM, Ralf Gommers wrote: > > In summary, I see quite a few reasons why this shouldn't go in and don't > see a compelling reason to release it right now. The 0.9 release is > (tentatively) planned for September, so you don't have to worry that your > changes sit in trunk unreleased for 1.5 years. > > > As the one doing the work of release manager, you have a lot of latitude in > making this decision, of course. The compelling reason to release it > right now is to get the improved features which nobody has actually voiced > specific concerns about. > > Travis, I just removed the code from 0.8.x. It's still in trunk, and with the tests and docs you added for me that is fine. With a few months to shake out possible bugs and agree on the API it will be a very useful improvement for 0.9. > Suggestions about how to give gamma.fit and beta.fit the docstring of it's > parent would be appreciated. > How about (I didn't test this): self.fit.__doc__ = rv_continuous.fit.__doc__ Best regards, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Jun 3 10:26:00 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 3 Jun 2010 22:26:00 +0800 Subject: [SciPy-Dev] {True, False} should be replaced w/ bool, correct? In-Reply-To: References: Message-ID: On Thu, Jun 3, 2010 at 10:10 PM, Vincent Davis wrote: > On Thu, Jun 3, 2010 at 4:40 AM, Ralf Gommers > wrote: > > > > > > On Thu, Jun 3, 2010 at 1:04 PM, David Goldsmith > > > wrote: > >> > >> Just checking; see, e.g., scipy.io.matlab.mio.savemat appendmat > >> parameter. (Or is it possible that the function really needs to see > either > >> the word True or the word False?) > > > > Correct, {True, False} should always be changed to bool in the docs. > > I didn't see how the "defualt" should be noted on bool options. I > think in most cases it should be clear but it might be nice is it was > explicit. > > In the description of the parameter, for example: cap : bool, optional Whether to return this string in capital letters. Default is True. Noting defaults should be done not only for bool args, but for everything that has a default. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Jun 3 10:43:02 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 3 Jun 2010 22:43:02 +0800 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: References: <4C07B72E.2050504@gmail.com> Message-ID: On Thu, Jun 3, 2010 at 10:15 PM, wrote: > On Thu, Jun 3, 2010 at 10:07 AM, Bruce Southey wrote: > > On 06/02/2010 10:28 PM, Benjamin Root wrote: > > > > > Just a thought... is it feasible for the doc building system to scan > through > > the function code and spot a deprecation warning and thereby be able to > add > > a list of deprecation warnings to the docstring? Obviously, such > warnings > > would have to follow some standard format, but it would be neat if such > > things could be automated. > There's enough docstring manipulation going on already I think, this is not that much work so manual would be better. It should be put in at the moment the deprecation takes place. > > In the future, someone will have to come up with a rule to force > documentation change when a depreciation event occurs and then enforce it. > In fact, for numpy (as scipy does not yet have the same policy) the desired > documentation changes should be added to: > http://projects.scipy.org/numpy/wiki/ApiDeprecation I have never seen any guidelines or rules to add Deprecation Warnings > into the docstrings. It would be good to define a standard for the > docstrings first. It should be made as visible as possible in my opinion. A reST warning in between summary and extended summary would work. It should clearly state in which version it will be removed. Best to keep the text identical to the one passed to the deprecate decorator. A reason or alternative should be given as well. .. warning:: `myfunc` is deprecated and will be removed in SciPy 0.9. Look at `thatfunc` for equivalent functionality. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jun 3 10:49:21 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 10:49:21 -0400 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: <4C07B99A.8080101@enthought.com> References: <4C07A52D.30503@enthought.com> <4C07B99A.8080101@enthought.com> Message-ID: On Thu, Jun 3, 2010 at 10:18 AM, Warren Weckesser wrote: > josef.pktd at gmail.com wrote: >> On Thu, Jun 3, 2010 at 8:50 AM, Warren Weckesser >> wrote: >> >>> stats.glm looks like it was started and then abandoned without being >>> finished. ?It was last touched in November 2007. ?Should this function >>> be deprecated so it can eventually be removed? >>> >> >> My thoughts when I looked at it was roughly: >> leave it alone since it's working, but don't "advertise" it because we >> should get a better replacement. >> > > How does one not advertise it? > > The docstring is wrong, incomplete, and not useful. That's it's not advertised > It has no tests. It has no tests (except for examples on my computer), but the results (for the basic case that I looked at) are correct. If we increase test coverage or start removing functions that don't have tests yet, I would work on box-cox, and several other functions in morestats.py . Mainly a question of priorities. > Currently, it appears that it just duplicates ttest_ind. ?As far as I > know, no one is working on it. > > Leaving it in wastes users' time reading about it. ?It erodes confidence > in other functions in scipy: ?"Is foo() a good function, or has it been > abandoned, like glm()?" > > To me, it is an ideal candidate for removal. If we apply strict criteria along those lines, we can reduce the size of scipy.stats.stats and scipy.stats.morestats, I guess, by at least a third. (Which I would do if I could start from scratch). A big fraction of functions in scipy.stats are in the category "no one is working on it". For glm specifically, I don't see any big cost of leaving it in, nor for deprecating it, and then I usually stick to the status-quo. But you can as well deprecate it, and point to ttest_ind. And for "bigger fish" like pdfmoments and pdf_approx, I never received a reply or opinion on the mailing list. statsmodels will have (or better, has in the sandbox) a generalization for glm, that works for any number of groups and includes both t_test and f_test. Josef > > Warren > >> similar to linregress the more general version will be available when >> scipy.stats gets the full OLS model. >> >> >>>>> x = (np.arange(20)>9).astype(int) >>>>> y = x + np.random.randn(20) >>>>> stats.glm(y,x) >>>>> >> (-1.7684287512254859, 0.093933208147769023) >> >>>>> stats.ttest_ind(y[:10], y[10:]) >>>>> >> (-1.7684287512254859, 0.093933208147768926) >> >> In the current form it doesn't do much different than ttest_ind except >> for different argument structure. >> >> I think it could be made to work on string labels if _support.unique >> is replaced by np.unique (which we are doing in statsmodels) >> >> >>>>> x = (np.arange(20)>9).astype(str) >>>>> x >>>>> >> array(['F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'T', 'T', 'T', >> ? ? ? ?'T', 'T', 'T', 'T', 'T', 'T', 'T'], >> ? ? ? dtype='|S1') >> >>>>> stats.glm(y,x) >>>>> >> Traceback (most recent call last): >> ? File "", line 1, in >> ? ? stats.glm(y,x) >> ? File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\stats.py", >> line 3315, in glm >> ? ? p = _support.unique(para) >> ? File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\_support.py", >> line 45, in unique >> ? ? if np.add.reduce(np.equal(uniques,item).flat) == 0: >> AttributeError: 'NotImplementedType' object has no attribute 'flat' >> >> Josef >> >> >>> Warren >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From ben.root at ou.edu Thu Jun 3 10:49:53 2010 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 3 Jun 2010 09:49:53 -0500 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: <4C07B72E.2050504@gmail.com> References: <4C07B72E.2050504@gmail.com> Message-ID: On Thu, Jun 3, 2010 at 9:07 AM, Bruce Southey wrote: > On 06/02/2010 10:28 PM, Benjamin Root wrote: > > As a power user of these tools, I often will encounter these warnings while > bulding my code piece-wise, however, I can easily imagine a case where a > regular user simply seeing a useful feature and spending time coding around > it, only to discover that it will soon be deprecated. I would certainly be > annoyed in such a case. > > A quick and easy way to list deprecations would be towards the end of the > docstring, but the user might not scroll all the way down past the feature > that they found. So, to raise visibility, such deprecation warnings should > be towards the beginning of the docstring. > > Just a thought... is it feasible for the doc building system to scan > through the function code and spot a deprecation warning and thereby be able > to add a list of deprecation warnings to the docstring? Obviously, such > warnings would have to follow some standard format, but it would be neat if > such things could be automated. > > Just my 2 cents, > Ben Root > > On Wed, Jun 2, 2010 at 10:07 PM, David Goldsmith wrote: > >> On Wed, Jun 2, 2010 at 7:22 PM, Vincent Davis wrote: >> >>> For example scipy.stats.stats.cov when you view source has >>> "scipy.stats.cov is deprecated; please update your code to use >>> numpy.cov." Should this be in the docs ? and is there an example of >>> how this should be pointed out. >>> This is something I actually implemented in a program then discovered >>> that is was deprecated. I would have like that to be in the online >>> docs. >>> >>> Thanks >>> Vincent >>> >> >> I vaguely recollect this being discussed before, but I can't find anything >> about it in our docstring Standard, in our Q+A section, nor (easily) at the >> Python site (generally, when in doubt, we default to Python docstring >> standards); so, how 'bout it guys and gals: should deprecation be noted in >> docstrings and if so, where and how? >> >> DG >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing listSciPy-Dev at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-dev > > Users should first check that numpy does not have the functionality that a > user needs. Duplicated functionality between numpy and scipy is or was a > main reason for depreciation. There are or were cases where numpy is > different than scipy but I think these are being corrected as when these are > found. > > I don't think that is a reasonable assumption to make for someone just learning how to use these packages. When I started using these packages myself about a year and a half ago, I remember not understanding the difference between scipy and numpy (and pylab... and matplotlib...) because they presented many of the same functions to me. At the time, I figured that I really was calling the same functions, just merely wrapped around the other, or something like that. It was quite confusing. A time evolution of my scripts would probably reveal some interesting insights into how my understanding of scipy/numpy changed. My point is that because there is so much shared functionality to the newbie user, that they will tend to treat scipy and numpy as synonymous, and the thought to check numpy's documentation will never even enter their minds. Therefore, one should be careful to note in deprecation warnings that a particular function is being deprecated because the functionality belongs in another package. That should raise awareness of the roles of the packages. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jun 3 10:52:24 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 10:52:24 -0400 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: References: <4C07B72E.2050504@gmail.com> Message-ID: On Thu, Jun 3, 2010 at 10:43 AM, Ralf Gommers wrote: > > > On Thu, Jun 3, 2010 at 10:15 PM, wrote: >> >> On Thu, Jun 3, 2010 at 10:07 AM, Bruce Southey wrote: >> > On 06/02/2010 10:28 PM, Benjamin Root wrote: >> >> > >> > Just a thought... is it feasible for the doc building system to scan >> > through >> > the function code and spot a deprecation warning and thereby be able to >> > add >> > a list of deprecation warnings to the docstring?? Obviously, such >> > warnings >> > would have to follow some standard format, but it would be neat if such >> > things could be automated. > > There's enough docstring manipulation going on already I think, this is not > that much work so manual would be better. It should be put in at the moment > the deprecation takes place. > >> >> In the future, someone will have to come up with a rule to force >> documentation change when a depreciation event occurs and then enforce it. >> In fact, for numpy (as scipy does not yet have the same policy) the >> desired >> documentation changes should be added to: >> http://projects.scipy.org/numpy/wiki/ApiDeprecation > >> I have never seen any guidelines or rules to add Deprecation Warnings >> into the docstrings. It would be good to define a standard for the >> docstrings first. > > It should be made as visible as possible in my opinion. A reST warning in > between summary and extended summary would work. It should clearly state in > which version it will be removed. Best to keep the text identical to the one > passed to the deprecate decorator. A reason or alternative should be given > as well. > > .. warning:: > ??? `myfunc` is deprecated and will be removed in SciPy 0.9. Look at > `thatfunc` for equivalent functionality. Sounds good to me, Does Sphinx and the webeditor accept warnings at that location? Josef > > Cheers, > Ralf > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From pav at iki.fi Thu Jun 3 10:52:48 2010 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 3 Jun 2010 14:52:48 +0000 (UTC) Subject: [SciPy-Dev] Warning of deprecation in doc's ? References: <4C07B72E.2050504@gmail.com> Message-ID: Thu, 03 Jun 2010 22:43:02 +0800, Ralf Gommers wrote: [clip] > It should be made as visible as possible in my opinion. A reST warning > in between summary and extended summary would work. It should clearly > state in which version it will be removed. Best to keep the text > identical to the one passed to the deprecate decorator. A reason or > alternative should be given as well. > > .. warning:: > `myfunc` is deprecated and will be removed in SciPy 0.9. Look at > `thatfunc` for equivalent functionality. Sphinx probably has a special format for deprecations. Best to use that, I believe. -- Pauli Virtanen From ralf.gommers at googlemail.com Thu Jun 3 11:01:17 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 3 Jun 2010 23:01:17 +0800 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: References: <4C07B72E.2050504@gmail.com> Message-ID: On Thu, Jun 3, 2010 at 10:52 PM, Pauli Virtanen wrote: > Thu, 03 Jun 2010 22:43:02 +0800, Ralf Gommers wrote: > [clip] > > It should be made as visible as possible in my opinion. A reST warning > > in between summary and extended summary would work. It should clearly > > state in which version it will be removed. Best to keep the text > > identical to the one passed to the deprecate decorator. A reason or > > alternative should be given as well. > > > > .. warning:: > > `myfunc` is deprecated and will be removed in SciPy 0.9. Look at > > `thatfunc` for equivalent functionality. > > Sphinx probably has a special format for deprecations. Best to use that, > I believe. > > Good point. In the Sphinx 0.6.6 docs I can't find it, but it seems there is indeed a ".. deprecated::" directive, https://bitbucket.org/birkenfeld/sphinx/issue/92/deprecated-options-not-working-in Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Thu Jun 3 11:05:45 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 03 Jun 2010 10:05:45 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> Message-ID: <4C07C4C9.6050104@gmail.com> On 06/03/2010 01:48 AM, josef.pktd at gmail.com wrote: > On Wed, Jun 2, 2010 at 4:03 PM, Bruce Southey wrote: > >> On 06/02/2010 01:41 PM, josef.pktd at gmail.com wrote: >> >> On Wed, Jun 2, 2010 at 2:18 PM, Neil Martinsen-Burrell >> wrote: >> >> >> On 2010-06-02 13:10 , Bruce Southey wrote: >> [...] >> >> >> >> However, this code is the chi-squared test part as SAS will compute the >> actual cell numbers. Also an extension to scipy.stats.chisquare() so we >> can not have both functions. >> >> >> Again, I don't understand what you mean that we can't have both >> functions? I believe (from a statistics teacher's point of view) that >> the Chi-Squared goodness of fit test (which is stats.chisquare) is a >> different beast from the Chi-Square test for independence (which is >> stats.chisquare_contingency). The fact that the distribution of the >> test statistic is the same should not tempt us to put them into the >> same function. >> >> >> Please read scipy.stats.chisquare() because scipy.stats.chisquare() is >> the 1-d case of yours. >> Quote from the docstring: >> " The chi square test tests the null hypothesis that the categorical data >> has the given frequencies." >> Also go the web site provided in the docstring. >> >> By default you get the expected frequencies but you can also put in your >> own using the f_exp variable. You could do the same in your code. >> >> >> In fact, Warren correctly used stats.chisquare with the expected >> frequencies calculated from the null hypothesis and the corrected >> degrees of freedom. chisquare_contingency is in some sense a >> convenience method for taking care of these pre-calculations before >> calling stats.chisquare. Can you explain more clearly to me why we >> should not include such a convenience function? >> >> >> Just a clarification, before I find time to work my way through the >> other comments >> >> stats.chisquare is a generic test for goodness-of-fit for discreted or >> binned distributions. >> and from the docstring of it >> "If no expected frequencies are given, the total >> N is assumed to be equally distributed across all groups." >> >> default is uniform distribution >> >> >> >> Try: >> http://en.wikipedia.org/wiki/Pearson's_chi-square_test >> >> The use of the uniform distribution is rather misleading and technically >> wrong as it does not help address the expected number of outcomes in a cell: >> > quote from the wikipedia page: > "A simple example is the hypothesis that an ordinary six-sided dice is > "fair", i.e., all six outcomes are equally likely to occur." > > I don't see anything misleading or technically wrong with the uniform > distributions, > or if they come from a Poisson, Hypergeometric, binned Normal or any > of number of other distributions. > Okay this must be only for the 1-way table as it does not apply to the 2-way or higher tables where the test is for independence between variables. There are valid technical reasons why it is misleading because saying that a random variable comes from some distribution has immutable meaning. Obviously if a random variable comes from the discrete uniform distribution then that random variable also must have a mean (N+1)/2, variance (N+1)*(N-1)/12 etc. There is nothing provided about the moments of the random variable provided under the null hypothesis so you can not say what distribution that a random variable is from. For example, the random variable could be from a beta-binomial distribution (as when alpha=beta=1 this is the discrete uniform) or binomial/multinomial with equal probabilities such that the statement 'all [the] outcomes are equally likely to occur' remains true. If you assume that your random variables are discrete uniform or any other distribution (except normal) then in general you can not assume that the Pearson's chi-squared test statistic has a specific distribution. However, in this case the Pearson's chi-squared test statistic is asymptotically chi-squared because of the normality assumption. So provided the central limit theorem is valid (not necessarily true for all distributions and for 'small' sample sizes) then this test will be asymptotically valid regardless of the assumption of the random variables in this case. >> http://en.wikipedia.org/wiki/Discrete_uniform_distribution >> >> >> chisquare_twoway is a special case that additional calculates the >> correct expected frequencies for the test of independencs based on the >> margin totals. The resulting distribution is not uniform. >> >> >> Actually the null hypothesis is rather different between 1-way and 2-way >> tables so you can not say that chisquare_twoway is a special case of >> chisquare. >> > What is the Null hypothesis in a one-way table? > > Josef > > SAS definition for 1-way table: "the null hypothesis specifies equal proportions of the total sample size for each class". This is not the same as saying a discrete uniform distribution as you are not directly testing that the cells have equal probability. But the ultimate outcome is probably not any different. Bruce >> I am not sure what you mean by the 'resulting distribution is not uniform'. >> The distribution of the cells values has nothing to do with the uniform >> distribution in either case because it is not used in the data nor in the >> formulation of the test. (And, yes, I have had to do the proof that the test >> statistic is Chi-squared - which is why there is the warning about small >> cells...). >> >> I agree with Neil that this is a very useful convenience function. >> >> >> My problem with the chisquare_twoway is that it should not call another >> function to finish two lines of code. It is just an excessive waste of >> resources. >> >> I never heard of a one-way contingency table, my question was whether >> the function should also handle 3-way or 4-way tables, additional to >> two-way. >> >> >> Correct to both of these as I just consider these as n-way tables. I think >> that contingency tables by definition only applies to the 2-d case. Pivot >> tables are essentially the same thing. I would have to lookup on how to get >> the expected number of outcomes but probably of the form Ni.. * N.j. >> *N..k/N... for the 3-way (the 2-way table is of the form Ni.*N.j/N..) for >> i=rows, j=columns, k=3rd axis and '.' means sum for that axis. >> >> I thought about the question how the input should be specified for my >> initial response, the alternative would be to use the original data or >> a "long" format instead of a table. But I thought that as a >> convenience function using the table format will be the most common >> use. >> >> I have written in the past functions that calculate the contingency >> table, and would be very useful to have a more complete coverage of >> tools to work with contingency tables in scipy.stats (or temporarily >> in statsmodels, where we are working also on the anova type of >> analysis) >> >> >> It depends on what tasks are needed. Really there are two steps: >> 1) Cross-tabulation that summarized the data from whatever input (groupby >> would help here). >> 2) Statistical tests - series of functions that accept summarized data only. >> >> If you have separate functions then the burden is on the user to find and >> call all the desired functions. You can also provide a single helper >> function to do all that because you don't want to repeat unnecessary calls. >> >> So, I think the way it is it is a nice function and we don't have to >> put all contingency table analysis into this function. >> >> Josef >> >> >> Bruce >> >> >> >> >> >> Really this should be combined with fisher.py in ticket 956: >> http://projects.scipy.org/scipy/ticket/956 >> >> >> Wow, apparently I have lots of disagreements today, but I don't think >> that this should be combined with Fisher's Exact test. (I would like >> to see that ticket mature to the point where it can be added to >> scipy.stats.) I like the functions in scipy.stats to correspond in a >> one-to-one manner with the statistical tests. I think that the docs >> should "See Also" the appropriate exact (and non-parametric) tests, >> but I think that one function/one test is a good rule. This is >> particularly true for people (like me) who would like to someday be >> able to use scipy.stats in a pedagogical context. >> >> -Neil >> >> >> I don't see any 'disagreements' rather just different ways to do things >> and identifying areas that need to be addressed for more general use. >> >> >> Agreed. :) >> >> [...] >> >> -Neil >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Thu Jun 3 11:06:40 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 3 Jun 2010 11:06:40 -0400 Subject: [SciPy-Dev] np.savetxt: apply patch in enhancement ticket 1079 to add headers? In-Reply-To: References: <4C066DA3.8010609@gmail.com> Message-ID: On Wed, Jun 2, 2010 at 1:14 PM, Stefan wrote: > >> Not that I am complaining rather trying to understand what is expected >> to happen. >> Under the patch, it is very much user beware.? The header argument can >> be anything or nothing. There is no check for the contents or if the >> delimiter used is the same as the rest of the output. Further with the >> newline option there is no guarantee that the lines in the header will >> have the same line endings throughout the file. >> So what should a user be allowed to use as a header? >> You could write a whole program there or an explanation of the >> following output - which is very appealing. You could force a list of >> strings so that you print out newline.join(header) - okay not quite >> because it should include the comment argument. >> Should savetxt be restricted to something that loadtxt can read? >> This is potentially problematic if you want a header line. Although it >> could return the number of header lines. >> [savetxt should also be updated to allow bz2 as loadtxt handles those >> now - not that I have used it] >> >> >> >> >> Also note that since that patch was written, savetxt takes a user >> supplied newline keyword, so you can just append that to the header >> string. >> >> >> >> ? True, we were not aware of this, but this does not help much for the >> comment/header. >> >> >> >> Entered as ~3 months ago:http://projects.scipy.org/numpy/changeset/8180 >> Should this be forced to check for valid options for new lines? >> Otherwise you from this? 'np.savetxt('junk.text', [1,2,3,4,5], >> newline='what')' you get: >> > 1.000000000000000000e+00what2.000000000000000000e+00what > 3.000000000000000000e+00what4.000000000000000000e+00 > what5.000000000000000000e+00what >> Which is not going to be read back by loadtxt. >> >> >> >> As numpy.loadtxt has a default comment character ('#'), the same may be >> implemented for numpy.savetxt. In this case, numpy.savetxt would get two >> additional keywords (e.g. header, comment(character)), which bloats the >> interface, but potentially provides more safety. >> >> >> >> >> FWIW, I ended up rolling my own using the most recent pre-Python 3 >> changes for savetxt that accepts a list of names instead of one string >> or if the provided array has the attribute dtype.names (non-nested rec >> or structured arrays) it uses those. ?Whatever is done I think the >> support for structured arrays is nice, and I think having this >> functionality is a no-brainer. ?I need it quite often. >> >> >> >> ? Although, we have not been using record arrays too often, we see their >> advantages and agree that it should be possible to use them as you described >> it. >> We also thought about a solution, using the __str__ method for the 'header >> object'. In this vain, an arbitrary header class (including a plane string) >> providing an __str__ member may be handed to numpy.savetxt, >> which can use it to write the header. >> > > > So let us briefly summarize whats on the table. It appears to us that > there are basically three open issues: > (1) a csv like header for savetxt written files (first line contains column > ? ?names) > (2) comments (introduced by comment character e.g. '#') at the beginning > ? ?of the file (preceding the data) > (3) the role of the 'newline' option > > As was noted, the patch (ticket 1079) enables both to write a csv like > header (1) and comment line(s) introduced by a comment character (e.g. '#'). > Nonetheless, this solution is quite unsatisfactory > in our opinion, because it may be error prone, > as the user is in charge of the entire formatting. Despite this, we think > that it should be up to the user what amount of information is to be put > at the top of the file, but the format should be checked as far as possible. > > Using either a string or a list/tuple of strings, as proposed by Bruce, > seems to be a reasonable possibility to implement the desired functionality. > Maybe two individual keywords ('header' and 'comment') should exist to > distinguish whether the the user requests case (1) or (2). As for loadtxt > the default comment character should be '#', but it may be changed by the > user. > > We think that savetxt should not be restricted to output, which can be read > by loadtxt. Although it should be possible to add commments to the output > file, so that it remains readable by loadtxt (without tweaking it > e.g. with the skiprows keyword). > Thanks. This does clear up my confusion and I think having both a header and a comments keyword makes sense. For the form, as I said, I went with a list of strings, as I encounter this more often than one string, but in the end it's all the same to me. Glad this is getting some attention. > We agree that the newline keyword may cause inconsistencies in the file > (if ticket 1079 were applied), > and possibly strange behavior such as when newline='what' is specified. > Yet, this question does not only concern the header/comments. > > Stefan & Christian > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From josef.pktd at gmail.com Thu Jun 3 11:03:34 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 11:03:34 -0400 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: References: <4C07A52D.30503@enthought.com> <4C07B99A.8080101@enthought.com> Message-ID: On Thu, Jun 3, 2010 at 10:49 AM, wrote: > On Thu, Jun 3, 2010 at 10:18 AM, Warren Weckesser > wrote: >> josef.pktd at gmail.com wrote: >>> On Thu, Jun 3, 2010 at 8:50 AM, Warren Weckesser >>> wrote: >>> >>>> stats.glm looks like it was started and then abandoned without being >>>> finished. ?It was last touched in November 2007. ?Should this function >>>> be deprecated so it can eventually be removed? >>>> >>> >>> My thoughts when I looked at it was roughly: >>> leave it alone since it's working, but don't "advertise" it because we >>> should get a better replacement. >>> >> >> How does one not advertise it? >> >> The docstring is wrong, incomplete, and not useful. > > That's it's not advertised > >> It has no tests. > > It has no tests (except for examples on my computer), but the results > (for the basic case that I looked at) are correct. > If we increase test coverage or start removing functions that don't > have tests yet, I would work on box-cox, and several other functions > in morestats.py . Mainly a question of priorities. > >> Currently, it appears that it just duplicates ttest_ind. ?As far as I >> know, no one is working on it. >> >> Leaving it in wastes users' time reading about it. ?It erodes confidence >> in other functions in scipy: ?"Is foo() a good function, or has it been >> abandoned, like glm()?" >> >> To me, it is an ideal candidate for removal. > > If we apply strict criteria along those lines, we can reduce the size > of scipy.stats.stats and scipy.stats.morestats, I guess, by at least a > third. (Which I would do if I could start from scratch). > A big fraction of functions in scipy.stats are in the category "no one > is working on it". > > For glm specifically, I don't see any big cost of leaving it in, nor > for deprecating it, and then I usually stick to the status-quo. But > you can as well deprecate it, and point to ttest_ind. > > And for "bigger fish" like pdfmoments and pdf_approx, I never received > a reply or opinion on the mailing list. > > statsmodels will have (or better, has in the sandbox) a generalization > for glm, that works for any number of groups and includes both t_test > and f_test. Actually, now that I have to think about glm again, I'm also in favor of deprecating it, since I can always point to the general version in statsmodels. Josef > > Josef > >> >> Warren >> >>> similar to linregress the more general version will be available when >>> scipy.stats gets the full OLS model. >>> >>> >>>>>> x = (np.arange(20)>9).astype(int) >>>>>> y = x + np.random.randn(20) >>>>>> stats.glm(y,x) >>>>>> >>> (-1.7684287512254859, 0.093933208147769023) >>> >>>>>> stats.ttest_ind(y[:10], y[10:]) >>>>>> >>> (-1.7684287512254859, 0.093933208147768926) >>> >>> In the current form it doesn't do much different than ttest_ind except >>> for different argument structure. >>> >>> I think it could be made to work on string labels if _support.unique >>> is replaced by np.unique (which we are doing in statsmodels) >>> >>> >>>>>> x = (np.arange(20)>9).astype(str) >>>>>> x >>>>>> >>> array(['F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'T', 'T', 'T', >>> ? ? ? ?'T', 'T', 'T', 'T', 'T', 'T', 'T'], >>> ? ? ? dtype='|S1') >>> >>>>>> stats.glm(y,x) >>>>>> >>> Traceback (most recent call last): >>> ? File "", line 1, in >>> ? ? stats.glm(y,x) >>> ? File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\stats.py", >>> line 3315, in glm >>> ? ? p = _support.unique(para) >>> ? File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\_support.py", >>> line 45, in unique >>> ? ? if np.add.reduce(np.equal(uniques,item).flat) == 0: >>> AttributeError: 'NotImplementedType' object has no attribute 'flat' >>> >>> Josef >>> >>> >>>> Warren >>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>> >>>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > From josef.pktd at gmail.com Thu Jun 3 11:22:42 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 11:22:42 -0400 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C07C4C9.6050104@gmail.com> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> <4C07C4C9.6050104@gmail.com> Message-ID: On Thu, Jun 3, 2010 at 11:05 AM, Bruce Southey wrote: > On 06/03/2010 01:48 AM, josef.pktd at gmail.com wrote: > > On Wed, Jun 2, 2010 at 4:03 PM, Bruce Southey wrote: > > > On 06/02/2010 01:41 PM, josef.pktd at gmail.com wrote: > > On Wed, Jun 2, 2010 at 2:18 PM, Neil Martinsen-Burrell > wrote: > > > On 2010-06-02 13:10 , Bruce Southey wrote: > [...] > > > > However, this code is the chi-squared test part as SAS will compute the > actual cell numbers. Also an extension to scipy.stats.chisquare() so we > can not have both functions. > > > Again, I don't understand what you mean that we can't have both > functions? I believe (from a statistics teacher's point of view) that > the Chi-Squared goodness of fit test (which is stats.chisquare) is a > different beast from the Chi-Square test for independence (which is > stats.chisquare_contingency). The fact that the distribution of the > test statistic is the same should not tempt us to put them into the > same function. > > > Please read scipy.stats.chisquare() because scipy.stats.chisquare() is > the 1-d case of yours. > Quote from the docstring: > " The chi square test tests the null hypothesis that the categorical data > has the given frequencies." > Also go the web site provided in the docstring. > > By default you get the expected frequencies but you can also put in your > own using the f_exp variable. You could do the same in your code. > > > In fact, Warren correctly used stats.chisquare with the expected > frequencies calculated from the null hypothesis and the corrected > degrees of freedom. ?chisquare_contingency is in some sense a > convenience method for taking care of these pre-calculations before > calling stats.chisquare. ?Can you explain more clearly to me why we > should not include such a convenience function? > > > Just a clarification, before I find time to work my way through the > other comments > > stats.chisquare is a generic test for goodness-of-fit for discreted or > binned distributions. > and from the docstring of it > "If no expected frequencies are given, the total > N is assumed to be equally distributed across all groups." > > default is uniform distribution > > > > Try: > http://en.wikipedia.org/wiki/Pearson's_chi-square_test > > The use of the uniform distribution is rather misleading and technically > wrong as it does not help address the expected number of outcomes in a cell: > > > quote from the wikipedia page: > "A simple example is the hypothesis that an ordinary six-sided dice is > "fair", i.e., all six outcomes are equally likely to occur." > > I don't see anything misleading or technically wrong with the uniform > distributions, > or if they come from a Poisson, Hypergeometric, binned Normal or any > of number of other distributions. > > > Okay this must be only for the 1-way table as it does not apply to the 2-way > or higher tables where the test is for independence between variables. I'm talking about a completely different strand of literature, e.g. a commercial program specialized on this http://www.mathwave.com/articles/goodness_of_fit.html#cs And never think of tables when I look at goodness-of-fit tests. I haven't seen yet a case where the asymptotic results for the chisquare test doesn't apply. > > There are valid technical reasons why it is misleading because saying that a > random variable comes from some distribution has immutable meaning. > Obviously if a random variable comes from the discrete uniform distribution > then that random variable also must have a mean (N+1)/2,? variance > (N+1)*(N-1)/12 etc. There is nothing provided about the moments of the > random variable provided under the null hypothesis so you can not say what > distribution that a random variable is from. For example, the random > variable could be from a beta-binomial distribution (as when alpha=beta=1 > this is the discrete uniform) or binomial/multinomial with equal > probabilities such that the statement 'all [the] outcomes are equally likely > to occur' remains true. > > If you assume that your random variables are discrete uniform or any other > distribution (except normal) then in general you can not assume that the > Pearson's chi-squared test statistic has a specific distribution. However, > in this case the Pearson's chi-squared test statistic is asymptotically > chi-squared because of the normality assumption. So provided the central > limit theorem is valid (not necessarily true for all distributions and for > 'small' sample sizes) then this test will be asymptotically valid regardless > of the assumption of the random variables in this case. > > http://en.wikipedia.org/wiki/Discrete_uniform_distribution > > > chisquare_twoway is a special case that additional calculates the > correct expected frequencies for the test of independencs based on the > margin totals. The resulting distribution is not uniform. > > > Actually the null hypothesis is rather different between 1-way and 2-way > tables so you can not say that chisquare_twoway is a special case of > chisquare. > > > What is the Null hypothesis in a one-way table? > > Josef > > > > SAS definition for 1-way table: "the null hypothesis specifies equal > proportions of the total sample size for each class". This is not the same > as saying a discrete uniform distribution as you are not directly testing > that the cells have equal probability. But the ultimate outcome is probably > not any different. Ok, I will have to look at this (when I have time), in my opinion this is inconsistent with the interpretation of a test for independence in a two-way or three-way table. Josef > > Bruce > > > I am not sure what you mean by the 'resulting distribution is not uniform'. > The distribution of the cells values has nothing to do with the uniform > distribution in either case because it is not used in the data nor in the > formulation of the test. (And, yes, I have had to do the proof that the test > statistic is Chi-squared - which is why there is the warning about small > cells...). > > I agree with Neil that this is a very useful convenience function. > > > My problem with the chisquare_twoway is that it should not call another > function to finish two lines of code. It is just an excessive waste of > resources. > > I never heard of a one-way contingency table, my question was whether > the function should also handle 3-way or 4-way tables, additional to > two-way. > > > Correct to both of these as I just consider these as n-way tables. I think > that contingency tables by definition only applies to the 2-d case. Pivot > tables are essentially the same thing. I would have to lookup on how to get > the expected number of outcomes but probably of the form Ni.. * N.j. > *N..k/N... for the 3-way (the 2-way table is of the form Ni.*N.j/N..) for > i=rows, j=columns, k=3rd axis and '.' means sum for that axis. > > I thought about the question how the input should be specified for my > initial response, the alternative would be to use the original data or > a "long" format instead of a table. But I thought that as a > convenience function using the table format will be the most common > use. > > I have written in the past functions that calculate the contingency > table, and would be very useful to have a more complete coverage of > tools to work with contingency tables in scipy.stats (or temporarily > in statsmodels, where we are working also on the anova type of > analysis) > > > It depends on what tasks are needed.? Really there are two steps: > 1) Cross-tabulation that summarized the data from whatever input (groupby > would help here). > 2) Statistical tests - series of functions that accept summarized data only. > > If you have separate functions then the burden is on the user to find and > call all the desired functions. You can also provide a single helper > function to do all that because you don't want to repeat unnecessary calls. > > So, I think the way it is it is a nice function and we don't have to > put all contingency table analysis into this function. > > Josef > > > Bruce > > > > > > Really this should be combined with fisher.py in ticket 956: > http://projects.scipy.org/scipy/ticket/956 > > > Wow, apparently I have lots of disagreements today, but I don't think > that this should be combined with Fisher's Exact test. (I would like > to see that ticket mature to the point where it can be added to > scipy.stats.) I like the functions in scipy.stats to correspond in a > one-to-one manner with the statistical tests. I think that the docs > should "See Also" the appropriate exact (and non-parametric) tests, > but I think that one function/one test is a good rule. This is > particularly true for people (like me) who would like to someday be > able to use scipy.stats in a pedagogical context. > > -Neil > > > I don't see any 'disagreements' rather just different ways to do things > and identifying areas that need to be addressed for more general use. > > > Agreed. :) > > [...] > > -Neil > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From njs at pobox.com Thu Jun 3 11:32:59 2010 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 3 Jun 2010 08:32:59 -0700 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: References: <4C07A52D.30503@enthought.com> Message-ID: On Thu, Jun 3, 2010 at 6:38 AM, wrote: > On Thu, Jun 3, 2010 at 8:50 AM, Warren Weckesser > wrote: >> stats.glm looks like it was started and then abandoned without being >> finished. ?It was last touched in November 2007. ?Should this function >> be deprecated so it can eventually be removed? > > My thoughts when I looked at it was roughly: > leave it alone since it's working, but don't "advertise" it because we > should get a better replacement. > similar to linregress the more general version will be available when > scipy.stats gets the full OLS model. Wait, what does 'glm' have to do with OLS (or t-tests) anyway? Surely if anything it *should* be a function that fits, you know, GLMs (generalized linear models)? I guess this is a vote for removing it, because GLMs are one of the fundamental stats models that people will look for, and having some weird, broken, other thing in the obvious place is just confusing and looks really bad. -- Nathaniel From warren.weckesser at enthought.com Thu Jun 3 11:51:42 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 03 Jun 2010 10:51:42 -0500 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: References: <4C07A52D.30503@enthought.com> <4C07B99A.8080101@enthought.com> Message-ID: <4C07CF8E.4000906@enthought.com> josef.pktd at gmail.com wrote: > On Thu, Jun 3, 2010 at 10:49 AM, wrote: > >> On Thu, Jun 3, 2010 at 10:18 AM, Warren Weckesser >> wrote: >> >>> josef.pktd at gmail.com wrote: >>> >>>> On Thu, Jun 3, 2010 at 8:50 AM, Warren Weckesser >>>> wrote: >>>> >>>> >>>>> stats.glm looks like it was started and then abandoned without being >>>>> finished. It was last touched in November 2007. Should this function >>>>> be deprecated so it can eventually be removed? >>>>> >>>>> >>>> My thoughts when I looked at it was roughly: >>>> leave it alone since it's working, but don't "advertise" it because we >>>> should get a better replacement. >>>> >>>> >>> How does one not advertise it? >>> >>> The docstring is wrong, incomplete, and not useful. >>> >> That's it's not advertised >> >> >>> It has no tests. >>> >> It has no tests (except for examples on my computer), but the results >> (for the basic case that I looked at) are correct. >> If we increase test coverage or start removing functions that don't >> have tests yet, I would work on box-cox, and several other functions >> in morestats.py . Mainly a question of priorities. >> >> >>> Currently, it appears that it just duplicates ttest_ind. As far as I >>> know, no one is working on it. >>> >>> Leaving it in wastes users' time reading about it. It erodes confidence >>> in other functions in scipy: "Is foo() a good function, or has it been >>> abandoned, like glm()?" >>> >>> To me, it is an ideal candidate for removal. >>> >> If we apply strict criteria along those lines, we can reduce the size >> of scipy.stats.stats and scipy.stats.morestats, I guess, by at least a >> third. (Which I would do if I could start from scratch). >> A big fraction of functions in scipy.stats are in the category "no one >> is working on it". >> >> For glm specifically, I don't see any big cost of leaving it in, nor >> for deprecating it, and then I usually stick to the status-quo. But >> you can as well deprecate it, and point to ttest_ind. >> >> And for "bigger fish" like pdfmoments and pdf_approx, I never received >> a reply or opinion on the mailing list. >> >> statsmodels will have (or better, has in the sandbox) a generalization >> for glm, that works for any number of groups and includes both t_test >> and f_test. >> > > Actually, now that I have to think about glm again, I'm also in favor > of deprecating it, since I can always point to the general version in > statsmodels. > > Josef > > Heh... meanwhile I'm starting to think that my call for deprecation was premature, and maybe all it really needs is an updated, accurate docstring that explains what the current implementation does. :) Warren > > > >> Josef >> >> >>> Warren >>> >>> >>>> similar to linregress the more general version will be available when >>>> scipy.stats gets the full OLS model. >>>> >>>> >>>> >>>>>>> x = (np.arange(20)>9).astype(int) >>>>>>> y = x + np.random.randn(20) >>>>>>> stats.glm(y,x) >>>>>>> >>>>>>> >>>> (-1.7684287512254859, 0.093933208147769023) >>>> >>>> >>>>>>> stats.ttest_ind(y[:10], y[10:]) >>>>>>> >>>>>>> >>>> (-1.7684287512254859, 0.093933208147768926) >>>> >>>> In the current form it doesn't do much different than ttest_ind except >>>> for different argument structure. >>>> >>>> I think it could be made to work on string labels if _support.unique >>>> is replaced by np.unique (which we are doing in statsmodels) >>>> >>>> >>>> >>>>>>> x = (np.arange(20)>9).astype(str) >>>>>>> x >>>>>>> >>>>>>> >>>> array(['F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'T', 'T', 'T', >>>> 'T', 'T', 'T', 'T', 'T', 'T', 'T'], >>>> dtype='|S1') >>>> >>>> >>>>>>> stats.glm(y,x) >>>>>>> >>>>>>> >>>> Traceback (most recent call last): >>>> File "", line 1, in >>>> stats.glm(y,x) >>>> File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\stats.py", >>>> line 3315, in glm >>>> p = _support.unique(para) >>>> File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\_support.py", >>>> line 45, in unique >>>> if np.add.reduce(np.equal(uniques,item).flat) == 0: >>>> AttributeError: 'NotImplementedType' object has no attribute 'flat' >>>> >>>> Josef >>>> >>>> >>>> >>>>> Warren >>>>> >>>>> _______________________________________________ >>>>> SciPy-Dev mailing list >>>>> SciPy-Dev at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>> >>>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From josef.pktd at gmail.com Thu Jun 3 11:53:41 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 11:53:41 -0400 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: References: <4C07A52D.30503@enthought.com> Message-ID: On Thu, Jun 3, 2010 at 11:32 AM, Nathaniel Smith wrote: > On Thu, Jun 3, 2010 at 6:38 AM, ? wrote: >> On Thu, Jun 3, 2010 at 8:50 AM, Warren Weckesser >> wrote: >>> stats.glm looks like it was started and then abandoned without being >>> finished. ?It was last touched in November 2007. ?Should this function >>> be deprecated so it can eventually be removed? >> >> My thoughts when I looked at it was roughly: >> leave it alone since it's working, but don't "advertise" it because we >> should get a better replacement. >> similar to linregress the more general version will be available when >> scipy.stats gets the full OLS model. > > Wait, what does 'glm' have to do with OLS (or t-tests) anyway? Surely > if anything it *should* be a function that fits, you know, GLMs > (generalized linear models)? > > I guess this is a vote for removing it, because GLMs are one of the > fundamental stats models that people will look for, and having some > weird, broken, other thing in the obvious place is just confusing and > looks really bad. That was my initial impression a long time ago. GLM as in general linear model not generalized. (It's the worst conflicting acronym in stats). The function actually estimates a GLM, it construct a binary dummy variable from the label data to get the design matrix, estimates it with OLS, calculates the t-statistic and the corresponding p-value. But then it becomes like the ttest_ind because it only returns the t-statistic and the corresponding p-value. I don't remember seeing any previous comments about it on the mailing list, but it would be a prime candidate for "finishing" it. (except finishing it requires a full module on it's own.) The discussion what glm (general linear model) has to do with ols fills already many pages on the pystatsmodels mailing list. (GLM in statsmodels is generalized linear model) Josef > > -- Nathaniel > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From josef.pktd at gmail.com Thu Jun 3 12:03:23 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 12:03:23 -0400 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: <4C07CF8E.4000906@enthought.com> References: <4C07A52D.30503@enthought.com> <4C07B99A.8080101@enthought.com> <4C07CF8E.4000906@enthought.com> Message-ID: On Thu, Jun 3, 2010 at 11:51 AM, Warren Weckesser wrote: > josef.pktd at gmail.com wrote: >> On Thu, Jun 3, 2010 at 10:49 AM, ? wrote: >> >>> On Thu, Jun 3, 2010 at 10:18 AM, Warren Weckesser >>> wrote: >>> >>>> josef.pktd at gmail.com wrote: >>>> >>>>> On Thu, Jun 3, 2010 at 8:50 AM, Warren Weckesser >>>>> wrote: >>>>> >>>>> >>>>>> stats.glm looks like it was started and then abandoned without being >>>>>> finished. ?It was last touched in November 2007. ?Should this function >>>>>> be deprecated so it can eventually be removed? >>>>>> >>>>>> >>>>> My thoughts when I looked at it was roughly: >>>>> leave it alone since it's working, but don't "advertise" it because we >>>>> should get a better replacement. >>>>> >>>>> >>>> How does one not advertise it? >>>> >>>> The docstring is wrong, incomplete, and not useful. >>>> >>> That's it's not advertised >>> >>> >>>> It has no tests. >>>> >>> It has no tests (except for examples on my computer), but the results >>> (for the basic case that I looked at) are correct. >>> If we increase test coverage or start removing functions that don't >>> have tests yet, I would work on box-cox, and several other functions >>> in morestats.py . Mainly a question of priorities. >>> >>> >>>> Currently, it appears that it just duplicates ttest_ind. ?As far as I >>>> know, no one is working on it. >>>> >>>> Leaving it in wastes users' time reading about it. ?It erodes confidence >>>> in other functions in scipy: ?"Is foo() a good function, or has it been >>>> abandoned, like glm()?" >>>> >>>> To me, it is an ideal candidate for removal. >>>> >>> If we apply strict criteria along those lines, we can reduce the size >>> of scipy.stats.stats and scipy.stats.morestats, I guess, by at least a >>> third. (Which I would do if I could start from scratch). >>> A big fraction of functions in scipy.stats are in the category "no one >>> is working on it". >>> >>> For glm specifically, I don't see any big cost of leaving it in, nor >>> for deprecating it, and then I usually stick to the status-quo. But >>> you can as well deprecate it, and point to ttest_ind. >>> >>> And for "bigger fish" like pdfmoments and pdf_approx, I never received >>> a reply or opinion on the mailing list. >>> >>> statsmodels will have (or better, has in the sandbox) a generalization >>> for glm, that works for any number of groups and includes both t_test >>> and f_test. >>> >> >> Actually, now that I have to think about glm again, I'm also in favor >> of deprecating it, since I can always point to the general version in >> statsmodels. >> >> Josef >> >> > > Heh... meanwhile I'm starting to think that my call for deprecation was > premature, and maybe all it really needs is an updated, accurate > docstring that explains what the current implementation does. ?:) You should stay firm to compensate for my reluctance to change things that are not (obviously or really) broken. :) As, I said I'm really pretty indifferent in this case. (But I wouldn't want to see wide spread use of it, because as Nathaniel said, the name is very misleading for the current result.) So, if you want to keep it mention clearly that it only does a ttest. Josef > > Warren > >> >> >> >>> Josef >>> >>> >>>> Warren >>>> >>>> >>>>> similar to linregress the more general version will be available when >>>>> scipy.stats gets the full OLS model. >>>>> >>>>> >>>>> >>>>>>>> x = (np.arange(20)>9).astype(int) >>>>>>>> y = x + np.random.randn(20) >>>>>>>> stats.glm(y,x) >>>>>>>> >>>>>>>> >>>>> (-1.7684287512254859, 0.093933208147769023) >>>>> >>>>> >>>>>>>> stats.ttest_ind(y[:10], y[10:]) >>>>>>>> >>>>>>>> >>>>> (-1.7684287512254859, 0.093933208147768926) >>>>> >>>>> In the current form it doesn't do much different than ttest_ind except >>>>> for different argument structure. >>>>> >>>>> I think it could be made to work on string labels if _support.unique >>>>> is replaced by np.unique (which we are doing in statsmodels) >>>>> >>>>> >>>>> >>>>>>>> x = (np.arange(20)>9).astype(str) >>>>>>>> x >>>>>>>> >>>>>>>> >>>>> array(['F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'T', 'T', 'T', >>>>> ? ? ? ?'T', 'T', 'T', 'T', 'T', 'T', 'T'], >>>>> ? ? ? dtype='|S1') >>>>> >>>>> >>>>>>>> stats.glm(y,x) >>>>>>>> >>>>>>>> >>>>> Traceback (most recent call last): >>>>> ? File "", line 1, in >>>>> ? ? stats.glm(y,x) >>>>> ? File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\stats.py", >>>>> line 3315, in glm >>>>> ? ? p = _support.unique(para) >>>>> ? File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\_support.py", >>>>> line 45, in unique >>>>> ? ? if np.add.reduce(np.equal(uniques,item).flat) == 0: >>>>> AttributeError: 'NotImplementedType' object has no attribute 'flat' >>>>> >>>>> Josef >>>>> >>>>> >>>>> >>>>>> Warren >>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-Dev mailing list >>>>>> SciPy-Dev at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> SciPy-Dev mailing list >>>>> SciPy-Dev at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>>> >>>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>> >>>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From njs at pobox.com Thu Jun 3 12:16:22 2010 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 3 Jun 2010 09:16:22 -0700 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: References: <4C07A52D.30503@enthought.com> Message-ID: On Thu, Jun 3, 2010 at 8:53 AM, wrote: > GLM as in general linear model not generalized. (It's the worst > conflicting acronym in stats). Sure, and lets not even talk about generalized least squares (unrelated to both!). But the general linear model is basically identical to a simple linear model, both in interface and implementation. There's no reason to have a separate function for it, one should just accept a matrix for the "y" variable in the OLS code. But *generalized* linear models are different in interface, implementation, and are almost as much of a stats workhorse as standard linear models. So every book I've ever seen uses the abbreviation "glm" to refer to the generalized version. (Also, this is what R calls the function ;-).) The implementation of dummy coding is kind of useful, but this is the wrong place and the wrong name... (Also, its least squares implementation calls inv -- the textbook example of bad numerics!) ...Okay, you know all that anyway, the question is what to do with it. If the problem were just that it needed a better implementation and some new features added, then maybe we would keep it and let it be improved incrementally. But the interface is just wrong, so we'll be removing it sooner or later, and it might as well be sooner, rather than prolong the agony. -- Nathaniel From josef.pktd at gmail.com Thu Jun 3 12:31:25 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 12:31:25 -0400 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: References: <4C07A52D.30503@enthought.com> Message-ID: On Thu, Jun 3, 2010 at 12:16 PM, Nathaniel Smith wrote: > On Thu, Jun 3, 2010 at 8:53 AM, ? wrote: >> GLM as in general linear model not generalized. (It's the worst >> conflicting acronym in stats). > > Sure, and lets not even talk about generalized least squares > (unrelated to both!). > > But the general linear model is basically identical to a simple linear > model, both in interface and implementation. There's no reason to have > a separate function for it, one should just accept a matrix for the > "y" variable in the OLS code. But *generalized* linear models are > different in interface, implementation, and are almost as much of a > stats workhorse as standard linear models. So every book I've ever > seen uses the abbreviation "glm" to refer to the generalized version. > (Also, this is what R calls the function ;-).) coming more from the econometrics side, I never heard of "generalized" until two years ago, and glm was always general linear model, (scikits.learn and many other packages use it in this definition) > > The implementation of dummy coding is kind of useful, but this is the > wrong place and the wrong name... > > (Also, its least squares implementation calls inv -- the textbook > example of bad numerics!) > > ...Okay, you know all that anyway, the question is what to do with it. > If the problem were just that it needed a better implementation and > some new features added, then maybe we would keep it and let it be > improved incrementally. But the interface is just wrong, so we'll be > removing it sooner or later, and it might as well be sooner, rather > than prolong the agony. Actually my version for stats.glm, as a test not as an estimation model uses least squares in the name, but has a similar interface http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head%3A/scikits/statsmodels/sandbox/regression/onewaygls.py class OneWayLS(object): '''Class to test equality of regression coefficients across groups This class performs tests whether the linear regression coefficients are the same across pre-specified groups. This can be used to test for structural breaks at given change points, or for ANOVA style analysis of differences in the effect of explanatory variables across groups. I don't see a way to provide a "better implementation and add some new features" without going full scale. That's why I agree now with deprecation, since after this thread it's not a hidden legacy/fossil anymore. Josef > > -- Nathaniel > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From vincent at vincentdavis.net Thu Jun 3 12:58:25 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 3 Jun 2010 10:58:25 -0600 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: References: <4C07B72E.2050504@gmail.com> Message-ID: On Thu, Jun 3, 2010 at 8:43 AM, Ralf Gommers wrote: > > > On Thu, Jun 3, 2010 at 10:15 PM, wrote: >> >> On Thu, Jun 3, 2010 at 10:07 AM, Bruce Southey wrote: >> > On 06/02/2010 10:28 PM, Benjamin Root wrote: >> >> > >> > Just a thought... is it feasible for the doc building system to scan >> > through >> > the function code and spot a deprecation warning and thereby be able to >> > add >> > a list of deprecation warnings to the docstring?? Obviously, such >> > warnings >> > would have to follow some standard format, but it would be neat if such >> > things could be automated. > > There's enough docstring manipulation going on already I think, this is not > that much work so manual would be better. It should be put in at the moment > the deprecation takes place. > >> >> In the future, someone will have to come up with a rule to force >> documentation change when a depreciation event occurs and then enforce it. >> In fact, for numpy (as scipy does not yet have the same policy) the >> desired >> documentation changes should be added to: >> http://projects.scipy.org/numpy/wiki/ApiDeprecation > >> I have never seen any guidelines or rules to add Deprecation Warnings >> into the docstrings. It would be good to define a standard for the >> docstrings first. > > It should be made as visible as possible in my opinion. A reST warning in > between summary and extended summary would work. It should clearly state in > which version it will be removed. Best to keep the text identical to the one > passed to the deprecate decorator. A reason or alternative should be given > as well. I would prefer to see it at the very top. If there is an easily available alternative why would I as a user not what to immediately view that alternative? If I am already using it then it is a good remider. Why put it after the summary? Vincent > .. warning:: > ??? `myfunc` is deprecated and will be removed in SciPy 0.9. Look at > `thatfunc` for equivalent functionality. > > Cheers, > Ralf > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From josef.pktd at gmail.com Thu Jun 3 12:59:01 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 12:59:01 -0400 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: References: <4C07A52D.30503@enthought.com> Message-ID: On Thu, Jun 3, 2010 at 12:31 PM, wrote: > On Thu, Jun 3, 2010 at 12:16 PM, Nathaniel Smith wrote: >> On Thu, Jun 3, 2010 at 8:53 AM, ? wrote: >>> GLM as in general linear model not generalized. (It's the worst >>> conflicting acronym in stats). >> >> Sure, and lets not even talk about generalized least squares >> (unrelated to both!). >> >> But the general linear model is basically identical to a simple linear >> model, both in interface and implementation. There's no reason to have >> a separate function for it, one should just accept a matrix for the >> "y" variable in the OLS code. But *generalized* linear models are >> different in interface, implementation, and are almost as much of a >> stats workhorse as standard linear models. So every book I've ever >> seen uses the abbreviation "glm" to refer to the generalized version. >> (Also, this is what R calls the function ;-).) > > coming more from the econometrics side, I never heard of "generalized" > until two years ago, and glm was always general linear model, > (scikits.learn and many other packages use it in this definition) > > >> >> The implementation of dummy coding is kind of useful, but this is the >> wrong place and the wrong name... >> >> (Also, its least squares implementation calls inv -- the textbook >> example of bad numerics!) >> >> ...Okay, you know all that anyway, the question is what to do with it. >> If the problem were just that it needed a better implementation and >> some new features added, then maybe we would keep it and let it be >> improved incrementally. But the interface is just wrong, so we'll be >> removing it sooner or later, and it might as well be sooner, rather >> than prolong the agony. > > Actually my version for stats.glm, as a test not as an estimation > model uses least squares in the name, but has a similar interface > > http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head%3A/scikits/statsmodels/sandbox/regression/onewaygls.py > > class OneWayLS(object): > '''Class to test equality of regression coefficients across groups > > This class performs tests whether the linear regression coefficients are > the same across pre-specified groups. This can be used to test for > structural breaks at given change points, or for ANOVA style analysis of > differences in the effect of explanatory variables across groups. Actually, I don't have ttest results, because I only look at the general case with two or more groups and only ftest is relevant in this case, so the simplest case of it is similar to stats.f_oneway not stats.glm http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head%3A/scikits/statsmodels/sandbox/examples/ex_onewaygls.py#L99 And thanks Warren and Nathaniel for voicing some strong opinions, it's very useful to break my indifference (economic utility definition). Josef > > I don't see a way to provide a "better implementation and add some new > features" without going full scale. > > That's why I agree now with deprecation, since after this thread it's > not a hidden legacy/fossil anymore. > > Josef > >> >> -- Nathaniel >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > From bsouthey at gmail.com Thu Jun 3 13:14:44 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 03 Jun 2010 12:14:44 -0500 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: References: <4C07A52D.30503@enthought.com> Message-ID: <4C07E304.8040503@gmail.com> On 06/03/2010 10:32 AM, Nathaniel Smith wrote: > On Thu, Jun 3, 2010 at 6:38 AM, wrote: > >> On Thu, Jun 3, 2010 at 8:50 AM, Warren Weckesser >> wrote: >> >>> stats.glm looks like it was started and then abandoned without being >>> finished. It was last touched in November 2007. Should this function >>> be deprecated so it can eventually be removed? >>> >> My thoughts when I looked at it was roughly: >> leave it alone since it's working, but don't "advertise" it because we >> should get a better replacement. >> similar to linregress the more general version will be available when >> scipy.stats gets the full OLS model. >> > Wait, what does 'glm' have to do with OLS (or t-tests) anyway? Surely > if anything it *should* be a function that fits, you know, GLMs > (generalized linear models)? > > I guess this is a vote for removing it, because GLMs are one of the > fundamental stats models that people will look for, and having some > weird, broken, other thing in the obvious place is just confusing and > looks really bad. > > -- Nathaniel > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > Perhaps people should actually read the code before jumping to incorrect conclusions. It is not similar to linregress unless you know how to 'trick' linreg. Granted that stats.glm is a crippled but it is well intended (like most things in scipy.stats). The docstring intended it to general linear models such as SAS's glm procedure and R's glm function (without generalized part). At present is just does 1-way anova with only two levels but could do more. >>> drug=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2] >>> postrt=[6, 0, 2, 8, 11, 4, 13, 1, 8, 0, 0, 2, 3, 1, 18, 4, 14, 9, 1, 9, 13, 10, 18, 5, 23, 12, 5, 16, 1, 20] >>> t_val,t_probs=stats.glm(postrt,drug) >>> t_val -1.5463854661015379 >>> t_probs 0.13324062984741347 >>> idrug=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] #create dummies to trick linreg >>> print stats.linregress(idrug, postrt) (-3.9000000000000044, 9.2000000000000011, -0.280506586484015, 0.13324062984741378, 2.5220102526131258) >>> -3.9000000000000044/2.5220102526131258 #this is the t-value of stats.glm -1.5463854661015373 I have major concerns about depreciating code when there is no alternative proposed for such an important statistical function. As David has said elsewhere, this is just Python code and has little or no maintenance cost. The full solution is probably Jonathan Taylor's glm class but that uses the formula class and is for generalized linear models. However, I don't see that in scipy anywhere soon. So the options are: 1) Rewrite the internals to fix address the current limitation - not hard but would need an API change and more importantly better options exist. 2) OLS is a superior version to linregress but needs changes to get ANOVA etc added http://www.scipy.org/Cookbook/OLS 3) The best candidate that I know that can replace both stats.linregress and stats.glm is Skipper's try_ols_anova.py code from pystatsmodel (at least posted on the list). But I am not sure what the current state of that is. 4) Some other option? Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Thu Jun 3 13:35:37 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 3 Jun 2010 10:35:37 -0700 Subject: [SciPy-Dev] {True, False} should be replaced w/ bool, correct? In-Reply-To: References: Message-ID: On Thu, Jun 3, 2010 at 7:26 AM, Ralf Gommers wrote: > > > On Thu, Jun 3, 2010 at 10:10 PM, Vincent Davis wrote: > >> On Thu, Jun 3, 2010 at 4:40 AM, Ralf Gommers >> wrote: >> > >> > >> > On Thu, Jun 3, 2010 at 1:04 PM, David Goldsmith < >> d.l.goldsmith at gmail.com> >> > wrote: >> >> >> >> Just checking; see, e.g., scipy.io.matlab.mio.savemat appendmat >> >> parameter. (Or is it possible that the function really needs to see >> either >> >> the word True or the word False?) >> > >> > Correct, {True, False} should always be changed to bool in the docs. >> >> I didn't see how the "defualt" should be noted on bool options. I >> think in most cases it should be clear but it might be nice is it was >> explicit. >> >> In the description of the parameter, for example: > cap : bool, optional > Whether to return this string in capital letters. Default is True. > > Noting defaults should be done not only for bool args, but for everything > that has a default. > > Cheers, > Ralf > Thanks, Ralf. Let me just add that if the default isn't clear in the existing docs, it may be necessary, as the editor, to look at the source. Thanks again. DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Thu Jun 3 13:48:06 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 3 Jun 2010 11:48:06 -0600 Subject: [SciPy-Dev] {True, False} should be replaced w/ bool, correct? In-Reply-To: References: Message-ID: n Thursday, June 3, 2010, David Goldsmith wrote: > On Thu, Jun 3, 2010 at 7:26 AM, Ralf Gommers wrote: > > > > On Thu, Jun 3, 2010 at 10:10 PM, Vincent Davis wrote: > > On Thu, Jun 3, 2010 at 4:40 AM, Ralf Gommers > wrote: >> >> >> On Thu, Jun 3, 2010 at 1:04 PM, David Goldsmith >> wrote: >>> >>> Just checking; see, e.g., scipy.io.matlab.mio.savemat appendmat >>> parameter.? (Or is it possible that the function really needs to see either >>> the word True or the word False?) >> >> Correct, {True, False} should always be changed to bool in the docs. > > I didn't see how the "defualt" should be noted on bool options. I > think in most cases it should be clear but it might be nice is it was > explicit. > > In the description of the parameter, for example: > cap : bool, optional > ? ? Whether to return this string in capital letters. Default is True. > > Noting defaults should be done not only for bool args, but for everything that has a default. Not sure what I was looking at but it did not state the default, that's way I ask and I didn't see anything in the guide but I might have missed it. vincent > > Cheers, > Ralf > > Thanks, Ralf.? Let me just add that if the default isn't clear in the existing docs, it may be necessary, as the editor, to look at the source.? Thanks again. > > DG > > From d.l.goldsmith at gmail.com Thu Jun 3 13:49:15 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 3 Jun 2010 10:49:15 -0700 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: References: <4C07B72E.2050504@gmail.com> Message-ID: OK, we're getting enough proposed content here that I think a formal modification of the docstring Standard is warranted; accordingly, I'm going to file a ticket. I'll post the link here and then if you want to be on the notification-of-ticket-changes list you can go there and add yourself. That way, this discussion of where this entry should live, what it should contain, how it should be formatted, etc., etc., will be in a more appropriate, easier to find place. Back shortly. DG On Thu, Jun 3, 2010 at 9:58 AM, Vincent Davis wrote: > On Thu, Jun 3, 2010 at 8:43 AM, Ralf Gommers > wrote: > > > > > > On Thu, Jun 3, 2010 at 10:15 PM, wrote: > >> > >> On Thu, Jun 3, 2010 at 10:07 AM, Bruce Southey > wrote: > >> > On 06/02/2010 10:28 PM, Benjamin Root wrote: > >> > >> > > >> > Just a thought... is it feasible for the doc building system to scan > >> > through > >> > the function code and spot a deprecation warning and thereby be able > to > >> > add > >> > a list of deprecation warnings to the docstring? Obviously, such > >> > warnings > >> > would have to follow some standard format, but it would be neat if > such > >> > things could be automated. > > > > There's enough docstring manipulation going on already I think, this is > not > > that much work so manual would be better. It should be put in at the > moment > > the deprecation takes place. > > > >> > >> In the future, someone will have to come up with a rule to force > >> documentation change when a depreciation event occurs and then enforce > it. > >> In fact, for numpy (as scipy does not yet have the same policy) the > >> desired > >> documentation changes should be added to: > >> http://projects.scipy.org/numpy/wiki/ApiDeprecation > > > >> I have never seen any guidelines or rules to add Deprecation Warnings > >> into the docstrings. It would be good to define a standard for the > >> docstrings first. > > > > It should be made as visible as possible in my opinion. A reST warning in > > between summary and extended summary would work. It should clearly state > in > > which version it will be removed. Best to keep the text identical to the > one > > passed to the deprecate decorator. A reason or alternative should be > given > > as well. > > I would prefer to see it at the very top. > If there is an easily available alternative why would I as a user not > what to immediately view that alternative? > If I am already using it then it is a good remider. Why put it after > the summary? > > Vincent > > > .. warning:: > > `myfunc` is deprecated and will be removed in SciPy 0.9. Look at > > `thatfunc` for equivalent functionality. > > > > Cheers, > > Ralf > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jun 3 13:53:08 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jun 2010 13:53:08 -0400 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: <4C07E304.8040503@gmail.com> References: <4C07A52D.30503@enthought.com> <4C07E304.8040503@gmail.com> Message-ID: On Thu, Jun 3, 2010 at 1:14 PM, Bruce Southey wrote: > On 06/03/2010 10:32 AM, Nathaniel Smith wrote: > > On Thu, Jun 3, 2010 at 6:38 AM, wrote: > > > On Thu, Jun 3, 2010 at 8:50 AM, Warren Weckesser > wrote: > > > stats.glm looks like it was started and then abandoned without being > finished. ?It was last touched in November 2007. ?Should this function > be deprecated so it can eventually be removed? > > > My thoughts when I looked at it was roughly: > leave it alone since it's working, but don't "advertise" it because we > should get a better replacement. > similar to linregress the more general version will be available when > scipy.stats gets the full OLS model. > > > Wait, what does 'glm' have to do with OLS (or t-tests) anyway? Surely > if anything it *should* be a function that fits, you know, GLMs > (generalized linear models)? > > I guess this is a vote for removing it, because GLMs are one of the > fundamental stats models that people will look for, and having some > weird, broken, other thing in the obvious place is just confusing and > looks really bad. > > -- Nathaniel > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > Perhaps people should actually read the code before jumping to incorrect > conclusions. It is not similar to linregress unless you know how to 'trick' > linreg. It's similar in the sense that it promises a lot, but is very limited or "crippled", and that the replacement is not just a quick rewrite. > > Granted that stats.glm is a crippled but it is well intended (like most > things in scipy.stats). The docstring intended it to general linear models > such as SAS's glm procedure and R's glm function (without generalized part). > At present is just does 1-way anova with only two levels but could do more. > >>>> drug=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, >>>> 2, 2, 2, 2, 2, 2, 2, 2] >>>> postrt=[6, 0, 2, 8, 11, 4, 13, 1, 8, 0, 0, 2, 3, 1, 18, 4, 14, 9, 1, 9, >>>> 13, 10, 18, 5, 23, 12, 5, 16, 1, 20] >>>> t_val,t_probs=stats.glm(postrt,drug) >>>> t_val > -1.5463854661015379 >>>> t_probs > 0.13324062984741347 >>>> idrug=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, >>>> 0, 0, 0, 0, 0, 0, 0, 0] #create dummies to trick linreg >>>> print stats.linregress(idrug, postrt) > (-3.9000000000000044, 9.2000000000000011, -0.280506586484015, > 0.13324062984741378, 2.5220102526131258) >>>> -3.9000000000000044/2.5220102526131258 #this is the t-value of stats.glm > -1.5463854661015373 > > > I have major concerns about depreciating code when there is no alternative > proposed for such an important statistical function. As David has said > elsewhere, this is just Python code and has little or no maintenance cost. > The full solution is probably Jonathan Taylor's glm class but that uses the > formula class and is for generalized linear models. However, I don't see > that in scipy anywhere soon. Currently the alternative is using ttest_ind, which produces the same result. The cost of glm is the confusion that it creates if there is such a big mismatch between name and result, which is exactly the response Nathaniel and I had. And Warren was proposing to deprecate it not to delete it right away. > > So the options are: > > 1) Rewrite the internals to fix address the current limitation - not hard > but would need an API change and more importantly better options exist. > 2) OLS is a superior version to linregress but needs changes to get ANOVA > etc added > http://www.scipy.org/Cookbook/OLS > 3) The best candidate that I know that can replace both stats.linregress and > stats.glm is Skipper's try_ols_anova.py code from pystatsmodel (at least > posted on the list).? But I am not sure what the current state of that is. > 4) Some other option? Yes, move the OLS model and associated code from statsmodels to scipy.stats (maybe we can discuss this after Skipper's gsoc), or use statsmodels as addition to scipy.stats. http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head%3A/scikits/statsmodels/sandbox/regression/try_ols_anova.py was just my initial experimental script, and I think we might still need a few versions (with Skipper's data and dummy handling and maybe Jonathan's formula framework) before we come to a final design. I don't think any duplication of effort to expand on stats.linregress or stats.glm is productive. Josef > > > Bruce > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From d.l.goldsmith at gmail.com Thu Jun 3 14:18:05 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 3 Jun 2010 11:18:05 -0700 Subject: [SciPy-Dev] Warning of deprecation in doc's ? In-Reply-To: References: <4C07B72E.2050504@gmail.com> Message-ID: http://projects.scipy.org/numpy/ticket/1501 Filed Description: "Presently, the docstring standard does not specify how to note that an object is to be deprecated; it has been proposed that this needs to be rectified. "Obviously, this should be an optional section in general, but required for objects once it is decided that they are to be deprecated. "Discussion on scipy-dev agreed that this section should be at or near the top, but at the top or between the One-line and Extended Summaries have both been proposed - we will try to reach a consensus [in the ticket comments]. "Proposed format is to utilize Sphinx' .. deprecated:: directive; someone please provide a concrete example of what this looks like (for example, does this directive support multi-line content, and if so, what does that look like). "Proposed content: summaries of deprecation schedule (in version number time, not real time) and justification for deprecation (e.g., being replaced, duplicates extant functionality elsewhere); existing alternatives to obtain the same functionality. (Feel strongly that it should contain something else? Add it below as a comment.) "IMO, we should try to decide on this and update the standard by June 15 at the latest. "Have I forgotten anything" DG On Thu, Jun 3, 2010 at 10:49 AM, David Goldsmith wrote: > OK, we're getting enough proposed content here that I think a formal > modification of the docstring Standard is warranted; accordingly, I'm going > to file a ticket. I'll post the link here and then if you want to be on the > notification-of-ticket-changes list you can go there and add yourself. That > way, this discussion of where this entry should live, what it should > contain, how it should be formatted, etc., etc., will be in a more > appropriate, easier to find place. Back shortly. > > DG > > > On Thu, Jun 3, 2010 at 9:58 AM, Vincent Davis wrote: > >> On Thu, Jun 3, 2010 at 8:43 AM, Ralf Gommers >> wrote: >> > >> > >> > On Thu, Jun 3, 2010 at 10:15 PM, wrote: >> >> >> >> On Thu, Jun 3, 2010 at 10:07 AM, Bruce Southey >> wrote: >> >> > On 06/02/2010 10:28 PM, Benjamin Root wrote: >> >> >> >> > >> >> > Just a thought... is it feasible for the doc building system to scan >> >> > through >> >> > the function code and spot a deprecation warning and thereby be able >> to >> >> > add >> >> > a list of deprecation warnings to the docstring? Obviously, such >> >> > warnings >> >> > would have to follow some standard format, but it would be neat if >> such >> >> > things could be automated. >> > >> > There's enough docstring manipulation going on already I think, this is >> not >> > that much work so manual would be better. It should be put in at the >> moment >> > the deprecation takes place. >> > >> >> >> >> In the future, someone will have to come up with a rule to force >> >> documentation change when a depreciation event occurs and then enforce >> it. >> >> In fact, for numpy (as scipy does not yet have the same policy) the >> >> desired >> >> documentation changes should be added to: >> >> http://projects.scipy.org/numpy/wiki/ApiDeprecation >> > >> >> I have never seen any guidelines or rules to add Deprecation Warnings >> >> into the docstrings. It would be good to define a standard for the >> >> docstrings first. >> > >> > It should be made as visible as possible in my opinion. A reST warning >> in >> > between summary and extended summary would work. It should clearly state >> in >> > which version it will be removed. Best to keep the text identical to the >> one >> > passed to the deprecate decorator. A reason or alternative should be >> given >> > as well. >> >> I would prefer to see it at the very top. >> If there is an easily available alternative why would I as a user not >> what to immediately view that alternative? >> If I am already using it then it is a good remider. Why put it after >> the summary? >> >> Vincent >> >> > .. warning:: >> > `myfunc` is deprecated and will be removed in SciPy 0.9. Look at >> > `thatfunc` for equivalent functionality. >> > >> > Cheers, >> > Ralf >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> > >> > >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide. (As interpreted > by Robert Graves) > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Thu Jun 3 15:15:10 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 03 Jun 2010 14:15:10 -0500 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: References: <4C07A52D.30503@enthought.com> <4C07E304.8040503@gmail.com> Message-ID: <4C07FF3E.2020505@gmail.com> On 06/03/2010 12:53 PM, josef.pktd at gmail.com wrote: > On Thu, Jun 3, 2010 at 1:14 PM, Bruce Southey wrote: > >> On 06/03/2010 10:32 AM, Nathaniel Smith wrote: >> >> On Thu, Jun 3, 2010 at 6:38 AM, wrote: >> >> >> On Thu, Jun 3, 2010 at 8:50 AM, Warren Weckesser >> wrote: >> >> >> stats.glm looks like it was started and then abandoned without being >> finished. It was last touched in November 2007. Should this function >> be deprecated so it can eventually be removed? >> >> >> My thoughts when I looked at it was roughly: >> leave it alone since it's working, but don't "advertise" it because we >> should get a better replacement. >> similar to linregress the more general version will be available when >> scipy.stats gets the full OLS model. >> >> >> Wait, what does 'glm' have to do with OLS (or t-tests) anyway? Surely >> if anything it *should* be a function that fits, you know, GLMs >> (generalized linear models)? >> >> I guess this is a vote for removing it, because GLMs are one of the >> fundamental stats models that people will look for, and having some >> weird, broken, other thing in the obvious place is just confusing and >> looks really bad. >> >> -- Nathaniel >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> Perhaps people should actually read the code before jumping to incorrect >> conclusions. It is not similar to linregress unless you know how to 'trick' >> linreg. >> > It's similar in the sense that it promises a lot, but is very limited > or "crippled", and that the replacement is not just a quick rewrite. > > >> Granted that stats.glm is a crippled but it is well intended (like most >> things in scipy.stats). The docstring intended it to general linear models >> such as SAS's glm procedure and R's glm function (without generalized part). >> At present is just does 1-way anova with only two levels but could do more. >> >> >>>>> drug=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, >>>>> 2, 2, 2, 2, 2, 2, 2, 2] >>>>> postrt=[6, 0, 2, 8, 11, 4, 13, 1, 8, 0, 0, 2, 3, 1, 18, 4, 14, 9, 1, 9, >>>>> 13, 10, 18, 5, 23, 12, 5, 16, 1, 20] >>>>> t_val,t_probs=stats.glm(postrt,drug) >>>>> t_val >>>>> >> -1.5463854661015379 >> >>>>> t_probs >>>>> >> 0.13324062984741347 >> >>>>> idrug=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, >>>>> 0, 0, 0, 0, 0, 0, 0, 0] #create dummies to trick linreg >>>>> print stats.linregress(idrug, postrt) >>>>> >> (-3.9000000000000044, 9.2000000000000011, -0.280506586484015, >> 0.13324062984741378, 2.5220102526131258) >> >>>>> -3.9000000000000044/2.5220102526131258 #this is the t-value of stats.glm >>>>> >> -1.5463854661015373 >> >> >> I have major concerns about depreciating code when there is no alternative >> proposed for such an important statistical function. As David has said >> elsewhere, this is just Python code and has little or no maintenance cost. >> The full solution is probably Jonathan Taylor's glm class but that uses the >> formula class and is for generalized linear models. However, I don't see >> that in scipy anywhere soon. >> > Currently the alternative is using ttest_ind, which produces the same result. > Not exactly since you have to reformat the input. Also you can do ttest_ind with linregress... > The cost of glm is the confusion that it creates if there is such a > big mismatch between name and result, which is exactly the response > Nathaniel and I had. > Generalized linear models is 'new' (so 1972) but general linear models is older (I think back to the 1950's when it was shown the relationship between ANOVA and regression). Yet both got back to the 1800's. But sure anyone is going to get confused if they come from the S/R world and don't check to see if the function at least has distribution and link arguments/options. > And Warren was proposing to deprecate it not to delete it right away. > > >> So the options are: >> >> 1) Rewrite the internals to fix address the current limitation - not hard >> but would need an API change and more importantly better options exist. >> 2) OLS is a superior version to linregress but needs changes to get ANOVA >> etc added >> http://www.scipy.org/Cookbook/OLS >> 3) The best candidate that I know that can replace both stats.linregress and >> stats.glm is Skipper's try_ols_anova.py code from pystatsmodel (at least >> posted on the list). But I am not sure what the current state of that is. >> 4) Some other option? >> > Yes, move the OLS model and associated code from statsmodels to > scipy.stats (maybe we can discuss this after Skipper's gsoc), or use > statsmodels as addition to scipy.stats. > > http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head%3A/scikits/statsmodels/sandbox/regression/try_ols_anova.py > was just my initial experimental script, Sorry - I just recalled his script but not the history. > and I think we might still > need a few versions (with Skipper's data and dummy handling and maybe > Jonathan's formula framework) before we come to a final design. > > I don't think any duplication of effort to expand on stats.linregress > or stats.glm is productive. > > Josef > > I totally agree as adding that at the same time justifies depreciation of both functions. Bruce >> >> Bruce >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From bsouthey at gmail.com Thu Jun 3 15:55:07 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 03 Jun 2010 14:55:07 -0500 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: References: <4C07A52D.30503@enthought.com> Message-ID: <4C08089B.8040601@gmail.com> On 06/03/2010 11:16 AM, Nathaniel Smith wrote: > On Thu, Jun 3, 2010 at 8:53 AM, wrote: > >> GLM as in general linear model not generalized. (It's the worst >> conflicting acronym in stats). >> > Sure, and lets not even talk about generalized least squares > (unrelated to both!). > > But the general linear model is basically identical to a simple linear > model, both in interface and implementation. Depends what you mean by 'simple'. Stealing from the SAS manual, these are some of the models fitted by the GLM procedure which I would not call simple: simple regression multiple regression analysis of variance (ANOVA), especially for unbalanced data analysis of covariance response surface models weighted regression polynomial regression partial correlation multivariate analysis of variance (MANOVA) repeated measures analysis of variance These include interactions... > There's no reason to have > a separate function for it, one should just accept a matrix for the > "y" variable in the OLS code. But *generalized* linear models are > different in interface, implementation, and are almost as much of a > stats workhorse as standard linear models. So every book I've ever > seen uses the abbreviation "glm" to refer to the generalized version. > (Also, this is what R calls the function ;-).) > Yeah, it is interesting that you forget older statistical packages (SAS, SPSS, don't remember what Genstat did ) and the first GLIM (the first? generalized linear model package). > The implementation of dummy coding is kind of useful, but this is the > wrong place and the wrong name... > Why? That is exactly what is needed and what stats.glm does. > (Also, its least squares implementation calls inv -- the textbook > example of bad numerics!) > Actually it should call pinv() here but you going to have to prove that this is 'bad numerics'! Especially given how the numpy computes it and that design matrices tend to have poor numerics to start with (especially if you do anova and use condition number to assess numerics). [I strong dislike people complaining of the apparent bad numerics just because they see the word inverse.] > ...Okay, you know all that anyway, the question is what to do with it. > If the problem were just that it needed a better implementation and > some new features added, then maybe we would keep it and let it be > improved incrementally. But the interface is just wrong, so we'll be > removing it sooner or later, and it might as well be sooner, rather > than prolong the agony. > > -- Nathaniel > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > The simple reason is that there is no alternative for users to use yet such as pystatsmodels. Bruce From ljosa at broad.mit.edu Thu Jun 3 16:15:30 2010 From: ljosa at broad.mit.edu (Vebjorn Ljosa) Date: Thu, 3 Jun 2010 16:15:30 -0400 Subject: [SciPy-Dev] License for parts of CellProfiler changed to BSD to allow incorporation into SciPy Message-ID: We have changed the license of some parts of CellProfiler from GNU GPL to BSD. It has previously been proposed [1] that some of the image processing code in CellProfiler be merged into SciPy, and the license change makes this possible. The CellProfiler SVN repository is at https://svn.broadinstitute.org/CellProfiler/trunk/CellProfiler/. The file LICENSE [2] contains a list of BSD-licensed subdirectories as well as other license details. The rest of CellProfiler continues to be licensed under the GNU GPL. The BSD-licensed subdirectories are: * CellProfiler/cpmath [3]: image processing algorithms * CellProfiler/utilities [4]: contains a Java bridge, making it possible to call Java functions from Python * bioformats [5]: wrapper that uses the Java bridge to have Bioformats [6] read or write an image file Good luck with the upcoming scikits.image sprint. I don't think anyone from the CellProfiler team will be able to take part in the sprint this time, but don't hesitate to ask on the cellprofiler-dev at broadinstitute.org mailing list. Thanks, Vebjorn [1] http://stefanv.github.com/scikits.image/contribute.html [2] https://svn.broadinstitute.org/CellProfiler/trunk/CellProfiler/LICENSE [3] https://svn.broadinstitute.org/CellProfiler/trunk/CellProfiler/cellprofiler/cpmath/ [4] https://svn.broadinstitute.org/CellProfiler/trunk/CellProfiler/cellprofiler/utilities/ [5] https://svn.broadinstitute.org/CellProfiler/trunk/CellProfiler/bioformats/ [6] http://www.loci.wisc.edu/software/bio-formats -- Vebjorn Ljosa, PhD Computational Biologist Broad Institute of MIT and Harvard From stefan at sun.ac.za Thu Jun 3 16:24:04 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 3 Jun 2010 13:24:04 -0700 Subject: [SciPy-Dev] License for parts of CellProfiler changed to BSD to allow incorporation into SciPy In-Reply-To: References: Message-ID: Vebjorn, 2010/6/3 Vebjorn Ljosa : > We have changed the license of some parts of CellProfiler from GNU GPL > to BSD. ?It has previously been proposed [1] that some of the image > processing code in CellProfiler be merged into SciPy, and the license > change makes this possible. Thanks a lot for your effort, and for this highly anticipated outcome! At SciPy2010, the scikits.image team will make a concerted effort to include many of these algorithms into our code-base. Kind regards St?fan From njs at pobox.com Thu Jun 3 17:20:40 2010 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 3 Jun 2010 14:20:40 -0700 Subject: [SciPy-Dev] Deprecate stats.glm? In-Reply-To: <4C08089B.8040601@gmail.com> References: <4C07A52D.30503@enthought.com> <4C08089B.8040601@gmail.com> Message-ID: On Thu, Jun 3, 2010 at 12:55 PM, Bruce Southey wrote: > On 06/03/2010 11:16 AM, Nathaniel Smith wrote: >> But the general linear model is basically identical to a simple linear >> model, both in interface and implementation. > Depends what you mean by 'simple'. Stealing from the SAS manual, these > are some of the models fitted by the GLM procedure which I would not > call simple: > simple regression > multiple regression > analysis of variance (ANOVA), especially for unbalanced data > analysis of covariance > multivariate analysis of variance (MANOVA) > weighted regression > polynomial regression Well, I didn't mean to start an argument; certainly 'simple' is underdefined, and there's a lot of conceptual richness to the linear model framework. (And perhaps some obfuscation from the historical tendency to use different names for mathematically equivalent ideas when used in different contexts.) But at the implementation level, everything in the above list is (1) solved in about 2 lines of code, (2) they're the same 2 lines of code for all of them. Making a friendly interface is more complicated than that, of course, but that's just more reason to do it once for all of them, instead of piece-meal. > response surface models > partial correlation These I'm not sure about off-hand. > repeated measures analysis of variance And this is a very complicated area; I know of at least 3 totally different approaches (traditional repeated measures ANOVA with or without sphericity corrections, MANOVAs on contrasts, and multi-level mixed-effect modelling), and none is very simple. I assume SAS has picked one to implement (probably the first?). But these issues are totally orthogonal to whether a "linear model" is "general" (in fact, I don't know how to apply *any* of these techniques to *general* linear models, i.e., multivariate ones, and would very much appreciate references if you have them!). So I don't see how this argues for treating "general linear models" separately from "linear models". > These include interactions... >> ? There's no reason to have >> a separate function for it, one should just accept a matrix for the >> "y" variable in the OLS code. But *generalized* linear models are >> different in interface, implementation, and are almost as much of a >> stats workhorse as standard linear models. So every book I've ever >> seen uses the abbreviation "glm" to refer to the generalized version. >> (Also, this is what R calls the function ;-).) >> > Yeah, it is interesting that you forget older statistical packages (SAS, > SPSS, don't remember what Genstat did ) and the first GLIM (the first? > generalized linear model package). I didn't forget them; I've just never used them. Can I also mention that I'm finding your tone quite combative and off-putting? If I've offended you somehow then I apologize, and would appreciate hearing why. If those packages have useful ideas, then I'm interested in hearing them, but just hearing the list of names unfortunately doesn't give me much to go on. >> The implementation of dummy coding is kind of useful, but this is the >> wrong place and the wrong name... >> > Why? > That is exactly what is needed and what stats.glm does. I'm sorry, I don't know how to explain better. My statement is that dummy coding is (1) useful, (2) neither called "glm" in any context, nor in any way specific to the general linear model. Do you disagree with any of this...? >> (Also, its least squares implementation calls inv -- the textbook >> example of bad numerics!) >> > Actually it should call pinv() here but you going to have to prove that > this is 'bad numerics'! Especially given how the numpy computes it and > that design matrices tend to have poor numerics to start with > (especially if you do anova and use condition number to assess > numerics). [I strong dislike people complaining of the apparent bad > numerics just because they see the word inverse.] Not sure I follow here either. If design matrices have poor numerics to start with, then that's exactly the case where forming the inverse is *bad*! If not, then it doesn't make much difference either way, but since it's no more effort to write code that is both faster and more robust, doing otherwise is just irresponsible in a widely-used library. But in any case, this was a side point. >> ...Okay, you know all that anyway, the question is what to do with it. >> If the problem were just that it needed a better implementation and >> some new features added, then maybe we would keep it and let it be >> improved incrementally. But the interface is just wrong, so we'll be >> removing it sooner or later, and it might as well be sooner, rather >> than prolong the agony. > > The simple reason is that there is no alternative for users to use yet > such as pystatsmodels. Well, and this is a philosophical difference, I guess. Personally, as a user, if given the choice between a stats library that was missing many things, but everything there was well-engineered, reliable, documented, etc., versus one that technically had more code but half the things I started to use turned out to be broken, or do something similar-but-different from what I expected, or just weren't documented, then I would choose the first library, no question. And I'd be more likely to contribute to make it more complete, too. It's just easier to work in an area that's not cluttered with broken machinery. "Add missing stuff following existing style" is a much easier goal to work on than "pick your way through the rubble to find usable pieces and cobble stuff out of them". But that's just me; I can see your perspective too, and don't know the SciPy community's preference. -- Nathaniel From ilanschnell at gmail.com Thu Jun 3 19:28:05 2010 From: ilanschnell at gmail.com (Ilan Schnell) Date: Thu, 3 Jun 2010 18:28:05 -0500 Subject: [SciPy-Dev] import error in scipy.stats on RH3 32-bit In-Reply-To: References: Message-ID: I've just found the problem, and it had nothing to do with checkins that were being made to the 0.8.x branch. After spending may hours on this problem, I'm now very happy that I found the problem. It tured out that the machine I use to build scipy on RH3 32-bit had it's clock set in the past. So the new brach source tarball I made (which is being used on all the build machines), had timestamps which appeared to be in the future on that machine. I'm not sure why exactly timestamps in the future can cause problems when building scipy, but after setting the clock on the machine things work fine now. So it was only by accident this happend on RH3 32-bit, it might have been any other system as well. - Ilan On Wed, Jun 2, 2010 at 10:29 PM, Ilan Schnell wrote: > Not yet. ?I'll look more into it tomorrow. ?:-) > > - Ilan > > On Wed, Jun 2, 2010 at 10:19 PM, Charles R Harris > wrote: >> >> >> On Wed, Jun 2, 2010 at 9:03 PM, Ilan Schnell wrote: >>> >>> Hello Chuck, >>> yes 6446 works. ?Actually, as the error indicates, the unresolved >>> symbol in is linalg/clapack.so, it just happened that during my >>> testing the stats package was imported first, so I initially thought >>> the error was there. >>> However, something has changed between 6446 and 6476, as >>> I wasn't seeing this error before. ?Looking at the revision log of >>> the 0.8.x branch, but I cannot see any obvious. ?And I'm also >>> puzzled why this only happens on one particular platform. >>> To make sure the build environment hasn't changed, I rebuild 6446 >>> on the same system, and it still works. >>> >> >> I hate to ask this of anyone, but... could you determine which revision >> caused the problem? >> >> Sadistical Chuck >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > From d.l.goldsmith at gmail.com Thu Jun 3 22:12:11 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 3 Jun 2010 19:12:11 -0700 Subject: [SciPy-Dev] Does Matlab need to be MATLAB(TM)... Message-ID: ...everywhere it occurs? DG -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Thu Jun 3 23:52:10 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 3 Jun 2010 23:52:10 -0400 Subject: [SciPy-Dev] Building docs in scipy? Message-ID: Should numpy/doc/sphinxext be distributed with scipy/doc/ or is this user error? I couldn't get the scipy docs to build until I copied it over. Skipper -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Fri Jun 4 01:04:21 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 3 Jun 2010 22:04:21 -0700 Subject: [SciPy-Dev] Marathon Skypecon tomorrow? Message-ID: Email me your Skype ID if you want to participate tomorrow, noon EDT. If no one emails me, I'll post a cancellation notice around 11:50 am EDT. DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From klrkdekira at gmail.com Fri Jun 4 01:40:41 2010 From: klrkdekira at gmail.com (CL Chow) Date: Fri, 4 Jun 2010 13:40:41 +0800 Subject: [SciPy-Dev] Marathon Skypecon tomorrow? In-Reply-To: References: Message-ID: My Skype ID is klrk_c You can ignore mine if no one else emails you, because I'll only be there as audience. Regards, CL Chow "Please do not send me Microsoft Office/Apple iWork documents. Send OpenDocument instead! http://fsf.org/campaigns/opendocument/" On Fri, Jun 4, 2010 at 1:04 PM, David Goldsmith wrote: > Email me your Skype ID if you want to participate tomorrow, noon EDT. If > no one emails me, I'll post a cancellation notice around 11:50 am EDT. > > DG > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Jun 4 02:01:18 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 3 Jun 2010 23:01:18 -0700 Subject: [SciPy-Dev] Does Matlab need to be MATLAB(TM)... In-Reply-To: References: Message-ID: Hi, On Thu, Jun 3, 2010 at 7:12 PM, David Goldsmith wrote: > ...everywhere it occurs? I seem to remember I did put TMs all over the place in the matlab reader code. I did a brief scan of: http://en.wikipedia.org/wiki/Trademark http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style_%28trademarks%29 I see that the wikipedia style guide says (above) not to use TM etc. I guess the principle is that, when we use the term 'matlab' it should always be clear we are referring to the product made by Mathworks. I don't think adding TM will have much impact on that and it looks a bit goofy. My vote would be to remove all the TMs, and maybe add a couple of footnotes in sensible places with 'matlab is a trademark of Mathworks'. Best, Matthew From david.kirkby at onetel.net Fri Jun 4 03:03:33 2010 From: david.kirkby at onetel.net (Dr. David Kirkby) Date: Fri, 04 Jun 2010 08:03:33 +0100 Subject: [SciPy-Dev] Does Matlab need to be MATLAB(TM)... In-Reply-To: References: Message-ID: <4C08A545.1040604@onetel.net> On 06/ 4/10 07:01 AM, Matthew Brett wrote: > Hi, > > On Thu, Jun 3, 2010 at 7:12 PM, David Goldsmith wrote: >> ...everywhere it occurs? > > I seem to remember I did put TMs all over the place in the matlab reader code. > IMHO it should be called MATLAB and not Matlab since that is what Mathworks call it. As for the TM, I tend to agree, it is pretty irrelevant, though I am not a lawyer. It might be better from a legal point to leave them there. Just my 2p Dave From matthew.brett at gmail.com Fri Jun 4 03:07:52 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 4 Jun 2010 00:07:52 -0700 Subject: [SciPy-Dev] Does Matlab need to be MATLAB(TM)... In-Reply-To: <4C08A545.1040604@onetel.net> References: <4C08A545.1040604@onetel.net> Message-ID: Hi, > IMHO it should be called MATLAB and not Matlab since that is what Mathworks call > it. >From the wikipedia style guide, I don't think we're obliged to capitalize the way the Mathworks would like, and we can choose whatever reads better. > As for the TM, I tend to agree, it is pretty irrelevant, though I am not a > lawyer. It might be better from a legal point to leave them there. I am not a lawyer either, but it looks as though the key principle is fair use. Fair use means - in our case - that when we say 'matlab' - we mean the Matlab program written by the Mathworks. If that's obvious from the context, I don't think we need the TM, and if it isn't, I don't think the TM helps much (whose TM?). I might be wrong though, Cheers, Matthew From d.l.goldsmith at gmail.com Fri Jun 4 03:18:52 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 4 Jun 2010 00:18:52 -0700 Subject: [SciPy-Dev] Does Matlab need to be MATLAB(TM)... In-Reply-To: References: <4C08A545.1040604@onetel.net> Message-ID: Thanks, guys. Not that your opinions aren't valuable, but it is a matter of legality that I'm concerned about, and we do have people on-list who seem to make it their business to worry about these things, so hopefully one of them will chime in as well. DG On Fri, Jun 4, 2010 at 12:07 AM, Matthew Brett wrote: > Hi, > > > IMHO it should be called MATLAB and not Matlab since that is what > Mathworks call > > it. > > >From the wikipedia style guide, I don't think we're obliged to > capitalize the way the Mathworks would like, and we can choose > whatever reads better. > > > As for the TM, I tend to agree, it is pretty irrelevant, though I am not > a > > lawyer. It might be better from a legal point to leave them there. > > I am not a lawyer either, but it looks as though the key principle is > fair use. Fair use means - in our case - that when we say 'matlab' - > we mean the Matlab program written by the Mathworks. If that's > obvious from the context, I don't think we need the TM, and if it > isn't, I don't think the TM helps much (whose TM?). I might be wrong > though, > > Cheers, > > Matthew > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Jun 4 03:56:14 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 4 Jun 2010 07:56:14 +0000 (UTC) Subject: [SciPy-Dev] Building docs in scipy? References: Message-ID: Thu, 03 Jun 2010 23:52:10 -0400, Skipper Seabold wrote: > Should numpy/doc/sphinxext be distributed with scipy/doc/ or is this > user error? I couldn't get the scipy docs to build until I copied it > over. It's pulled in by svn:externals. With git, ymmv. It probably should be included in the distribution tarballs, nevertheless. -- Pauli Virtanen From david.kirkby at onetel.net Fri Jun 4 04:41:42 2010 From: david.kirkby at onetel.net (Dr. David Kirkby) Date: Fri, 04 Jun 2010 09:41:42 +0100 Subject: [SciPy-Dev] Does Matlab need to be MATLAB(TM)... In-Reply-To: References: <4C08A545.1040604@onetel.net> Message-ID: <4C08BC46.4030006@onetel.net> On 06/ 4/10 08:07 AM, Matthew Brett wrote: > Hi, > >> IMHO it should be called MATLAB and not Matlab since that is what Mathworks call >> it. > >> From the wikipedia style guide, I don't think we're obliged to > capitalize the way the Mathworks would like, and we can choose > whatever reads better. If someone came along and changed SciPy to scipy, would you feel it appropriate to change it back? I suspect "yes" is the answer. As such, why not respect Mathworks and write MATLAB the prefer to write it? BTW, I don't even use MATLAB, so I'm not a MATLAB employee or similar! Dave From matthew.brett at gmail.com Fri Jun 4 06:30:36 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 4 Jun 2010 11:30:36 +0100 Subject: [SciPy-Dev] Does Matlab need to be MATLAB(TM)... In-Reply-To: References: <4C08A545.1040604@onetel.net> Message-ID: Hi, On Fri, Jun 4, 2010 at 8:18 AM, David Goldsmith wrote: > Thanks, guys.? Not that your opinions aren't valuable, but it is a matter of > legality that I'm concerned about, and we do have people on-list who seem to > make it their business to worry about these things, so hopefully one of them > will chime in as well. ;) - ah yes - it is an art that can take an age to learn, to distinguish signal from noise ! Matthew From pav at iki.fi Fri Jun 4 06:53:23 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 4 Jun 2010 10:53:23 +0000 (UTC) Subject: [SciPy-Dev] Does Matlab need to be MATLAB(TM)... References: <4C08A545.1040604@onetel.net> <4C08BC46.4030006@onetel.net> Message-ID: Fri, 04 Jun 2010 09:41:42 +0100, Dr. David Kirkby wrote: > On 06/ 4/10 08:07 AM, Matthew Brett wrote: >> Hi, >> >>> IMHO it should be called MATLAB and not Matlab since that is what >>> Mathworks call it. >> >>> From the wikipedia style guide, I don't think we're obliged to >> capitalize the way the Mathworks would like, and we can choose whatever >> reads better. > > If someone came along and changed SciPy to scipy, would you feel it > appropriate to change it back? I suspect "yes" is the answer. As such, > why not respect Mathworks and write MATLAB the prefer to write it? If Mathworks decided to refer to SciPy as SCIPY or Scipy or scipy, I would hardly be inclined to correct them, much less raise a lawsuit. The point is that as long as with "Matlab" we are referring to the "MATLAB" produced by Mathworks, not a hypotethical product of our own called "Matlab" or "scipy.matlab", this cannot be a trademark infringement. IANAL, of course, but it seems clear that the exact spelling is hardly an issue of concern. IMHO English proper name capitalization trumps the "official" spelling, but this is not very important. -- Pauli Virtanen From josef.pktd at gmail.com Fri Jun 4 07:21:50 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 4 Jun 2010 07:21:50 -0400 Subject: [SciPy-Dev] Does Matlab need to be MATLAB(TM)... In-Reply-To: References: <4C08A545.1040604@onetel.net> <4C08BC46.4030006@onetel.net> Message-ID: On Fri, Jun 4, 2010 at 6:53 AM, Pauli Virtanen wrote: > Fri, 04 Jun 2010 09:41:42 +0100, Dr. David Kirkby wrote: > >> On 06/ 4/10 08:07 AM, Matthew Brett wrote: >>> Hi, >>> >>>> IMHO it should be called MATLAB and not Matlab since that is what >>>> Mathworks call it. >>> >>>> From the wikipedia style guide, I don't think we're obliged to >>> capitalize the way the Mathworks would like, and we can choose whatever >>> reads better. >> >> If someone came along and changed SciPy to scipy, would you feel it >> appropriate to change it back? I suspect "yes" is the answer. As such, >> why not respect Mathworks and write MATLAB the prefer to write it? > > If Mathworks decided to refer to SciPy as SCIPY or Scipy or scipy, I > would hardly be inclined to correct them, much less raise a lawsuit. > > The point is that as long as with "Matlab" we are referring to the > "MATLAB" produced by Mathworks, not a hypotethical product of our own > called "Matlab" or "scipy.matlab", this cannot be a trademark > infringement. IANAL, of course, but it seems clear that the exact > spelling is hardly an issue of concern. > > IMHO English proper name capitalization trumps the "official" spelling, > but this is not very important. Additionally, the reference is often to matlab as a programming language, such as programs written in matlab, where the author is not Mathworks. We don't add a trademark sign to C# or Java, or Gauss or .. either. Josef > -- > Pauli Virtanen > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From ralf.gommers at googlemail.com Fri Jun 4 07:30:57 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 4 Jun 2010 19:30:57 +0800 Subject: [SciPy-Dev] Building docs in scipy? In-Reply-To: References: Message-ID: On Fri, Jun 4, 2010 at 3:56 PM, Pauli Virtanen wrote: > Thu, 03 Jun 2010 23:52:10 -0400, Skipper Seabold wrote: > > Should numpy/doc/sphinxext be distributed with scipy/doc/ or is this > > user error? I couldn't get the scipy docs to build until I copied it > > over. > > It's pulled in by svn:externals. With git, ymmv. It probably should be > included in the distribution tarballs, nevertheless. > > Matthew helpfully pointed out some options (thanks!) to do this in git: http://news.gmane.org/gmane.comp.python.scientific.devel However, none of the options he gave are automatic, so people will keep running into this. I think building docs should work out of the box, so I see 2 options: 1. we copy sphinxext in scipy. it's not like it changes often, so keeping things in sync manually is doable. 2. sphinxext becomes a numpy module that can be imported and used from scipy. Including it in the tarballs while it's not in the repo is not a good idea imho - tarballs should be exactly the same as the svn tag. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Fri Jun 4 08:26:31 2010 From: aisaac at american.edu (Alan G Isaac) Date: Fri, 04 Jun 2010 08:26:31 -0400 Subject: [SciPy-Dev] Does Matlab need to be MATLAB(TM)... In-Reply-To: <4C08A545.1040604@onetel.net> References: <4C08A545.1040604@onetel.net> Message-ID: <4C08F0F7.3070204@american.edu> On 6/4/2010 3:03 AM, Dr. David Kirkby wrote: > As for the TM, I tend to agree, it is pretty irrelevant, though I am not a > lawyer. It might be better from a legal point to leave them there. I think the usual rule is that the first use should show the trademark. I cannot find that explicitly as law, but see e.g. http://www.filemaker.com/company/legal/trademark_guidelines.html The key role of the TM symbol is to avoid confusion about branding, so in the case of MATLAB, this actually should be more than adequate, since it is a clear case of nominative use: http://en.wikipedia.org/wiki/Nominative_use fwiw, Alan Isaac From pav at iki.fi Fri Jun 4 08:47:02 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 4 Jun 2010 12:47:02 +0000 (UTC) Subject: [SciPy-Dev] Building docs in scipy? References: Message-ID: Fri, 04 Jun 2010 19:30:57 +0800, Ralf Gommers wrote: [clip] >> Matthew helpfully pointed out some options (thanks!) to do this in git: > http://news.gmane.org/gmane.comp.python.scientific.devel > > However, none of the options he gave are automatic, so people will keep > running into this. I think building docs should work out of the box, so > I see 2 options: > > 1. we copy sphinxext in scipy. it's not like it changes often, so > keeping things in sync manually is doable. I'd say this is bad practice, and we should not do this. Better to have a Makefile rule that checks it out from git before building the documents, if it comes to that. > 2. sphinxext becomes a numpy module that can be imported and used from > scipy. > > Including it in the tarballs while it's not in the repo is not a good > idea imho - tarballs should be exactly the same as the svn tag. Matplotlib moved their sphinx stuff into a submodule. I'm a bit leery doing that either, since the Sphinx stuff has not much to do with Numpy itself... -- Pauli Virtanen From vincent at vincentdavis.net Fri Jun 4 08:51:42 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Fri, 4 Jun 2010 06:51:42 -0600 Subject: [SciPy-Dev] Marathon Skypecon tomorrow? In-Reply-To: References: Message-ID: I will not be able to be there. Thanks Vincent On Thu, Jun 3, 2010 at 11:04 PM, David Goldsmith wrote: > Email me your Skype ID if you want to participate tomorrow, noon EDT.? If no > one emails me, I'll post a cancellation notice around 11:50 am EDT. > > DG > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From jsseabold at gmail.com Fri Jun 4 09:05:25 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 4 Jun 2010 09:05:25 -0400 Subject: [SciPy-Dev] Building docs in scipy? In-Reply-To: References: Message-ID: On Fri, Jun 4, 2010 at 7:30 AM, Ralf Gommers wrote: > > > On Fri, Jun 4, 2010 at 3:56 PM, Pauli Virtanen wrote: > >> Thu, 03 Jun 2010 23:52:10 -0400, Skipper Seabold wrote: >> > Should numpy/doc/sphinxext be distributed with scipy/doc/ or is this >> > user error? I couldn't get the scipy docs to build until I copied it >> > over. >> >> Ah, ok. I just got one of Pauli's branches (or whatever they are called in git!) off github to look at the optimization rewrite, so that explains that. I didn't think I had to to this before. > It's pulled in by svn:externals. With git, ymmv. It probably should be >> included in the distribution tarballs, nevertheless. >> >> Matthew helpfully pointed out some options (thanks!) to do this in git: > http://news.gmane.org/gmane.comp.python.scientific.devel > > However, none of the options he gave are automatic, so people will keep > running into this. I think building docs should work out of the box, so I > see 2 options: > 1. we copy sphinxext in scipy. it's not like it changes often, so keeping > things in sync manually is doable. > 2. sphinxext becomes a numpy module that can be imported and used from > scipy. > I would vote for 2 if possible because I also use this stuff and have just been copying it over by hand for now. Thanks, Skipper -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Fri Jun 4 09:06:39 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 4 Jun 2010 15:06:39 +0200 Subject: [SciPy-Dev] Building docs in scipy? In-Reply-To: References: Message-ID: <20100604130639.GD29814@phare.normalesup.org> On Fri, Jun 04, 2010 at 09:05:25AM -0400, Skipper Seabold wrote: > 2. sphinxext becomes a numpy module that can be imported and used from > scipy. > I would vote for 2 if possible because I also use this stuff and have just > been copying it over by hand for now. +1. Ga?l From jdh2358 at gmail.com Fri Jun 4 09:24:51 2010 From: jdh2358 at gmail.com (John Hunter) Date: Fri, 4 Jun 2010 08:24:51 -0500 Subject: [SciPy-Dev] Does Matlab need to be MATLAB(TM)... In-Reply-To: References: <4C08A545.1040604@onetel.net> Message-ID: On Fri, Jun 4, 2010 at 5:30 AM, Matthew Brett wrote: > Hi, > > On Fri, Jun 4, 2010 at 8:18 AM, David Goldsmith wrote: >> Thanks, guys.? Not that your opinions aren't valuable, but it is a matter of >> legality that I'm concerned about, and we do have people on-list who seem to >> make it their business to worry about these things, so hopefully one of them >> will chime in as well. > > ;) - ah yes - it is an art that can take an age to learn, to > distinguish signal from noise ! Yes, young Jedi, we are not seeking your Wikipedia skills here. Please do not bother to research and answer the questions we pose. We are looking for an official IANAL/YMMV judgment. JDH From ralf.gommers at googlemail.com Fri Jun 4 10:11:42 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 4 Jun 2010 22:11:42 +0800 Subject: [SciPy-Dev] signal.ltisys test crashes on windows Message-ID: The offending test, (0, 3, 3) of: class TestSS2TF: def tst_matrix_shapes(self, p, q, r): ss2tf(np.zeros((p, p)), np.zeros((p, q)), np.zeros((r, p)), np.zeros((r, q)), 0) def test_basic(self): for p, q, r in [ (3, 3, 3), (0, 3, 3), (1, 1, 1)]: yield self.tst_matrix_shapes, p, q, r The 0 causes an empty array to be passed to ss2tf, which crashes the interpreter on Windows XP for both 2.5 and 2.6. Is the empty array really what was intended with this test? Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jun 4 10:20:01 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 4 Jun 2010 10:20:01 -0400 Subject: [SciPy-Dev] signal.ltisys test crashes on windows In-Reply-To: References: Message-ID: On Fri, Jun 4, 2010 at 10:11 AM, Ralf Gommers wrote: > The offending test, (0, 3, 3) of: > > class TestSS2TF: > ??? def tst_matrix_shapes(self, p, q, r): > ??????? ss2tf(np.zeros((p, p)), > ????????????? np.zeros((p, q)), > ????????????? np.zeros((r, p)), > ????????????? np.zeros((r, q)), 0) > > ??? def test_basic(self): > ??????? for p, q, r in [ > ??????????? (3, 3, 3), > ??????????? (0, 3, 3), > ??????????? (1, 1, 1)]: > ??????????? yield self.tst_matrix_shapes, p, q, r > > > The 0 causes an empty array to be passed to ss2tf, which crashes the > interpreter on Windows XP for both 2.5 and 2.6. Is the empty array really > what was intended with this test? replace > (0, 3, 3), by > (1, 3, 3), We had a recent threads about this, and the crash is avoided with numpy trunk (raises an exception instead) But I don't think the empty array is an appropriate test. Josef > > Thanks, > Ralf > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From guyer at nist.gov Fri Jun 4 10:21:05 2010 From: guyer at nist.gov (Jonathan Guyer) Date: Fri, 4 Jun 2010 10:21:05 -0400 Subject: [SciPy-Dev] Does Matlab need to be MATLAB(TM)... In-Reply-To: References: Message-ID: On Jun 3, 2010, at 10:12 PM, David Goldsmith wrote: > ...everywhere it occurs? Our rules are obviously not your rules, but the NIST Editorial Review Board explicitly prohibits the use if (TM) and the like in NIST publications (we are discouraged from using trade names at all, unless necessary to specify the "experimental" apparatus (I'm pretty sure {s,S}ci{p,P}y's usage of "{m,M}{atlab,ATLAB}" would be considered acceptable, since {m,M}{atlab,ATLAB} compatibility is the point)). My understanding from when I served on the Board is that "(TM)" carries no legal weight at all (anybody can affix it to a name they "claim") and that although "(R)" does carry legal weight, it was not considered our responsibility to defend other people's trademarks. In fact, we are emphatically required to use "(R)" with NIST registered trademarks (e.g., "Standard Reference Material(R)") and banned from using it with anybody else's trademarks, registered or otherwise. None of this should be construed as any official NIST guidance as to what *you* should do, only my understanding of what *I* am supposed to do. From warren.weckesser at enthought.com Fri Jun 4 10:31:05 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Fri, 04 Jun 2010 09:31:05 -0500 Subject: [SciPy-Dev] signal.ltisys test crashes on windows In-Reply-To: References: Message-ID: <4C090E29.1050506@enthought.com> josef.pktd at gmail.com wrote: > On Fri, Jun 4, 2010 at 10:11 AM, Ralf Gommers > wrote: > >> The offending test, (0, 3, 3) of: >> >> class TestSS2TF: >> def tst_matrix_shapes(self, p, q, r): >> ss2tf(np.zeros((p, p)), >> np.zeros((p, q)), >> np.zeros((r, p)), >> np.zeros((r, q)), 0) >> >> def test_basic(self): >> for p, q, r in [ >> (3, 3, 3), >> (0, 3, 3), >> (1, 1, 1)]: >> yield self.tst_matrix_shapes, p, q, r >> >> >> The 0 causes an empty array to be passed to ss2tf, which crashes the >> interpreter on Windows XP for both 2.5 and 2.6. Is the empty array really >> what was intended with this test? >> > > replace > > >> (0, 3, 3), >> > by > >> (1, 3, 3), >> > > Agreed. 0 is a degenerate case. Perhaps the original author of the test expected ss2tf to handle this case cleanly, but it currently doesn't. Warren > We had a recent threads about this, and the crash is avoided with > numpy trunk (raises an exception instead) > > But I don't think the empty array is an appropriate test. > > Josef > > > >> Thanks, >> Ralf >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From d.l.goldsmith at gmail.com Fri Jun 4 11:11:53 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 4 Jun 2010 08:11:53 -0700 Subject: [SciPy-Dev] Does Matlab need to be MATLAB(TM)... In-Reply-To: References: Message-ID: On Fri, Jun 4, 2010 at 7:21 AM, Jonathan Guyer wrote: > > On Jun 3, 2010, at 10:12 PM, David Goldsmith wrote: > > > ...everywhere it occurs? > > Our rules are obviously not your rules, but the NIST Editorial Review Board > explicitly prohibits the use if (TM) and the like in NIST publications (we > are discouraged from using trade names at all, unless necessary to specify > the "experimental" apparatus (I'm pretty sure {s,S}ci{p,P}y's usage of > "{m,M}{atlab,ATLAB}" would be considered acceptable, since > {m,M}{atlab,ATLAB} compatibility is the point)). > > My understanding from when I served on the Board is that "(TM)" carries no > legal weight at all (anybody can affix it to a name they "claim") and that > although "(R)" does carry legal weight, it was not considered our > responsibility to defend other people's trademarks. In fact, we are > emphatically required to use "(R)" with NIST registered trademarks (e.g., > "Standard Reference Material(R)") and banned from using it with anybody > else's trademarks, registered or otherwise. > > None of this should be construed as any official NIST guidance as to what > *you* should do, only my understanding of what *I* am supposed to do. > Understood, but you guys are the "National Institute of Standards and Technology," which is good enough for me - I'm going to cease to worry about it (and in fact take out the TM where I see it). :-) DG > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.czesla at hs.uni-hamburg.de Fri Jun 4 11:49:37 2010 From: stefan.czesla at hs.uni-hamburg.de (Stefan) Date: Fri, 4 Jun 2010 15:49:37 +0000 (UTC) Subject: [SciPy-Dev] =?utf-8?q?np=2Esavetxt=3A_apply_patch_in_enhancement_?= =?utf-8?q?ticket_1079=09to_add_headers=3F?= References: <4C066DA3.8010609@gmail.com> Message-ID: Dear all, as a consequence of our discussion, we developed a patch (attached to ticket 1079), which implements some of the features discussed here. We concentrated on comments and the header. Please have a look at the patch. We are looking forward to hearing your opinion and suggestions, and whether you see any problems, which could prevent it from entering the official release. We agree with Bruce that the format string should be inferred from the data type of the array. Yet, we believe that this point should be addressed in a different patch focussing on that topic. Also we noted that there is no error checking, when an array of dimension larger 2 is handed to np.savetxt, which may be implemented easily. Stefan & Christian From d.l.goldsmith at gmail.com Fri Jun 4 11:56:57 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 4 Jun 2010 08:56:57 -0700 Subject: [SciPy-Dev] Canceling Skypecon again Message-ID: Due to lack of issues requiring live discussion, and illness of host. :-( DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Fri Jun 4 12:03:33 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 4 Jun 2010 12:03:33 -0400 Subject: [SciPy-Dev] np.savetxt: apply patch in enhancement ticket 1079 to add headers? In-Reply-To: References: <4C066DA3.8010609@gmail.com> Message-ID: On Fri, Jun 4, 2010 at 11:49 AM, Stefan wrote: > Dear all, > > as a consequence of our discussion, we developed a patch (attached to > ticket 1079), which implements some of the features discussed here. > We concentrated on comments and the header. Please have a look at the > patch. We are looking forward to hearing your opinion and suggestions, > and whether you see any problems, which could prevent it from entering the > official release. > Link: http://projects.scipy.org/numpy/ticket/1079 One comment. Maybe you can add in the notes that the comment keyword can be used to write a header and still preserve compatibility with loadtxt. This wasn't obvious to me at first, though maybe that's just me. Other than that I think it looks like a good first effort towards making this a better function and I appreciate the attention here. Skipper > We agree with Bruce that the format string should be inferred from the > data type of the array. Yet, we believe that this point should be > addressed in a different patch focussing on that topic. > > Also we noted that there is no error checking, when an array of dimension > larger 2 is handed to np.savetxt, which may be implemented easily. > > Stefan & Christian > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From warren.weckesser at enthought.com Fri Jun 4 12:29:53 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Fri, 04 Jun 2010 11:29:53 -0500 Subject: [SciPy-Dev] ODEINT/ODE solvers redesign--anyone for a sprint at SciPy 2010? Message-ID: <4C092A01.9040905@enthought.com> It's about time we tackled the issue of the ODE solvers in SciPy. Some notes about the issue are on the wiki: http://projects.scipy.org/scipy/wiki/OdeintRedesign This would be a great topic for a sprint at the SciPy conference. I just added it to the list of suggested sprint topics, so give it a vote if you are going to be there and are interested in working on this. Warren From bsouthey at gmail.com Fri Jun 4 13:08:12 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 04 Jun 2010 12:08:12 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C07ADC1.6040504@enthought.com> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> <4C07ADC1.6040504@enthought.com> Message-ID: <4C0932FC.2020108@gmail.com> On 06/03/2010 08:27 AM, Warren Weckesser wrote: > Just letting you know that I'm not ignoring all the great comments from > josef, Neil and Bruce about my suggestion for chisquare_contingency. > Unfortunately, I won't have time to think about all the deeper > suggestions for another week or so. For now, I'll just say that I > agree with josef's and Neil's suggestions for the docstring, and that > Neil's summary of the function as simply a convenience function that > calls stats.chisquare with appropriate arguments to perform a test of > independence on a contingency table is exactly what I had in mind. > > Warren > > > Hi, I looked at how SAS handles n-way tables. What it appears to do is break the original table down into a set of 2-way tables and does the analysis on each of these. So a 3 by 4 by 5 table is processed as three 2-way tables with the results of each 4 by 5 table presented. I do not know how Stata and R analysis analyze n-way tables. Consequently, I rewrote my suggested code (attached) to handle 3 and 4 way tables by using recursion. There should be some Python way to do that recursion for any number of dimensions. I also added the 1-way table (but that has a different hypothesis than the 2-way table) so users can send a 1-d table. The data used is from two SAS examples and I added a dimension to get a 4-way table. I included the SAS values but these are only to 4 decimal places for reference. http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect029.htm http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect030.htm What is missing: 1) Docstring and tests but those are dependent what is ultimately decided 2) Other test statistics but scipy.stats versions are not very friendly in that these do not accept a 2-d array 3) A way to do recursion 4) Ability to label the levels etc. 5) Correct handling of input types. Bruce -------------- next part -------------- A non-text attachment was scrubbed... Name: cont_table.py Type: text/x-python Size: 4300 bytes Desc: not available URL: From bsouthey at gmail.com Fri Jun 4 14:08:15 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 04 Jun 2010 13:08:15 -0500 Subject: [SciPy-Dev] np.savetxt: apply patch in enhancement ticket 1079 to add headers? In-Reply-To: References: <4C066DA3.8010609@gmail.com> Message-ID: <4C09410F.1010900@gmail.com> On 06/04/2010 11:03 AM, Skipper Seabold wrote: > On Fri, Jun 4, 2010 at 11:49 AM, Stefan wrote: > >> Dear all, >> >> as a consequence of our discussion, we developed a patch (attached to >> ticket 1079), which implements some of the features discussed here. >> We concentrated on comments and the header. Please have a look at the >> patch. We are looking forward to hearing your opinion and suggestions, >> and whether you see any problems, which could prevent it from entering the >> official release. >> >> > Link: http://projects.scipy.org/numpy/ticket/1079 > > One comment. Maybe you can add in the notes that the comment keyword > can be used to write a header and still preserve compatibility with > loadtxt. This wasn't obvious to me at first, though maybe that's just > me. > > Other than that I think it looks like a good first effort towards > making this a better function and I appreciate the attention here. > > Skipper > > >> We agree with Bruce that the format string should be inferred from the >> data type of the array. Yet, we believe that this point should be >> addressed in a different patch focussing on that topic. >> >> Also we noted that there is no error checking, when an array of dimension >> larger 2 is handed to np.savetxt, which may be implemented easily. >> >> Stefan& Christian >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > Hi, For the sake of similarity to loadtxt keywords (because loadtxt has them and changing those is harder than adding new ones to savetxt): 1) 'comment_character' should be 'comments' 2) instead of 'comment' perhaps use 'preamble' Thanks for doing the patch so quickly! Bruce From josef.pktd at gmail.com Fri Jun 4 14:12:06 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 4 Jun 2010 14:12:06 -0400 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C0932FC.2020108@gmail.com> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> <4C07ADC1.6040504@enthought.com> <4C0932FC.2020108@gmail.com> Message-ID: On Fri, Jun 4, 2010 at 1:08 PM, Bruce Southey wrote: > On 06/03/2010 08:27 AM, Warren Weckesser wrote: >> >> Just letting you know that I'm not ignoring all the great comments from >> josef, Neil and Bruce about my suggestion for chisquare_contingency. >> Unfortunately, I won't have time to think about all the deeper >> suggestions for another week or so. ? For now, I'll just say that I >> agree with josef's and Neil's suggestions for the docstring, and that >> Neil's summary of the function as simply a convenience function that >> calls stats.chisquare with appropriate arguments to perform a test of >> independence on a contingency table is exactly what I had in mind. >> >> Warren >> >> >> > > Hi, > I looked at how SAS handles n-way tables. What it appears to do is break the > original table down into a set of 2-way tables and does the analysis on each > of these. So a 3 by 4 by 5 table is processed as three 2-way tables with the > results of each 4 by 5 table presented. I do not know how Stata and R > analysis analyze n-way tables. > > Consequently, I rewrote my suggested code (attached) to handle 3 and 4 way > tables by using recursion. There should be some Python way to do that > recursion for any number of dimensions. I also added the 1-way table (but > that has a different hypothesis than the 2-way table) so users can send a > 1-d table. (very briefly because I don't have much time today) I think, these are good extensions, but to handle all cases, the function is getting too large and would need several options. On your code and SAS, Z(correct me if my quick reading is wrong) You seem to be calculating conditional independence for the last two variables conditional on the values of the first variables. I think this could be generalized to all pairwise independence tests. Similar, I'm a bit surprised that SAS uses conditional and not marginal independence, I would have thought that the test for marginal independence (aggregate out all but 2 variables) would be the more common use case. Initially, I was thinking just about independence of all variables in a 3 or more way table, i.e. P(x,y,z)=P(x)*P(y)*P(z) My opinion is that these variations of tests would fit better in a class where all pairwise conditional, and marginal and joint hypotheses can be supplied as methods, or split it up into a group of functions. Thanks, Josef > > The data used is from two SAS examples and I added a dimension to get a > 4-way table. I included the SAS values but these are only to 4 decimal > places for reference. > > http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect029.htm > http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect030.htm > > What is missing: > 1) Docstring and tests but those are dependent what is ultimately decided > 2) Other test statistics but scipy.stats versions are not very friendly in > that these do not accept a 2-d array > 3) A way to do recursion > 4) Ability to label the levels etc. > 5) Correct handling of input types. > > Bruce > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From vincent at vincentdavis.net Fri Jun 4 20:49:27 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Fri, 4 Jun 2010 18:49:27 -0600 Subject: [SciPy-Dev] why this not report an error for a.dtype=float when a is mixed struc array Message-ID: Is there a reason not to have this return an error. >>> a1 = np.array([(1,3.3),(2,4.4)], dtype=[('a', int),('b', float)]) >>> a1 array([(1, 3.2999999999999998), (2, 4.4000000000000004)], dtype=[('a', '>> a1.dtype=float >>> a1 array([ 4.94065646e-324, 3.30000000e+000, 9.88131292e-324, 4.40000000e+000]) It seems that this could really cause problems if you did not notice what was going on. Vincent From ralf.gommers at googlemail.com Sat Jun 5 06:45:48 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 5 Jun 2010 18:45:48 +0800 Subject: [SciPy-Dev] ANN: SciPy 0.8.0 beta 1 Message-ID: I'm pleased to announce the first beta release of SciPy 0.8.0. SciPy is a package of tools for science and engineering for Python. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more. This beta release comes almost one and a half year after the 0.7.0 release and contains many new features, numerous bug-fixes, improved test coverage, and better documentation. Please note that SciPy 0.8.0b1 requires Python 2.4 or greater and NumPy 1.4.1 or greater. For information, please see the release notes: http://sourceforge.net/projects/scipy/files/scipy/0.8.0b1/NOTES.txt/view You can download the release from here: https://sourceforge.net/projects/scipy/ Python 2.5/2.6 binaries for Windows and OS X are available as well as source tarballs for other platforms. Thank you to everybody who contributed to this release. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Sat Jun 5 04:22:04 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sat, 5 Jun 2010 01:22:04 -0700 Subject: [SciPy-Dev] why this not report an error for a.dtype=float when a is mixed struc array In-Reply-To: References: Message-ID: On 4 June 2010 17:49, Vincent Davis wrote: > Is there a reason not to have this return an error. >>>> a1 = np.array([(1,3.3),(2,4.4)], dtype=[('a', int),('b', float)]) >>>> a1 > array([(1, 3.2999999999999998), (2, 4.4000000000000004)], > ? ? ?dtype=[('a', '>>> a1.dtype=float >>>> a1 > array([ ?4.94065646e-324, ? 3.30000000e+000, ? 9.88131292e-324, > ? ? ? ? 4.40000000e+000]) This is a feature! Sometimes, it is handy to view the raw memory in different ways. You are probably looking for the "astype" method. Regards St?fan From d.l.goldsmith at gmail.com Sat Jun 5 03:11:02 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sat, 5 Jun 2010 00:11:02 -0700 Subject: [SciPy-Dev] A doc-related, check-in-related request Message-ID: Hi! If you add documented code to NumPy or SciPy, the Wiki will pull the docstring and will give it the status of "Needs editing," even if you have supplied a "Needs review"-quality docstring. Also, even if your docstring isn't "Needs review"-quality, you, as code writer, are presumably the best person to "own" the docstring, be it for the purpose of finishing it later or serving as a reference for someone else to do so. So, I make the following general request: a few days after you commit your code (give it a few days because the Wiki doesn't always pull right away), please visit your new committed objects in the Wiki and do one of two things: if you feel the docstring is "finished," please go ahead and promote it to "Needs review" status; if you feel the docstring is unfinished, please "claim" it by editing it (if you don't have time for substantive edits, you can just add a line break or something similarly trivial, just something so that the Wiki will record you as having made an edit), which in turn will automatically promote it to "Being written" (which alerts others to check the log to see if someone else is working on the docstring). This way, new docstrings don't make our progress look, in the Wiki, like regress. Thanks! DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Sat Jun 5 11:09:01 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 5 Jun 2010 09:09:01 -0600 Subject: [SciPy-Dev] why this not report an error for a.dtype=float when a is mixed struc array In-Reply-To: References: Message-ID: 2010/6/5 St?fan van der Walt : > On 4 June 2010 17:49, Vincent Davis wrote: >> Is there a reason not to have this return an error. >>>>> a1 = np.array([(1,3.3),(2,4.4)], dtype=[('a', int),('b', float)]) >>>>> a1 >> array([(1, 3.2999999999999998), (2, 4.4000000000000004)], >> ? ? ?dtype=[('a', '>>>> a1.dtype=float >>>>> a1 >> array([ ?4.94065646e-324, ? 3.30000000e+000, ? 9.88131292e-324, >> ? ? ? ? 4.40000000e+000]) > > This is a feature! ?Sometimes, it is handy to view the raw memory in > different ways. Out of curiosity how would I use this? Thanks Vincent > > You are probably looking for the "astype" method. > > Regards > St?fan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From pav at iki.fi Sat Jun 5 11:17:15 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 5 Jun 2010 15:17:15 +0000 (UTC) Subject: [SciPy-Dev] Clarification: is the Extended Summary section optional? References: Message-ID: Wed, 02 Jun 2010 20:09:39 -0600, Vincent Davis wrote: [clip] > As I am always interested in learning new things is there any help I can > offer in getting the wiki review feature implemented? Yes, definitely, any help here is appreciated! The list of issues to do has grown a bit long, as I haven't found sufficient time to tackle them :/ If you are not yet familiar with Django, the following will be helpful: http://docs.djangoproject.com/en/1.2/ Myself, I'd start by going hands-on through their excellent tutorial before diving into Pydocweb. The doc editor itself is not a very special as a Django app, and follows the usual Django conventions, so the tutorial should make several things more clear. (Unfortunately the app was a bit hastily cobbled together, and this shows at some points.) If you have specific questions, feel free to ask! -- Pauli Virtanen From jsseabold at gmail.com Sat Jun 5 11:20:05 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Sat, 5 Jun 2010 11:20:05 -0400 Subject: [SciPy-Dev] why this not report an error for a.dtype=float when a is mixed struc array In-Reply-To: References: Message-ID: On Sat, Jun 5, 2010 at 11:09 AM, Vincent Davis wrote: > 2010/6/5 St?fan van der Walt : >> On 4 June 2010 17:49, Vincent Davis wrote: >>> Is there a reason not to have this return an error. >>>>>> a1 = np.array([(1,3.3),(2,4.4)], dtype=[('a', int),('b', float)]) >>>>>> a1 >>> array([(1, 3.2999999999999998), (2, 4.4000000000000004)], >>> ? ? ?dtype=[('a', '>>>>> a1.dtype=float >>>>>> a1 >>> array([ ?4.94065646e-324, ? 3.30000000e+000, ? 9.88131292e-324, >>> ? ? ? ? 4.40000000e+000]) >> >> This is a feature! ?Sometimes, it is handy to view the raw memory in >> different ways. > You might find this thread helpful. Especially, Chris's reply. http://thread.gmane.org/gmane.comp.python.numeric.general/32664/ Skipper From vincent at vincentdavis.net Sat Jun 5 11:49:09 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 5 Jun 2010 09:49:09 -0600 Subject: [SciPy-Dev] why this not report an error for a.dtype=float when a is mixed struc array In-Reply-To: References: Message-ID: On Sat, Jun 5, 2010 at 9:20 AM, Skipper Seabold wrote: > On Sat, Jun 5, 2010 at 11:09 AM, Vincent Davis wrote: >> 2010/6/5 St?fan van der Walt : >>> On 4 June 2010 17:49, Vincent Davis wrote: >>>> Is there a reason not to have this return an error. >>>>>>> a1 = np.array([(1,3.3),(2,4.4)], dtype=[('a', int),('b', float)]) >>>>>>> a1 >>>> array([(1, 3.2999999999999998), (2, 4.4000000000000004)], >>>> ? ? ?dtype=[('a', '>>>>>> a1.dtype=float >>>>>>> a1 >>>> array([ ?4.94065646e-324, ? 3.30000000e+000, ? 9.88131292e-324, >>>> ? ? ? ? 4.40000000e+000]) >>> >>> This is a feature! ?Sometimes, it is handy to view the raw memory in >>> different ways. >> > > You might find this thread helpful. ?Especially, Chris's reply. > > http://thread.gmane.org/gmane.comp.python.numeric.general/32664/ Ok now I understand why, that is why the numbers are a mess but not why it is a feature :) I guess I am try to think of why I would use this. I might be completely wrong but if most users would expect a different behavior and don't notice what is actually happening then maybe there should be a warning and a different way to get the current results. Vincent > > Skipper > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From stefan at sun.ac.za Sat Jun 5 23:08:18 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sat, 5 Jun 2010 20:08:18 -0700 Subject: [SciPy-Dev] why this not report an error for a.dtype=float when a is mixed struc array In-Reply-To: References: Message-ID: On 5 June 2010 08:49, Vincent Davis wrote: >> You might find this thread helpful. ?Especially, Chris's reply. >> >> http://thread.gmane.org/gmane.comp.python.numeric.general/32664/ > > Ok now I understand why, that is why the numbers are a mess but not > why it is a feature :) I guess I am try to think of why I would use > this. I might be completely wrong but if most users would expect a > different behavior and don't notice what is actually happening then > maybe there should be a warning and a different way to get the current > results. There are many uses for 'view', such as examining underlying bytes or changing the subclass of an array without copying. I'm not sure I follow your argument, though. 'view' and 'astype' do distinctly different things (well defined), and are both necessary for advanced array computation. An array is simply a wrapper around memory, and it should not be too magical. Regards St?fan From vincent at vincentdavis.net Sat Jun 5 23:22:09 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 5 Jun 2010 21:22:09 -0600 Subject: [SciPy-Dev] why this not report an error for a.dtype=float when a is mixed struc array In-Reply-To: References: Message-ID: 2010/6/5 St?fan van der Walt : > On 5 June 2010 08:49, Vincent Davis wrote: >>> You might find this thread helpful. ?Especially, Chris's reply. >>> >>> http://thread.gmane.org/gmane.comp.python.numeric.general/32664/ >> >> Ok now I understand why, that is why the numbers are a mess but not >> why it is a feature :) I guess I am try to think of why I would use >> this. I might be completely wrong but if most users would expect a >> different behavior and don't notice what is actually happening then >> maybe there should be a warning and a different way to get the current >> results. > > There are many uses for 'view', such as examining underlying bytes or > changing the subclass of an array without copying. > > I'm not sure I follow your argument, though. ?'view' and 'astype' do > distinctly different things (well defined), and are both necessary for > advanced array computation. ?An array is simply a wrapper around > memory, and it should not be too magical. i.e. I lack knowledge and experience, no thats not what you said but it is probably the correct assessment. Thanks Vincent > Regards > St?fan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From ralf.gommers at googlemail.com Sun Jun 6 11:53:52 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 6 Jun 2010 23:53:52 +0800 Subject: [SciPy-Dev] SSE instruction in arpack file - f2py issue? Message-ID: When checking the 0.8.0b1 superpacks I found a single file with SSE instructions, sparse/linalg/eigen/arpack/_arpack.pyd. The only thing possible explanation I found is that f2py can add SSE instructions by default, as claimed here: http://thread.gmane.org/gmane.comp.python.f2py.user/712/focus=6882. Is this correct? Anyone have any other suggestions on where to look? Ticket and history: http://projects.scipy.org/scipy/ticket/1170 Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From neilcrighton at gmail.com Mon Jun 7 07:34:04 2010 From: neilcrighton at gmail.com (Neil Crighton) Date: Mon, 7 Jun 2010 11:34:04 +0000 (UTC) Subject: [SciPy-Dev] ANN: SciPy 0.8.0 beta 1 References: Message-ID: Ralf Gommers googlemail.com> writes: > I'm pleased to announce the first beta release of SciPy > 0.8.0.SciPy is a package of tools for science and engineering > for Python.It includes modules for statistics, optimization, > integration, linearalgebra, Fourier transforms, signal and > image processing, ODE solvers, and more.This beta release comes > almost one and a half year after the 0.7.0 release andcontains > many new features, numerous bug-fixes, improved testcoverage, > and better documentation. Please note that SciPy 0.8.0b1 > requires Python 2.4 or greater and NumPy 1.4.1 or greater. Thanks for getting the beta out! The release notes say Numpy 1.3 or greater is needed - is this correct? Above you say 1.4.1 is needed. I think "support for Python 3 in Scipy might not yet be included in Scipy 0.8" is too ambiguous. Just say 0.8 will not be compatible with Python 3, but we expect the next version (0.9?) to be compatible, if that's the case. Cheers, Neil From ralf.gommers at googlemail.com Mon Jun 7 07:55:24 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 7 Jun 2010 19:55:24 +0800 Subject: [SciPy-Dev] ANN: SciPy 0.8.0 beta 1 In-Reply-To: References: Message-ID: On Mon, Jun 7, 2010 at 7:34 PM, Neil Crighton wrote: > Ralf Gommers googlemail.com> writes: > > > I'm pleased to announce the first beta release of SciPy > > 0.8.0.SciPy is a package of tools for science and engineering > > for Python.It includes modules for statistics, optimization, > > integration, linearalgebra, Fourier transforms, signal and > > image processing, ODE solvers, and more.This beta release comes > > almost one and a half year after the 0.7.0 release andcontains > > many new features, numerous bug-fixes, improved testcoverage, > > and better documentation. Please note that SciPy 0.8.0b1 > > requires Python 2.4 or greater and NumPy 1.4.1 or greater. > > Thanks for getting the beta out! > > The release notes say Numpy 1.3 or greater is needed - is this > correct? Above you say 1.4.1 is needed. No, 1.4.1 is needed. Notes are fixed now. > I think "support for > Python 3 in Scipy might not yet be included in Scipy 0.8" is too > ambiguous. Just say 0.8 will not be compatible with Python 3, but > we expect the next version (0.9?) to be compatible, if that's the > case. > > Reworded as: "Python 3 compatibility is planned and is currently technically feasible, since Numpy has been ported. However, since the Python 3 compatible Numpy 2.0 has not been released yet, support for Python 3 in Scipy is not yet included in Scipy 0.8. SciPy 0.9, planned for fall 2010, will very likely include experimental support for Python 3." Thanks for reporting, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jun 7 10:15:56 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 7 Jun 2010 10:15:56 -0400 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> <4C07ADC1.6040504@enthought.com> <4C0932FC.2020108@gmail.com> Message-ID: On Fri, Jun 4, 2010 at 2:12 PM, wrote: > On Fri, Jun 4, 2010 at 1:08 PM, Bruce Southey wrote: >> On 06/03/2010 08:27 AM, Warren Weckesser wrote: >>> >>> Just letting you know that I'm not ignoring all the great comments from >>> josef, Neil and Bruce about my suggestion for chisquare_contingency. >>> Unfortunately, I won't have time to think about all the deeper >>> suggestions for another week or so. ? For now, I'll just say that I >>> agree with josef's and Neil's suggestions for the docstring, and that >>> Neil's summary of the function as simply a convenience function that >>> calls stats.chisquare with appropriate arguments to perform a test of >>> independence on a contingency table is exactly what I had in mind. >>> >>> Warren >>> >>> >>> >> >> Hi, >> I looked at how SAS handles n-way tables. What it appears to do is break the >> original table down into a set of 2-way tables and does the analysis on each >> of these. So a 3 by 4 by 5 table is processed as three 2-way tables with the >> results of each 4 by 5 table presented. I do not know how Stata and R >> analysis analyze n-way tables. >> >> Consequently, I rewrote my suggested code (attached) to handle 3 and 4 way >> tables by using recursion. There should be some Python way to do that >> recursion for any number of dimensions. I also added the 1-way table (but >> that has a different hypothesis than the 2-way table) so users can send a >> 1-d table. > > (very briefly because I don't have much time today) > > I think, these are good extensions, but to handle all cases, the > function is getting too large and would need several options. > > On your code and SAS, Z(correct me if my quick reading is wrong) > You seem to be calculating conditional independence for the last two > variables conditional on the values of the first variables. I think > this could be generalized to all pairwise independence tests. > > Similar, I'm a bit surprised that SAS uses conditional and not > marginal independence, I would have thought that the test for marginal > independence (aggregate out all but 2 variables) would be the more > common use case. just some more questions and comments (until I have time to check this) looking at conditional independence looks similar to linear regression models, where the effect of other variables is taken out. However, looking at all chisquare tests (conditional on all possible other values) runs into the multiple test problem. Is the some kind of post-hoc or Bonferroni correction or is there a distribution for eg. the max of all chisquare test statistics. with an iterator (numpy mailinglist), my version for the conditional independence of the last two variables for all values of the earlier variables looks like for ind in allbut2ax_iterator(table3, axes=(-2,-1)): print chisquare_contingency(table3[ind]) Josef > > Initially, I was thinking just about independence of all variables in > a 3 or more way table, i.e. P(x,y,z)=P(x)*P(y)*P(z) > > My opinion is that these variations of tests would fit better in a > class where all pairwise conditional, and marginal and joint > hypotheses can be supplied as methods, or split it up into a group of > functions. > > Thanks, > > Josef > >> >> The data used is from two SAS examples and I added a dimension to get a >> 4-way table. I included the SAS values but these are only to 4 decimal >> places for reference. >> >> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect029.htm >> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect030.htm >> >> What is missing: >> 1) Docstring and tests but those are dependent what is ultimately decided >> 2) Other test statistics but scipy.stats versions are not very friendly in >> that these do not accept a 2-d array >> 3) A way to do recursion >> 4) Ability to label the levels etc. >> 5) Correct handling of input types. >> >> Bruce >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > From charlesr.harris at gmail.com Mon Jun 7 10:20:01 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 7 Jun 2010 08:20:01 -0600 Subject: [SciPy-Dev] ANN: SciPy 0.8.0 beta 1 In-Reply-To: References: Message-ID: On Mon, Jun 7, 2010 at 5:55 AM, Ralf Gommers wrote: > > > On Mon, Jun 7, 2010 at 7:34 PM, Neil Crighton wrote: > >> Ralf Gommers googlemail.com> writes: >> >> > I'm pleased to announce the first beta release of SciPy >> > 0.8.0.SciPy is a package of tools for science and engineering >> > for Python.It includes modules for statistics, optimization, >> > integration, linearalgebra, Fourier transforms, signal and >> > image processing, ODE solvers, and more.This beta release comes >> > almost one and a half year after the 0.7.0 release andcontains >> > many new features, numerous bug-fixes, improved testcoverage, >> > and better documentation. Please note that SciPy 0.8.0b1 >> > requires Python 2.4 or greater and NumPy 1.4.1 or greater. >> >> Thanks for getting the beta out! >> >> The release notes say Numpy 1.3 or greater is needed - is this >> correct? Above you say 1.4.1 is needed. > > > No, 1.4.1 is needed. Notes are fixed now. > > >> I think "support for >> Python 3 in Scipy might not yet be included in Scipy 0.8" is too >> ambiguous. Just say 0.8 will not be compatible with Python 3, but >> we expect the next version (0.9?) to be compatible, if that's the >> case. >> >> Reworded as: > "Python 3 compatibility is planned and is currently technically > feasible, since Numpy has been ported. However, since the Python 3 > compatible Numpy 2.0 has not been released yet, support for Python 3 > in Scipy is not yet included in Scipy 0.8. SciPy 0.9, planned for fall > 2010, will very likely include experimental support for Python 3." > > Are we going to release a Numpy 1.5? Also, the beta release should be noted on the SciPy home page. Maybe adding such notes needs to be part of the how-to-release checklist since it tends to be forgotten. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Mon Jun 7 11:00:35 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 07 Jun 2010 10:00:35 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> <4C07ADC1.6040504@enthought.com> <4C0932FC.2020108@gmail.com> Message-ID: <4C0D0993.1080206@gmail.com> On 06/07/2010 09:15 AM, josef.pktd at gmail.com wrote: > On Fri, Jun 4, 2010 at 2:12 PM, wrote: > >> On Fri, Jun 4, 2010 at 1:08 PM, Bruce Southey wrote: >> >>> On 06/03/2010 08:27 AM, Warren Weckesser wrote: >>> >>>> Just letting you know that I'm not ignoring all the great comments from >>>> josef, Neil and Bruce about my suggestion for chisquare_contingency. >>>> Unfortunately, I won't have time to think about all the deeper >>>> suggestions for another week or so. For now, I'll just say that I >>>> agree with josef's and Neil's suggestions for the docstring, and that >>>> Neil's summary of the function as simply a convenience function that >>>> calls stats.chisquare with appropriate arguments to perform a test of >>>> independence on a contingency table is exactly what I had in mind. >>>> >>>> Warren >>>> >>>> >>>> >>>> >>> Hi, >>> I looked at how SAS handles n-way tables. What it appears to do is break the >>> original table down into a set of 2-way tables and does the analysis on each >>> of these. So a 3 by 4 by 5 table is processed as three 2-way tables with the >>> results of each 4 by 5 table presented. I do not know how Stata and R >>> analysis analyze n-way tables. >>> >>> Consequently, I rewrote my suggested code (attached) to handle 3 and 4 way >>> tables by using recursion. There should be some Python way to do that >>> recursion for any number of dimensions. I also added the 1-way table (but >>> that has a different hypothesis than the 2-way table) so users can send a >>> 1-d table. >>> >> (very briefly because I don't have much time today) >> >> I think, these are good extensions, but to handle all cases, the >> function is getting too large and would need several options. >> >> On your code and SAS, Z(correct me if my quick reading is wrong) >> You seem to be calculating conditional independence for the last two >> variables conditional on the values of the first variables. I think >> this could be generalized to all pairwise independence tests. >> >> Similar, I'm a bit surprised that SAS uses conditional and not >> marginal independence, I would have thought that the test for marginal >> independence (aggregate out all but 2 variables) would be the more >> common use case. >> You can argue SAS's formulation relates to how the table is constructed because the hypothesis associated with the table is dependent on how the user constructs it. For example, the 3-way table A by (B by C) is very different from the 3-way table C by (B by A) yet these involve the same underlying numbers. If a user did not specify an order then considering all possible hypotheses is an option. Really log-linear models are a better approach to analysis n-way tables because these allow you to examine all these different hypotheses. > just some more questions and comments (until I have time to check this) > > looking at conditional independence looks similar to linear regression > models, where the effect of other variables is taken out. However, > looking at all chisquare tests (conditional on all possible other > values) runs into the multiple test problem. Is the some kind of > post-hoc or Bonferroni correction or is there a distribution for eg. > the max of all chisquare test statistics. > Ignoring my views on this, first 'multiple test problems' do not change the probability calculation for most approaches to compute the 'raw' p-value as the vast majority of the approaches require the 'raw' p-value. Second, it is very easy to say 'correct for multiple tests' but that is pure ignorance when 'what' you are correcting is for is not stated. If you are correcting the 'family-wise error rate' then you need to correctly define 'family-wise' in this situation especially to address at least one other assumption being made. > with an iterator (numpy mailinglist), my version for the conditional > independence of the last two variables for all values of the earlier > variables looks like > > for ind in allbut2ax_iterator(table3, axes=(-2,-1)): > print chisquare_contingency(table3[ind]) > > Josef > > A link: http://article.gmane.org/gmane.comp.python.numeric.general/38352 I would have to see. Bruce >> Initially, I was thinking just about independence of all variables in >> a 3 or more way table, i.e. P(x,y,z)=P(x)*P(y)*P(z) >> >> My opinion is that these variations of tests would fit better in a >> class where all pairwise conditional, and marginal and joint >> hypotheses can be supplied as methods, or split it up into a group of >> functions. >> >> Thanks, >> >> Josef >> >> >>> The data used is from two SAS examples and I added a dimension to get a >>> 4-way table. I included the SAS values but these are only to 4 decimal >>> places for reference. >>> >>> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect029.htm >>> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect030.htm >>> >>> What is missing: >>> 1) Docstring and tests but those are dependent what is ultimately decided >>> 2) Other test statistics but scipy.stats versions are not very friendly in >>> that these do not accept a 2-d array >>> 3) A way to do recursion >>> 4) Ability to label the levels etc. >>> 5) Correct handling of input types. >>> >>> Bruce >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >>> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jun 7 11:45:06 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 7 Jun 2010 11:45:06 -0400 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C0D0993.1080206@gmail.com> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> <4C07ADC1.6040504@enthought.com> <4C0932FC.2020108@gmail.com> <4C0D0993.1080206@gmail.com> Message-ID: On Mon, Jun 7, 2010 at 11:00 AM, Bruce Southey wrote: > On 06/07/2010 09:15 AM, josef.pktd at gmail.com wrote: > > On Fri, Jun 4, 2010 at 2:12 PM, wrote: > > > On Fri, Jun 4, 2010 at 1:08 PM, Bruce Southey wrote: > > > On 06/03/2010 08:27 AM, Warren Weckesser wrote: > > > Just letting you know that I'm not ignoring all the great comments from > josef, Neil and Bruce about my suggestion for chisquare_contingency. > Unfortunately, I won't have time to think about all the deeper > suggestions for another week or so. ? For now, I'll just say that I > agree with josef's and Neil's suggestions for the docstring, and that > Neil's summary of the function as simply a convenience function that > calls stats.chisquare with appropriate arguments to perform a test of > independence on a contingency table is exactly what I had in mind. > > Warren > > > > > > Hi, > I looked at how SAS handles n-way tables. What it appears to do is break the > original table down into a set of 2-way tables and does the analysis on each > of these. So a 3 by 4 by 5 table is processed as three 2-way tables with the > results of each 4 by 5 table presented. I do not know how Stata and R > analysis analyze n-way tables. > > Consequently, I rewrote my suggested code (attached) to handle 3 and 4 way > tables by using recursion. There should be some Python way to do that > recursion for any number of dimensions. I also added the 1-way table (but > that has a different hypothesis than the 2-way table) so users can send a > 1-d table. > > > (very briefly because I don't have much time today) > > I think, these are good extensions, but to handle all cases, the > function is getting too large and would need several options. > > On your code and SAS, Z(correct me if my quick reading is wrong) > You seem to be calculating conditional independence for the last two > variables conditional on the values of the first variables. I think > this could be generalized to all pairwise independence tests. > > Similar, I'm a bit surprised that SAS uses conditional and not > marginal independence, I would have thought that the test for marginal > independence (aggregate out all but 2 variables) would be the more > common use case. > > > You can argue SAS's formulation relates to how the table is constructed > because the hypothesis associated with the table is dependent on how the > user constructs it. For example, the 3-way table A by (B by C) is very > different from the 3-way table C by (B by A) yet these involve the same > underlying numbers. If a user did not specify an order then considering all > possible hypotheses is an option. I don't know the SAS notation, what I thought in analogy to regression analysis, is that if one variable is considered as endogenous, then only pairwise tests with this variable need to be included. > > Really log-linear models are a better approach to analysis n-way tables > because these allow you to examine all these different hypotheses. > > just some more questions and comments (until I have time to check this) > > looking at conditional independence looks similar to linear regression > models, where the effect of other variables is taken out. However, > looking at all chisquare tests (conditional on all possible other > values) runs into the multiple test problem. Is the some kind of > post-hoc or Bonferroni correction or is there a distribution for eg. > the max of all chisquare test statistics. > > > Ignoring my views on this, first 'multiple test problems' do not change the > probability calculation for most approaches to compute the 'raw' p-value as > the vast majority of the approaches require the 'raw' p-value. > > Second, it is very easy to say 'correct for multiple tests' but that is pure > ignorance when 'what' you are correcting is for is not stated. If you are > correcting the 'family-wise error rate' then you need to correctly define > 'family-wise' in this situation especially to address at least one other > assumption being made. I know nothing about this in the context of contingency tables. We recently had the discussion about multiple tests in the context of post-hoc tests for anova, where I had to read up. In econometrics, there is an extensive literature on this, and some cases like structural change tests with unknown change points I know pretty well. The main point that I wanted to make is, that multiple change tests need more attention and at least a warning in the docstring which (raw) p-values are reported, since it is easy for unwary users to misinterpret the reported p-values. But hopefully this could be extended to provide the user with options to do an appropriate correction. Josef > > with an iterator (numpy mailinglist), my version for the conditional > independence of the last two variables for all values of the earlier > variables looks like > > for ind in allbut2ax_iterator(table3, axes=(-2,-1)): > print chisquare_contingency(table3[ind]) > > Josef > > > > A link: > http://article.gmane.org/gmane.comp.python.numeric.general/38352 > > I would have to see. > > Bruce > > Initially, I was thinking just about independence of all variables in > a 3 or more way table, i.e. P(x,y,z)=P(x)*P(y)*P(z) > > My opinion is that these variations of tests would fit better in a > class where all pairwise conditional, and marginal and joint > hypotheses can be supplied as methods, or split it up into a group of > functions. > > Thanks, > > Josef > > > > The data used is from two SAS examples and I added a dimension to get a > 4-way table. I included the SAS values but these are only to 4 decimal > places for reference. > > http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect029.htm > http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect030.htm > > What is missing: > 1) Docstring and tests but those are dependent what is ultimately decided > 2) Other test statistics but scipy.stats versions are not very friendly in > that these do not accept a 2-d array > 3) A way to do recursion > 4) Ability to label the levels etc. > 5) Correct handling of input types. > > Bruce > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From bsouthey at gmail.com Mon Jun 7 12:45:07 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 07 Jun 2010 11:45:07 -0500 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> <4C07ADC1.6040504@enthought.com> <4C0932FC.2020108@gmail.com> <4C0D0993.1080206@gmail.com> Message-ID: <4C0D2213.7020302@gmail.com> On 06/07/2010 10:45 AM, josef.pktd at gmail.com wrote: > On Mon, Jun 7, 2010 at 11:00 AM, Bruce Southey wrote: > >> On 06/07/2010 09:15 AM, josef.pktd at gmail.com wrote: >> >> On Fri, Jun 4, 2010 at 2:12 PM, wrote: >> >> >> On Fri, Jun 4, 2010 at 1:08 PM, Bruce Southey wrote: >> >> >> On 06/03/2010 08:27 AM, Warren Weckesser wrote: >> >> >> Just letting you know that I'm not ignoring all the great comments from >> josef, Neil and Bruce about my suggestion for chisquare_contingency. >> Unfortunately, I won't have time to think about all the deeper >> suggestions for another week or so. For now, I'll just say that I >> agree with josef's and Neil's suggestions for the docstring, and that >> Neil's summary of the function as simply a convenience function that >> calls stats.chisquare with appropriate arguments to perform a test of >> independence on a contingency table is exactly what I had in mind. >> >> Warren >> >> >> >> >> >> Hi, >> I looked at how SAS handles n-way tables. What it appears to do is break the >> original table down into a set of 2-way tables and does the analysis on each >> of these. So a 3 by 4 by 5 table is processed as three 2-way tables with the >> results of each 4 by 5 table presented. I do not know how Stata and R >> analysis analyze n-way tables. >> >> Consequently, I rewrote my suggested code (attached) to handle 3 and 4 way >> tables by using recursion. There should be some Python way to do that >> recursion for any number of dimensions. I also added the 1-way table (but >> that has a different hypothesis than the 2-way table) so users can send a >> 1-d table. >> >> >> (very briefly because I don't have much time today) >> >> I think, these are good extensions, but to handle all cases, the >> function is getting too large and would need several options. >> >> On your code and SAS, Z(correct me if my quick reading is wrong) >> You seem to be calculating conditional independence for the last two >> variables conditional on the values of the first variables. I think >> this could be generalized to all pairwise independence tests. >> >> Similar, I'm a bit surprised that SAS uses conditional and not >> marginal independence, I would have thought that the test for marginal >> independence (aggregate out all but 2 variables) would be the more >> common use case. >> >> >> You can argue SAS's formulation relates to how the table is constructed >> because the hypothesis associated with the table is dependent on how the >> user constructs it. For example, the 3-way table A by (B by C) is very >> different from the 3-way table C by (B by A) yet these involve the same >> underlying numbers. If a user did not specify an order then considering all >> possible hypotheses is an option. >> > I don't know the SAS notation, what I thought in analogy to regression > analysis, is that if one variable is considered as endogenous, then > only pairwise tests with this variable need to be included. > This is not the same as regression for multiple reasons. Here we are testing independence without any distribution assumption associated with the actual data. (Of course under the normality assumption then these are the same. ) > >> Really log-linear models are a better approach to analysis n-way tables >> because these allow you to examine all these different hypotheses. >> >> just some more questions and comments (until I have time to check this) >> >> looking at conditional independence looks similar to linear regression >> models, where the effect of other variables is taken out. However, >> looking at all chisquare tests (conditional on all possible other >> values) runs into the multiple test problem. Is the some kind of >> post-hoc or Bonferroni correction or is there a distribution for eg. >> the max of all chisquare test statistics. >> >> >> Ignoring my views on this, first 'multiple test problems' do not change the >> probability calculation for most approaches to compute the 'raw' p-value as >> the vast majority of the approaches require the 'raw' p-value. >> >> Second, it is very easy to say 'correct for multiple tests' but that is pure >> ignorance when 'what' you are correcting is for is not stated. If you are >> correcting the 'family-wise error rate' then you need to correctly define >> 'family-wise' in this situation especially to address at least one other >> assumption being made. >> > I know nothing about this in the context of contingency tables. In a 2-way table there is no need for any correction so it is pointless to say 'correct for multiple tests'. In a 3-way or higher table, as you indicated, is essentially a test of conditional independence as I implemented it. It is also pointless to say 'correct for multiple tests' because you are first assuming conditional independence between say A by B given C=1 and A by B for C=2. So what happens when C=1 is independent of when C=2 so these do belong to different 'families'. Second, there is nothing said about the relation of either A or B with C - which may be a more critical problem. > We > recently had the discussion about multiple tests in the context of > post-hoc tests for anova, where I had to read up. > I am perhaps too aware of multiple testing and unfortunately these types of discussions go on and on and on. A lot depends on which of many 'schools' of thought you subscribe to. It basically amounts to 'hand waving' with no solution because these schools are defined by different fundamental assumptions that can not be challenged. Ultimately none are correct because we never know the true situation - if we did we would not be doing it. > In econometrics, there is an extensive literature on this, and some > cases like structural change tests with unknown change points I know > pretty well. > > The main point that I wanted to make is, that multiple change tests > need more attention and at least a warning in the docstring which > (raw) p-values are reported, since it is easy for unwary users to > misinterpret the reported p-values. But hopefully this could be > extended to provide the user with options to do an appropriate > correction. > > Josef > This is pointless because you are misunderstanding what is meant by 'multiple test correction'. Placing those kinds of statements in the wrong places also reflects ignorance especially when the correct value maybe given and there is no 'appropriate' correction possible. Further no statement is ever going to protect users from misinterpreting p-values. Bruce > > >> with an iterator (numpy mailinglist), my version for the conditional >> independence of the last two variables for all values of the earlier >> variables looks like >> >> for ind in allbut2ax_iterator(table3, axes=(-2,-1)): >> print chisquare_contingency(table3[ind]) >> >> Josef >> >> >> >> A link: >> http://article.gmane.org/gmane.comp.python.numeric.general/38352 >> >> I would have to see. >> >> Bruce >> >> Initially, I was thinking just about independence of all variables in >> a 3 or more way table, i.e. P(x,y,z)=P(x)*P(y)*P(z) >> >> My opinion is that these variations of tests would fit better in a >> class where all pairwise conditional, and marginal and joint >> hypotheses can be supplied as methods, or split it up into a group of >> functions. >> >> Thanks, >> >> Josef >> >> >> >> The data used is from two SAS examples and I added a dimension to get a >> 4-way table. I included the SAS values but these are only to 4 decimal >> places for reference. >> >> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect029.htm >> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect030.htm >> >> What is missing: >> 1) Docstring and tests but those are dependent what is ultimately decided >> 2) Other test statistics but scipy.stats versions are not very friendly in >> that these do not accept a 2-d array >> 3) A way to do recursion >> 4) Ability to label the levels etc. >> 5) Correct handling of input types. >> >> Bruce >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From ralf.gommers at googlemail.com Mon Jun 7 12:56:10 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 8 Jun 2010 00:56:10 +0800 Subject: [SciPy-Dev] ANN: SciPy 0.8.0 beta 1 In-Reply-To: References: Message-ID: On Mon, Jun 7, 2010 at 10:20 PM, Charles R Harris wrote: > > > On Mon, Jun 7, 2010 at 5:55 AM, Ralf Gommers wrote: > >> >> >> On Mon, Jun 7, 2010 at 7:34 PM, Neil Crighton wrote: >> >>> Ralf Gommers googlemail.com> writes: >>> >>> > I'm pleased to announce the first beta release of SciPy >>> > 0.8.0.SciPy is a package of tools for science and engineering >>> > for Python.It includes modules for statistics, optimization, >>> > integration, linearalgebra, Fourier transforms, signal and >>> > image processing, ODE solvers, and more.This beta release comes >>> > almost one and a half year after the 0.7.0 release andcontains >>> > many new features, numerous bug-fixes, improved testcoverage, >>> > and better documentation. Please note that SciPy 0.8.0b1 >>> > requires Python 2.4 or greater and NumPy 1.4.1 or greater. >>> >>> Thanks for getting the beta out! >>> >>> The release notes say Numpy 1.3 or greater is needed - is this >>> correct? Above you say 1.4.1 is needed. >> >> >> No, 1.4.1 is needed. Notes are fixed now. >> >> >>> I think "support for >>> Python 3 in Scipy might not yet be included in Scipy 0.8" is too >>> ambiguous. Just say 0.8 will not be compatible with Python 3, but >>> we expect the next version (0.9?) to be compatible, if that's the >>> case. >>> >>> Reworded as: >> "Python 3 compatibility is planned and is currently technically >> feasible, since Numpy has been ported. However, since the Python 3 >> compatible Numpy 2.0 has not been released yet, support for Python 3 >> in Scipy is not yet included in Scipy 0.8. SciPy 0.9, planned for fall >> 2010, will very likely include experimental support for Python 3." >> >> > Are we going to release a Numpy 1.5? > Yes. Guess I should reread such a paragraph a few times before committing. The only reason I've not made a 1.5 branch yet is I will only have time for a numpy release cycle at or towards the end of this scipy release cycle. Saves some backporting. If you think it'd be useful to do it now please let me know. > Also, the beta release should be noted on the SciPy home page. > Done. For the previous releases I put only the final release there. Maybe good to announce beta/rc releases but then just update the announcement instead of adding new items each time. It's a small sidebar after all. > Maybe adding such notes needs to be part of the how-to-release checklist > since it tends to be forgotten. > The scipy.org announcement is in there. I'll add a "check the release notes for ..." item. Here is my checklist of things to do before 0.8.0rc1. I'm traveling (without computer) for the next week, so if anyone wants to tackle any of these items, that would be be very helpful. - remove stuff in scipy.io as explained in 0.7.0 release notes - fix paver dmg task to include docs - add sphinxext to tarballs? - check numscons works - SSE instruction ticket: http://projects.scipy.org/scipy/ticket/1170 - linalg.qr: http://projects.scipy.org/scipy/ticket/243 - windows crash: http://projects.scipy.org/scipy/ticket/1102 - invalid 2.6 syntax: http://projects.scipy.org/scipy/ticket/1193 Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jun 7 14:30:57 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 7 Jun 2010 14:30:57 -0400 Subject: [SciPy-Dev] chi-square test for a contingency (R x C) table In-Reply-To: <4C0D2213.7020302@gmail.com> References: <4C05DDF3.9010206@enthought.com> <4C064D79.4030106@wartburg.edu> <4C06807A.40301@gmail.com> <4C06861C.1060401@wartburg.edu> <4C069E84.4020308@gmail.com> <4C06A059.6020901@wartburg.edu> <4C06B8FB.8080806@gmail.com> <4C07ADC1.6040504@enthought.com> <4C0932FC.2020108@gmail.com> <4C0D0993.1080206@gmail.com> <4C0D2213.7020302@gmail.com> Message-ID: On Mon, Jun 7, 2010 at 12:45 PM, Bruce Southey wrote: > On 06/07/2010 10:45 AM, josef.pktd at gmail.com wrote: >> On Mon, Jun 7, 2010 at 11:00 AM, Bruce Southey ?wrote: >> >>> On 06/07/2010 09:15 AM, josef.pktd at gmail.com wrote: >>> >>> On Fri, Jun 4, 2010 at 2:12 PM, ?wrote: >>> >>> >>> On Fri, Jun 4, 2010 at 1:08 PM, Bruce Southey ?wrote: >>> >>> >>> On 06/03/2010 08:27 AM, Warren Weckesser wrote: >>> >>> >>> Just letting you know that I'm not ignoring all the great comments from >>> josef, Neil and Bruce about my suggestion for chisquare_contingency. >>> Unfortunately, I won't have time to think about all the deeper >>> suggestions for another week or so. ? For now, I'll just say that I >>> agree with josef's and Neil's suggestions for the docstring, and that >>> Neil's summary of the function as simply a convenience function that >>> calls stats.chisquare with appropriate arguments to perform a test of >>> independence on a contingency table is exactly what I had in mind. >>> >>> Warren >>> >>> >>> >>> >>> >>> Hi, >>> I looked at how SAS handles n-way tables. What it appears to do is break the >>> original table down into a set of 2-way tables and does the analysis on each >>> of these. So a 3 by 4 by 5 table is processed as three 2-way tables with the >>> results of each 4 by 5 table presented. I do not know how Stata and R >>> analysis analyze n-way tables. >>> >>> Consequently, I rewrote my suggested code (attached) to handle 3 and 4 way >>> tables by using recursion. There should be some Python way to do that >>> recursion for any number of dimensions. I also added the 1-way table (but >>> that has a different hypothesis than the 2-way table) so users can send a >>> 1-d table. >>> >>> >>> (very briefly because I don't have much time today) >>> >>> I think, these are good extensions, but to handle all cases, the >>> function is getting too large and would need several options. >>> >>> On your code and SAS, Z(correct me if my quick reading is wrong) >>> You seem to be calculating conditional independence for the last two >>> variables conditional on the values of the first variables. I think >>> this could be generalized to all pairwise independence tests. >>> >>> Similar, I'm a bit surprised that SAS uses conditional and not >>> marginal independence, I would have thought that the test for marginal >>> independence (aggregate out all but 2 variables) would be the more >>> common use case. >>> >>> >>> You can argue SAS's formulation relates to how the table is constructed >>> because the hypothesis associated with the table is dependent on how the >>> user constructs it. For example, the 3-way table A by (B by C) is very >>> different from the 3-way table C by (B by A) yet these involve the same >>> underlying numbers. If a user did not specify an order then considering all >>> possible hypotheses is an option. >>> >> I don't know the SAS notation, what I thought in analogy to regression >> analysis, is that if one variable is considered as endogenous, then >> only pairwise tests with this variable need to be included. >> > This is not the same as regression for multiple reasons. Here we are > testing independence without any distribution assumption associated with > the actual data. (Of course under the normality assumption then these > are the same. ) > >> >>> Really log-linear models are a better approach to analysis n-way tables >>> because these allow you to examine all these different hypotheses. >>> >>> just some more questions and comments (until I have time to check this) >>> >>> looking at conditional independence looks similar to linear regression >>> models, where the effect of other variables is taken out. However, >>> looking at all chisquare tests (conditional on all possible other >>> values) runs into the multiple test problem. Is the some kind of >>> post-hoc or Bonferroni correction or is there a distribution for eg. >>> the max of all chisquare test statistics. >>> >>> >>> Ignoring my views on this, first 'multiple test problems' do not change the >>> probability calculation for most approaches to compute the 'raw' p-value as >>> the vast majority of the approaches require the 'raw' p-value. >>> >>> Second, it is very easy to say 'correct for multiple tests' but that is pure >>> ignorance when 'what' you are correcting is for is not stated. If you are >>> correcting the 'family-wise error rate' then you need to correctly define >>> 'family-wise' in this situation especially to address at least one other >>> assumption being made. >>> >> I know nothing about this in the context of contingency tables. > In a 2-way table there is no need for any correction so it is pointless > to say 'correct for multiple tests'. In a 3-way or higher table, as you > indicated, is essentially a test of conditional independence as I > implemented it. It is also pointless to say 'correct for multiple tests' > because you are first assuming conditional independence between say A by > B given C=1 and A by B for C=2. So what happens when C=1 is independent > of when C=2 so these do belong to different 'families'. Second, there is > nothing said about the relation of either A ?or B with C - which may be > a more critical problem. > >> We >> recently had the discussion about multiple tests in the context of >> post-hoc tests for anova, where I had to read up. >> > I am perhaps too aware of multiple testing and unfortunately these types > of discussions go on and on and on. A lot depends on which of many > 'schools' of thought you subscribe to. It basically amounts to 'hand > waving' ?with no solution because these schools are defined by different > fundamental ?assumptions that can not be challenged. Ultimately none are > correct because we never know the true situation - if we did we would > not be doing it. I think it depends on the hypothesis and the general statistical theory is relatively clear, but maybe some people prefer a "test-mining" approach. >> In econometrics, there is an extensive literature on this, and some >> cases like structural change tests with unknown change points I know >> pretty well. >> >> The main point that I wanted to make is, that multiple change tests >> need more attention and at least a warning in the docstring which >> (raw) p-values are reported, since it is easy for unwary users to >> misinterpret the reported p-values. But hopefully this could be >> extended to provide the user with options to do an appropriate >> correction. >> >> Josef >> > This is pointless because you are misunderstanding what is meant by > 'multiple test correction'. ??? > Placing those kinds of statements in the > wrong places also reflects ignorance especially when the correct value > maybe given and there is no 'appropriate' correction possible. Further > no statement is ever going to protect users from misinterpreting p-values. Doing a quick search on the recent literature, it seems there is a lot going on in doing proper multiple test correction, additional to more traditional tests, that I haven't tried you to really understand or where I don't know how well they generalize, e.g. (generalized) Cochran-Mantel-Haenszel Chi-Squared Test, Cochran?s Q test. I only read the abstract of this: http://jnci.oxfordjournals.org/cgi/content/abstract/99/2/147 "Twenty-one (50%) of them contained at least one of the following three basic flaws: 1) in outcome-related gene finding, an unstated, unclear, or inadequate control for multiple testing; 2) ....." Josef > > Bruce > > >> >> >>> with an iterator (numpy mailinglist), my version for the conditional >>> independence of the last two variables for all values of the earlier >>> variables looks like >>> >>> for ind in allbut2ax_iterator(table3, axes=(-2,-1)): >>> ? ? ?print chisquare_contingency(table3[ind]) >>> >>> Josef >>> >>> >>> >>> A link: >>> http://article.gmane.org/gmane.comp.python.numeric.general/38352 >>> >>> I would have to see. >>> >>> Bruce >>> >>> Initially, I was thinking just about independence of all variables in >>> a 3 or more way table, i.e. P(x,y,z)=P(x)*P(y)*P(z) >>> >>> My opinion is that these variations of tests would fit better in a >>> class where all pairwise conditional, and marginal and joint >>> hypotheses can be supplied as methods, or split it up into a group of >>> functions. >>> >>> Thanks, >>> >>> Josef >>> >>> >>> >>> The data used is from two SAS examples and I added a dimension to get a >>> 4-way table. I included the SAS values but these are only to 4 decimal >>> places for reference. >>> >>> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect029.htm >>> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect030.htm >>> >>> What is missing: >>> 1) Docstring and tests but those are dependent what is ultimately decided >>> 2) Other test statistics but scipy.stats versions are not very friendly in >>> that these do not accept a 2-d array >>> 3) A way to do recursion >>> 4) Ability to label the levels etc. >>> 5) Correct handling of input types. >>> >>> Bruce >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From david at silveregg.co.jp Mon Jun 7 21:03:51 2010 From: david at silveregg.co.jp (David) Date: Tue, 08 Jun 2010 10:03:51 +0900 Subject: [SciPy-Dev] ANN: SciPy 0.8.0 beta 1 In-Reply-To: References: Message-ID: <4C0D96F7.1000406@silveregg.co.jp> On 06/08/2010 01:56 AM, Ralf Gommers wrote: > > > On Mon, Jun 7, 2010 at 10:20 PM, Charles R Harris > > wrote: > > > > On Mon, Jun 7, 2010 at 5:55 AM, Ralf Gommers > > > wrote: > > > > On Mon, Jun 7, 2010 at 7:34 PM, Neil Crighton > > wrote: > > Ralf Gommers googlemail.com > > writes: > > > I'm pleased to announce the first beta release of SciPy > > 0.8.0.SciPy is a package of tools for science and engineering > > for Python.It includes modules for statistics, optimization, > > integration, linearalgebra, Fourier transforms, signal and > > image processing, ODE solvers, and more.This beta release > comes > > almost one and a half year after the 0.7.0 release > andcontains > > many new features, numerous bug-fixes, improved testcoverage, > > and better documentation. Please note that SciPy 0.8.0b1 > > requires Python 2.4 or greater and NumPy 1.4.1 or greater. > > Thanks for getting the beta out! > > The release notes say Numpy 1.3 or greater is needed - is this > correct? Above you say 1.4.1 is needed. > > > No, 1.4.1 is needed. Notes are fixed now. > > I think "support for > Python 3 in Scipy might not yet be included in Scipy 0.8" is too > ambiguous. Just say 0.8 will not be compatible with Python > 3, but > we expect the next version (0.9?) to be compatible, if > that's the > case. > > Reworded as: > "Python 3 compatibility is planned and is currently technically > feasible, since Numpy has been ported. However, since the Python 3 > compatible Numpy 2.0 has not been released yet, support for Python 3 > in Scipy is not yet included in Scipy 0.8. SciPy 0.9, planned > for fall > 2010, will very likely include experimental support for Python 3." > > > Are we going to release a Numpy 1.5? > > > Yes. Guess I should reread such a paragraph a few times before committing. > > The only reason I've not made a 1.5 branch yet is I will only have time > for a numpy release cycle at or towards the end of this scipy release > cycle. Saves some backporting. If you think it'd be useful to do it now > please let me know. I don't think we should make the 1.5 branch now - there is a lot of things missing, and I would really like to put everything that is needed for python 3.x support in scipy in the 1.5 release. And AFAIK, we have not cleaned up the branch to make it ABI compatible with 1.4.x. I can't give a hard timeline, but I hope to have some time during euroscipy, cheers, David From ralf.gommers at googlemail.com Mon Jun 7 21:10:58 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 8 Jun 2010 09:10:58 +0800 Subject: [SciPy-Dev] ANN: SciPy 0.8.0 beta 1 In-Reply-To: <4C0D96F7.1000406@silveregg.co.jp> References: <4C0D96F7.1000406@silveregg.co.jp> Message-ID: On Tue, Jun 8, 2010 at 9:03 AM, David wrote: > On 06/08/2010 01:56 AM, Ralf Gommers wrote: > > > > > > On Mon, Jun 7, 2010 at 10:20 PM, Charles R Harris > > > wrote: > > > > > > > > On Mon, Jun 7, 2010 at 5:55 AM, Ralf Gommers > > > > > wrote: > > > > > > > > On Mon, Jun 7, 2010 at 7:34 PM, Neil Crighton > > > wrote: > > > > Ralf Gommers googlemail.com > > > writes: > > > > > I'm pleased to announce the first beta release of SciPy > > > 0.8.0.SciPy is a package of tools for science and > engineering > > > for Python.It includes modules for statistics, > optimization, > > > integration, linearalgebra, Fourier transforms, signal and > > > image processing, ODE solvers, and more.This beta release > > comes > > > almost one and a half year after the 0.7.0 release > > andcontains > > > many new features, numerous bug-fixes, improved > testcoverage, > > > and better documentation. Please note that SciPy 0.8.0b1 > > > requires Python 2.4 or greater and NumPy 1.4.1 or greater. > > > > Thanks for getting the beta out! > > > > The release notes say Numpy 1.3 or greater is needed - is > this > > correct? Above you say 1.4.1 is needed. > > > > > > No, 1.4.1 is needed. Notes are fixed now. > > > > I think "support for > > Python 3 in Scipy might not yet be included in Scipy 0.8" is > too > > ambiguous. Just say 0.8 will not be compatible with Python > > 3, but > > we expect the next version (0.9?) to be compatible, if > > that's the > > case. > > > > Reworded as: > > "Python 3 compatibility is planned and is currently technically > > feasible, since Numpy has been ported. However, since the Python > 3 > > compatible Numpy 2.0 has not been released yet, support for > Python 3 > > in Scipy is not yet included in Scipy 0.8. SciPy 0.9, planned > > for fall > > 2010, will very likely include experimental support for Python > 3." > > > > > > Are we going to release a Numpy 1.5? > > > > > > Yes. Guess I should reread such a paragraph a few times before > committing. > > > > The only reason I've not made a 1.5 branch yet is I will only have time > > for a numpy release cycle at or towards the end of this scipy release > > cycle. Saves some backporting. If you think it'd be useful to do it now > > please let me know. > > I don't think we should make the 1.5 branch now - there is a lot of > things missing, and I would really like to put everything that is needed > for python 3.x support in scipy in the 1.5 release. Threads are mixing a bit, but we're talking about numpy here. I thought numpy 3.x support was pretty much finished? > And AFAIK, we have > not cleaned up the branch to make it ABI compatible with 1.4.x. > > That should be done after making the branch right? If you remove datetime in trunk you're just going to have to put it back later. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Mon Jun 7 21:15:10 2010 From: david at silveregg.co.jp (David) Date: Tue, 08 Jun 2010 10:15:10 +0900 Subject: [SciPy-Dev] ANN: SciPy 0.8.0 beta 1 In-Reply-To: References: <4C0D96F7.1000406@silveregg.co.jp> Message-ID: <4C0D999E.9040704@silveregg.co.jp> On 06/08/2010 10:10 AM, Ralf Gommers wrote: > > > On Tue, Jun 8, 2010 at 9:03 AM, David > wrote: > > On 06/08/2010 01:56 AM, Ralf Gommers wrote: > > > > > > On Mon, Jun 7, 2010 at 10:20 PM, Charles R Harris > > > >> wrote: > > > > > > > > On Mon, Jun 7, 2010 at 5:55 AM, Ralf Gommers > > > >> > > wrote: > > > > > > > > On Mon, Jun 7, 2010 at 7:34 PM, Neil Crighton > > > >> wrote: > > > > Ralf Gommers googlemail.com > > > > writes: > > > > > I'm pleased to announce the first beta release of SciPy > > > 0.8.0.SciPy is a package of tools for science and engineering > > > for Python.It includes modules for statistics, optimization, > > > integration, linearalgebra, Fourier transforms, signal and > > > image processing, ODE solvers, and more.This beta release > > comes > > > almost one and a half year after the 0.7.0 release > > andcontains > > > many new features, numerous bug-fixes, improved testcoverage, > > > and better documentation. Please note that SciPy 0.8.0b1 > > > requires Python 2.4 or greater and NumPy 1.4.1 or greater. > > > > Thanks for getting the beta out! > > > > The release notes say Numpy 1.3 or greater is needed > - is this > > correct? Above you say 1.4.1 is needed. > > > > > > No, 1.4.1 is needed. Notes are fixed now. > > > > I think "support for > > Python 3 in Scipy might not yet be included in Scipy > 0.8" is too > > ambiguous. Just say 0.8 will not be compatible with > Python > > 3, but > > we expect the next version (0.9?) to be compatible, if > > that's the > > case. > > > > Reworded as: > > "Python 3 compatibility is planned and is currently technically > > feasible, since Numpy has been ported. However, since the > Python 3 > > compatible Numpy 2.0 has not been released yet, support > for Python 3 > > in Scipy is not yet included in Scipy 0.8. SciPy 0.9, > planned > > for fall > > 2010, will very likely include experimental support for > Python 3." > > > > > > Are we going to release a Numpy 1.5? > > > > > > Yes. Guess I should reread such a paragraph a few times before > committing. > > > > The only reason I've not made a 1.5 branch yet is I will only > have time > > for a numpy release cycle at or towards the end of this scipy release > > cycle. Saves some backporting. If you think it'd be useful to do > it now > > please let me know. > > I don't think we should make the 1.5 branch now - there is a lot of > things missing, and I would really like to put everything that is needed > for python 3.x support in scipy in the 1.5 release. > > > Threads are mixing a bit, but we're talking about numpy here. I thought > numpy 3.x support was pretty much finished? Yes, but to make scipy compatible with 3.x, it is easier to add some stuff in numpy.distutils, etc... for scipy. > > And AFAIK, we have > not cleaned up the branch to make it ABI compatible with 1.4.x. > > That should be done after making the branch right? If you remove > datetime in trunk you're just going to have to put it back later. But we agreed to remove it, right ? What if we decide to have a 1.6, etc... ? cheers, David From charlesr.harris at gmail.com Mon Jun 7 21:20:22 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 7 Jun 2010 19:20:22 -0600 Subject: [SciPy-Dev] ANN: SciPy 0.8.0 beta 1 In-Reply-To: <4C0D999E.9040704@silveregg.co.jp> References: <4C0D96F7.1000406@silveregg.co.jp> <4C0D999E.9040704@silveregg.co.jp> Message-ID: On Mon, Jun 7, 2010 at 7:15 PM, David wrote: > On 06/08/2010 10:10 AM, Ralf Gommers wrote: > > > > > > On Tue, Jun 8, 2010 at 9:03 AM, David > > wrote: > > > > On 06/08/2010 01:56 AM, Ralf Gommers wrote: > > > > > > > > > On Mon, Jun 7, 2010 at 10:20 PM, Charles R Harris > > > > > > >> wrote: > > > > > > > > > > > > On Mon, Jun 7, 2010 at 5:55 AM, Ralf Gommers > > > > > > >> > > > wrote: > > > > > > > > > > > > On Mon, Jun 7, 2010 at 7:34 PM, Neil Crighton > > > > > >> > wrote: > > > > > > Ralf Gommers googlemail.com > > > > > > writes: > > > > > > > I'm pleased to announce the first beta release of SciPy > > > > 0.8.0.SciPy is a package of tools for science and engineering > > > > for Python.It includes modules for statistics, optimization, > > > > integration, linearalgebra, Fourier transforms, signal and > > > > image processing, ODE solvers, and more.This beta release > > > comes > > > > almost one and a half year after the 0.7.0 release > > > andcontains > > > > many new features, numerous bug-fixes, improved testcoverage, > > > > and better documentation. Please note that SciPy 0.8.0b1 > > > > requires Python 2.4 or greater and NumPy 1.4.1 or greater. > > > > > > Thanks for getting the beta out! > > > > > > The release notes say Numpy 1.3 or greater is needed > > - is this > > > correct? Above you say 1.4.1 is needed. > > > > > > > > > No, 1.4.1 is needed. Notes are fixed now. > > > > > > I think "support for > > > Python 3 in Scipy might not yet be included in Scipy > > 0.8" is too > > > ambiguous. Just say 0.8 will not be compatible with > > Python > > > 3, but > > > we expect the next version (0.9?) to be compatible, if > > > that's the > > > case. > > > > > > Reworded as: > > > "Python 3 compatibility is planned and is currently technically > > > feasible, since Numpy has been ported. However, since the > > Python 3 > > > compatible Numpy 2.0 has not been released yet, support > > for Python 3 > > > in Scipy is not yet included in Scipy 0.8. SciPy 0.9, > > planned > > > for fall > > > 2010, will very likely include experimental support for > > Python 3." > > > > > > > > > Are we going to release a Numpy 1.5? > > > > > > > > > Yes. Guess I should reread such a paragraph a few times before > > committing. > > > > > > The only reason I've not made a 1.5 branch yet is I will only > > have time > > > for a numpy release cycle at or towards the end of this scipy > release > > > cycle. Saves some backporting. If you think it'd be useful to do > > it now > > > please let me know. > > > > I don't think we should make the 1.5 branch now - there is a lot of > > things missing, and I would really like to put everything that is > needed > > for python 3.x support in scipy in the 1.5 release. > > > > > > Threads are mixing a bit, but we're talking about numpy here. I thought > > numpy 3.x support was pretty much finished? > > Yes, but to make scipy compatible with 3.x, it is easier to add some > stuff in numpy.distutils, etc... for scipy. > > > > > And AFAIK, we have > > not cleaned up the branch to make it ABI compatible with 1.4.x. > > > > That should be done after making the branch right? If you remove > > datetime in trunk you're just going to have to put it back later. > > But we agreed to remove it, right ? What if we decide to have a 1.6, > etc... ? > > I've been toying with the idea that the trunk should be branched, with one branch for the datetime and other API changes and another that is compatible with 1.4, 1.5, etc. When the changes are ready, they can then be merged back in. Of course, this will all be easier when the GIT transition is finished. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Tue Jun 8 04:55:25 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Tue, 08 Jun 2010 10:55:25 +0200 Subject: [SciPy-Dev] ERROR: Failure: SyntaxError (invalid syntax (test_distributions.py, line 391) Message-ID: Hi all, I am using >>> numpy.__version__ '2.0.0.dev8460' >>> import scipy >>> scipy.__version__ '0.9.0.dev6493' and I found some (new) errors ====================================================================== ERROR: test_continuous_basic.test_cont_basic(, (), 'wald') ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/case.py", line 183, in runTest self.test(*self.arg) File "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/tests/test_continuous_basic.py", line 291, in check_cdf_ppf npt.assert_almost_equal(distfn.cdf(distfn.ppf([0.001,0.5,0.999], *arg), *arg), File "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/distributions.py", line 1324, in ppf place(output,cond,self._ppf(*goodargs)*scale + loc) File "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/distributions.py", line 1028, in _ppf return self.vecfunc(q,*args) File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/function_base.py", line 1794, in __call__ theout = self.thefunc(*newargs) File "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/distributions.py", line 974, in _ppf_single_call return optimize.brentq(self._ppf_to_solve, self.xa, self.xb, args=(q,)+args, xtol=self.xtol) File "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/optimize/zeros.py", line 262, in brentq r = _zeros._brentq(f,a,b,xtol,maxiter,args,full_output,disp) ValueError: f(a) and f(b) must have different signs ====================================================================== ERROR: Failure: SyntaxError (invalid syntax (test_distributions.py, line 391)) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/tests/test_distributions.py", line 391 res = distfunc.rvs(*args, size=200) ^ SyntaxError: invalid syntax ====================================================================== ERROR: test_mpmath.test_expi_complex ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/case.py", line 183, in runTest self.test(*self.arg) File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/decorators.py", line 146, in skipper_func return f(*args, **kwargs) File "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/special/tests/test_mpmath.py", line 46, in test_expi_complex dataset = np.array(dataset, dtype=np.complex_) TypeError: a float is required Nils From bsouthey at gmail.com Tue Jun 8 11:40:59 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 08 Jun 2010 10:40:59 -0500 Subject: [SciPy-Dev] ANN: SciPy 0.8.0 beta 1 In-Reply-To: References: <4C0D96F7.1000406@silveregg.co.jp> Message-ID: <4C0E648B.10204@gmail.com> Hi, I got 2 errors and 1 failure when I installed the beta using Python 2.6 (Linux 64-bit) with numpy '2.0.0.dev8445' . Can we get a fix for ticket 1152 or at least mark it as known? http://projects.scipy.org/scipy/ticket/1152 The others are below. There are also a number of overflow warnings that should be checked and avoided. The same warnings also occur in test_continuous_basic for certain distributions. test_iv_cephes_vs_amos (test_basic.TestBessel) ... Warning: overflow encountered in iv Warning: overflow encountered in iv Warning: invalid value encountered in isinf Should these be tickets? Bruce ====================================================================== ERROR: Ticket #1124. ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/scipy/signal/tests/test_signaltools.py", line 287, in test_none signal.medfilt(None) File "/usr/lib64/python2.6/site-packages/scipy/signal/signaltools.py", line 317, in medfilt return sigtools._order_filterND(volume,domain,order) ValueError: order_filterND not available for this type ====================================================================== FAIL: test_random_real (test_basic.TestSingleIFFT) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/scipy/fftpack/tests/test_basic.py", line 205, in test_random_real assert_array_almost_equal (y1, x) File "/usr/lib64/python2.6/site-packages/numpy/testing/utils.py", line 774, in assert_array_almost_equal header='Arrays are not almost equal') File "/usr/lib64/python2.6/site-packages/numpy/testing/utils.py", line 618, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not almost equal (mismatch 0.900900900901%) x: array([ 0.41364330 +5.90676663e-09j, 0.80715483 +2.64462052e-08j, 0.05271048 -3.67830459e-08j, 0.72591031 -9.31092980e-09j, 0.35162351 +1.40012923e-09j, 0.17632297 -1.25899486e-08j,... y: array([ 0.41364321, 0.80715483, 0.05271063, 0.72591019, 0.35162321, 0.17632306, 0.3850981 , 0.75712842, 0.68898875, 0.52632052, 0.69728118, 0.68721569, 0.69135427, 0.34033701, 0.65788335,... >> raise AssertionError('\nArrays are not almost equal\n\n(mismatch 0.900900900901%)\n x: array([ 0.41364330 +5.90676663e-09j, 0.80715483 +2.64462052e-08j,\n 0.05271048 -3.67830459e-08j, 0.72591031 -9.31092980e-09j,\n 0.35162351 +1.40012923e-09j, 0.17632297 -1.25899486e-08j,...\n y: array([ 0.41364321, 0.80715483, 0.05271063, 0.72591019, 0.35162321,\n 0.17632306, 0.3850981 , 0.75712842, 0.68898875, 0.52632052,\n 0.69728118, 0.68721569, 0.69135427, 0.34033701, 0.65788335,...') ---------------------------------------------------------------------- From stefan.czesla at hs.uni-hamburg.de Tue Jun 8 11:51:33 2010 From: stefan.czesla at hs.uni-hamburg.de (Stefan) Date: Tue, 8 Jun 2010 15:51:33 +0000 (UTC) Subject: [SciPy-Dev] np.savetxt: apply patch in enhancement ticket 1079 to add headers? References: <4C066DA3.8010609@gmail.com> <4C09410F.1010900@gmail.com> Message-ID: Hi all, dear Bruce and Skipper, we very much appreciate your feedback. In response to Skipper's annotation we added a paragraph in the notes section and also tried to indicate the purpose of the keywords more precisely in the parameter section. The keyword renaming suggested by Bruce lead to some internal discussions here. We also were not 100% satisfied with the 'comments-comment_character' solution proposed in the first patch, and we see the conflict with loadtxt. Yet, also the combination of 'Preamble-Comments' appears, somewhat, awkward, because both seem to indicate the same, at least in our opinion. We appreciate Bruce's suggestion to call the keyword Preamble, because it expresses its purpose much more clearly than 'Comments' did. For the same reason, we decided to stay with 'comment_character' instead of 'Comments'. For the sake of clarity, this solution sacrifices full compatibility with np.loadtxt, but it does not create a conflict either. An adapted patch is available via ticket 1079 at: http://projects.scipy.org/numpy/ticket/1079 Christian & Stefan From charlesr.harris at gmail.com Tue Jun 8 11:57:04 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 8 Jun 2010 09:57:04 -0600 Subject: [SciPy-Dev] ANN: SciPy 0.8.0 beta 1 In-Reply-To: <4C0E648B.10204@gmail.com> References: <4C0D96F7.1000406@silveregg.co.jp> <4C0E648B.10204@gmail.com> Message-ID: On Tue, Jun 8, 2010 at 9:40 AM, Bruce Southey wrote: > Hi, > I got 2 errors and 1 failure when I installed the beta using Python 2.6 > (Linux 64-bit) with numpy '2.0.0.dev8445' . > > Can we get a fix for ticket 1152 or at least mark it as known? > http://projects.scipy.org/scipy/ticket/1152 > > The others are below. > > There are also a number of overflow warnings that should be checked and > avoided. The same warnings also occur in test_continuous_basic for > certain distributions. > test_iv_cephes_vs_amos (test_basic.TestBessel) ... Warning: overflow > encountered in iv > Warning: overflow encountered in iv > Warning: invalid value encountered in isinf > > Numpy revision r8455 fixes the isinf warnings for most platforms. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilanschnell at gmail.com Tue Jun 8 13:51:01 2010 From: ilanschnell at gmail.com (Ilan Schnell) Date: Tue, 8 Jun 2010 12:51:01 -0500 Subject: [SciPy-Dev] ANN: EPD 6.2 released Message-ID: Hello, I am pleased to announce that EPD (Enthought Python Distribution) version 6.2 has been released. This release includes an update to Python 2.6.5, SciPy 0.8.0beta1, as well updates to many other packages and bug fixes. You can find a complete list of updates in the change log: http://www.enthought.com/EPDChangelog.html To find more information about EPD, as well as download a 30 day free trial, visit this page: http://www.enthought.com/products/epd.php In order to be able to serve the Python community better, we made a small survey. Please consider taking a few minutes: http://www.surveygizmo.com/s/307237/epd-user-feedback About EPD --------- The Enthought Python Distribution (EPD) is a "kitchen-sink-included" distribution of the Python Programming Language, including over 80 additional tools and libraries. The EPD bundle includes NumPy, SciPy, IPython, 2D and 3D visualization, and many other tools. http://www.enthought.com/products/epdlibraries.php It is currently available as a single-click installer for Windows XP, Vista and 7, MacOS (10.5 and 10.6), RedHat 3, 4 and 5, as well as Solaris 10 (x86 and x86_64/amd64 on all platforms). The 32-bit EPD is free for academic use. An annual subscription including installation support is available for individual and commercial use. Additional support options, including customization, bug fixes and training classes are also available: http://www.enthought.com/products/support_level_table.php - Ilan From josef.pktd at gmail.com Tue Jun 8 15:38:59 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 8 Jun 2010 15:38:59 -0400 Subject: [SciPy-Dev] ERROR: Failure: SyntaxError (invalid syntax (test_distributions.py, line 391) In-Reply-To: References: Message-ID: On Tue, Jun 8, 2010 at 4:55 AM, Nils Wagner wrote: > Hi all, > > I am using > >>>> numpy.__version__ > '2.0.0.dev8460' >>>> import scipy >>>> scipy.__version__ > '0.9.0.dev6493' > > and I found some (new) errors > > ====================================================================== > ERROR: > test_continuous_basic.test_cont_basic( object at 0x4cb5c90>, (), 'wald') > ---------------------------------------------------------------------- > Traceback (most recent call last): > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/case.py", > line 183, in runTest > ? ? self.test(*self.arg) > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/tests/test_continuous_basic.py", > line 291, in check_cdf_ppf > ? ? npt.assert_almost_equal(distfn.cdf(distfn.ppf([0.001,0.5,0.999], > *arg), *arg), > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/distributions.py", > line 1324, in ppf > ? ? place(output,cond,self._ppf(*goodargs)*scale + loc) > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/distributions.py", > line 1028, in _ppf > ? ? return self.vecfunc(q,*args) > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/function_base.py", > line 1794, in __call__ > ? ? theout = self.thefunc(*newargs) > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/distributions.py", > line 974, in _ppf_single_call > ? ? return optimize.brentq(self._ppf_to_solve, self.xa, > self.xb, args=(q,)+args, xtol=self.xtol) > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/optimize/zeros.py", > line 262, in brentq > ? ? r = > _zeros._brentq(f,a,b,xtol,maxiter,args,full_output,disp) > ValueError: f(a) and f(b) must have different signs looking at changeset 6472, it looks like there are two possible errors using logcdf instead of cdf and switching to the internal method (underline) which might not do correct bounds handling (but I'm not sure about the latter) 4289 return invnorm.cdf(x,1,0) 4291 return invnorm._logcdf(x, 1.0) reverting this line, I guess, fixes it > > ====================================================================== > ERROR: Failure: SyntaxError (invalid syntax > (test_distributions.py, line 391)) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/loader.py", > line 379, in loadTestsFromName > ? ? addr.filename, addr.module) > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/importer.py", > line 39, in importFromPath > ? ? return self.importFromDir(dir_path, fqname) > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/importer.py", > line 86, in importFromDir > ? ? mod = load_module(part_fqname, fh, filename, desc) > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/tests/test_distributions.py", > line 391 > ? ? res = distfunc.rvs(*args, size=200) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ > SyntaxError: invalid syntax I think here the arguments need to be reversed res = distfunc.rvs(size=200, *args) Josef > > ====================================================================== > ERROR: test_mpmath.test_expi_complex > ---------------------------------------------------------------------- > Traceback (most recent call last): > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/case.py", > line 183, in runTest > ? ? self.test(*self.arg) > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/decorators.py", > line 146, in skipper_func > ? ? return f(*args, **kwargs) > ? File > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/special/tests/test_mpmath.py", > line 46, in test_expi_complex > ? ? dataset = np.array(dataset, dtype=np.complex_) > TypeError: a float is required > > Nils > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From oliphant at enthought.com Wed Jun 9 00:29:25 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 8 Jun 2010 23:29:25 -0500 Subject: [SciPy-Dev] ERROR: Failure: SyntaxError (invalid syntax (test_distributions.py, line 391) In-Reply-To: References: Message-ID: <3520C593-6833-4071-8CC1-85D44C7A12FF@enthought.com> On Jun 8, 2010, at 3:55 AM, Nils Wagner wrote: > Hi all, > > I am using > >>>> numpy.__version__ > '2.0.0.dev8460' >>>> import scipy >>>> scipy.__version__ > '0.9.0.dev6493' > > and I found some (new) errors > > ====================================================================== > ERROR: > test_continuous_basic.test_cont_basic( object at 0x4cb5c90>, (), 'wald') > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/case.py", > line 183, in runTest > self.test(*self.arg) > File > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/tests/test_continuous_basic.py", > line 291, in check_cdf_ppf > npt.assert_almost_equal(distfn.cdf(distfn.ppf([0.001,0.5,0.999], > *arg), *arg), > File > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/distributions.py", > line 1324, in ppf > place(output,cond,self._ppf(*goodargs)*scale + loc) > File > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/distributions.py", > line 1028, in _ppf > return self.vecfunc(q,*args) > File > "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/function_base.py", > line 1794, in __call__ > theout = self.thefunc(*newargs) > File > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/distributions.py", > line 974, in _ppf_single_call > return optimize.brentq(self._ppf_to_solve, self.xa, > self.xb, args=(q,)+args, xtol=self.xtol) > File > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/optimize/zeros.py", > line 262, in brentq > r = > _zeros._brentq(f,a,b,xtol,maxiter,args,full_output,disp) > ValueError: f(a) and f(b) must have different signs > > ====================================================================== > ERROR: Failure: SyntaxError (invalid syntax > (test_distributions.py, line 391)) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/loader.py", > line 379, in loadTestsFromName > addr.filename, addr.module) > File > "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/importer.py", > line 39, in importFromPath > return self.importFromDir(dir_path, fqname) > File > "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/importer.py", > line 86, in importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/stats/tests/test_distributions.py", > line 391 > res = distfunc.rvs(*args, size=200) > ^ > SyntaxError: invalid syntax The above two should be fixed in trunk. This last one is an old syntax issue with not being able to pass keyword arguments after *args without building the dictionary. I don't know what is causing the error below. -Travis From nwagner at iam.uni-stuttgart.de Wed Jun 9 03:25:46 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Wed, 09 Jun 2010 09:25:46 +0200 Subject: [SciPy-Dev] test_complex_dotc (test_blas.TestFBLAS1Simple) ... Message-ID: Hi all, I installed numpy and scipy via svn on CentOS release 5.2 I have used the prebuild blas and lapack libraries (see below). scipy.test('1','10') segfaults in test_complex_dotc (test_blas.TestFBLAS1Simple) ... Program received signal SIGSEGV, Segmentation fault. 0x00002aaab719a257 in cdotc_ () from /usr/lib64/libblas.so.3 (gdb) bt #0 0x00002aaab719a257 in cdotc_ () from /usr/lib64/libblas.so.3 #1 0x00002aaab913d0c1 in f2py_rout_fblas_cdotc (capi_self=, capi_args=, capi_keywds=, f2py_func=0x2aaab9140720 ) at build/src.linux-x86_64-2.4/build/src.linux-x86_64-2.4/scipy/linalg/fblasmodule.c:5310 Is this a known issue ? Any pointer would be appreciated. Cheers, Nils rpm -qi blas Name : blas Relocations: (not relocatable) Version : 3.0 Vendor: CentOS Release : 37.el5 Build Date: Sa 06 Jan 2007 17:21:23 CET Install Date: Di 08 Jun 2010 15:57:57 CEST Build Host: builder5.centos.org Group : Development/Libraries Source RPM: lapack-3.0-37.el5.src.rpm Size : 695196 License: Freely distributable Signature : DSA/SHA1, Mi 04 Apr 2007 02:22:00 CEST, Key ID a8a447dce8562897 URL : http://www.netlib.org/lapack/ Summary : Die BLAS (Basic Linear Algebra Subprograms)-Bibliothek. Description : BLAS (Basic Linear Algebra Subprograms) is a standard library which provides a number of basic algorithms for numerical algebra. Man pages for blas are available in the blas-man package. rpm -qi lapack Name : lapack Relocations: (not relocatable) Version : 3.0 Vendor: CentOS Release : 37.el5 Build Date: Sa 06 Jan 2007 17:21:23 CET Install Date: Di 08 Jun 2010 15:58:12 CEST Build Host: builder5.centos.org Group : Development/Libraries Source RPM: lapack-3.0-37.el5.src.rpm Size : 5910874 License: Freely distributable Signature : DSA/SHA1, Mi 04 Apr 2007 02:24:47 CEST, Key ID a8a447dce8562897 URL : http://www.netlib.org/lapack/ From nwagner at iam.uni-stuttgart.de Wed Jun 9 04:05:53 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Wed, 09 Jun 2010 10:05:53 +0200 Subject: [SciPy-Dev] Building rpms Message-ID: Hi all, I tried to build rpms from numpy and scipy. It failed with tar -cf dist/numpy-2.0.0.dev8460.tar numpy-2.0.0.dev8460 gzip -f9 dist/numpy-2.0.0.dev8460.tar removing 'numpy-2.0.0.dev8460' (and everything under it) copying dist/numpy-2.0.0.dev8460.tar.gz -> build/bdist.linux-x86_64/rpm/SOURCES building RPMs rpm -ba --define _topdir /data/home/nwagner/svn/numpy/build/bdist.linux-x86_64/rpm --clean build/bdist.linux-x86_64/rpm/SPECS/numpy.spec -ba: unknown option error: command 'rpm' failed with exit status 1 Any idea ? Nils From cournape at gmail.com Wed Jun 9 05:55:32 2010 From: cournape at gmail.com (David Cournapeau) Date: Wed, 9 Jun 2010 18:55:32 +0900 Subject: [SciPy-Dev] Building rpms In-Reply-To: References: Message-ID: On Wed, Jun 9, 2010 at 5:05 PM, Nils Wagner wrote: > Hi all, > > I tried to build rpms from numpy and scipy. > > It failed with > > tar -cf dist/numpy-2.0.0.dev8460.tar numpy-2.0.0.dev8460 > gzip -f9 dist/numpy-2.0.0.dev8460.tar > removing 'numpy-2.0.0.dev8460' (and everything under it) > copying dist/numpy-2.0.0.dev8460.tar.gz -> > build/bdist.linux-x86_64/rpm/SOURCES > building RPMs > rpm -ba --define _topdir > /data/home/nwagner/svn/numpy/build/bdist.linux-x86_64/rpm > --clean build/bdist.linux-x86_64/rpm/SPECS/numpy.spec > -ba: unknown option -ba should be an option for rpmbuild, not rpm. I don't know why distutils calls rpm here, David From nwagner at iam.uni-stuttgart.de Wed Jun 9 07:32:28 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Wed, 09 Jun 2010 13:32:28 +0200 Subject: [SciPy-Dev] Building rpms In-Reply-To: References: Message-ID: On Wed, 9 Jun 2010 18:55:32 +0900 David Cournapeau wrote: > On Wed, Jun 9, 2010 at 5:05 PM, Nils Wagner > wrote: >> Hi all, >> >> I tried to build rpms from numpy and scipy. >> >> It failed with >> >> tar -cf dist/numpy-2.0.0.dev8460.tar numpy-2.0.0.dev8460 >> gzip -f9 dist/numpy-2.0.0.dev8460.tar >> removing 'numpy-2.0.0.dev8460' (and everything under it) >> copying dist/numpy-2.0.0.dev8460.tar.gz -> >> build/bdist.linux-x86_64/rpm/SOURCES >> building RPMs >> rpm -ba --define _topdir >> /data/home/nwagner/svn/numpy/build/bdist.linux-x86_64/rpm >> --clean build/bdist.linux-x86_64/rpm/SPECS/numpy.spec >> -ba: unknown option > > -ba should be an option for rpmbuild, not rpm. I don't >know why > distutils calls rpm here, > > David Exactly. It looks like a bug in distutils. However, as soon as I have installed the rpm-build.rpm package on CentOS it works for me. Nils From m.boumans at gmx.net Sun Jun 6 01:45:28 2010 From: m.boumans at gmx.net (bowie_22) Date: Sun, 6 Jun 2010 05:45:28 +0000 (UTC) Subject: [SciPy-Dev] ANN: SciPy 0.8.0 beta 1 References: Message-ID: Ralf Gommers googlemail.com> writes: > > I'm pleased to announce the first beta release of SciPy 0.8.0.SciPy is a package of tools for science and engineering for Python.It includes modules for statistics, optimization, integration, linearalgebra, Fourier transforms, signal and image processing, ODE solvers, > and more.This beta release comes almost one and a half year after the 0.7.0 release andcontains many new features, numerous bug-fixes, improved testcoverage, and better documentation. ?Please note that SciPy 0.8.0b1 > requires Python 2.4 or greater and NumPy 1.4.1 or greater.For information, please see the release notes:http://sourceforge.net/projects/scipy/files/scipy/0.8.0b1/NOTES.txt/viewYou can download the release from here:https://sourceforge.net/projects/scipy/Python 2.5/2.6 binaries for Windows and OS X are available as well as source tarballs for other platforms. Thank you to everybody who contributed to this release.Cheers,Ralf > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > Hello everybody, I just have read the announcement for SciPy 0.8.0 and of course I have installed it immediatly. I am quite new in Scipy (coming from Matlab) and I thought a good starting point for a contribution would be to review and check the Scipy documentation. I added some hints in http://docs.scipy.org/numpy/Front%20Page/ and now I ask myself how the release of the documentation is conntected to the release of a new scipy version. Is it connected at all? Browsing throw the docs give at http://docs.scipy.org/doc shows a documentation for scipy 0.7. Does "...and better documentation..." mean an improvement in the docstrings (As I am still not sure which place is the best to look at)? As a scipy rookie I would appreciate same information about this topics (release of documentation and release of a new scipy package) Thank you! Regs Marcus From amcmorl at gmail.com Thu Jun 10 10:22:24 2010 From: amcmorl at gmail.com (Angus McMorland) Date: Thu, 10 Jun 2010 10:22:24 -0400 Subject: [SciPy-Dev] Docstrings permissions Message-ID: I've found a scipy docstring that needs slight adjustment. Please can someone give me edit permissions on the docstring site: I'll try to make this the impetus I need to get contributing in general. I've registered an account under the name amcmorl. Thanks all, Angus. -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh From d.l.goldsmith at gmail.com Thu Jun 10 17:21:56 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 10 Jun 2010 14:21:56 -0700 Subject: [SciPy-Dev] Marathon skypecon tomorrow and an agenda item Message-ID: Agenda item: no one has "registered" themselves to work on any of the Milestones - was that a bad idea? If so, what are some other things we can do to kick-start this thing? DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Thu Jun 10 19:26:15 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 10 Jun 2010 17:26:15 -0600 Subject: [SciPy-Dev] Marathon skypecon tomorrow and an agenda item In-Reply-To: References: Message-ID: On Thu, Jun 10, 2010 at 3:21 PM, David Goldsmith wrote: > Agenda item: no one has "registered" themselves to work on any of the > Milestones - was that a bad idea?? If so, what are some other things we can > do to kick-start this thing? Well guess I either missed the list or looked and didn't think I was capable of any of them. So could you point me to the list (again?). > > DG > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From d.l.goldsmith at gmail.com Thu Jun 10 19:31:01 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 10 Jun 2010 16:31:01 -0700 Subject: [SciPy-Dev] Marathon skypecon tomorrow and an agenda item In-Reply-To: References: Message-ID: On Thu, Jun 10, 2010 at 4:26 PM, Vincent Davis wrote: > On Thu, Jun 10, 2010 at 3:21 PM, David Goldsmith > wrote: > > Agenda item: no one has "registered" themselves to work on any of the > > Milestones - was that a bad idea? If so, what are some other things we > can > > do to kick-start this thing? > > Well guess I either missed the list or looked and didn't think I was > capable of any of them. So could you point me to the list (again?). > http://docs.scipy.org/scipy/Milestones/ DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Thu Jun 10 19:32:06 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 10 Jun 2010 17:32:06 -0600 Subject: [SciPy-Dev] Data Types documentation page questions? Message-ID: Regarding this page http://docs.scipy.org/doc/numpy/user/basics.types.html I assume there is a "-" missing here int64 Integer (9223372036854775808 to 9223372036854775807) Also I would suggest that the intervals on these data type use standard mathematical notations for open and closed interval. "(", and "[" Vincent From d.l.goldsmith at gmail.com Thu Jun 10 19:52:14 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 10 Jun 2010 16:52:14 -0700 Subject: [SciPy-Dev] Data Types documentation page questions? In-Reply-To: References: Message-ID: On Thu, Jun 10, 2010 at 4:32 PM, Vincent Davis wrote: > Regarding this page > http://docs.scipy.org/doc/numpy/user/basics.types.html > > I assume there is a "-" missing here > int64 Integer (9223372036854775808 to 9223372036854775807) > Yes, the left number should be negative. > Also I would suggest that the intervals on these data type use > standard mathematical notations for open and closed interval. "(", and > "[" > Except that notation is used for real number intervals, not sets of integers (at least, I've never seen it used w/ sets of integers). And were you to make the change, it would simply be replacing every "(" with a "[" and every ")" with a "]" (because, the way the ranges are given, they all include their endpoints) - I think the risk of misunderstanding on the part of the reader is minimal and thus any benefit is hardly worth the effort. DG > > > Vincent > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Thu Jun 10 21:45:14 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 10 Jun 2010 18:45:14 -0700 Subject: [SciPy-Dev] How to document parameters *args and **kwds Message-ID: We've kind of discussed *args before (see http://docs.scipy.org/numpy/Questions+Answers/#variable-arguments), though we didn't note a "canonical answer." Can we: A) agree on such, and B) extend it to parameter **kwds? Thanks! DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Jun 10 21:50:27 2010 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 10 Jun 2010 21:50:27 -0400 Subject: [SciPy-Dev] How to document parameters *args and **kwds In-Reply-To: References: Message-ID: On Thu, Jun 10, 2010 at 21:45, David Goldsmith wrote: > We've kind of discussed *args before (see > http://docs.scipy.org/numpy/Questions+Answers/#variable-arguments), though > we didn't note a "canonical answer."? Can we: > > A) agree on such, and > > B) extend it to parameter **kwds? In most cases, I would leave the type field blank unless if they happen to be homogeneous. They often aren't. *args : Arguments to pass to the callback. **kwds : Keyword arguments to pass to the callback. But sometimes they are. *indices : ints Possibly multiple indices. I don't think Sphinx has a problem with these constructs, but I could be wrong. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From d.l.goldsmith at gmail.com Thu Jun 10 22:05:53 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 10 Jun 2010 19:05:53 -0700 Subject: [SciPy-Dev] How to document parameters *args and **kwds In-Reply-To: References: Message-ID: On Thu, Jun 10, 2010 at 6:50 PM, Robert Kern wrote: > On Thu, Jun 10, 2010 at 21:45, David Goldsmith > wrote: > > We've kind of discussed *args before (see > > http://docs.scipy.org/numpy/Questions+Answers/#variable-arguments), > though > > we didn't note a "canonical answer." Can we: > > > > A) agree on such, and > > > > B) extend it to parameter **kwds? > > In most cases, I would leave the type field blank unless if they > happen to be homogeneous. They often aren't. > > *args : > Arguments to pass to the callback. > **kwds : > Keyword arguments to pass to the callback. > > But sometimes they are. > > *indices : ints > Possibly multiple indices. > > I don't think Sphinx has a problem with these constructs, but I could be > wrong. > What we were (only slightly) leaning to in the Q+A page discussion, in part because Ralf said there was already precedent for it in the docs, was: \*args : Arguments Explanation of number and type of arguments .... but escaping the '*' with an '\' doesn't appear to be working, but leaving it un-escaped gets misinterpreted, too (as an un-closed emphasis mark-up). So the **kwds analog would be \*\*kwds : Keyword arguments Explanation of number and type... but now that I look at that typed out, I predict that the command-line crowd will protest. :-) Regardless, right now, using * & ** "breaks" the Wiki, whereas \* & \*\* keeps the Wiki from complaining, but the \ aren't removed by it, and look ugly be it in a terminal or rendered. What to do, what to do... DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Jun 10 22:36:58 2010 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 10 Jun 2010 22:36:58 -0400 Subject: [SciPy-Dev] How to document parameters *args and **kwds In-Reply-To: References: Message-ID: On Thu, Jun 10, 2010 at 22:05, David Goldsmith wrote: > On Thu, Jun 10, 2010 at 6:50 PM, Robert Kern wrote: >> >> On Thu, Jun 10, 2010 at 21:45, David Goldsmith >> wrote: >> > We've kind of discussed *args before (see >> > http://docs.scipy.org/numpy/Questions+Answers/#variable-arguments), >> > though >> > we didn't note a "canonical answer."? Can we: >> > >> > A) agree on such, and >> > >> > B) extend it to parameter **kwds? >> >> In most cases, I would leave the type field blank unless if they >> happen to be homogeneous. They often aren't. >> >> *args : >> ? ?Arguments to pass to the callback. >> **kwds : >> ? ?Keyword arguments to pass to the callback. >> >> But sometimes they are. >> >> *indices : ints >> ? ?Possibly multiple indices. >> >> I don't think Sphinx has a problem with these constructs, but I could be >> wrong. > > What we were (only slightly) leaning to in the Q+A page discussion, in part > because Ralf said there was already precedent for it in the docs, was: > > \*args : Arguments > > Explanation of number and type of arguments .... > > but escaping the '*' with an '\' doesn't appear to be working, but leaving > it un-escaped gets misinterpreted, too (as an un-closed emphasis mark-up). > So the **kwds analog would be > > \*\*kwds : Keyword arguments > ??? Explanation of number and type... > > but now that I look at that typed out, I predict that the command-line crowd > will protest. :-)? Regardless, right now, using * & ** "breaks" the Wiki, > whereas \* & \*\* keeps the Wiki from complaining, but the \ aren't removed > by it, and look ugly be it in a terminal or rendered.? What to do, what to > do... Fix the wiki software. The generated Sphinx docs are fine with the unescaped version. The wiki is a tool to help build the Sphinx docs, not the other way around. However, my typeless examples do not work directly. You need to omit the colon. Shame, because I like the look of the colon. Ah well. The following do work, and I prefer them to the "Arguments" and "Keyword arguments" placeholders. The *, ** and usually the names of those variables usually state clearly that they are arguments or keyword arguments. Stating it a third time just seems weird. I'd say, add real type information if it makes sense, otherwise omit it. But that's just me. *args Arguments to pass to the callback. **kwds Keyword arguments to pass to the callback. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From vincent at vincentdavis.net Thu Jun 10 23:50:02 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 10 Jun 2010 21:50:02 -0600 Subject: [SciPy-Dev] mono space text in document editor. Message-ID: It seems that the text in the document editor is not a mono space text. Can this be changed? Vincent From vincent at vincentdavis.net Fri Jun 11 00:11:57 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 10 Jun 2010 22:11:57 -0600 Subject: [SciPy-Dev] Updating constants Message-ID: The current constants in scipy are from 2002, the newest set available are from 2006. Should they be updated, What are the issues with updating with regard to notifying users ie documenting the update. Vincent From josef.pktd at gmail.com Fri Jun 11 00:38:18 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 11 Jun 2010 00:38:18 -0400 Subject: [SciPy-Dev] Updating constants In-Reply-To: References: Message-ID: On Fri, Jun 11, 2010 at 12:11 AM, Vincent Davis wrote: > The current constants in scipy are from 2002, the newest set available > are from 2006. > Should they be updated, What are the issues with updating with regard > to notifying users ie documenting the update. I thought these are constants. Did they change the value of Pi recently? just curious: What has changed? Josef > > Vincent > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From vincent at vincentdavis.net Fri Jun 11 00:54:13 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 10 Jun 2010 22:54:13 -0600 Subject: [SciPy-Dev] Updating constants In-Reply-To: References: Message-ID: On Thu, Jun 10, 2010 at 10:38 PM, wrote: > On Fri, Jun 11, 2010 at 12:11 AM, Vincent Davis > wrote: >> The current constants in scipy are from 2002, the newest set available >> are from 2006. >> Should they be updated, What are the issues with updating with regard >> to notifying users ie documenting the update. > > I thought these are constants. Did they change the value of Pi recently? They found more digits of Pi :) I was referring to "Fundamental Physical Constants" scipy.constants.codata I don't know, it's not easy to compare. I assume there was a reason they updated the list. There is an Uncertainty value on many of them so I assume the actual value don't change but our estimate does. http://physics.nist.gov/cuu/Constants/index.html release dates 1986, 1998, 2002, 2006 Vincent > > just curious: What has changed? > > Josef > >> >> Vincent >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From vincent at vincentdavis.net Fri Jun 11 01:09:22 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 10 Jun 2010 23:09:22 -0600 Subject: [SciPy-Dev] Updating constants In-Reply-To: References: Message-ID: On Thu, Jun 10, 2010 at 10:54 PM, Vincent Davis wrote: > On Thu, Jun 10, 2010 at 10:38 PM, ? wrote: >> On Fri, Jun 11, 2010 at 12:11 AM, Vincent Davis >> wrote: >>> The current constants in scipy are from 2002, the newest set available >>> are from 2006. >>> Should they be updated, What are the issues with updating with regard >>> to notifying users ie documenting the update. >> >> I thought these are constants. Did they change the value of Pi recently? > > They found more digits of Pi :) > I was referring to "Fundamental Physical Constants" scipy.constants.codata > I don't know, it's not easy to compare. I assume there was a reason > they updated the list. There is an Uncertainty value on many of them > so I assume the actual value don't change but our estimate does. > > http://physics.nist.gov/cuu/Constants/index.html > release dates 1986, 1998, 2002, 2006 > > Vincent > >> >> just curious: What has changed? >> I just relived I can update the constants and then compare them with the old. Should be quick, I will send out the diff in the morning. Vincent >> Josef >> >>> >>> Vincent >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > From josef.pktd at gmail.com Fri Jun 11 01:29:39 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 11 Jun 2010 01:29:39 -0400 Subject: [SciPy-Dev] Updating constants In-Reply-To: References: Message-ID: On Fri, Jun 11, 2010 at 1:09 AM, Vincent Davis wrote: > On Thu, Jun 10, 2010 at 10:54 PM, Vincent Davis > wrote: >> On Thu, Jun 10, 2010 at 10:38 PM, ? wrote: >>> On Fri, Jun 11, 2010 at 12:11 AM, Vincent Davis >>> wrote: >>>> The current constants in scipy are from 2002, the newest set available >>>> are from 2006. >>>> Should they be updated, What are the issues with updating with regard >>>> to notifying users ie documenting the update. >>> >>> I thought these are constants. Did they change the value of Pi recently? >> >> They found more digits of Pi :) >> I was referring to "Fundamental Physical Constants" scipy.constants.codata >> I don't know, it's not easy to compare. I assume there was a reason >> they updated the list. There is an Uncertainty value on many of them >> so I assume the actual value don't change but our estimate does. >> >> http://physics.nist.gov/cuu/Constants/index.html >> release dates 1986, 1998, 2002, 2006 >> >> Vincent >> >>> >>> just curious: What has changed? >>> > > I just relived I can update the constants and then compare them with > the old. Should be quick, I will send out the diff in the morning. quote from the REVIEWS OF MODERN PHYSICS paper: "Although just four years separate the 31 December closing dates of the 2002 and 2006 adjustments, there are a number of important new results to consider. Experimental advances include the 2003 Atomic Mass Evaluation from the Atomic Mass Data Center (AMDC),which provides new values for the relative atomic masses Ar(X) of a number of relevant atoms; a new value of ..." Josef ?Curiouser and curiouser!? > > Vincent > > >>> Josef >>> >>>> >>>> Vincent >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From pav at iki.fi Fri Jun 11 06:25:08 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 11 Jun 2010 10:25:08 +0000 (UTC) Subject: [SciPy-Dev] mono space text in document editor. References: Message-ID: Thu, 10 Jun 2010 21:50:02 -0600, Vincent Davis wrote: > It seems that the text in the document editor is not a mono space text. > Can this be changed? What text? The text in the edit window is monospace. The output should not be made monospaced; it is intended for Sphinx, and plain text should be variable spaced. -- Pauli Virtanen From tpk at kraussfamily.org Fri Jun 11 08:50:49 2010 From: tpk at kraussfamily.org (Tom K.) Date: Fri, 11 Jun 2010 12:50:49 +0000 (UTC) Subject: [SciPy-Dev] [SciPy] #902: need high, stop, pass options to signal.firwin Message-ID: Looks like this patch for firwin might have been overlooked. Posting here as requested. #902: need high, stop, pass options to signal.firwin ----------------------------------+---------------------------- Reporter: tpk@? | Owner: somebody Type: enhancement | Status: new Priority: normal | Milestone: 0.8.0 Component: scipy.signal | Version: 0.7.0 Keywords: | ----------------------------------+---------------------------- Comment(by charris): It probably got overlooked. Send a note to the list. -- Ticket URL: SciPy SciPy is open-source software for mathematics, science, and engineering. From d.l.goldsmith at gmail.com Fri Jun 11 11:19:34 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 11 Jun 2010 08:19:34 -0700 Subject: [SciPy-Dev] Marathon Skypecon in 45 minutes Message-ID: My Skype ID is d.l.goldsmith - message me and I'll add you. DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Fri Jun 11 11:45:03 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 11 Jun 2010 08:45:03 -0700 Subject: [SciPy-Dev] How to document parameters *args and **kwds In-Reply-To: References: Message-ID: On Thu, Jun 10, 2010 at 7:36 PM, Robert Kern wrote: > On Thu, Jun 10, 2010 at 22:05, David Goldsmith > wrote: > > On Thu, Jun 10, 2010 at 6:50 PM, Robert Kern > wrote: > >> > >> On Thu, Jun 10, 2010 at 21:45, David Goldsmith > > >> wrote: > >> > We've kind of discussed *args before (see > >> > http://docs.scipy.org/numpy/Questions+Answers/#variable-arguments), > >> > though > >> > we didn't note a "canonical answer." Can we: > >> > > >> > A) agree on such, and > >> > > >> > B) extend it to parameter **kwds? > >> > >> In most cases, I would leave the type field blank unless if they > >> happen to be homogeneous. They often aren't. > >> > >> *args : > >> Arguments to pass to the callback. > >> **kwds : > >> Keyword arguments to pass to the callback. > >> > >> But sometimes they are. > >> > >> *indices : ints > >> Possibly multiple indices. > >> > >> I don't think Sphinx has a problem with these constructs, but I could be > >> wrong. > > > > What we were (only slightly) leaning to in the Q+A page discussion, in > part > > because Ralf said there was already precedent for it in the docs, was: > > > > \*args : Arguments > > > > Explanation of number and type of arguments .... > > > > but escaping the '*' with an '\' doesn't appear to be working, but > leaving > > it un-escaped gets misinterpreted, too (as an un-closed emphasis > mark-up). > > So the **kwds analog would be > > > > \*\*kwds : Keyword arguments > > Explanation of number and type... > > > > but now that I look at that typed out, I predict that the command-line > crowd > > will protest. :-) Regardless, right now, using * & ** "breaks" the Wiki, > > whereas \* & \*\* keeps the Wiki from complaining, but the \ aren't > removed > > by it, and look ugly be it in a terminal or rendered. What to do, what > to > > do... > > Fix the wiki software. The generated Sphinx docs are fine with the > unescaped version. The wiki is a tool to help build the Sphinx docs, > not the other way around. > > However, my typeless examples do not work directly. You need to omit > the colon. Shame, because I like the look of the colon. Ah well. The > following do work, and I prefer them to the "Arguments" and "Keyword > arguments" placeholders. The *, ** and usually the names of those > variables usually state clearly that they are arguments or keyword > arguments. Stating it a third time just seems weird. I'd say, add real > type information if it makes sense, otherwise omit it. But that's just > me. > > *args > Arguments to pass to the callback. > **kwds > Keyword arguments to pass to the callback. > > -- > Robert Kern > I excerpted this over on the Q+A page; please go check to confirm that I haven't misrepresented you. And thanks for your input! DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Fri Jun 11 12:07:59 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Fri, 11 Jun 2010 10:07:59 -0600 Subject: [SciPy-Dev] mono space text in document editor. In-Reply-To: References: Message-ID: On Fri, Jun 11, 2010 at 4:25 AM, Pauli Virtanen wrote: > Thu, 10 Jun 2010 21:50:02 -0600, Vincent Davis wrote: >> It seems that the text in the document editor is not a mono space text. >> Can this be changed? > > What text? On this page for example. http://docs.scipy.org/scipy/docs/scipy.constants.codata.precision/edit/ When you edit the text it is not monospace and there is no 75 char guide. This makes both staying within the 75 char limit and getting the rst syntax correct difficult. (for example putting "-" under a word to get bold font.) I could use an external editor and copy paste. See attached images for you don't know what I mean. Thanks Vincent > > The text in the edit window is monospace. The output should not be made > monospaced; it is intended for Sphinx, and plain text should be variable > spaced. > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- A non-text attachment was scrubbed... Name: scipy doc edit.tiff Type: image/tiff Size: 96158 bytes Desc: not available URL: From d.l.goldsmith at gmail.com Fri Jun 11 12:30:23 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 11 Jun 2010 09:30:23 -0700 Subject: [SciPy-Dev] mono space text in document editor. In-Reply-To: References: Message-ID: On Fri, Jun 11, 2010 at 9:07 AM, Vincent Davis wrote: > On Fri, Jun 11, 2010 at 4:25 AM, Pauli Virtanen wrote: > > Thu, 10 Jun 2010 21:50:02 -0600, Vincent Davis wrote: > >> It seems that the text in the document editor is not a mono space text. > >> Can this be changed? > > > > What text? > On this page for example. > http://docs.scipy.org/scipy/docs/scipy.constants.codata.precision/edit/ > When you edit the text it is not monospace and there is no 75 char guide. > This makes both staying within the 75 char limit Just err on the side of being too short (i.e., if in doubt, break the line). > and getting the rst > syntax correct difficult. (for example putting "-" under a word to get > bold font.) **** gives us bold font. ("-" under a word makes it a section heading; only Parameters, Returns, Other parameters, Raises, See also, Notes, References, and Examples, starting in the first column and on a line by themselves, should have "-" under them.) > I could use an external editor and copy paste. > Actually, nominally, that _is_ what we'd prefer writer/editors to do: (from http://docs.scipy.org/numpy/Front%20Page/#roles-reviewing) "**It is best to grab the whole existing page [from the Edit window, not the View window], or a template, edit it on your computer, return [and] check that nobody else has edited first, and then upload your document. Please do not edit incrementally, unless making trivial changes like fixing markup or reformatting." (Full disclosure: I "cheat," but since you brought it up, I figured I should emphasize that what you state is the preferred modus operendi.) DG See attached images for you don't know what I mean. > > Thanks > Vincent > > > > > > > The text in the edit window is monospace. The output should not be made > > monospaced; it is intended for Sphinx, and plain text should be variable > > spaced. > > > > -- > > Pauli Virtanen > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero. Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves) -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jun 11 12:41:17 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 11 Jun 2010 12:41:17 -0400 Subject: [SciPy-Dev] mono space text in document editor. In-Reply-To: References: Message-ID: On Fri, Jun 11, 2010 at 12:30 PM, David Goldsmith wrote: > On Fri, Jun 11, 2010 at 9:07 AM, Vincent Davis > wrote: >> >> On Fri, Jun 11, 2010 at 4:25 AM, Pauli Virtanen wrote: >> > Thu, 10 Jun 2010 21:50:02 -0600, Vincent Davis wrote: >> >> It seems that the text in the document editor is not a mono space text. >> >> Can this be changed? >> > >> > What text? >> On this page for example. >> http://docs.scipy.org/scipy/docs/scipy.constants.codata.precision/edit/ >> When you edit the text it is not monospace and there is no 75 char guide. >> This makes both staying within the 75 char limit it's monospaced for me in firefox, underlines line up with header. maybe it's a browser setting. Josef > > Just err on the side of being too short (i.e., if in doubt, break the line). > >> >> and getting the rst >> syntax correct difficult. (for example putting "-" under a word to get >> bold font.) > > **** gives us bold font.? ("-" under a word makes it a section > heading; only Parameters, Returns, Other parameters, Raises, See also, > Notes, References, and Examples, starting in the first column and on a line > by themselves, should have "-" under them.) > >> >> I could use an external editor and copy paste. > > Actually, nominally, that _is_ what we'd prefer writer/editors to do: > > (from http://docs.scipy.org/numpy/Front%20Page/#roles-reviewing) "It is best > to grab the whole existing page [from the Edit window, not the View window], > or a template, edit it on your computer, return [and] check that nobody else > has edited first, and then upload your document. Please do not edit > incrementally, unless making trivial changes like fixing markup or > reformatting." > > (Full disclosure: I "cheat," but since you brought it up, I figured I should > emphasize that what you state is the preferred modus operendi.) > > DG > >> See attached images for you don't know what I mean. >> >> Thanks >> Vincent >> >> >> >> > >> > The text in the edit window is monospace. The output should not be made >> > monospaced; it is intended for Sphinx, and plain text should be variable >> > spaced. >> > >> > -- >> > Pauli Virtanen >> > >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> > >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide. ?(As interpreted > by Robert Graves) > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From jsseabold at gmail.com Fri Jun 11 12:45:05 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 11 Jun 2010 12:45:05 -0400 Subject: [SciPy-Dev] Warnings raised (from fit in scipy.stats) Message-ID: Since the raising of warning behavior has been changed (I believe), I have been running into a lot of warnings in my code when say I do something like In [120]: from scipy import stats In [121]: y = [-45, -3, 1, 0, 1, 3] In [122]: v = stats.norm.pdf(y)/stats.norm.cdf(y) Warning: invalid value encountered in divide Sometimes, this is useful to know. Sometimes, though, it's very disturbing when it's encountered in some kind of iteration or optimization. I have been using numpy.clip to get around this in my own code, but when it's buried a bit deeper, it's not quite so simple. Take this example. In [123]: import numpy as np In [124]: np.random.seed(12345) In [125]: B = 6.0 In [126]: x = np.random.exponential(scale=B, size=5000) In [127]: from scipy.stats import expon In [128]: expon.fit(x) Out[128]: (0.21874043533906118, 5.7122829778172939) The fit is achieved by fmin (as far as I know, since disp=0 in the rv_continuous.fit...), but there are a number of warnings emitted. Is there any middle ground to be had in these type of situations via context management perhaps? Should I file a ticket? Skipper From pav at iki.fi Fri Jun 11 12:50:41 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 11 Jun 2010 16:50:41 +0000 (UTC) Subject: [SciPy-Dev] mono space text in document editor. References: Message-ID: Fri, 11 Jun 2010 10:07:59 -0600, Vincent Davis wrote: [clip] > On this page for example. > http://docs.scipy.org/scipy/docs/scipy.constants.codata.precision/edit/ > When you edit the text it is not monospace and there is no 75 char > guide. This makes both staying within the 75 char limit and getting the > rst syntax correct difficult. (for example putting "-" under a word to > get bold font.) I could use an external editor and copy paste. See > attached images for you don't know what I mean. That's specific to the browser you are using, and possibly also user- specific customizations -- is it Safari on OSX? Anyway, there's nothing in the CSS forcing