From josef.pktd at gmail.com Sat Jun 1 11:35:33 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 1 Jun 2013 11:35:33 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> Message-ID: On Tue, May 28, 2013 at 10:34 PM, Matthew Brett wrote: > Hi, > > On Tue, May 28, 2013 at 7:18 PM, Paulo Jabardo wrote: >> I'm an engineer working in research but I spend a good deal of time coding. >> What I've seen with most of my colleagues and friends is that they will only >> code whenever it is extremely necessary for an immediate application in an >> experiment or for their PhD. The problem starts very early, when I was >> beginning my studies, we were taught C (and that is still the case almost 20 >> years later). A small percentage of the students (10%?) enjoy programming >> and they will profit. I really loved pointers and doing neat tricks. For the >> rest it was torture, plain and simple torture. And completely useless. Most >> students couldn't do anything useful with programming. All their suffering >> was for nothing. What happened later was obvious: they would avoid >> programming at all costs and if they had to do something they would use >> MS-Excel. The spreadsheets I've seen... I still have nightmares. The things >> they accomplished humbles me, proves that I'm a lower being. I've seen >> people solve partial differential equations where each cell was an element >> in the solution and it was colored according to the result. Beautiful but >> I'd rather suffer accute physical pain than to do something like that, or >> worse, debug such a "program". By the way, this sort of application was not >> a joke or a neat hack, it was actually the only way those guys knew how to >> solve a problem. >> >> 15 years later... I have a physics undergraduate student working with me. >> Very smart and interested. They still learn C and later on when they need to >> do something, what is it they do? Most professors use Origin. A huge >> improvement over Excel, but still. A couple of months ago, he had to turn in >> a report and since we don't have Origin, he was using Excel. I kind of felt >> sorry for him and I helped him out to do it in Python. He couldn't believe >> it. > > Oh - dear; you probably saw this stuff? > > http://blog.stodden.net/2013/04/19/what-the-reinhart-rogoff-debacle-really-shows-verifying-empirical-results-needs-to-be-routine/ I think that's a good example that peer review works. > >> I did my Masters and PhD in CFD. Most other students had almost no >> background in programming and did most things using Excel! When they had to >> modify some code, it was almost by accident that things worked. You can >> imagine what sort of code comes out of this. The professors didn't know >> programming much better. Just getting them to understand the concept of >> version control took a while. >> >> In my opinion, If schools taught, at the begining, something like >> Python/Octave/R instead of C, students would be able to use this knowledge >> easily and productively throughout their courses and eventually learn C when >> they really needed it. > > That's surely one of the big arguments for Python - it is a great > first language, and it is capable across a wider range than Octave or > R - or even Excel :) We can mistake in any language I just read this """ Abstract [Correction Notice: An Erratum for this article was reported in Vol 17(4) of Psychological Methods (see record 2012-33502-001). The R code for arriving at adjusted p values for one of the methods is incorrect. The specific changes that need to be made are provided in the erratum.] """ It's still functioning peer review if a mistake is found after an article has been published, or after a pull request has landed in master. --------- in general: in the research areas that I know, the vast majority of researchers use Windows, and everything that is not core task is point and click. As long as Matlab, Stata and GAUSS, or whatever else, doesn't have version control build in, VC won't be used by the majority of researchers that I know. We didn't grow up when version control was popular. And we don't have IT guys to manage it for us. (There is the old fashioned version control of starting new directories at crucial stages, or for specific conference talks and paper submissions.) (DVCS are only a few years old, and it will take a few more years for diffusion to "non-programmers" to happen.) Even after using git some time, I only find it usable because I can do all the regular stuff with git gui (and for unusual stuff I can use commandline and git gui at the same time). --------- (just in case I'm misunderstood: I'm all in favor of best practices and unit and functional tests, but I don't expect that researchers will adopt it (fast) if it goes against their usual pattern of using tools. example: If you teach a software carpentry course that uses Linux, then I wouldn't be surprised if some users go back to their office and the first thing they do is use Excel. :) Josef (I used a virtual Debian for one month.) > > Cheers, > > Matthew > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From matthew.brett at gmail.com Sat Jun 1 17:39:37 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 1 Jun 2013 14:39:37 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> Message-ID: Hi, On Sat, Jun 1, 2013 at 8:35 AM, wrote: > On Tue, May 28, 2013 at 10:34 PM, Matthew Brett wrote: >> Hi, >> >> On Tue, May 28, 2013 at 7:18 PM, Paulo Jabardo wrote: >>> I'm an engineer working in research but I spend a good deal of time coding. >>> What I've seen with most of my colleagues and friends is that they will only >>> code whenever it is extremely necessary for an immediate application in an >>> experiment or for their PhD. The problem starts very early, when I was >>> beginning my studies, we were taught C (and that is still the case almost 20 >>> years later). A small percentage of the students (10%?) enjoy programming >>> and they will profit. I really loved pointers and doing neat tricks. For the >>> rest it was torture, plain and simple torture. And completely useless. Most >>> students couldn't do anything useful with programming. All their suffering >>> was for nothing. What happened later was obvious: they would avoid >>> programming at all costs and if they had to do something they would use >>> MS-Excel. The spreadsheets I've seen... I still have nightmares. The things >>> they accomplished humbles me, proves that I'm a lower being. I've seen >>> people solve partial differential equations where each cell was an element >>> in the solution and it was colored according to the result. Beautiful but >>> I'd rather suffer accute physical pain than to do something like that, or >>> worse, debug such a "program". By the way, this sort of application was not >>> a joke or a neat hack, it was actually the only way those guys knew how to >>> solve a problem. >>> >>> 15 years later... I have a physics undergraduate student working with me. >>> Very smart and interested. They still learn C and later on when they need to >>> do something, what is it they do? Most professors use Origin. A huge >>> improvement over Excel, but still. A couple of months ago, he had to turn in >>> a report and since we don't have Origin, he was using Excel. I kind of felt >>> sorry for him and I helped him out to do it in Python. He couldn't believe >>> it. >> >> Oh - dear; you probably saw this stuff? >> >> http://blog.stodden.net/2013/04/19/what-the-reinhart-rogoff-debacle-really-shows-verifying-empirical-results-needs-to-be-routine/ > > I think that's a good example that peer review works. It's a good example of how peer-review should work, but it's very uncommon for the reviewer to have the original spreadsheet, and that was the key to the problem. >>> I did my Masters and PhD in CFD. Most other students had almost no >>> background in programming and did most things using Excel! When they had to >>> modify some code, it was almost by accident that things worked. You can >>> imagine what sort of code comes out of this. The professors didn't know >>> programming much better. Just getting them to understand the concept of >>> version control took a while. >>> >>> In my opinion, If schools taught, at the begining, something like >>> Python/Octave/R instead of C, students would be able to use this knowledge >>> easily and productively throughout their courses and eventually learn C when >>> they really needed it. >> >> That's surely one of the big arguments for Python - it is a great >> first language, and it is capable across a wider range than Octave or >> R - or even Excel :) > > We can mistake in any language > > I just read this > > """ > Abstract > > [Correction Notice: An Erratum for this article was reported in > Vol 17(4) of Psychological Methods (see record 2012-33502-001). The R > code for arriving at adjusted p values for one of the methods is > incorrect. The specific changes that need to be made are provided in > the erratum.] > """ > > It's still functioning peer review if a mistake is found after an > article has been published, or after a pull request has landed in > master. The problem is that the peers don't get to review what has been done, in general, they get to review what the author said had been done. Donoho's point - about computational science - is that this can be very different. The question is then : does this matter? Are - most published research findings false? > --------- > in general: > > in the research areas that I know, the vast majority of researchers > use Windows, and everything that is not core task is point and click. > As long as Matlab, Stata and GAUSS, or whatever else, doesn't have > version control build in, VC won't be used by the majority of > researchers that I know. We didn't grow up when version control was > popular. And we don't have IT guys to manage it for us. > (There is the old fashioned version control of starting new > directories at crucial stages, or for specific conference talks and > paper submissions.) > (DVCS are only a few years old, and it will take a few more years for > diffusion to "non-programmers" to happen.) We get taught some complicated things when we are training - calculus, algebra... Does it make sense that we don't teach less complicated things like version control and programming? > Even after using git some time, I only find it usable because I can do > all the regular stuff with git gui (and for unusual stuff I can use > commandline and git gui at the same time). > > > --------- > (just in case I'm misunderstood: > I'm all in favor of best practices and unit and functional tests, but > I don't expect that researchers will adopt it (fast) if it goes > against their usual pattern of using tools. > example: If you teach a software carpentry course that uses Linux, > then I wouldn't be surprised if some users go back to their office and > the first thing they do is use Excel. :) In general as you know I agree completely that it doesn't make sense to persuade people to switch from Windows to Linux at the same time as persuading them to use good software tools. We should teach people stuff that they will and can use, and it's a common them among software-carpentry types that it would be better to teach Windows people how to best use Windows rather than teaching them on a virtual machine that they are unlikely to use for their work. Cheers, Matthew From josef.pktd at gmail.com Sat Jun 1 23:29:26 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 1 Jun 2013 23:29:26 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> Message-ID: On Sat, Jun 1, 2013 at 5:39 PM, Matthew Brett wrote: > Hi, > > On Sat, Jun 1, 2013 at 8:35 AM, wrote: >> On Tue, May 28, 2013 at 10:34 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Tue, May 28, 2013 at 7:18 PM, Paulo Jabardo wrote: >>>> I'm an engineer working in research but I spend a good deal of time coding. >>>> What I've seen with most of my colleagues and friends is that they will only >>>> code whenever it is extremely necessary for an immediate application in an >>>> experiment or for their PhD. The problem starts very early, when I was >>>> beginning my studies, we were taught C (and that is still the case almost 20 >>>> years later). A small percentage of the students (10%?) enjoy programming >>>> and they will profit. I really loved pointers and doing neat tricks. For the >>>> rest it was torture, plain and simple torture. And completely useless. Most >>>> students couldn't do anything useful with programming. All their suffering >>>> was for nothing. What happened later was obvious: they would avoid >>>> programming at all costs and if they had to do something they would use >>>> MS-Excel. The spreadsheets I've seen... I still have nightmares. The things >>>> they accomplished humbles me, proves that I'm a lower being. I've seen >>>> people solve partial differential equations where each cell was an element >>>> in the solution and it was colored according to the result. Beautiful but >>>> I'd rather suffer accute physical pain than to do something like that, or >>>> worse, debug such a "program". By the way, this sort of application was not >>>> a joke or a neat hack, it was actually the only way those guys knew how to >>>> solve a problem. >>>> >>>> 15 years later... I have a physics undergraduate student working with me. >>>> Very smart and interested. They still learn C and later on when they need to >>>> do something, what is it they do? Most professors use Origin. A huge >>>> improvement over Excel, but still. A couple of months ago, he had to turn in >>>> a report and since we don't have Origin, he was using Excel. I kind of felt >>>> sorry for him and I helped him out to do it in Python. He couldn't believe >>>> it. >>> >>> Oh - dear; you probably saw this stuff? >>> >>> http://blog.stodden.net/2013/04/19/what-the-reinhart-rogoff-debacle-really-shows-verifying-empirical-results-needs-to-be-routine/ >> >> I think that's a good example that peer review works. > > It's a good example of how peer-review should work, but it's very > uncommon for the reviewer to have the original spreadsheet, and that > was the key to the problem. The spreadsheet mistake was only one point driving the result, the rest were modelling decisions. Even without having access to their original work, the study can be independently redone and show that there is no "big" effect. Even in their results, using robust measures like median doesn't show much of an effect. So it's mainly a few outliers (or coding mistakes) my favorite outside economics http://www.genomesunzipped.org/2012/03/questioning-the-evidence-for-non-canonical-rna-editing-in-humans.php (one advantage of economics is that there have always been "schools of thought" partially lined up with the political orientation. This has the consequence that if one side finds something "good", the other side tries to disprove it. And the compensating bias might uncover what is a robust finding.) > >>>> I did my Masters and PhD in CFD. Most other students had almost no >>>> background in programming and did most things using Excel! When they had to >>>> modify some code, it was almost by accident that things worked. You can >>>> imagine what sort of code comes out of this. The professors didn't know >>>> programming much better. Just getting them to understand the concept of >>>> version control took a while. >>>> >>>> In my opinion, If schools taught, at the begining, something like >>>> Python/Octave/R instead of C, students would be able to use this knowledge >>>> easily and productively throughout their courses and eventually learn C when >>>> they really needed it. >>> >>> That's surely one of the big arguments for Python - it is a great >>> first language, and it is capable across a wider range than Octave or >>> R - or even Excel :) >> >> We can mistake in any language >> >> I just read this >> >> """ >> Abstract >> >> [Correction Notice: An Erratum for this article was reported in >> Vol 17(4) of Psychological Methods (see record 2012-33502-001). The R >> code for arriving at adjusted p values for one of the methods is >> incorrect. The specific changes that need to be made are provided in >> the erratum.] >> """ >> >> It's still functioning peer review if a mistake is found after an >> article has been published, or after a pull request has landed in >> master. > > The problem is that the peers don't get to review what has been done, > in general, they get to review what the author said had been done. > > Donoho's point - about computational science - is that this can be > very different. > > The question is then : does this matter? Are - most published > research findings false? following the link from the PLOS editorial statement http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124 I think the entire premise "are research findings false" is completely misguided. It just continuous the magic 0.05 tradition. (However I think it makes a good polemic to illustrate a point.) Disclaimer: I never read the applied part of any paper outside of economics, and I can only imagine from second hand readings that some articles really only report a p-value or if their result is statistically significant or not. I have been reading now for several months articles criticizing research tradition and editorial recommendations to improve statistical reporting in various fields, starting with psychological methods and behavioral research. The general recommendation is to report effect sizes and confidence intervals instead of, or additional to p-values. So we can actually see what the size of this statistical (non-)significant effect is, and learn from it. Maybe the interval is not completely "false". and there are other problems in some fields with the majority of the research: the studies are underpowered, they ignore multiple testing problems, ... (according to some editorials and reports) Where open access to research methodology comes in is in undermining the reputation of researchers that systematically bias (Ioannidis) their results. In economics this debate happened a few years ago after some famous failures to (independently) replicate results, and now most, I think all, top economics journals require that the data/source is published. > >> --------- >> in general: >> >> in the research areas that I know, the vast majority of researchers >> use Windows, and everything that is not core task is point and click. >> As long as Matlab, Stata and GAUSS, or whatever else, doesn't have >> version control build in, VC won't be used by the majority of >> researchers that I know. We didn't grow up when version control was >> popular. And we don't have IT guys to manage it for us. >> (There is the old fashioned version control of starting new >> directories at crucial stages, or for specific conference talks and >> paper submissions.) >> (DVCS are only a few years old, and it will take a few more years for >> diffusion to "non-programmers" to happen.) > > We get taught some complicated things when we are training - calculus, > algebra... > > Does it make sense that we don't teach less complicated things like > version control and programming? (I don't know about teaching computer programming in American undergraduate programs, I'm a resident alien.) Programming within economics is not directly part of the curriculum. Students (undergraduates once they are beyond Excel!) learn programming in statistics, in my PhD program it was applied statistics/econometrics and computational economics (simulating macroeconomy) where we learned to program, and got paid for it as research assistant (with no requirement for unit tests nor version control.) My impression is that for "non-programmers", the behavioral pattern for using the tools is acquired in the applied fields that use computer programming. Once unit/functional testing and version control is used there by professors and teaching assistants and required as part of the best practice for doing your work, then it will stick. Otherwise it's like calculus. Some need it most of their life, the other ones forget about it as soon as the exams are over. (But you cannot learn calculus and statistics by doing, and there is only limited amount of time students have. More statistics please.) ----- Two more Version control systems are not available for Word Processing which rules out version control for large parts of the actual work. network and peer effects: One reason I think that version control will be the standard in a few more years (if usability gets better) is that you just need one or a few "programming types" in a group to spread it like an infection. You need those guys both as an advertising to see how to do things in a better way, but also as a support when a new user gets lost. I only found git acceptable because I knew that I have the rescue and support team on the mailing lists. (Thanks for that.) Josef > >> Even after using git some time, I only find it usable because I can do >> all the regular stuff with git gui (and for unusual stuff I can use >> commandline and git gui at the same time). >> >> >> --------- >> (just in case I'm misunderstood: >> I'm all in favor of best practices and unit and functional tests, but >> I don't expect that researchers will adopt it (fast) if it goes >> against their usual pattern of using tools. >> example: If you teach a software carpentry course that uses Linux, >> then I wouldn't be surprised if some users go back to their office and >> the first thing they do is use Excel. :) > > In general as you know I agree completely that it doesn't make sense > to persuade people to switch from Windows to Linux at the same time as > persuading them to use good software tools. We should teach people > stuff that they will and can use, and it's a common them among > software-carpentry types that it would be better to teach Windows > people how to best use Windows rather than teaching them on a virtual > machine that they are unlikely to use for their work. > > Cheers, > > Matthew > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From matthew.brett at gmail.com Sun Jun 2 01:47:17 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 1 Jun 2013 22:47:17 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> Message-ID: Hi, On Sat, Jun 1, 2013 at 8:29 PM, wrote: > following the link from the PLOS editorial statement > http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124 > > I think the entire premise "are research findings false" is completely > misguided. It just continuous the magic 0.05 tradition. I don't think it is as simple as that. For example, one of the studies I cited before was only able to replicate 6 / 53 'landmark' studies in hematological oncology. http://www.nature.com/nature/journal/v483/n7391/full/483531a.html "Clearly there are fundamental problems in both academia and industry in the way such research is conducted and reported. Addressing these systemic issues will require tremendous commitment and a desire to change the prevalent culture. Perhaps the most crucial element for change is to acknowledge that the bar for reproducibility in performing and presenting preclinical studies must be raised." I've been canvassing my colleagues over the last year or so about what replication rate they would guess in brain imaging, and the answers are rather variable, but have a mean around 30 percent. These estimates are from people running brain imaging centers or very experienced in the field. If these estimates are correct, the waste is enormous, overwhelming. > Otherwise it's like calculus. Some need it most of their life, the > other ones forget about it as soon as the exams are over. > (But you cannot learn calculus and statistics by doing, and there is > only limited amount of time students have. More statistics please.) The person who is trying to do work in Excel, that should be done in a programming language, needed that training. They will be doing slower work. and make more errors for the lack of a small amount of training. For sure the tech-smart guy or gal in the lab makes a big difference, but not every lab has such a person, and it's common (believe me) for researchers who don't know this stuff to assume it's only for nerds and that it only slows down getting real work done. That's largely a function of lack of training in how easy it is to make mistakes, and therefore the necessity of using tools to reduce mistakes and improve transparency. I've also noticed that when people are not comfortable with their tools, they often fail to notice obvious statistical issues that they would normally expect to spot at once. Here's an obvious example from brain imaging: http://www.edvul.com/voodoocorr.php So, if you teach people statistics and you don't teach them how and when to program, and they have to do anything other than point and click in SPSS, you'll often get bad statistics none the less. Cheers, Matthew From takowl at gmail.com Sun Jun 2 07:00:25 2013 From: takowl at gmail.com (Thomas Kluyver) Date: Sun, 2 Jun 2013 12:00:25 +0100 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> Message-ID: On 2 June 2013 06:47, Matthew Brett wrote: > The person who is trying to do work in Excel, that should be done in a > programming language, needed that training. They will be doing slower > work. and make more errors for the lack of a small amount of training. > I agree with the argument, but let's not understate the amount of learning involved. Here, all new PhD students are given a seven day intensive R course, by a lecturer who's good enough at teaching R that he makes money from running the course elsewhere. That covers the basics, but it certainly doesn't mean that they can do anything in R that they would otherwise do in Excel. And it doesn't even touch on version control or writing tests. I found one of my labmates editing the copy of a modelling script that she'd named 'foobar_DONOTEDIT', but I still couldn't persuade her to use version control. I think there's a fascinating question as to why people find Excel so much easier than a 'real' programming language, even if they create really complex spreadsheets. I think it's a combination of: - Familiarity: people are taught spreadsheets, and often Excel specifically, at school, whereas 'programming' is seen as a kind of geek sorcery. - Mingling code and data: I think it's conceptually harder to have your data in one place and your analysis in another, even though that's ultimately good practice - Seeing what you're doing: In Excel, you calculate something by putting a formula in a cell. You press enter, and there's the result. In code, you store it in a variable, and you have to explicitly ask for it to be displayed. If you're calculating 1000 variables in a loop, then it's not obvious from the display which one corresponds to which input. Can we mix some of that comfort with the robustness we're used to in conventional code? E.g. I can imagine a different kind of spreadsheet tool, where instead of putting formulae in cells, you define new columns and tables, and where you can save the steps you've done to apply to another data file in the same format. Perhaps it could even naturally progress to real code so that it acts as a kind of gateway drug for programming. Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Jun 2 07:49:47 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 2 Jun 2013 07:49:47 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> Message-ID: On Sun, Jun 2, 2013 at 7:00 AM, Thomas Kluyver wrote: > On 2 June 2013 06:47, Matthew Brett wrote: >> >> The person who is trying to do work in Excel, that should be done in a >> programming language, needed that training. They will be doing slower >> work. and make more errors for the lack of a small amount of training. > > > I agree with the argument, but let's not understate the amount of learning > involved. Here, all new PhD students are given a seven day intensive R > course, by a lecturer who's good enough at teaching R that he makes money > from running the course elsewhere. That covers the basics, but it certainly > doesn't mean that they can do anything in R that they would otherwise do in > Excel. And it doesn't even touch on version control or writing tests. I > found one of my labmates editing the copy of a modelling script that she'd > named 'foobar_DONOTEDIT', but I still couldn't persuade her to use version > control. > > I think there's a fascinating question as to why people find Excel so much > easier than a 'real' programming language, even if they create really > complex spreadsheets. I think it's a combination of: > > - Familiarity: people are taught spreadsheets, and often Excel specifically, > at school, whereas 'programming' is seen as a kind of geek sorcery. > - Mingling code and data: I think it's conceptually harder to have your data > in one place and your analysis in another, even though that's ultimately > good practice > - Seeing what you're doing: In Excel, you calculate something by putting a > formula in a cell. You press enter, and there's the result. In code, you > store it in a variable, and you have to explicitly ask for it to be > displayed. If you're calculating 1000 variables in a loop, then it's not > obvious from the display which one corresponds to which input. The last point is where I still use Excel or OpenOffice calc. Visual inspection of a larger amount of heterogeneous data. for another area where excel use is still very heavy http://robertkugel.ventanaresearch.com/2013/01/29/the-spreadsheet-and-the-whale/ via http://blog.enthought.com/?p=113067 (the advantages and perils of using Excel when you bet a few million dollars on the outcome.) > > Can we mix some of that comfort with the robustness we're used to in > conventional code? E.g. I can imagine a different kind of spreadsheet tool, > where instead of putting formulae in cells, you define new columns and > tables, and where you can save the steps you've done to apply to another > data file in the same format. Perhaps it could even naturally progress to > real code so that it acts as a kind of gateway drug for programming. Stata has a very good combination, point and click for the commands that do the statistics or data handling, then the commands are printed to the console. The results can be seen in the console or the dataframe viewer. (in matlab the plot wizard works similar, point and click and save to script) It's easy to build up a collection of reproducable, reusable scripts this way. This was great for me as beginner or when I use parts that I don't know or remember (or the syntax and options for it). In contrast, a new plot in matplotlib is a few hours of reading documentation and googling for examples. Josef > > Thomas > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From otrov at hush.ai Sun Jun 2 08:21:04 2013 From: otrov at hush.ai (zetah) Date: Sun, 02 Jun 2013 14:21:04 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> Message-ID: <20130602122104.E09CEA6E40@smtp.hushmail.com> Matthew Brett wrote: >The person who is trying to do work in Excel, that should be done in a >programming language, needed that training. They will be doing slower >work. and make more errors for the lack of a small amount of training. Thomas Kluyver wrote: >I think there's a fascinating question as to why people find Excel so much >easier than a 'real' programming language, even if they create really >complex spreadsheets. I find you use term "Excel" too vague. One way to look at Excel is as visual interface to your data, that you can slice and dice and apply most common tools in least amount of time, even if you are average user. But Excel data is also available to you in object oriented VBA programming and then also VSTO (.NET Framework) if VBA is too coarse for your sensitive project. So Excel (and good part of Office) expose it's interface and data to both VBA (builtin programming interface) and Visual Studio. It's as real programming as you are up to. As for Python applicability in scientific software, I find it most useful in environments similar to Matlab/R/IDL. I guess that's the SciPy paradigm after all I feel that Python can offer new and original possibilities, or adapt to new trends like Mathematica and IPython Notebook, but Excel is just different league From takowl at gmail.com Sun Jun 2 08:48:37 2013 From: takowl at gmail.com (Thomas Kluyver) Date: Sun, 2 Jun 2013 13:48:37 +0100 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> Message-ID: On 2 June 2013 12:49, wrote: > This was great for me as beginner or when I use parts that I don't > know or remember (or the syntax and options for it). > In contrast, a new plot in matplotlib is a few hours of reading > documentation and googling for examples. > Yes, I've thought this about plotting as well. If I want to, say, rotate the labels on the x axis by 45 degrees, I'd much rather just click on them and edit a number in a text box, rather than googling what function and parameter will make it look like I want. In the Python world, Veusz (http://home.gna.org/veusz/ ) has some of this capability, and it can save a Python script to recreate a plot that you've produced interactively. But I've not really used it in earnest, because it would be awkward to integrate with my usual tools. zetah: > But Excel data is also available to you in object oriented VBA programming and then also VSTO (.NET Framework) if VBA is too coarse for your sensitive project. So Excel (and good part of Office) expose it's interface and data to both VBA (builtin programming interface) and Visual Studio. It's as real programming as you are up to. You're technically correct... the best kind of correct. ;-) I wrote VBA macros once years ago. But the group of users we're discussing don't use those features. It's not a natural extension of making spreadsheets, but a completely different set of skills to learn. And I don't think we want to encourage them down that route - data analysis in VBA or even .NET would be much more offputting than using Python/R/Matlab. Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From otrov at hush.ai Sun Jun 2 09:29:27 2013 From: otrov at hush.ai (zetah) Date: Sun, 02 Jun 2013 15:29:27 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> Message-ID: <20130602132927.B6CC4A6E40@smtp.hushmail.com> Thomas Kluyver wrote: >You're technically correct... the best kind of correct. ;-) I wrote VBA >macros once years ago. But the group of users we're discussing >don't use those features. It's not a natural extension of making >spreadsheets, but a completely different set of skills to learn. And I don't think we >want to encourage them down that route - data analysis in VBA or even .NET >would be much more offputting than using Python/R/Matlab. Oops... did I jump in semi-private discussion? Apologies, I wasn't aware, I thought you guys discuss scientific software generally, and I just read last couple of emails in my inbox. Cheers From takowl at gmail.com Sun Jun 2 10:33:12 2013 From: takowl at gmail.com (Thomas Kluyver) Date: Sun, 2 Jun 2013 15:33:12 +0100 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <20130602132927.B6CC4A6E40@smtp.hushmail.com> References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> <20130602132927.B6CC4A6E40@smtp.hushmail.com> Message-ID: On 2 June 2013 14:29, zetah wrote: > Oops... did I jump in semi-private discussion? No, sorry if I gave that impression. This is all intended to be public, as far as I'm aware. Best wishes, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From otrov at hush.ai Sun Jun 2 10:59:58 2013 From: otrov at hush.ai (zetah) Date: Sun, 02 Jun 2013 16:59:58 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> <20130602132927.B6CC4A6E40@smtp.hushmail.com> Message-ID: <20130602145958.DB291A6E38@smtp.hushmail.com> Thomas Kluyver wrote: >> Oops... did I jump in semi-private discussion? > >No, sorry if I gave that impression. This is all intended to be public, as >far as I'm aware. All right, thanks. You were mentioning a group of users, and encouraging some routes, so I thought your initial discussion was concerning some group known to you. Never mind, I don't plan to advocate Excel in Python user group, but to raise concern about terminology used. "Excel > spreadsheet" Cheers From takowl at gmail.com Sun Jun 2 11:51:02 2013 From: takowl at gmail.com (Thomas Kluyver) Date: Sun, 2 Jun 2013 16:51:02 +0100 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <20130602145958.DB291A6E38@smtp.hushmail.com> References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> <20130602132927.B6CC4A6E40@smtp.hushmail.com> <20130602145958.DB291A6E38@smtp.hushmail.com> Message-ID: On 2 June 2013 15:59, zetah wrote: > You were mentioning a group of users, and encouraging some routes, so I > thought your initial discussion was concerning some group known to you. > 'type of users' might have been a more accurate phrase, but it has an unfortunate negative ring that I wanted to avoid. There are a lot of people doing important data analysis in quite risky and hard-to-maintain ways. Using spreadsheets where some simple code might be more reliable is one symptom of that, and there have been a couple of major examples from economics where spreadsheet errors led to serious mistakes. The discussion is revolving roughly around whether and how we can push those users towards better tools and methods, like coding, version control and testing. Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From otrov at hush.ai Sun Jun 2 14:00:54 2013 From: otrov at hush.ai (zetah) Date: Sun, 02 Jun 2013 20:00:54 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> <20130602132927.B6CC4A6E40@smtp.hushmail.com> <20130602145958.DB291A6E38@smtp.hushmail.com> Message-ID: <20130602180055.5CC67A6E40@smtp.hushmail.com> Thomas Kluyver wrote: >'type of users' might have been a more accurate phrase, but it has an >unfortunate negative ring that I wanted to avoid. There are a lot of people >doing important data analysis in quite risky and hard-to-maintain ways. >Using spreadsheets where some simple code might be more reliable is one >symptom of that, and there have been a couple of major examples from >economics where spreadsheet errors led to serious mistakes. >The discussion is revolving roughly around whether and how we can push >those users towards better tools and methods, like coding, version control >and testing. Thanks for overview Thomas, I read all emails on the subject and will comment briefly, for the sake of my participation, although topic is huge I don't have experience with critical modeling, but I do and learn data analysis with historical data and generally. If we speak about errors, I think that most of it, like taught in Numerical analysis course, are due to human factor not understanding data types and also variety of data sources representing data differently. Trivial example that sql and netcdf databases represent same data in different format. Similarly for other data sources which in turn can be just plain text dumps. If that is handled correctly and user is familiar with the tool used, there shouldn't be any surprises. If it is of any interest, I thought to generalize my usual workflow, as single user example (hope it's not useless): - collecting data: if not directly available I use Python, and depending on source do validation. I don't change format if it's not necessary. - pre-processing: if I preprocess (usually with Python), I store data to sql server. - using data: single set or multiple datasets in PowerPivot (limited just by amount of RAM), where DAX allows calculations on pivoted views values. I haven't yet found any other tool that allows such diverse views in such short time. - post-processing: when needed I export results to CSV. Usually to just load in numpy array and plot with Matplotlib, or 3D viewing in VisIt or Gephi. - versioning: data in source database(s) stays intact, and all calculations can be saved to a file (with values), and then opened again even if datasource is not available. So I use Excel mainly for data manipulation and Python back and forth. Also I use additional tools for 3D visualization. I never liked to learn about versioning systems, and I'm happy with my current scheme From charlesr.harris at gmail.com Sun Jun 2 14:38:09 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 2 Jun 2013 12:38:09 -0600 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <20130602180055.5CC67A6E40@smtp.hushmail.com> References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> <20130602132927.B6CC4A6E40@smtp.hushmail.com> <20130602145958.DB291A6E38@smtp.hushmail.com> <20130602180055.5CC67A6E40@smtp.hushmail.com> Message-ID: On Sun, Jun 2, 2013 at 12:00 PM, zetah wrote: > Thomas Kluyver wrote: > >'type of users' might have been a more accurate phrase, but it has an > >unfortunate negative ring that I wanted to avoid. There are a lot of > people > >doing important data analysis in quite risky and hard-to-maintain ways. > >Using spreadsheets where some simple code might be more reliable is one > >symptom of that, and there have been a couple of major examples from > >economics where spreadsheet errors led to serious mistakes. > >The discussion is revolving roughly around whether and how we can push > >those users towards better tools and methods, like coding, version control > >and testing. > > Thanks for overview Thomas, I read all emails on the subject and will > comment briefly, for the sake of my participation, although topic is huge > > I don't have experience with critical modeling, but I do and learn data > analysis with historical data and generally. > > If we speak about errors, I think that most of it, like taught in > Numerical analysis course, are due to human factor not understanding data > types and also variety of data sources representing data differently. > Trivial example that sql and netcdf databases represent same data in > different format. Similarly for other data sources which in turn can be > just plain text dumps. If that is handled correctly and user is familiar > with the tool used, there shouldn't be any surprises. > At least when no one checks ;) The errors that the gods of analysis gift to us are often hidden away and are easy to overlook. They also tend to creep in when one is overconfident. It's all part of the devine sense of humor. > > If it is of any interest, I thought to generalize my usual workflow, as > single user example (hope it's not useless): > - collecting data: if not directly available I use Python, and depending > on source do validation. I don't change format if it's not necessary. > - pre-processing: if I preprocess (usually with Python), I store data to > sql server. > - using data: single set or multiple datasets in PowerPivot (limited just > by amount of RAM), where DAX allows calculations on pivoted views values. I > haven't yet found any other tool that allows such diverse views in such > short time. > - post-processing: when needed I export results to CSV. Usually to just > load in numpy array and plot with Matplotlib, or 3D viewing in VisIt or > Gephi. > - versioning: data in source database(s) stays intact, and all > calculations can be saved to a file (with values), and then opened again > even if datasource is not available. > > So I use Excel mainly for data manipulation and Python back and forth. > Also I use additional tools for 3D visualization. > I never liked to learn about versioning systems, and I'm happy with my > current scheme > I confess to my shame that I have never learned to use a spreadsheet for any but the simplest things. It's just so darn complicated ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun Jun 2 15:51:00 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 2 Jun 2013 12:51:00 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> <20130602132927.B6CC4A6E40@smtp.hushmail.com> <20130602145958.DB291A6E38@smtp.hushmail.com> <20130602180055.5CC67A6E40@smtp.hushmail.com> Message-ID: Hi, On Sun, Jun 2, 2013 at 11:38 AM, Charles R Harris wrote: > > > On Sun, Jun 2, 2013 at 12:00 PM, zetah wrote: >> >> Thomas Kluyver wrote: >> >'type of users' might have been a more accurate phrase, but it has an >> >unfortunate negative ring that I wanted to avoid. There are a lot of >> > people >> >doing important data analysis in quite risky and hard-to-maintain ways. >> >Using spreadsheets where some simple code might be more reliable is one >> >symptom of that, and there have been a couple of major examples from >> >economics where spreadsheet errors led to serious mistakes. >> >The discussion is revolving roughly around whether and how we can push >> >those users towards better tools and methods, like coding, version >> > control >> >and testing. >> >> Thanks for overview Thomas, I read all emails on the subject and will >> comment briefly, for the sake of my participation, although topic is huge >> >> I don't have experience with critical modeling, but I do and learn data >> analysis with historical data and generally. >> >> If we speak about errors, I think that most of it, like taught in >> Numerical analysis course, are due to human factor not understanding data >> types and also variety of data sources representing data differently. >> Trivial example that sql and netcdf databases represent same data in >> different format. Similarly for other data sources which in turn can be just >> plain text dumps. If that is handled correctly and user is familiar with the >> tool used, there shouldn't be any surprises. > > > At least when no one checks ;) The errors that the gods of analysis gift to > us are often hidden away and are easy to overlook. They also tend to creep > in when one is overconfident. It's all part of the devine sense of humor. Yes - when no-one checks! I wish I still shared the feeling that mostly when I do stuff it's correct, or mostly correct, or correct enough. It was only when I started checking that I started to worry. I well remember the happier times I'd write a 100 line analysis script with no tests and be "pretty sure" that it was correct. Cheers, Matthew From otrov at hush.ai Sun Jun 2 16:06:01 2013 From: otrov at hush.ai (zetah) Date: Sun, 02 Jun 2013 22:06:01 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> <20130602132927.B6CC4A6E40@smtp.hushmail.com> <20130602145958.DB291A6E38@smtp.hushmail.com> <20130602180055.5CC67A6E40@smtp.hushm ail.com> Message-ID: <20130602200602.49733A6E42@smtp.hushmail.com> Charles R Harris wrote: >> If we speak about errors, I think that most of it, like taught in >> Numerical analysis course, are due to human factor not understanding data >> types and also variety of data sources representing data differently. >> Trivial example that sql and netcdf databases represent same data in >> different format. Similarly for other data sources which in turn can be >> just plain text dumps. If that is handled correctly and user is familiar >> with the tool used, there shouldn't be any surprises. >> > >At least when no one checks ;) The errors that the gods of analysis gift to >us are often hidden away and are easy to overlook. They also tend to creep >in when one is overconfident. It's all part of the devine sense of humor. Probably true. I know this comes from experience that I have not enough >I confess to my shame that I have never learned to use a spreadsheet for >any but the simplest things. It's just so darn complicated ;) That's fine, maybe it's just a legacy habit no one wants to break or preference toward familiar data manipulation environment. For myself, even with all that numpy broadcasting magics, I'd spend much more time slicing data in Python then doing it as I currently prefer, as more abstractions I'd have to use for same outcome. Viewing the values at the same time while calculating feels more natural to me and provides instant "validation" to say. But if I want real validation I can make validation scenario. Earlier my only annoyance with pivoted data was that I couldn't do more then trivial calculations on values in pivoted view, unless using programmatic approach. Now that's possible (with DAX), and I can't imagine what else could make data manipulation more intuitive to me. There are many aspects on this subject, and please do continue if I stepped in too carelessly :) Cheers From trive at astro.su.se Sun Jun 2 18:31:34 2013 From: trive at astro.su.se (=?ISO-8859-1?Q?Th=F8ger_Emil_Rivera-Thorsen?=) Date: Mon, 03 Jun 2013 00:31:34 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <20130602200602.49733A6E42@smtp.hushmail.com> References: <51A39B6D.4030607@gmail.com> <20130602132927.B6CC4A6E40@smtp.hushmail.com> <20130602145958.DB291A6E38@smtp.hushmail.com> <20130602180055.5CC67A6E40@smtp.hushm ail.com> <20130602200602.49733A6E42@smtp.hushmail.com> Message-ID: <51ABC7C6.3080506@astro.su.se> On 02-06-2013 22:06, zetah wrote: > Charles R Harris wrote: >>> If we speak about errors, I think that most of it, like taught in >>> Numerical analysis course, are due to human factor not understanding data >>> types and also variety of data sources representing data differently. >>> Trivial example that sql and netcdf databases represent same data in >>> different format. Similarly for other data sources which in turn can be >>> just plain text dumps. If that is handled correctly and user is familiar >>> with the tool used, there shouldn't be any surprises. >>> >> At least when no one checks ;) The errors that the gods of analysis gift to >> us are often hidden away and are easy to overlook. They also tend to creep >> in when one is overconfident. It's all part of the devine sense of humor. > Probably true. I know this comes from experience that I have not enough > > >> I confess to my shame that I have never learned to use a spreadsheet for >> any but the simplest things. It's just so darn complicated ;) > That's fine, maybe it's just a legacy habit no one wants to break or preference toward familiar data manipulation environment. > > For myself, even with all that numpy broadcasting magics, I'd spend much more time slicing data in Python then doing it as I currently prefer, as more abstractions I'd have to use for same outcome. Viewing the values at the same time while calculating feels more natural to me and provides instant "validation" to say. But if I want real validation I can make validation scenario. > > Earlier my only annoyance with pivoted data was that I couldn't do more then trivial calculations on values in pivoted view, unless using programmatic approach. Now that's possible (with DAX), and I can't imagine what else could make data manipulation more intuitive to me. > > There are many aspects on this subject, and please do continue if I stepped in too carelessly :) You may of course be perfectly happy with your current work setup, but it seems to me like you could do everything you describe without leaving Python, by using Pandas. Pivot tables, slicing and dicing of heterogenous data types, indexing by multi-layer labels, arbitrary operations on pivoted, sliced and diced data frames, importing/exporting csv, ascii, html and even LaTeX, quick plotting for data ionspection purposes etc. Of course, the interactive element isn't there. On the other hand, it is very powerful, and you don't have to switch between several different environments and tools. The frames are basically enhanced numpy arrays, so the data can be passed directly to numpy or matplotlib. Also, if working in the IPython qtconsole or notebook, simply typing the dataframe's name will show it nicely rendered as an html table. I have definitely enjoyed working with it. Sorry for going slightly off-topic. /Emil > > Cheers > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From otrov at hush.ai Sun Jun 2 20:59:26 2013 From: otrov at hush.ai (zetah) Date: Mon, 03 Jun 2013 02:59:26 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <51ABC7C6.3080506@astro.su.se> References: <51A39B6D.4030607@gmail.com> <20130602132927.B6CC4A6E40@smtp.hushmail.com> <20130602145958.DB291A6E38@smtp.hushmail.com> <20130602180055.5CC67A6E40@smtp.hushm ail.com> <20130602200602.49733A6E42@smtp.hushmail.com> <51AB C7C6.3080506@astro.su.se> Message-ID: <20130603005926.A32CEA6E38@smtp.hushmail.com> Th?ger Emil Rivera-Thorsen wrote: >You may of course be perfectly happy with your current work setup, but >it seems to me like you could do everything you describe without leaving >Python, by using Pandas. Pivot tables, slicing and dicing of >heterogenous data types, indexing by multi-layer labels, arbitrary >operations on pivoted, sliced and diced data frames, importing/exporting >csv, ascii, html and even LaTeX, quick plotting for data ionspection >purposes etc. I tried it shortly couple of month ago or so, and it seemed like impressive work in progress. I remember time series handling was very intuitive, but I didn't study the module for some reason... and I didn't know about pivoting dataframes >Of course, the interactive element isn't there. On the >other hand, it is very powerful, and you don't have to switch between >several different environments and tools. You are right there too, reliable interactive interface provides less mental effort, and surely using same environment has it's benefits. UI as extension is important, not many get impressed on command line. I hope to see IPython Notebook drive new ideas and attract more developers, more then I expect Excel on Surface to implement new interface exclusively based on touch and filters. >The frames are basically enhanced numpy arrays, so the data can be >passed directly to numpy or matplotlib. Also, if working in the IPython >qtconsole or notebook, simply typing the dataframe's name will show it >nicely rendered as an html table. >I have definitely enjoyed working with it. Sounds like fun. I'll experiment Thanks From nadavh at visionsense.com Mon Jun 3 01:26:22 2013 From: nadavh at visionsense.com (Nadav Horesh) Date: Mon, 3 Jun 2013 05:26:22 +0000 Subject: [SciPy-User] SciPy-User Digest, Vol 118, Issue 4 In-Reply-To: References: Message-ID: <520e0eec44af4501b4100c172b6ff08c@BN1PR08MB076.namprd08.prod.outlook.com> For me an important use case is a file transfer over the VPN. Is there any way to test it? Nadav. ________________________________________ From: scipy-user-bounces at scipy.org on behalf of scipy-user-request at scipy.org Sent: 03 June 2013 01:26 To: scipy-user at scipy.org Subject: SciPy-User Digest, Vol 118, Issue 4 Send SciPy-User mailing list submissions to scipy-user at scipy.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.scipy.org/mailman/listinfo/scipy-user or, via email, send a message with subject or body 'help' to scipy-user-request at scipy.org You can reach the person managing the list at scipy-user-owner at scipy.org When replying, please edit your Subject line so it is more specific than "Re: Contents of SciPy-User digest..." Today's Topics: 1. Re: peer review of scientific software (zetah) 2. Re: peer review of scientific software (Charles R Harris) 3. Re: peer review of scientific software (Matthew Brett) 4. Re: peer review of scientific software (zetah) 5. Re: peer review of scientific software (Th?ger Emil Rivera-Thorsen) ---------------------------------------------------------------------- Message: 1 Date: Sun, 02 Jun 2013 20:00:54 +0200 From: "zetah" Subject: Re: [SciPy-User] peer review of scientific software To: "SciPy Users List" Message-ID: <20130602180055.5CC67A6E40 at smtp.hushmail.com> Content-Type: text/plain; charset="UTF-8" Thomas Kluyver wrote: >'type of users' might have been a more accurate phrase, but it has an >unfortunate negative ring that I wanted to avoid. There are a lot of people >doing important data analysis in quite risky and hard-to-maintain ways. >Using spreadsheets where some simple code might be more reliable is one >symptom of that, and there have been a couple of major examples from >economics where spreadsheet errors led to serious mistakes. >The discussion is revolving roughly around whether and how we can push >those users towards better tools and methods, like coding, version control >and testing. Thanks for overview Thomas, I read all emails on the subject and will comment briefly, for the sake of my participation, although topic is huge I don't have experience with critical modeling, but I do and learn data analysis with historical data and generally. If we speak about errors, I think that most of it, like taught in Numerical analysis course, are due to human factor not understanding data types and also variety of data sources representing data differently. Trivial example that sql and netcdf databases represent same data in different format. Similarly for other data sources which in turn can be just plain text dumps. If that is handled correctly and user is familiar with the tool used, there shouldn't be any surprises. If it is of any interest, I thought to generalize my usual workflow, as single user example (hope it's not useless): - collecting data: if not directly available I use Python, and depending on source do validation. I don't change format if it's not necessary. - pre-processing: if I preprocess (usually with Python), I store data to sql server. - using data: single set or multiple datasets in PowerPivot (limited just by amount of RAM), where DAX allows calculations on pivoted views values. I haven't yet found any other tool that allows such diverse views in such short time. - post-processing: when needed I export results to CSV. Usually to just load in numpy array and plot with Matplotlib, or 3D viewing in VisIt or Gephi. - versioning: data in source database(s) stays intact, and all calculations can be saved to a file (with values), and then opened again even if datasource is not available. So I use Excel mainly for data manipulation and Python back and forth. Also I use additional tools for 3D visualization. I never liked to learn about versioning systems, and I'm happy with my current scheme ------------------------------ Message: 2 Date: Sun, 2 Jun 2013 12:38:09 -0600 From: Charles R Harris Subject: Re: [SciPy-User] peer review of scientific software To: SciPy Users List Message-ID: Content-Type: text/plain; charset="iso-8859-1" On Sun, Jun 2, 2013 at 12:00 PM, zetah wrote: > Thomas Kluyver wrote: > >'type of users' might have been a more accurate phrase, but it has an > >unfortunate negative ring that I wanted to avoid. There are a lot of > people > >doing important data analysis in quite risky and hard-to-maintain ways. > >Using spreadsheets where some simple code might be more reliable is one > >symptom of that, and there have been a couple of major examples from > >economics where spreadsheet errors led to serious mistakes. > >The discussion is revolving roughly around whether and how we can push > >those users towards better tools and methods, like coding, version control > >and testing. > > Thanks for overview Thomas, I read all emails on the subject and will > comment briefly, for the sake of my participation, although topic is huge > > I don't have experience with critical modeling, but I do and learn data > analysis with historical data and generally. > > If we speak about errors, I think that most of it, like taught in > Numerical analysis course, are due to human factor not understanding data > types and also variety of data sources representing data differently. > Trivial example that sql and netcdf databases represent same data in > different format. Similarly for other data sources which in turn can be > just plain text dumps. If that is handled correctly and user is familiar > with the tool used, there shouldn't be any surprises. > At least when no one checks ;) The errors that the gods of analysis gift to us are often hidden away and are easy to overlook. They also tend to creep in when one is overconfident. It's all part of the devine sense of humor. > > If it is of any interest, I thought to generalize my usual workflow, as > single user example (hope it's not useless): > - collecting data: if not directly available I use Python, and depending > on source do validation. I don't change format if it's not necessary. > - pre-processing: if I preprocess (usually with Python), I store data to > sql server. > - using data: single set or multiple datasets in PowerPivot (limited just > by amount of RAM), where DAX allows calculations on pivoted views values. I > haven't yet found any other tool that allows such diverse views in such > short time. > - post-processing: when needed I export results to CSV. Usually to just > load in numpy array and plot with Matplotlib, or 3D viewing in VisIt or > Gephi. > - versioning: data in source database(s) stays intact, and all > calculations can be saved to a file (with values), and then opened again > even if datasource is not available. > > So I use Excel mainly for data manipulation and Python back and forth. > Also I use additional tools for 3D visualization. > I never liked to learn about versioning systems, and I'm happy with my > current scheme > I confess to my shame that I have never learned to use a spreadsheet for any but the simplest things. It's just so darn complicated ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20130602/48da6418/attachment-0001.html ------------------------------ Message: 3 Date: Sun, 2 Jun 2013 12:51:00 -0700 From: Matthew Brett Subject: Re: [SciPy-User] peer review of scientific software To: SciPy Users List Message-ID: Content-Type: text/plain; charset=ISO-8859-1 Hi, On Sun, Jun 2, 2013 at 11:38 AM, Charles R Harris wrote: > > > On Sun, Jun 2, 2013 at 12:00 PM, zetah wrote: >> >> Thomas Kluyver wrote: >> >'type of users' might have been a more accurate phrase, but it has an >> >unfortunate negative ring that I wanted to avoid. There are a lot of >> > people >> >doing important data analysis in quite risky and hard-to-maintain ways. >> >Using spreadsheets where some simple code might be more reliable is one >> >symptom of that, and there have been a couple of major examples from >> >economics where spreadsheet errors led to serious mistakes. >> >The discussion is revolving roughly around whether and how we can push >> >those users towards better tools and methods, like coding, version >> > control >> >and testing. >> >> Thanks for overview Thomas, I read all emails on the subject and will >> comment briefly, for the sake of my participation, although topic is huge >> >> I don't have experience with critical modeling, but I do and learn data >> analysis with historical data and generally. >> >> If we speak about errors, I think that most of it, like taught in >> Numerical analysis course, are due to human factor not understanding data >> types and also variety of data sources representing data differently. >> Trivial example that sql and netcdf databases represent same data in >> different format. Similarly for other data sources which in turn can be just >> plain text dumps. If that is handled correctly and user is familiar with the >> tool used, there shouldn't be any surprises. > > > At least when no one checks ;) The errors that the gods of analysis gift to > us are often hidden away and are easy to overlook. They also tend to creep > in when one is overconfident. It's all part of the devine sense of humor. Yes - when no-one checks! I wish I still shared the feeling that mostly when I do stuff it's correct, or mostly correct, or correct enough. It was only when I started checking that I started to worry. I well remember the happier times I'd write a 100 line analysis script with no tests and be "pretty sure" that it was correct. Cheers, Matthew ------------------------------ Message: 4 Date: Sun, 02 Jun 2013 22:06:01 +0200 From: "zetah" Subject: Re: [SciPy-User] peer review of scientific software To: "SciPy Users List" Message-ID: <20130602200602.49733A6E42 at smtp.hushmail.com> Content-Type: text/plain; charset="UTF-8" Charles R Harris wrote: >> If we speak about errors, I think that most of it, like taught in >> Numerical analysis course, are due to human factor not understanding data >> types and also variety of data sources representing data differently. >> Trivial example that sql and netcdf databases represent same data in >> different format. Similarly for other data sources which in turn can be >> just plain text dumps. If that is handled correctly and user is familiar >> with the tool used, there shouldn't be any surprises. >> > >At least when no one checks ;) The errors that the gods of analysis gift to >us are often hidden away and are easy to overlook. They also tend to creep >in when one is overconfident. It's all part of the devine sense of humor. Probably true. I know this comes from experience that I have not enough >I confess to my shame that I have never learned to use a spreadsheet for >any but the simplest things. It's just so darn complicated ;) That's fine, maybe it's just a legacy habit no one wants to break or preference toward familiar data manipulation environment. For myself, even with all that numpy broadcasting magics, I'd spend much more time slicing data in Python then doing it as I currently prefer, as more abstractions I'd have to use for same outcome. Viewing the values at the same time while calculating feels more natural to me and provides instant "validation" to say. But if I want real validation I can make validation scenario. Earlier my only annoyance with pivoted data was that I couldn't do more then trivial calculations on values in pivoted view, unless using programmatic approach. Now that's possible (with DAX), and I can't imagine what else could make data manipulation more intuitive to me. There are many aspects on this subject, and please do continue if I stepped in too carelessly :) Cheers ------------------------------ Message: 5 Date: Mon, 03 Jun 2013 00:31:34 +0200 From: Th?ger Emil Rivera-Thorsen Subject: Re: [SciPy-User] peer review of scientific software To: SciPy Users List Message-ID: <51ABC7C6.3080506 at astro.su.se> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 02-06-2013 22:06, zetah wrote: > Charles R Harris wrote: >>> If we speak about errors, I think that most of it, like taught in >>> Numerical analysis course, are due to human factor not understanding data >>> types and also variety of data sources representing data differently. >>> Trivial example that sql and netcdf databases represent same data in >>> different format. Similarly for other data sources which in turn can be >>> just plain text dumps. If that is handled correctly and user is familiar >>> with the tool used, there shouldn't be any surprises. >>> >> At least when no one checks ;) The errors that the gods of analysis gift to >> us are often hidden away and are easy to overlook. They also tend to creep >> in when one is overconfident. It's all part of the devine sense of humor. > Probably true. I know this comes from experience that I have not enough > > >> I confess to my shame that I have never learned to use a spreadsheet for >> any but the simplest things. It's just so darn complicated ;) > That's fine, maybe it's just a legacy habit no one wants to break or preference toward familiar data manipulation environment. > > For myself, even with all that numpy broadcasting magics, I'd spend much more time slicing data in Python then doing it as I currently prefer, as more abstractions I'd have to use for same outcome. Viewing the values at the same time while calculating feels more natural to me and provides instant "validation" to say. But if I want real validation I can make validation scenario. > > Earlier my only annoyance with pivoted data was that I couldn't do more then trivial calculations on values in pivoted view, unless using programmatic approach. Now that's possible (with DAX), and I can't imagine what else could make data manipulation more intuitive to me. > > There are many aspects on this subject, and please do continue if I stepped in too carelessly :) You may of course be perfectly happy with your current work setup, but it seems to me like you could do everything you describe without leaving Python, by using Pandas. Pivot tables, slicing and dicing of heterogenous data types, indexing by multi-layer labels, arbitrary operations on pivoted, sliced and diced data frames, importing/exporting csv, ascii, html and even LaTeX, quick plotting for data ionspection purposes etc. Of course, the interactive element isn't there. On the other hand, it is very powerful, and you don't have to switch between several different environments and tools. The frames are basically enhanced numpy arrays, so the data can be passed directly to numpy or matplotlib. Also, if working in the IPython qtconsole or notebook, simply typing the dataframe's name will show it nicely rendered as an html table. I have definitely enjoyed working with it. Sorry for going slightly off-topic. /Emil > > Cheers > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user ------------------------------ _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user End of SciPy-User Digest, Vol 118, Issue 4 ****************************************** From trive at astro.su.se Mon Jun 3 08:35:29 2013 From: trive at astro.su.se (=?UTF-8?B?VGjDuGdlciBFbWlsIFJpdmVyYS1UaG9yc2Vu?=) Date: Mon, 03 Jun 2013 14:35:29 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <20130603005926.A32CEA6E38@smtp.hushmail.com> References: <51A39B6D.4030607@gmail.com> <20130602132927.B6CC4A6E40@smtp.hushmail.com> <20130602145958.DB291A6E38@smtp.hushmail.com> <20130602180055.5CC67A6E40@smtp.hushm ail.com> <20130602200602.49733A6E42@smtp.hushmail.com> <51AB C7C6.3080506@astro.su.se> <20130603005926.A32CEA6E38@smtp.hushmail.c om> Message-ID: <51AC8D91.1050207@astro.su.se> Here's the Pandas docs on reshaping and pivoting: http://pandas.pydata.org/pandas-docs/stable/reshaping.html Otherwise I recommend watching some of Wes McKinney's (the creators) presentations on YouTube, some of them are very in-depth and instructive (although some of them are quite lengthy). Cheers Emil On 03-06-2013 02:59, zetah wrote: > Th?ger Emil Rivera-Thorsen wrote: >> You may of course be perfectly happy with your current work setup, but >> it seems to me like you could do everything you describe without leaving >> Python, by using Pandas. Pivot tables, slicing and dicing of >> heterogenous data types, indexing by multi-layer labels, arbitrary >> operations on pivoted, sliced and diced data frames, importing/exporting >> csv, ascii, html and even LaTeX, quick plotting for data ionspection >> purposes etc. > I tried it shortly couple of month ago or so, and it seemed like impressive work in progress. I remember time series handling was very intuitive, but I didn't study the module for some reason... and I didn't know about pivoting dataframes > > >> Of course, the interactive element isn't there. On the >> other hand, it is very powerful, and you don't have to switch between >> several different environments and tools. > You are right there too, reliable interactive interface provides less mental effort, and surely using same environment has it's benefits. > UI as extension is important, not many get impressed on command line. I hope to see IPython Notebook drive new ideas and attract more developers, more then I expect Excel on Surface to implement new interface exclusively based on touch and filters. > > >> The frames are basically enhanced numpy arrays, so the data can be >> passed directly to numpy or matplotlib. Also, if working in the IPython >> qtconsole or notebook, simply typing the dataframe's name will show it >> nicely rendered as an html table. >> I have definitely enjoyed working with it. > Sounds like fun. I'll experiment > > Thanks > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From jason-sage at creativetrax.com Mon Jun 3 09:20:51 2013 From: jason-sage at creativetrax.com (Jason Grout) Date: Mon, 03 Jun 2013 09:20:51 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> Message-ID: <51AC9833.8040306@creativetrax.com> On 6/2/13 1:47 AM, Matthew Brett wrote: > The person who is trying to do work in Excel, that should be done in a > programming language, needed that training. I'm not sure where in the discussion I should post this, but I wanted to make a comment about the prevalence and power of tools like Excel that I just realized. I've been watching Brett Victor's videos recently, and I just realized that a spreadsheet, with its initial orientation to concrete data, does a good job of implementing his "ladder of abstraction" [1]. You first work with concrete data, then you parametrize the results (e.g., write formulas for the cells), etc. With programming, we need to basically come up with the abstraction right away, which is more difficult. It seems like there is a good tool in the middle ground there that would basically be a spreadsheet that writes your python program for you, letting you play with the data interactively, but the parametrization of your operations writes the python code. Anyways, some thoughts. I realize there are some stats packages that basically do this (write a script as you click through a gui). Thanks, Jason [1] http://worrydream.com/#!/LadderOfAbstraction; another very interesting Bret Victor video is: http://worrydream.com/#!/DrawingDynamicVisualizationsTalkAddendum From evilper at gmail.com Mon Jun 3 09:35:35 2013 From: evilper at gmail.com (Per Nielsen) Date: Mon, 3 Jun 2013 15:35:35 +0200 Subject: [SciPy-User] Unexpectedly large memory usage in scipy.ode class In-Reply-To: References: Message-ID: You are right, I checked the size of the working arrays and they corresponded perfectly with my memory usage. I checked some of the Runge-Kutta based solvers made available through the complex_ode class, and they have a lower memory overhead compared to zvode, about half according to my tests. Thank you for your help :) Per On Fri, May 31, 2013 at 4:23 PM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > > On Fri, May 31, 2013 at 5:58 AM, Per Nielsen wrote: > >> Hi all, >> >> I am solving large linear ODE systems using the QuTip python package ( >> https://code.google.com/p/qutip/) which uses scipy ODE solvers under the >> hood. The system is of he form >> >> dydt = L*y, >> >> where L is a large complex sparse matrix, all pretty standard. In this >> type of problem the matrix L is the biggest memory user, expected to be >> much larger than the solution vector y itself. >> >> Below is the output of a @profile from the memory_profiler package on the >> function setting up the ode object, no actual time-stepping is done (the >> code can be found here: >> https://github.com/qutip/qutip/blob/master/qutip/mesolve.py#L561). >> >> Line # Mem usage Increment Line Contents >> ================================================ >> 562 @profile >> 563 def _mesolve_const(H, rho0, tlist, >> c_op_list, expt_ops, args, opt, >> 564 progress_bar): >> 565 """! >> 566 Evolve the density matrix using an >> ODE solver, for constant hamiltonian >> 567 and collapse operators. >> 568 """ >> 569 61.961 MB 0.000 MB >> 570 61.961 MB 0.000 MB if debug: >> 571 print(inspect.stack()[0][3]) >> 572 >> 573 # >> 574 # check initial state >> 575 # >> 576 61.961 MB 0.000 MB if isket(rho0): >> 577 # if initial state is a ket >> and no collapse operator where given, >> 578 # fallback on the unitary >> schrodinger equation solver >> 579 61.961 MB 0.000 MB if len(c_op_list) == 0 and >> isoper(H): >> 580 return _sesolve_const(H, >> rho0, tlist, expt_ops, args, opt) >> 581 >> 582 # Got a wave function as >> initial state: convert to density matrix. >> 583 61.973 MB 0.012 MB rho0 = rho0 * rho0.dag() >> 584 >> 585 # >> 586 # construct liouvillian >> 587 # >> 588 61.973 MB 0.000 MB if opt.tidy: >> 589 61.973 MB 0.000 MB H = H.tidyup(opt.atol) >> 590 >> 591 327.887 MB 265.914 MB L = liouvillian_fast(H, c_op_list) >> 592 >> 593 # >> 594 # setup integrator >> 595 # >> 596 343.168 MB 15.281 MB initial_vector = >> mat2vec(rho0.full()) >> 597 343.168 MB 0.000 MB r = scipy.integrate.ode(cy_ode_rhs) >> 598 343.168 MB 0.000 MB r.set_f_params(L.data.data, >> L.data.indices, L.data.indptr) >> 599 343.168 MB 0.000 MB r.set_integrator('zvode', >> method=opt.method, order=opt.order, >> 600 343.168 MB 0.000 MB atol=opt.atol, >> rtol=opt.rtol, nsteps=opt.nsteps, >> 601 343.168 MB 0.000 MB >> first_step=opt.first_step, min_step=opt.min_step, >> 602 343.172 MB 0.004 MB >> max_step=opt.max_step) >> 603 572.055 MB 228.883 MB >> r.set_initial_value(initial_vector, tlist[0]) >> 604 >> 605 # >> 606 # call generic ODE code >> 607 # >> 608 602.805 MB 30.750 MB return _generic_ode_solve(r, rho0, >> tlist, expt_ops, opt, progress_bar) >> >> On line 591 the L matrix generated and eats a large chunk of memory, as >> expected. However, on line 603 setting the initial condition eats an almost >> comparable chunk, despite the fact that the initial vector itself only >> takes up ~ 15 MB (line 596). >> >> I find this strange, as I would expect that setting the initial condition >> would at most increase the memory usage by approximately the size of the >> initial vector. >> >> I have tried to reproduce the problem using a minimal script (see >> attachment), but here the memory usage is as expected: >> >> Filename: test_ode2.py >> >> Line # Mem usage Increment Line Contents >> ================================================ >> 7 @profile >> 8 18.707 MB 0.000 MB def runode(): >> 9 18.707 MB 0.000 MB N = 5000 >> 10 >> 11 # M = np.random.rand(N, N) >> 12 111.230 MB 92.523 MB M = sparse.rand(N, N, >> density=0.05, format='csr') \ >> 13 198.797 MB 87.566 MB + 1j * sparse.rand(N, N, >> density=0.05, format='csr') >> 14 199.031 MB 0.234 MB y0 = np.random.rand(N, 1) + 1j * >> np.random.rand(N, 1) >> 15 >> 16 199.031 MB 0.000 MB t0 = 0.0 >> 17 >> 18 199.031 MB 0.000 MB def f(t, y, M): >> 19 # return np.dot(M, y) >> 20 return M.dot(y) >> 21 >> 22 199.031 MB 0.000 MB r = ode(f) >> 23 199.031 MB 0.000 MB r.set_integrator('zvode', >> atol=1e-10) >> 24 199.035 MB 0.004 MB r.set_f_params(M) >> 25 199.035 MB 0.000 MB r.set_initial_value(y0, t0) >> >> Does someone with more insight into the scipy.ode solver might have an >> idea of whats going on? I looked in the file myself but didnt not see any >> indications of large memory consumptions. >> > > > The `set_initial_value` method calls the integrator's `reset` method. The > `reset` method of the 'zvode' integrator allocates three "work" arrays, > `iwork`, `rwork` and `zwork`, whose sizes depends on the size of `y0`. To > verify that these are the cause of the memory growth, you can access these > arrays after calling `r.set_initial_value(y0, t0)` as > `r._integrator.iwork`, etc. > > Warren > > > >> Best, >> Per >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From otrov at hush.ai Mon Jun 3 10:32:08 2013 From: otrov at hush.ai (zetah) Date: Mon, 03 Jun 2013 16:32:08 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <51AC9833.8040306@creativetrax.com> References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> <51AC9833.8040306@creativetrax.com> Message-ID: <20130603143209.E27C1A6E40@smtp.hushmail.com> Jason Grout wrote: >I'm not sure where in the discussion I should post this, but I wanted to >make a comment about the prevalence and power of tools like Excel that I >just realized. I've been watching Brett Victor's videos recently, and I >just realized that a spreadsheet, with its initial orientation to >concrete data, does a good job of implementing his "ladder of >abstraction" [1]. You first work with concrete data, then you >parametrize the results (e.g., write formulas for the cells), etc. With >programming, we need to basically come up with the abstraction right >away, which is more difficult. It seems like there is a good tool in >the middle ground there that would basically be a spreadsheet that >writes your python program for you, letting you play with the data >interactively, but the parametrization of your operations writes the >python code. Same thought here, glad you wrote that Following the stream of discussion, pandas dataframe as object is annotated with metadata and allows different ways of manipulating this annotated data. If some higher state of mind (armed with skills and vision) can see this as interactive helper in IPython Notebook (called as magic command), perhaps it can sell itself. So I'm not talking about visible streadsheet of million element array (like array showed in Matlab), but just metadata scheme and numbered axis. Numbered axis could be thorough thought manipulator on basic numpy array, while some advanced featured available to pandas dataframe, as it offers diverse transformation potentials This helper could provide filter, but not filter as in Excel, but filter as numpy array ufuncs... I can't see the sneak preview right away, but hopefully that's not a problem Oh let someone sees this as a challenge :) From scipy at whamra.com Mon Jun 3 15:42:24 2013 From: scipy at whamra.com (Waleed Hamra) Date: Mon, 03 Jun 2013 22:42:24 +0300 Subject: [SciPy-User] how do I get the subtrees of dendrogram made by scipy.cluster.hierarchy? Message-ID: <2213506.DbI8pMOm7X@waleed-virtual-machine> I had a confusion regarding this module (scipy.cluster.hierarchy) ... and still have some ! For example we have this dendrogram: http://img62.imageshack.us/img62/8130/3ieb4.png My question is how can I extract the coloured subtrees (each one represent a cluster) in a nice format, say SIF format ? Now the code to get the plot above is: In [1]: import scipy In [2]: import scipy.cluster.hierarchy as sch In [3]: import matplotlib.pylab as plt In [4]: X = scipy.randn(100,2) In [5]: d = sch.distance.pdist(X) In [6]: Z= sch.linkage(d,method='complete') In [7]: P =sch.dendrogram(Z) In [8]: plt.savefig('plot_dendrogram.png') In [9]: T = sch.fcluster(Z, 0.5*d.max(), 'distance') In [10]: T Out[10]: array([4, 5, 3, 2, 2, 3, 5, 2, 2, 5, 2, 2, 2, 3, 2, 3, 2, 5, 4, 5, 2, 5, 2, 3, 3, 3, 1, 3, 4, 2, 2, 4, 2, 4, 3, 3, 2, 5, 5, 5, 3, 2, 2, 2, 5, 4, 2, 4, 2, 2, 5, 5, 1, 2, 3, 2, 2, 5, 4, 2, 5, 4, 3, 5, 4, 4, 2, 2, 2, 4, 2, 5, 2, 2, 3, 3, 2, 4, 5, 3, 4, 4, 2, 1, 5, 4, 2, 2, 5, 5, 2, 2, 5, 5, 5, 4, 3, 3, 2, 4], dtype=int32) In [11]: sch.leaders(Z,T) Out[11]: (array([190, 191, 182, 193, 194], dtype=int32), array([2, 3, 1, 4,5],dtype=int32)) So now, the output of fcluster() gives the clustering of the nodes (by their id's), and leaders() described here is supposed to return 2 arrays: first one contains the leader nodes of the clusters generated by Z, here we can see we have 5 clusters, as well as in the plot and the second one the id's of these clusters So if this leaders() returns resp. L and M : L[2]=182 and M[2]=1, then cluster 1 is leaded by node id 182, which doesn't exist in the observations set X, the documentation says "... then it corresponds to a non-singleton cluster". But I can't get it ... Also, I converted the Z to a tree by sch.to_tree(Z), that will return an easy- to-use tree object, which I want to visualize, but which tool should I use as a graphical platform that manipulate these kind of tree objects as inputs? thanks in advance :) From msuzen at gmail.com Tue Jun 4 04:07:17 2013 From: msuzen at gmail.com (Suzen, Mehmet) Date: Tue, 4 Jun 2013 10:07:17 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: On 28 May 2013 20:23, Calvin Morrison wrote: > > http://arxiv.org/pdf/1210.0530v3.pdf > > Pissed-off Scientific Programmer, > Calvin Morrison Those recent papers and discussions all talk about good practises. I was thinking today in the bus, why there are not many literature on scientific software development methodologies. One explicit paper I found was from 80s called A Development Methodology for Scientific Software Cort, G. et. al. http://dx.doi.org/10.1109/TNS.1985.4333629 It is pretty classic approach for today's standard, There is also a book about generic style and good practice, its a pretty good book (might be mentioned in this list before): Writing Scientific Software: A Guide to Good Style Suely Oliveira and David E. Stewart http://www.cambridge.org/9780521858960 but I don't see any reference to modern development methodologies specifically address to scientific software. For example: extensions of test driven development, which would suit better than classic specification-design-coding-testing. Test cases would be directly related to what we would like to achieve in the first place. For example a generic density of something etc. I haven't heard anyone developing scientific software in this way...yet. Best, -m From josef.pktd at gmail.com Tue Jun 4 07:27:05 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 4 Jun 2013 07:27:05 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: On Tue, Jun 4, 2013 at 4:07 AM, Suzen, Mehmet wrote: > On 28 May 2013 20:23, Calvin Morrison wrote: >> >> http://arxiv.org/pdf/1210.0530v3.pdf >> >> Pissed-off Scientific Programmer, >> Calvin Morrison > > Those recent papers and discussions all talk about good practises. I > was thinking > today in the bus, why there are not many literature on scientific > software development > methodologies. One explicit paper I found was from 80s called > > A Development Methodology for Scientific Software > Cort, G. et. al. > http://dx.doi.org/10.1109/TNS.1985.4333629 > > It is pretty classic approach for today's standard, There is also a book about > generic style and good practice, its a pretty good book (might be > mentioned in this list before): > > Writing Scientific Software: A Guide to Good Style > Suely Oliveira and David E. Stewart > http://www.cambridge.org/9780521858960 > > but I don't see any reference to modern development methodologies specifically > address to scientific software. For example: extensions of test driven > development, > which would suit better than classic > specification-design-coding-testing. Test cases > would be directly related to what we would like to achieve in the > first place. For example > a generic density of something etc. I haven't heard anyone developing > scientific software > in this way...yet. I think functional (not unit) testing is pretty much the standard in the area of developing statistical algorithms even if nobody calls it that way. And I don't know of any references to software development for it. When writing a library function for existing algorithms, then it is standard to test it against existing results. Many (or most) software packages, or articles that describe the software, show that they reproduce existing results as test cases. (And that's the way we work for statsmodels.) For new algorithms, it is standard to publish Monte Carlo studies that show that the new algorithm is "better" in at least some cases or directions than the existing algorithms (or statistical estimators and tests), and often they use published case studies or applied results to show how the conclusion would differ or be unchanged (Just for illustration: the workflow of some friends of mine that are theoretical econometricians. First write the paper with the heavy theory and proofs, then start to write the MonteCarlo, the first version doesn't deliver the results that can be expected based on the theory, look for bugs and fix those, rerun MonteCarlo, iterate, then find different test cases, simulated data generating processes, and show where it works and where it doesn't, and check the theoretical explanation/intuition why it doesn't work in some cases. Submit only cases that work, and write a footnote for the other cases.) And after, that there are many published articles that present MonteCarlo studies to show that an algorithm does not work properly if some assumptions are violated, and that something else is better. (This doesn't mean that they produce a "pretty" piece of software, but it shows that it works as advertised.) I don't think I ever heard of unit or functional testing for applied research, that is testing the workflow and not the computational tools. Josef > > Best, > -m > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From josef.pktd at gmail.com Tue Jun 4 07:51:13 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 4 Jun 2013 07:51:13 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: On Tue, Jun 4, 2013 at 7:27 AM, wrote: > On Tue, Jun 4, 2013 at 4:07 AM, Suzen, Mehmet wrote: >> On 28 May 2013 20:23, Calvin Morrison wrote: >>> >>> http://arxiv.org/pdf/1210.0530v3.pdf >>> >>> Pissed-off Scientific Programmer, >>> Calvin Morrison >> >> Those recent papers and discussions all talk about good practises. I >> was thinking >> today in the bus, why there are not many literature on scientific >> software development >> methodologies. One explicit paper I found was from 80s called >> >> A Development Methodology for Scientific Software >> Cort, G. et. al. >> http://dx.doi.org/10.1109/TNS.1985.4333629 >> >> It is pretty classic approach for today's standard, There is also a book about >> generic style and good practice, its a pretty good book (might be >> mentioned in this list before): >> >> Writing Scientific Software: A Guide to Good Style >> Suely Oliveira and David E. Stewart >> http://www.cambridge.org/9780521858960 >> >> but I don't see any reference to modern development methodologies specifically >> address to scientific software. For example: extensions of test driven >> development, >> which would suit better than classic >> specification-design-coding-testing. Test cases >> would be directly related to what we would like to achieve in the >> first place. For example >> a generic density of something etc. I haven't heard anyone developing >> scientific software >> in this way...yet. > > I think functional (not unit) testing is pretty much the standard in > the area of developing statistical algorithms even if nobody calls it > that way. And I don't know of any references to software development > for it. > > When writing a library function for existing algorithms, then it is > standard to test it against existing results. Many (or most) software > packages, or articles that describe the software, show that they > reproduce existing results as test cases. > (And that's the way we work for statsmodels.) > > For new algorithms, it is standard to publish Monte Carlo studies that > show that the new algorithm is "better" in at least some cases or > directions than the existing algorithms (or statistical estimators and > tests), and often they use published case studies or applied results > to show how the conclusion would differ or be unchanged > > (Just for illustration: the workflow of some friends of mine that are > theoretical econometricians. > First write the paper with the heavy theory and proofs, then start to > write the MonteCarlo, the first version doesn't deliver the results > that can be expected based on the theory, look for bugs and fix those, > rerun MonteCarlo, iterate, then find different test cases, simulated > data generating processes, and show where it works and where it > doesn't, and check the theoretical explanation/intuition why it > doesn't work in some cases. Submit only cases that work, and write a > footnote for the other cases. Sorry I forgot one step After the submission, one referee of the paper doesn't like some parts or wants additional simulations. Iterate until publication or rejection. If rejection, then submit to another journal, and iterate. By the time the article is finally published, other researchers already started to use the algorithm and possibly the code. Sounds partially like functional test driven developement to me. ) > > And after, that there are many published articles that present > MonteCarlo studies to show that an algorithm does not work properly if > some assumptions are violated, and that something else is better. > > (This doesn't mean that they produce a "pretty" piece of software, but > it shows that it works as advertised.) > > > I don't think I ever heard of unit or functional testing for applied > research, that is testing the workflow and not the computational > tools. > > Josef > >> >> Best, >> -m >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user From matthew.brett at gmail.com Wed Jun 5 05:05:23 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 5 Jun 2013 02:05:23 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: Hi, On Tue, Jun 4, 2013 at 4:27 AM, wrote: > On Tue, Jun 4, 2013 at 4:07 AM, Suzen, Mehmet wrote: >> On 28 May 2013 20:23, Calvin Morrison wrote: >>> >>> http://arxiv.org/pdf/1210.0530v3.pdf >>> >>> Pissed-off Scientific Programmer, >>> Calvin Morrison >> >> Those recent papers and discussions all talk about good practises. I >> was thinking >> today in the bus, why there are not many literature on scientific >> software development >> methodologies. One explicit paper I found was from 80s called >> >> A Development Methodology for Scientific Software >> Cort, G. et. al. >> http://dx.doi.org/10.1109/TNS.1985.4333629 >> >> It is pretty classic approach for today's standard, There is also a book about >> generic style and good practice, its a pretty good book (might be >> mentioned in this list before): >> >> Writing Scientific Software: A Guide to Good Style >> Suely Oliveira and David E. Stewart >> http://www.cambridge.org/9780521858960 >> >> but I don't see any reference to modern development methodologies specifically >> address to scientific software. For example: extensions of test driven >> development, >> which would suit better than classic >> specification-design-coding-testing. Test cases >> would be directly related to what we would like to achieve in the >> first place. For example >> a generic density of something etc. I haven't heard anyone developing >> scientific software >> in this way...yet. > > I think functional (not unit) testing is pretty much the standard in > the area of developing statistical algorithms even if nobody calls it > that way. And I don't know of any references to software development > for it. > > When writing a library function for existing algorithms, then it is > standard to test it against existing results. Many (or most) software > packages, or articles that describe the software, show that they > reproduce existing results as test cases. > (And that's the way we work for statsmodels.) > > For new algorithms, it is standard to publish Monte Carlo studies that > show that the new algorithm is "better" in at least some cases or > directions than the existing algorithms (or statistical estimators and > tests), and often they use published case studies or applied results > to show how the conclusion would differ or be unchanged > > (Just for illustration: the workflow of some friends of mine that are > theoretical econometricians. > First write the paper with the heavy theory and proofs, then start to > write the MonteCarlo, the first version doesn't deliver the results > that can be expected based on the theory, look for bugs and fix those, > rerun MonteCarlo, iterate, then find different test cases, simulated > data generating processes, and show where it works and where it > doesn't, and check the theoretical explanation/intuition why it > doesn't work in some cases. Submit only cases that work, and write a > footnote for the other cases.) Here is an example of some incorrect theory combined with a simulation showing correct results. It turned out there were two separate errors in theory which balanced each other out in the particular case used for the simulation. This paper reviews and corrects the previous paper: http://www.math.mcgill.ca/keith/fmriagain/fmriagain.abstract.html Quote from section 2.2: "In general the variance of the parameter estimates is underestimated by equation (3) but the estimator of the variance is overestimated by equation (6), so that the two tend to cancel each other out in the T statistic (5). It can be shown that they do cancel out almost exactly for the random regressors that were chosen for validating the methods, which explains why the biases were not obsereved. However for other non-random regressors these e?ects do not cancel and large discrepancies can occur." I think that points at the need to write tests for all parts not just the whole. Cheers, Matthew From msuzen at gmail.com Wed Jun 5 07:24:42 2013 From: msuzen at gmail.com (Suzen, Mehmet) Date: Wed, 5 Jun 2013 13:24:42 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: On 4 June 2013 13:27, wrote: > I think functional (not unit) testing is pretty much the standard in > the area of developing statistical algorithms even if nobody calls it > that way. And I don't know of any references to software development > for it. Yes, functional and unit testing of an existing implimentations appears to be, as you said, pretty much common practice. But What I had in mind was a methodology of first designing and coding with the functional tests (initially failing) c.f. TDD http://en.wikipedia.org/wiki/Test-driven_development Your MC workflow seems to follow this logic though. Regarding the iteration process you refer:since we are doing many iterations, at some point we lose track of where did we start in the process: but I think TDD could help us to focus on the result in the scientific software projects. > research, that is testing the workflow and not the computational > tools. This is very curicial point I think. Workflow and computational tools are two seperate things; If we think tools as APIs. A workflow may require to use an API but it isn't our responsibility to test the API. Best, -m From newville at cars.uchicago.edu Wed Jun 5 17:36:54 2013 From: newville at cars.uchicago.edu (Matt Newville) Date: Wed, 5 Jun 2013 16:36:54 -0500 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: Hi, On Wed, Jun 5, 2013 at 4:05 AM, Matthew Brett wrote: > Hi, > > On Tue, Jun 4, 2013 at 4:27 AM, wrote: >> On Tue, Jun 4, 2013 at 4:07 AM, Suzen, Mehmet wrote: >>> On 28 May 2013 20:23, Calvin Morrison wrote: >>>> >>>> http://arxiv.org/pdf/1210.0530v3.pdf >>>> >>>> Pissed-off Scientific Programmer, >>>> Calvin Morrison >>> >>> Those recent papers and discussions all talk about good practises. I >>> was thinking >>> today in the bus, why there are not many literature on scientific >>> software development >>> methodologies. One explicit paper I found was from 80s called >>> >>> A Development Methodology for Scientific Software >>> Cort, G. et. al. >>> http://dx.doi.org/10.1109/TNS.1985.4333629 >>> >>> It is pretty classic approach for today's standard, There is also a book about >>> generic style and good practice, its a pretty good book (might be >>> mentioned in this list before): >>> >>> Writing Scientific Software: A Guide to Good Style >>> Suely Oliveira and David E. Stewart >>> http://www.cambridge.org/9780521858960 >>> >>> but I don't see any reference to modern development methodologies specifically >>> address to scientific software. For example: extensions of test driven >>> development, >>> which would suit better than classic >>> specification-design-coding-testing. Test cases >>> would be directly related to what we would like to achieve in the >>> first place. For example >>> a generic density of something etc. I haven't heard anyone developing >>> scientific software >>> in this way...yet. >> >> I think functional (not unit) testing is pretty much the standard in >> the area of developing statistical algorithms even if nobody calls it >> that way. And I don't know of any references to software development >> for it. >> >> When writing a library function for existing algorithms, then it is >> standard to test it against existing results. Many (or most) software >> packages, or articles that describe the software, show that they >> reproduce existing results as test cases. >> (And that's the way we work for statsmodels.) >> >> For new algorithms, it is standard to publish Monte Carlo studies that >> show that the new algorithm is "better" in at least some cases or >> directions than the existing algorithms (or statistical estimators and >> tests), and often they use published case studies or applied results >> to show how the conclusion would differ or be unchanged >> >> (Just for illustration: the workflow of some friends of mine that are >> theoretical econometricians. >> First write the paper with the heavy theory and proofs, then start to >> write the MonteCarlo, the first version doesn't deliver the results >> that can be expected based on the theory, look for bugs and fix those, >> rerun MonteCarlo, iterate, then find different test cases, simulated >> data generating processes, and show where it works and where it >> doesn't, and check the theoretical explanation/intuition why it >> doesn't work in some cases. Submit only cases that work, and write a >> footnote for the other cases.) > > Here is an example of some incorrect theory combined with a simulation > showing correct results. It turned out there were two separate errors > in theory which balanced each other out in the particular case used > for the simulation. > > This paper reviews and corrects the previous paper: > > http://www.math.mcgill.ca/keith/fmriagain/fmriagain.abstract.html > > Quote from section 2.2: > > "In general the variance of the parameter estimates is underestimated > by equation (3) but the > estimator of the variance is overestimated by equation (6), so that > the two tend to cancel > each other out in the T statistic (5). It can be shown that they do > cancel out almost exactly > for the random regressors that were chosen for validating the methods, > which explains why > the biases were not obsereved. However for other non-random regressors > these e?ects do not > cancel and large discrepancies can occur." > > I think that points at the need to write tests for all parts not just the whole. > > Cheers, > > Matthew > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > This has been a fairly wide-ranging conversation, but I want to say that I agree completely with Josef's views here. Functional testing has been the norm for scientific software, and, I would say, justifiably so. It is how scientists are trained to think and work. Yes, one must question and understand the details of the instruments and algorithms you use and keep them well-calibrated (which is approximately the meaning of the currently fashionable "unit testing"). But at some point, you trust that your instruments are calibrated and working, and the algorithms you're using are mostly bug-free, and build on these to make real measurements and do real analyses on new, untested systems. The paper that Alan Isaac referred to that started this conversation seemed to advocate for unit testing in the sense of "don't trust the codes you're using, always test them". At first reading, this seems like good advice. Since unit testing (or, at least, the phrase) is relatively new for software development, it gives the appearance of being new advice. But the authors damage their case by continuing on by saying not to trust analysis tools built by other scientists based on the reputation and prior use of thse tools. Here, they expose the weakness of favoring "unit tests" over "functional tests". They are essentially advocating throwing out decades of proven, tested work (and claiming that the use of this work to date is not justified, as it derives from un-due reputation of the authors of prior work) for a fashionable new trend. Science is deliberately conservative, and telling scientists that unit testing is all the rage among the cool programmers and they should jump on that bandwagon is not likely to gain much traction. To be clear, and to use an example I am familiar with, these authors imply "don't trust scipy.optimize.leastsq() -- test it and be skeptical before using it". The problem is that this is easily read as "if you write your own minimization code and write unit tests, you're doing better than this older, outdated work. That piece of junk was used by others based solely on the reputation of the authors, they didn't have any unit tests at all!". One of the key features of scipy is that it reuses well-tested work (LAPACK, MINPACK-1, FFTPACK, and similar well-tested approaches). Now, there might be a bug in these (and, there might be a bug in the scipy wrapper), but the likelihood of finding a new bug with any particular use case is vanishingly small. No one is saying MINPACK-1 (or scipy.optimize.leastsq, or the Standard Model) is perfect and complete. But it works well on one heck of a lot of cases. At some point you *must* rely on these to make progress. In fact, doing so (applying existing models to new problems and using the results, that is, functional testing) is the classic way in which flaws in the underlying models are found. Yes, calibrating the wazoo out of instruments and algorithms, and pushing every button independently (unit testing) is very useful. I don't think anyone is advocating against doing this. But doing this to the exclusion of existing methodologies is going to meet resistance among scientists, and for good reason. The main problem with the Reinhart and Rogoff paper (and the success of the re-interpretation by Herndon, Ash, and Pollin) is a good example of how science works, and what to avoid. And that is *not* (as some seemed to have suggested) to avoid Excel or spreadsheets in favor of procedural programming approaches, but rather to not use home-built, poorly-tested and poorly-described algorithms. Yes, if they had used R or scipy they may have been better off. Unit testing would have helped them. Any testing would have helped them, as would better explanation of their methods. I'm sorry to admit that I read only the abstract, but I would not be surprised if Matthew Brett's example also fell into this category. That is, were the nearly-cancelling mistakes discovered because of unit testing or because of tests of the whole? Obviously, if two functions were always (always!) used together, and had canceling errors (say, one function "incorrectly" scaled by a factor of 2 and the other incorrectly scaled by a factor or 1/2), unit testing might show flaws that never, ever changed the end results. Functional testing (applying a set of analysis tools to a wide range of data, as with a Monte Carlo approach), seems completely sensible to me. You would not be saying that every component is independently checked and proven reliable on its own, but you are testing the whole. Sorry this was so long, --Matt From matthew.brett at gmail.com Wed Jun 5 17:47:47 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 5 Jun 2013 14:47:47 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: Hi, On Wed, Jun 5, 2013 at 2:36 PM, Matt Newville wrote: > I'm sorry to admit that I read only the abstract, but I would not be > surprised if Matthew Brett's example also fell into this category. > That is, were the nearly-cancelling mistakes discovered because of > unit testing or because of tests of the whole? Obviously, if two > functions were always (always!) used together, and had canceling > errors (say, one function "incorrectly" scaled by a factor of 2 and > the other incorrectly scaled by a factor or 1/2), unit testing might > show flaws that never, ever changed the end results. I believe what happened was that the first author of the paper read the previous paper and saw the errors in the math. As with your example, if the previous paper's algorithms had only been run on similar data then we would never have had a problem. If you had two functions both off by a factor of two you will have to hope that no-one is calling only one of those functions. If we want to provide a library that our users can trust, we must test the whole public API of our code. Of course even then we've only got 'I don't know of any bugs for the ranges of parameters I've tested'. Cheers, Matthew From njs at pobox.com Wed Jun 5 18:08:10 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 5 Jun 2013 23:08:10 +0100 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: On Wed, Jun 5, 2013 at 10:36 PM, Matt Newville wrote: > The paper that Alan Isaac referred to that started this conversation > seemed to advocate for unit testing in the sense of "don't trust the > codes you're using, always test them". At first reading, this seems > like good advice. Since unit testing (or, at least, the phrase) is > relatively new for software development, it gives the appearance of > being new advice. But the authors damage their case by continuing on > by saying not to trust analysis tools built by other scientists based > on the reputation and prior use of thse tools. Here, they expose the > weakness of favoring "unit tests" over "functional tests". They are > essentially advocating throwing out decades of proven, tested work > (and claiming that the use of this work to date is not justified, as > it derives from un-due reputation of the authors of prior work) for a > fashionable new trend. Science is deliberately conservative, and > telling scientists that unit testing is all the rage among the cool > programmers and they should jump on that bandwagon is not likely to > gain much traction. But... have you ever sat down and written tests for a piece of widely used academic software? (Not LAPACK, but some random large package that's widely used within a field but doesn't have a comprehensive test suite of its own.) Everyone I've heard of who's done this discovers bugs all over the place. Would you personally trip over them if you didn't test the code? Who knows, maybe not. And probably most of the rest -- off by one errors here and there, maybe an incorrect normalizing constant, etc., -- end up not mattering too much. Or maybe they do. How could you even tell? You should absolutely check scipy.optimize.leastsq before using it! You could rewrite it too if you want, I guess, and if you write a thorough test suite it might even work out. But it's pretty bizarre to me to think that someone is going to think "ah-hah, writing my own code + test suite will be easier than just writing a test suite!" Sure some people are going to find ways to procrastinate on the real problem (*cough*grad students*cough*) and NIH ain't just a funding body. But that's totally orthogonal to whether tests are good. Honestly I'm not even sure what unit-testing "bandwagon" you're talking about. I insist on unit tests for my code because every time I fail to write them I regret it sooner or later, and I'd rather it be sooner. And because they pay themselves back ridiculously quickly because you never have to debug more than 15 lines of code at a time, you always know that everything the current 15 lines of code depends on is working correctly. Plus, white-box unit-testing can be comprehensive in a way that black-box functional testing just can't be. The code paths in a system grow like 2**n; you can reasonably test all of them for a short function with n < 5, but not for a whole system with n >> 100. And white-box unit-testing is what lets you move quickly when programming, because you can quickly isolate errors instead of spending all your time tracing through stuff in a debugger. If you want to *know* your code is correct, this kind of thorough testing is just a necessary (not sufficient!) condition. (Building on libraries that have large user bases is also very helpful!) -n From Phillip.M.Feldman at gmail.com Wed Jun 5 18:44:03 2013 From: Phillip.M.Feldman at gmail.com (pfeldman) Date: Wed, 5 Jun 2013 15:44:03 -0700 (PDT) Subject: [SciPy-User] ftol and xtol Message-ID: <1370472243657-18355.post@n7.nabble.com> It would be very helpful if one could specify `ftol` and `xtol` with any of the optimization algorithms. How difficult would it be to implement this? Phillip -- View this message in context: http://scipy-user.10969.n7.nabble.com/ftol-and-xtol-tp18355.html Sent from the Scipy-User mailing list archive at Nabble.com. From guziy.sasha at gmail.com Wed Jun 5 18:47:40 2013 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Wed, 5 Jun 2013 18:47:40 -0400 Subject: [SciPy-User] ftol and xtol In-Reply-To: <1370472243657-18355.post@n7.nabble.com> References: <1370472243657-18355.post@n7.nabble.com> Message-ID: It does exist for fmin http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html#scipy.optimize.fmin Which one do you want to use? -- Oleksandr (Sasha) Huziy 2013/6/5 pfeldman > It would be very helpful if one could specify `ftol` and `xtol` with any of > the optimization algorithms. How difficult would it be to implement this? > > Phillip > > > > -- > View this message in context: > http://scipy-user.10969.n7.nabble.com/ftol-and-xtol-tp18355.html > Sent from the Scipy-User mailing list archive at Nabble.com. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jun 5 22:46:00 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 5 Jun 2013 22:46:00 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: On Wed, Jun 5, 2013 at 6:08 PM, Nathaniel Smith wrote: > On Wed, Jun 5, 2013 at 10:36 PM, Matt Newville > wrote: >> The paper that Alan Isaac referred to that started this conversation >> seemed to advocate for unit testing in the sense of "don't trust the >> codes you're using, always test them". At first reading, this seems >> like good advice. Since unit testing (or, at least, the phrase) is >> relatively new for software development, it gives the appearance of >> being new advice. But the authors damage their case by continuing on >> by saying not to trust analysis tools built by other scientists based >> on the reputation and prior use of thse tools. Here, they expose the >> weakness of favoring "unit tests" over "functional tests". They are >> essentially advocating throwing out decades of proven, tested work >> (and claiming that the use of this work to date is not justified, as >> it derives from un-due reputation of the authors of prior work) for a >> fashionable new trend. Science is deliberately conservative, and >> telling scientists that unit testing is all the rage among the cool >> programmers and they should jump on that bandwagon is not likely to >> gain much traction. > > But... have you ever sat down and written tests for a piece of widely > used academic software? (Not LAPACK, but some random large package > that's widely used within a field but doesn't have a comprehensive > test suite of its own.) Everyone I've heard of who's done this > discovers bugs all over the place. Would you personally trip over them > if you didn't test the code? Who knows, maybe not. And probably most > of the rest -- off by one errors here and there, maybe an incorrect > normalizing constant, etc., -- end up not mattering too much. Or maybe > they do. How could you even tell? > > You should absolutely check scipy.optimize.leastsq before using it! But leastsq has seen it's uses and we "know" it works. My main work for scipy.stats has been to make reasonably sure it works as advertised, (with adding sometimes "don't trust those results"). Optimizers either work or they don't work, and we see whether they work for our problems in the "functional" testing, in statsmodels for example. (notwithstanding that many bugs have been fixed in scipy.optimize where the optimizers did not work correctly and someone went to see why.) The recent discussion on global optimizers was on how successful they are for different problems, not whether each individual piece is unit tested. > You could rewrite it too if you want, I guess, and if you write a > thorough test suite it might even work out. But it's pretty bizarre to > me to think that someone is going to think "ah-hah, writing my own > code + test suite will be easier than just writing a test suite!" Sure > some people are going to find ways to procrastinate on the real > problem (*cough*grad students*cough*) and NIH ain't just a funding > body. But that's totally orthogonal to whether tests are good. > > Honestly I'm not even sure what unit-testing "bandwagon" you're > talking about. I insist on unit tests for my code because every time I > fail to write them I regret it sooner or later, and I'd rather it be > sooner. And because they pay themselves back ridiculously quickly > because you never have to debug more than 15 lines of code at a time, > you always know that everything the current 15 lines of code depends > on is working correctly. > > Plus, white-box unit-testing can be comprehensive in a way that > black-box functional testing just can't be. The code paths in a system > grow like 2**n; you can reasonably test all of them for a short > function with n < 5, but not for a whole system with n >> 100. And > white-box unit-testing is what lets you move quickly when programming, > because you can quickly isolate errors instead of spending all your > time tracing through stuff in a debugger. If you want to *know* your > code is correct, this kind of thorough testing is just a necessary > (not sufficient!) condition. (Building on libraries that have large > user bases is also very helpful!) I think there are two different areas, one is writing library code for general usage, and the other is code that is written for (initially) one time usage as part of the research. Library code is reasonable tested, either by usage or unit/functional tests. If it passes functional test and usage, it should be reasonably "safe". However, competition among packages (in statistics/econometrics) for example creates a large incentive for software developers to make sure the code is correct, and respond to any reports of imprecision. For example, nonlinear least squares, optimize.leastsq and packages that use it have the NIST test cases. Either we pass them or we don't. Another example, every once in a while a journal article is published on the quality of an estimation in the most popular statistical software packages. If one commercial package gets a bad report, then it is usually quickly fixed, or they show that the default values that the author of the paper used are not correct. (Last example that I remember is GARCH fitting where most packages didn't do well because of numerical derivative problems. The author that criticized this also showed how to do it better, which was then quickly adapted by all major packages.) If users cannot trust a package, then there is a big incentives for users to switch to a different one. But if everyone else is using one package in a field, then an individual researcher cannot be "blamed" for using it also, and has little incentive to unit test. How do you know what's the correct result for a code unit to write a unit test? Often I don't know, and all I can do is wait for the functional test results. Minor bugs are indistinguishable from minor design decisions, and it's not worth a huge amount of effort to find them all: example: in time series analysis, an indexing mistake at the beginning of the time series changes some decimals and the mistake hides behind other decisions for how initial conditions are treated across methods and packages. An indexing mistake at the end of the time series screws up the forecasting and will be visible the first time someone looks at the forecasts. example: degrees of freedom and small sample corrections: I have no idea what different packages use, until I specifically read the docs for this (if there are any, which is true for Stata and SAS, but false for R), and test and verify against it. If I don't have exactly the same algorithm, then I don't see minor bugs/design choices because it doesn't make much difference in most Monte Carlos and applications. example: Is the low precision of the Jacobian of scipy.optimize.leastsq a feature or a bug? I don't use it. example: Is the limited precision of scipy.special in some parameter ranges a feature or a bug? It's a feature for me, but Pauli and some other contributors consider them as bugs, if they can do something about it, and have removed many of them. example: scipy.stats.distributions numerical problems and low precision for unusual cases. bug or feature. I can work with 6 to 10 decimals precision in most cases, but sometimes I or some users would like to have a lot more, or want to evaluate the distributions at some "weird" parameters. example: I implemented 11 methods to correct p-values for multiple testing, 9 are verified against R, 2 are not available in R and I have to trust my code and that they are doing well in the Monte Carlo (although slightly worse than I would have expected). The other part: code written for one time research: Why would you spend a lot of time unit testing, if all you are interested is the functional test that it works for the given application? And, as above, how would you know what the correct result should be, besides "works for me". bottom line: I think unit testing and functional testing for scientific code is pretty different from many other areas of software development. It's easy to write a unit test that the right record is retrieved from a database. It's a lot more difficult to write a unit test that .... (I coded correctly the asymptotic distribution for a new estimator or test statistic.) (How did I end up at the wrong side of the argument? I have been advocating TDD, unit tests and verified functional tests for five years on this mailing list.) Josef > > -n > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From Phillip.M.Feldman at gmail.com Thu Jun 6 00:08:46 2013 From: Phillip.M.Feldman at gmail.com (pfeldman) Date: Wed, 5 Jun 2013 21:08:46 -0700 (PDT) Subject: [SciPy-User] ftol and xtol In-Reply-To: References: <1370472243657-18355.post@n7.nabble.com> Message-ID: <1370491726005-18358.post@n7.nabble.com> In particular, I'd like to be able to specify `ftol` and `xtol` with optimize.fmin_bfgs, optimize.fmin_l_bfgs_b, and optimize.anneal. optimize.anneal has a parameter called `feps` that seems similar to `ftol`, but there is no parameter comparable to `xtol`. Also, it would be great if the parameter names were the same across the board--to the extent possible--because that would make it much easier to compare alternative optimization algorithms. -- View this message in context: http://scipy-user.10969.n7.nabble.com/ftol-and-xtol-tp18355p18358.html Sent from the Scipy-User mailing list archive at Nabble.com. From Jerome.Kieffer at esrf.fr Thu Jun 6 01:23:27 2013 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Thu, 6 Jun 2013 07:23:27 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: <20130606072327.cf0288bce43ffbae6947d7ca@esrf.fr> On Wed, 5 Jun 2013 23:08:10 +0100 Nathaniel Smith wrote: > But... have you ever sat down and written tests for a piece of widely > used academic software? (Not LAPACK, but some random large package > that's widely used within a field but doesn't have a comprehensive > test suite of its own.) Everyone I've heard of who's done this > discovers bugs all over the place. Would you personally trip over them > if you didn't test the code? Who knows, maybe not. And probably most > of the rest -- off by one errors here and there, maybe an incorrect > normalizing constant, etc., -- end up not mattering too much. Or maybe > they do. How could you even tell? I found bugs in scipy.ndimage.shift and in scipy.stats.linregress. The first took me ages to be spotted as I was assuming the error was on my side as scipy was seen as a "large library widely used". Cheers, -- J?r?me Kieffer Data analysis unit - ESRF PS: I blame nobody: I probably write more bugs than most of you. From msuzen at gmail.com Thu Jun 6 02:00:58 2013 From: msuzen at gmail.com (Suzen, Mehmet) Date: Thu, 6 Jun 2013 08:00:58 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: On 6 June 2013 00:08, Nathaniel Smith wrote: > time tracing through stuff in a debugger. If you want to *know* your > code is correct, this kind of thorough testing is just a necessary > (not sufficient!) condition. (Building on libraries that have large > user bases is also very helpful!) Good point. I don't think a standard user of a well established API should do unit testing or something similar on the library; except maybe running 'intall test'. After usage, correctness of outputs has to be checked against the overall science behind the code i.e. functional testing. Healthy scepticism is good, more of it would constitute paranoia. -m From matthew.brett at gmail.com Thu Jun 6 07:21:52 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 6 Jun 2013 12:21:52 +0100 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <20130606072327.cf0288bce43ffbae6947d7ca@esrf.fr> References: <51A39B6D.4030607@gmail.com> <20130606072327.cf0288bce43ffbae6947d7ca@esrf.fr> Message-ID: Hi, On Thu, Jun 6, 2013 at 6:23 AM, Jerome Kieffer wrote: > On Wed, 5 Jun 2013 23:08:10 +0100 > Nathaniel Smith wrote: > >> But... have you ever sat down and written tests for a piece of widely >> used academic software? (Not LAPACK, but some random large package >> that's widely used within a field but doesn't have a comprehensive >> test suite of its own.) Everyone I've heard of who's done this >> discovers bugs all over the place. Would you personally trip over them >> if you didn't test the code? Who knows, maybe not. And probably most >> of the rest -- off by one errors here and there, maybe an incorrect >> normalizing constant, etc., -- end up not mattering too much. Or maybe >> they do. How could you even tell? > > I found bugs in scipy.ndimage.shift and in scipy.stats.linregress. > The first took me ages to be spotted as I was assuming the error was on > my side as scipy was seen as a "large library widely used". Well said. See also Blake Griffith's current struggles with scipy.sparse (last message title "parametric tests, known failures and skipped tests"). If it's not tested - assume it's broken. If it's not tested and it's not broken, assume it will break soon. Don't use anything for serious work that isn't tested. At least - that has been my experience. Cheers, Matthew From josef.pktd at gmail.com Thu Jun 6 08:19:03 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 6 Jun 2013 08:19:03 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <20130606072327.cf0288bce43ffbae6947d7ca@esrf.fr> Message-ID: On Thu, Jun 6, 2013 at 7:21 AM, Matthew Brett wrote: > Hi, > > On Thu, Jun 6, 2013 at 6:23 AM, Jerome Kieffer wrote: >> On Wed, 5 Jun 2013 23:08:10 +0100 >> Nathaniel Smith wrote: >> >>> But... have you ever sat down and written tests for a piece of widely >>> used academic software? (Not LAPACK, but some random large package >>> that's widely used within a field but doesn't have a comprehensive >>> test suite of its own.) Everyone I've heard of who's done this >>> discovers bugs all over the place. Would you personally trip over them >>> if you didn't test the code? Who knows, maybe not. And probably most >>> of the rest -- off by one errors here and there, maybe an incorrect >>> normalizing constant, etc., -- end up not mattering too much. Or maybe >>> they do. How could you even tell? >> >> I found bugs in scipy.ndimage.shift and in scipy.stats.linregress. >> The first took me ages to be spotted as I was assuming the error was on >> my side as scipy was seen as a "large library widely used". > > Well said. See also Blake Griffith's current struggles with > scipy.sparse (last message title "parametric tests, known failures and > skipped tests"). As far as I understand these are not BUGs. These are TDD test failures during development while adding support to additional dtypes. Daniel Smith was adding better indexing support, with many TDD test failures. But that doesn't mean scipy.sparse didn't work correctly for the initial implementation for float matrices. I think scipy is overall in a very good shape now. Most bugs are enhancements or refactorings and cleanup. (Of course it doesn't mean that there are no real bugs.) > > If it's not tested - assume it's broken. > > If it's not tested and it's not broken, assume it will break soon. > > Don't use anything for serious work that isn't tested. > > At least - that has been my experience. I agree. But in the heavily used parts of a library, we get the bug reports by users very fast for cases that are not covered by the unit tests. (It took 1 to 2 years to fix all bugs in the distribution fit with some fixed parameters, for all different combinations of fixed and not fixed parameters.) Josef > > Cheers, > > Matthew > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From josef.pktd at gmail.com Thu Jun 6 08:56:35 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 6 Jun 2013 08:56:35 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <20130606072327.cf0288bce43ffbae6947d7ca@esrf.fr> Message-ID: >>> I found bugs in scipy.ndimage.shift and in scipy.stats.linregress. >>> The first took me ages to be spotted as I was assuming the error was on >>> my side as scipy was seen as a "large library widely used". Ok, I found the stats.linregress case https://github.com/scipy/scipy/pull/433 There is no way I write unit tests for all edge cases that I never expect to show up. For sure you find bugs/behavior like this in many packages, and I wouldn't trust any package for extreme cases, no matter what their test suite is. Josef From matthew.brett at gmail.com Thu Jun 6 08:57:28 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 6 Jun 2013 13:57:28 +0100 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <20130606072327.cf0288bce43ffbae6947d7ca@esrf.fr> Message-ID: Hi, On Thu, Jun 6, 2013 at 1:19 PM, wrote: > On Thu, Jun 6, 2013 at 7:21 AM, Matthew Brett wrote: >> Hi, >> >> On Thu, Jun 6, 2013 at 6:23 AM, Jerome Kieffer wrote: >>> On Wed, 5 Jun 2013 23:08:10 +0100 >>> Nathaniel Smith wrote: >>> >>>> But... have you ever sat down and written tests for a piece of widely >>>> used academic software? (Not LAPACK, but some random large package >>>> that's widely used within a field but doesn't have a comprehensive >>>> test suite of its own.) Everyone I've heard of who's done this >>>> discovers bugs all over the place. Would you personally trip over them >>>> if you didn't test the code? Who knows, maybe not. And probably most >>>> of the rest -- off by one errors here and there, maybe an incorrect >>>> normalizing constant, etc., -- end up not mattering too much. Or maybe >>>> they do. How could you even tell? >>> >>> I found bugs in scipy.ndimage.shift and in scipy.stats.linregress. >>> The first took me ages to be spotted as I was assuming the error was on >>> my side as scipy was seen as a "large library widely used". >> >> Well said. See also Blake Griffith's current struggles with >> scipy.sparse (last message title "parametric tests, known failures and >> skipped tests"). > > As far as I understand these are not BUGs. > These are TDD test failures during development while adding support to > additional dtypes. See for example : https://github.com/scipy/scipy/issues/2542 In particular that ticket ends with "Existing tests only tested lil with float data." Not that this is surprising - I learned to test the hell out of everything by finding how often I wrote broken code myself. Cheers, Matthew From matthew.brett at gmail.com Thu Jun 6 08:59:44 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 6 Jun 2013 13:59:44 +0100 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <20130606072327.cf0288bce43ffbae6947d7ca@esrf.fr> Message-ID: Hi, On Thu, Jun 6, 2013 at 1:56 PM, wrote: >>>> I found bugs in scipy.ndimage.shift and in scipy.stats.linregress. >>>> The first took me ages to be spotted as I was assuming the error was on >>>> my side as scipy was seen as a "large library widely used". > > Ok, I found the stats.linregress case > https://github.com/scipy/scipy/pull/433 > > There is no way I write unit tests for all edge cases that I never > expect to show up. > For sure you find bugs/behavior like this in many packages, and I > wouldn't trust any package for extreme cases, no matter what their > test suite is. I guess that means the user has to know what you thought an extreme case was? I think the point of test driven development is precisely in order to specify the edges before you've locked yourself down to an implementation. If one write's the implementation first one often does forget the edges. Cheers, Matthew From Valene.Pellissier at cedrat.com Thu Jun 6 09:05:47 2013 From: Valene.Pellissier at cedrat.com (=?iso-8859-1?Q?Val=E8ne_Pellissier?=) Date: Thu, 6 Jun 2013 13:05:47 +0000 Subject: [SciPy-User] Read matrix from matrix market format file Message-ID: <7BE88C19F22BFE44979FC114121314D2413EAF2E@MBX2.OPENHOST.FR> Hi, I've got some problems reading a matrix from a matrix market format file. I have Python 3.2 and installed Numpy 1.7.1, Scipy 0.12.0 and matplotlib 1.2.1 on Windows 64. I tried the scipy.io.mmread function but got an error I don't understand. >>> B=scipy.io.mmread("my_matrix.mtx") Traceback (most recent call last): File "", line 1, in File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 70, in mmread return MMFile().read(source) File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 301, in read self._parse_header(stream) File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 337, in _parse_he der self.__class__.info(stream) File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 208, in info raise ValueError("Header line not of length 3: " + line) It is a big matrix. Is there someone who has a clue about what I'm doing wrong ? Your help will be much appreciated. Thanks Valene [cid:image001.jpg at 01CE62C7.51BB5AD0]Val?ne PELLISSIER - R&D Engineer CEDRAT S.A. 15 Chemin de Malacher - Inovall?e - 38246 MEYLAN cedex - FRANCE Phone: +33 (0)4 76 90 50 45 - Fax: +33 (0)4 56 38 08 30 valene.pellissier at cedrat.com - www.cedrat.com [youtube] [youtube] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1014 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1130 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 4777 bytes Desc: image001.jpg URL: From robanhk at gmail.com Thu Jun 6 09:46:54 2013 From: robanhk at gmail.com (Roban Kramer) Date: Thu, 6 Jun 2013 09:46:54 -0400 Subject: [SciPy-User] Read matrix from matrix market format file In-Reply-To: <7BE88C19F22BFE44979FC114121314D2413EAF2E@MBX2.OPENHOST.FR> References: <7BE88C19F22BFE44979FC114121314D2413EAF2E@MBX2.OPENHOST.FR> Message-ID: Can you send the first few lines of the my_matrix.mtx file? On Thu, Jun 6, 2013 at 9:05 AM, Val?ne Pellissier < Valene.Pellissier at cedrat.com> wrote: > Hi, **** > > ** ** > > I've got some problems reading a matrix from a matrix market format file. > **** > > I have Python 3.2 and installed Numpy 1.7.1, Scipy 0.12.0 and matplotlib > 1.2.1 on Windows 64. **** > > I tried the scipy.io.mmread function but got an error I don't understand. > **** > > ** ** > > >>> B=scipy.io.mmread("my_matrix.mtx")**** > > Traceback (most recent call last):**** > > File "", line 1, in **** > > File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 70, in mmread > **** > > return MMFile().read(source)**** > > File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 301, in read > **** > > self._parse_header(stream)**** > > File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 337, in > _parse_he**** > > der**** > > self.__class__.info(stream)**** > > File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 208, in info > **** > > raise ValueError("Header line not of length 3: " + line)**** > > ** ** > > It is a big matrix. **** > > Is there someone who has a clue about what I'm doing wrong ? **** > > ** ** > > Your help will be much appreciated.**** > > Thanks**** > > Valene**** > > ** ** > > *****Val?ne PELLISSIER *- R&D Engineer > *CEDRAT S.A.* > 15 Chemin de Malacher - Inovall?e - 38246 MEYLAN cedex - FRANCE > Phone: +33 (0)4 76 90 50 45 - Fax: +33 (0)4 56 38 08 30**** > > valene.pellissier at cedrat.com - www.cedrat.com [image: youtube][image: > youtube] *** > * > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1130 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 4777 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1014 bytes Desc: not available URL: From josef.pktd at gmail.com Thu Jun 6 09:49:31 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 6 Jun 2013 09:49:31 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <20130606072327.cf0288bce43ffbae6947d7ca@esrf.fr> Message-ID: On Thu, Jun 6, 2013 at 8:57 AM, Matthew Brett wrote: > Hi, > > On Thu, Jun 6, 2013 at 1:19 PM, wrote: >> On Thu, Jun 6, 2013 at 7:21 AM, Matthew Brett wrote: >>> Hi, >>> >>> On Thu, Jun 6, 2013 at 6:23 AM, Jerome Kieffer wrote: >>>> On Wed, 5 Jun 2013 23:08:10 +0100 >>>> Nathaniel Smith wrote: >>>> >>>>> But... have you ever sat down and written tests for a piece of widely >>>>> used academic software? (Not LAPACK, but some random large package >>>>> that's widely used within a field but doesn't have a comprehensive >>>>> test suite of its own.) Everyone I've heard of who's done this >>>>> discovers bugs all over the place. Would you personally trip over them >>>>> if you didn't test the code? Who knows, maybe not. And probably most >>>>> of the rest -- off by one errors here and there, maybe an incorrect >>>>> normalizing constant, etc., -- end up not mattering too much. Or maybe >>>>> they do. How could you even tell? >>>> >>>> I found bugs in scipy.ndimage.shift and in scipy.stats.linregress. >>>> The first took me ages to be spotted as I was assuming the error was on >>>> my side as scipy was seen as a "large library widely used". >>> >>> Well said. See also Blake Griffith's current struggles with >>> scipy.sparse (last message title "parametric tests, known failures and >>> skipped tests"). >> >> As far as I understand these are not BUGs. >> These are TDD test failures during development while adding support to >> additional dtypes. > > See for example : https://github.com/scipy/scipy/issues/2542 > > In particular that ticket ends with "Existing tests only tested lil > with float data." you cut off the other part of my statement But that doesn't mean scipy.sparse didn't work correctly for the initial implementation for float matrices. *float* Josef > > Not that this is surprising - I learned to test the hell out of > everything by finding how often I wrote broken code myself. > > Cheers, > > Matthew > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From matthew.brett at gmail.com Thu Jun 6 10:33:17 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 6 Jun 2013 07:33:17 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <20130606072327.cf0288bce43ffbae6947d7ca@esrf.fr> Message-ID: Hi, On Thu, Jun 6, 2013 at 6:49 AM, wrote: > On Thu, Jun 6, 2013 at 8:57 AM, Matthew Brett wrote: >> Hi, >> >> On Thu, Jun 6, 2013 at 1:19 PM, wrote: >>> On Thu, Jun 6, 2013 at 7:21 AM, Matthew Brett wrote: >>>> Hi, >>>> >>>> On Thu, Jun 6, 2013 at 6:23 AM, Jerome Kieffer wrote: >>>>> On Wed, 5 Jun 2013 23:08:10 +0100 >>>>> Nathaniel Smith wrote: >>>>> >>>>>> But... have you ever sat down and written tests for a piece of widely >>>>>> used academic software? (Not LAPACK, but some random large package >>>>>> that's widely used within a field but doesn't have a comprehensive >>>>>> test suite of its own.) Everyone I've heard of who's done this >>>>>> discovers bugs all over the place. Would you personally trip over them >>>>>> if you didn't test the code? Who knows, maybe not. And probably most >>>>>> of the rest -- off by one errors here and there, maybe an incorrect >>>>>> normalizing constant, etc., -- end up not mattering too much. Or maybe >>>>>> they do. How could you even tell? >>>>> >>>>> I found bugs in scipy.ndimage.shift and in scipy.stats.linregress. >>>>> The first took me ages to be spotted as I was assuming the error was on >>>>> my side as scipy was seen as a "large library widely used". >>>> >>>> Well said. See also Blake Griffith's current struggles with >>>> scipy.sparse (last message title "parametric tests, known failures and >>>> skipped tests"). >>> >>> As far as I understand these are not BUGs. >>> These are TDD test failures during development while adding support to >>> additional dtypes. >> >> See for example : https://github.com/scipy/scipy/issues/2542 >> >> In particular that ticket ends with "Existing tests only tested lil >> with float data." > > you cut off the other part of my statement > > But that doesn't mean scipy.sparse didn't work correctly for the > initial implementation for float matrices. > > *float* Sorry - I think I read your message too quickly. On the other hand that neatly points out the problem that the user would be unlikely to guess that sparse would only work correctly for floats. Cheers, Matthew From josef.pktd at gmail.com Thu Jun 6 10:44:20 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 6 Jun 2013 10:44:20 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <20130606072327.cf0288bce43ffbae6947d7ca@esrf.fr> Message-ID: On Thu, Jun 6, 2013 at 8:59 AM, Matthew Brett wrote: > Hi, > > On Thu, Jun 6, 2013 at 1:56 PM, wrote: >>>>> I found bugs in scipy.ndimage.shift and in scipy.stats.linregress. >>>>> The first took me ages to be spotted as I was assuming the error was on >>>>> my side as scipy was seen as a "large library widely used". >> >> Ok, I found the stats.linregress case >> https://github.com/scipy/scipy/pull/433 >> >> There is no way I write unit tests for all edge cases that I never >> expect to show up. >> For sure you find bugs/behavior like this in many packages, and I >> wouldn't trust any package for extreme cases, no matter what their >> test suite is. > > I guess that means the user has to know what you thought an extreme case was? Anything that gets close to machine precision in a special case requires special attention. I assume many scipy.special distribution functions where written with statistical tests in mind, with maybe good accuracy in the 0.0001 to 0.5 percentiles. I wouldn't trust any of them for extreme tails 1e-30 until I have verified them. And I know in which cases Pauli and others expanded the range with good precision. https://github.com/scipy/scipy/issues/1489 fixed by https://github.com/scipy/scipy/pull/2494 but never went high on *my* priorities > > I think the point of test driven development is precisely in order to > specify the edges before you've locked yourself down to an > implementation. If one write's the implementation first one often > does forget the edges. "A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools." DA It's a question of priorities, I don't spend my time coming up with edge cases where something might fail, and then still only cover 50% of things users might run into. Some edge cases are important, some are just a numerical curiosity. example: minimum sample size for time series analysis in statsmodels is not checked http://groups.google.com/group/pystatsmodels/browse_thread/thread/15bba79f7474e1b3 I have an open issue for it, but I have no idea why someone would do time series analysis with 5 observations. It doesn't worry me enough to drop everything and fix the "bug". skew and kurtosis tests in scipy.stats now enforce the correct minimum sample size. example almost perfect collinearity in estimating a linear regression: the model produces nonsense, but what a statistical package is doing in this case and how close to perfect collinearity it can get without breaking down varies widely. my priorities are usually: check that something is correct for 99.5% of use cases and worry about the other 0.5% when they actually show up. And sometimes we have to revise our evaluation, when an edge case that we never thought off actually occurs pretty regularly. (if you want an example: problems with perfect prediction in Logit that neither Skipper nor I knew about until someone ran into it.) http://statsmodels.sourceforge.net/stable/pitfalls.html#unidentified-parameters to come back to the original point: I think edge cases are an area where having a large user base, that does implicit functional testing, is an advantage, and where I would trust packages that are popular more than those that have a larger test suite (when that's not the same). Josef > > Cheers, > > Matthew > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From matthew.brett at gmail.com Thu Jun 6 11:30:57 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 6 Jun 2013 08:30:57 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <20130606072327.cf0288bce43ffbae6947d7ca@esrf.fr> Message-ID: Hi, On Thu, Jun 6, 2013 at 7:44 AM, wrote: > On Thu, Jun 6, 2013 at 8:59 AM, Matthew Brett wrote: >> Hi, >> >> On Thu, Jun 6, 2013 at 1:56 PM, wrote: >>>>>> I found bugs in scipy.ndimage.shift and in scipy.stats.linregress. >>>>>> The first took me ages to be spotted as I was assuming the error was on >>>>>> my side as scipy was seen as a "large library widely used". >>> >>> Ok, I found the stats.linregress case >>> https://github.com/scipy/scipy/pull/433 >>> >>> There is no way I write unit tests for all edge cases that I never >>> expect to show up. >>> For sure you find bugs/behavior like this in many packages, and I >>> wouldn't trust any package for extreme cases, no matter what their >>> test suite is. >> >> I guess that means the user has to know what you thought an extreme case was? > > Anything that gets close to machine precision in a special case > requires special attention. > > I assume many scipy.special distribution functions where written with > statistical tests in mind, with maybe good accuracy in the 0.0001 to > 0.5 percentiles. I wouldn't trust any of them for extreme tails 1e-30 > until I have verified them. And I know in which cases Pauli and others > expanded the range with good precision. > > https://github.com/scipy/scipy/issues/1489 > fixed by https://github.com/scipy/scipy/pull/2494 > but never went high on *my* priorities > >> >> I think the point of test driven development is precisely in order to >> specify the edges before you've locked yourself down to an >> implementation. If one write's the implementation first one often >> does forget the edges. > > "A common mistake that people make when trying to design something > completely foolproof is to underestimate the ingenuity of complete > fools." DA The complete fool I am writing the tests for is me :) > It's a question of priorities, I don't spend my time coming up with > edge cases where something might fail, and then still only cover 50% > of things users might run into. Some edge cases are important, some > are just a numerical curiosity. I personally find that writing the tests first - and considering the edge cases first - is quicker in the end. I find that it's easier to think clearly about the implementation before it's written. Having said that, about half the time I don't write the tests first, but I almost always regret it in due course. Cheers, Matthew From josef.pktd at gmail.com Thu Jun 6 12:06:38 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 6 Jun 2013 12:06:38 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <20130606072327.cf0288bce43ffbae6947d7ca@esrf.fr> Message-ID: > > On the other hand that neatly points out the problem that the user > would be unlikely to guess that sparse would only work correctly for > floats. Sorry, I guess the arguments got a bit twisted. I'm not arguing against unit tests. The point of Matt and me was that functional testing and a large user base is important, and that we can rely on packages that have those as users (for the usual usage). When I use Stata or numpy, then I don't check whether the mean of 100 variables is correctly calculated (unless my numbers go from 1e-30 to 1e+30). When I use pandas, then I quickly realize that ddof=1 for standard deviation and I cannot use it as plug-in for numpy. scipy is a library that needs unit tests: When I started with scipy 5 years ago, there were huge gaps in test coverage in many sub-packages, especially in the less popular areas. There was not even a minimal test coverage for some modules or functions, and I wouldn't trust any of those results. Bugs have mainly been fixed in response to bug reports. So, the popular functions were pretty safe and bugfree. Everything else was a gamble. I think all the major gaps in test coverage have been closed by now. for example linalg and fftpack were always pretty good (based on underlying libraries and heavy usage) special, signal, and stats got lot's of attention (I'm not sure how far signal is, stats still has some problems) ndimage got a partial makeover, and might still have rough edges sparse got partial improvement and is on the schedule for this year optimize got a refactoring, but there are still problems in some algorithms that are mainly found by functional testing. interpolate a mixed bag, and splines are a bit messy. integrate: I don't remember any problems there weave: dead maxentropy: removed because of lack of users and maintainers I don't know much about the other ones. Now, there are very few bugs that show up in scipy.stats that I feel urgent enough to prepare a pull request myself. The last pull requests of mine that I merged into statsmodels had around 95% test coverage, almost all verified against other statistical packages, but no unit tests for dtypes, pandas dataframes or anything "weird". (I will need to go back and add tests for pandas dataframes.) Josef From paulo.ortins at gmail.com Thu Jun 6 12:19:13 2013 From: paulo.ortins at gmail.com (Paulo Ortins) Date: Thu, 6 Jun 2013 13:19:13 -0300 Subject: [SciPy-User] indices in scipy.ndimage.morphology.distance_transform_edt Message-ID: Hello guys, When i use distance_transform_edt, it returned a indices matrix. What this matrix means ? -- Atenciosamente, Paulo Ortins. (71) 8834 - 0628 Check out my blog ! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Valene.Pellissier at cedrat.com Thu Jun 6 12:48:12 2013 From: Valene.Pellissier at cedrat.com (=?iso-8859-1?Q?Val=E8ne_Pellissier?=) Date: Thu, 6 Jun 2013 16:48:12 +0000 Subject: [SciPy-User] Read matrix from matrix market format file In-Reply-To: References: <7BE88C19F22BFE44979FC114121314D2413EAF2E@MBX2.OPENHOST.FR> Message-ID: <7BE88C19F22BFE44979FC114121314D2413EB1E9@MBX2.OPENHOST.FR> Sorry I found out the problem. Blanks ... Do you know an easy and fast way to delete the useless blanks, and still keep the ones to seperate the variables ? My file looks like that : %%MatrixMarket matrix coordinate real symmetric 8176 8176 50823 1 1 1.00000000000000 2 1 3.217746558609426E-002 3 1 -5.773555991094832E-002 15 1 5.773555991029918E-002 16 1 -2.670364769384909E-002 50 1 2.666074407522187E-002 51 1 0.203221218356395 58 1 -0.203221218355300 65 1 5.555410250046564E-004 66 1 0.147340602419819 69 1 -3.939755803017718E-003 122 1 -0.147340602419818 123 1 3.939755803017390E-003 124 1 -5.359343846466105E-004 125 1 -0.175978855055765 3897 1 1.844398988268397E-003 3931 1 -6.804457799602319E-004 4146 1 6.804457843076172E-004 Thanks a lot for you help ! [cid:image001.jpg at 01CE62E6.63D1D860]Val?ne PELLISSIER - R&D Engineer CEDRAT S.A. 15 Chemin de Malacher - Inovall?e - 38246 MEYLAN cedex - FRANCE Phone: +33 (0)4 76 90 50 45 - Fax: +33 (0)4 56 38 08 30 valene.pellissier at cedrat.com - www.cedrat.com [youtube] [youtube] De : scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] De la part de Roban Kramer Envoy? : jeudi 6 juin 2013 15:47 ? : SciPy Users List Objet : Re: [SciPy-User] Read matrix from matrix market format file Can you send the first few lines of the my_matrix.mtx file? On Thu, Jun 6, 2013 at 9:05 AM, Val?ne Pellissier > wrote: Hi, I've got some problems reading a matrix from a matrix market format file. I have Python 3.2 and installed Numpy 1.7.1, Scipy 0.12.0 and matplotlib 1.2.1 on Windows 64. I tried the scipy.io.mmread function but got an error I don't understand. >>> B=scipy.io.mmread("my_matrix.mtx") Traceback (most recent call last): File "", line 1, in File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 70, in mmread return MMFile().read(source) File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 301, in read self._parse_header(stream) File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 337, in _parse_he der self.__class__.info(stream) File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 208, in info raise ValueError("Header line not of length 3: " + line) It is a big matrix. Is there someone who has a clue about what I'm doing wrong ? Your help will be much appreciated. Thanks Valene [cid:image001.jpg at 01CE62E6.63D1D860]Val?ne PELLISSIER - R&D Engineer CEDRAT S.A. 15 Chemin de Malacher - Inovall?e - 38246 MEYLAN cedex - FRANCE Phone: +33 (0)4 76 90 50 45 - Fax: +33 (0)4 56 38 08 30 valene.pellissier at cedrat.com - www.cedrat.com [youtube] [youtube] _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1014 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1130 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 4777 bytes Desc: image001.jpg URL: From msuzen at gmail.com Thu Jun 6 13:10:43 2013 From: msuzen at gmail.com (Suzen, Mehmet) Date: Thu, 6 Jun 2013 19:10:43 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <20130606072327.cf0288bce43ffbae6947d7ca@esrf.fr> Message-ID: On 6 June 2013 17:30, Matthew Brett wrote: > I personally find that writing the tests first - and considering the > edge cases first - is quicker in the end. I find that it's easier to > think clearly about the implementation before it's written. Having > said that, about half the time I don't write the tests first, but I > almost always regret it in due course. TDD is quite suited here. There is also BDD (Behaviour-driven development) c.f. http://lettuce.it/intro/overview.html Best, -m From zachary.pincus at yale.edu Thu Jun 6 16:37:00 2013 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 6 Jun 2013 16:37:00 -0400 Subject: [SciPy-User] indices in scipy.ndimage.morphology.distance_transform_edt In-Reply-To: References: Message-ID: <80EB5D41-CAB2-4E43-BB5B-B2F4225A1881@yale.edu> > Hello guys, > > When i use distance_transform_edt, it returned a indices matrix. > > What this matrix means ? According to the docstring: > In addition to the distance transform, the feature transform can > be calculated. In this case the index of the closest background > element is returned along the first axis of the result. The examples in the docstring might help illustrate this a bit. Basically, the distance transform gives the distance to the closest background element. The indices specify *which* background element it was that was closest... Zach From newville at cars.uchicago.edu Fri Jun 7 08:03:01 2013 From: newville at cars.uchicago.edu (Matt Newville) Date: Fri, 7 Jun 2013 07:03:01 -0500 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: On Wed, Jun 5, 2013 at 5:08 PM, Nathaniel Smith wrote: > On Wed, Jun 5, 2013 at 10:36 PM, Matt Newville > wrote: >> The paper that Alan Isaac referred to that started this conversation >> seemed to advocate for unit testing in the sense of "don't trust the >> codes you're using, always test them". At first reading, this seems >> like good advice. Since unit testing (or, at least, the phrase) is >> relatively new for software development, it gives the appearance of >> being new advice. But the authors damage their case by continuing on >> by saying not to trust analysis tools built by other scientists based >> on the reputation and prior use of thse tools. Here, they expose the >> weakness of favoring "unit tests" over "functional tests". They are >> essentially advocating throwing out decades of proven, tested work >> (and claiming that the use of this work to date is not justified, as >> it derives from un-due reputation of the authors of prior work) for a >> fashionable new trend. Science is deliberately conservative, and >> telling scientists that unit testing is all the rage among the cool >> programmers and they should jump on that bandwagon is not likely to >> gain much traction. > > But... have you ever sat down and written tests for a piece of widely > used academic software? (Not LAPACK, but some random large package > that's widely used within a field but doesn't have a comprehensive > test suite of its own.) Everyone I've heard of who's done this > discovers bugs all over the place. Would you personally trip over them > if you didn't test the code? Who knows, maybe not. And probably most > of the rest -- off by one errors here and there, maybe an incorrect > normalizing constant, etc., -- end up not mattering too much. Or maybe > they do. How could you even tell? Sorry for the delay in responding. For some definition of 'widely used academic software' and some definition of "unit testing", why yes, I have. And I have found many errors. I use unit tests, and I am not saying they are bad. I'm saying that other testing methods are valid too. Advocating for one testing method to the exclusion of others is not a good idea. I'm also defending the conservative scientist who finds the errors that "end up not mattering much" as, well, not mattering so much, until they are important, which would probably be a case where the code was applied to a new category or range of problem, which previous tests may not have covered. The non-software analogy is have a very well calibrated and tested meter over some range, and applying it to a new range. It might fail spectacularly, it might work very well, and it might work partially. Applying tools to new problems is what scientific instruments (and software) have to try to do. They might not work as expected. Unit testing with inputs over the expected are not entirely useful here. > You should absolutely check scipy.optimize.leastsq before using it! Really? Are you sure that is what you mean to say? By "you" do you mean me, personally, or do you mean everyone using scipy.optimize.leastsq? If you mean me, personally, It turns out I have written tests (functional) against the NIST test suite that Josef mentioned: https://github.com/newville/lmfit-py/blob/master/tests/fit_NIST_leastsq.py The results are actually not so clear. Most tests "pass" to very high precision, some "pass", but at lower precision than the certified values, and some do not do very well at all. But then, the NIST test suite is especially grueling. I also believe that the certified NIST values may have actually come from MINPACK-1. In that case, the test shows roughly that scipy.optimize.leastsq is as good as MINPACK-1, which is saying something, but not very much. But if you mean everyone, then I completely disagree. The point of using scipy is that one can be reasonably sure it has already been tested. Of course there may be bugs. What would the tests that *everyone* writes be testing anyway? If they just repeat other tests, it proves little more than that they can write a test. Should they also absolutely check numpy.sqrt? Does numpy use the underlying implementation from the C standard library for sqrt, or does it have its own? I don't know, but if you're suggesting that everyone should test everything, I'm sure you can tell us where these stray from the correct values by more than machine precision. What should one not absolutely check? I suspect you don't mean that everyone should absolutely test everything, but what you wrote could easily be read that way. > You could rewrite it too if you want, I guess, and if you write a > thorough test suite it might even work out. But it's pretty bizarre to > me to think that someone is going to think "ah-hah, writing my own > code + test suite will be easier than just writing a test suite!" Sure > some people are going to find ways to procrastinate on the real > problem (*cough*grad students*cough*) and NIH ain't just a funding > body. But that's totally orthogonal to whether tests are good. But if you don't trust the other person's code, why would you even bother testing it? And yes, I think many people would think that writing their own code would be easier and better than writing tests for someone else's buggy code. My reading of the Joppa et al paper is that a principal complaint of theirs is that people use existing software packages based on things like "reputation of the package author(s)", and "how many times its been used in the literature". They advocate being very skeptical of such software. This ignores any testing that has already gone into the existing package -- indeed they imply that there probably is none. But, the uses in the literature demonstrate that the results of the library or package can work well, at least in some cases. This is "prior work", and ignoring it is not good. Ignoring the existing literature is a very common problem in science, as many people prefer to spend a week in the lab to save them that hour in the library. But actually *advocating* for others to not use the existing literature or existing packages is a terrible idea. The balance, the pH meter, and the thermocouple were each, at one point in time, sophisticated devices. Now, not so much. You check the label, check that it is not obviously wrong, and believe its results. Of course, these instruments have intrinsic uncertainties, and can be just wrong in certain cases, but you are not (usually) better off building your own. Similarly, the C compiler, the quick-sort algorithm, and the fast Fourier transform. The Joppa et al paper can easily be read to say that scientists should not trust LAPACK, FFTPACK, MINPACK-1. It sounds very close to your saying "you should absolutely check scipy.optimize.leastsq" and leaving it unclear whether you mean "every scientist who ever uses it". This "trust nothing" approach could easily throw out the baby with the bathwater. It is certainly not how science is actually done, because science attempts to apply previous knowledge and methods to new problems, while maintaining a healthy skepticism that previous knowledge and methods may be flawed. Again, unit testing is akin to checking your instruments are working correctly. Yes, this is important. Functional testing *is* the scientific method. > Honestly I'm not even sure what unit-testing "bandwagon" you're > talking about. Again, I'm not opposed to unit testing, or any other testing method at all, and find unit testing very useful (I was writing unit tests yesterday, in fact, and may write more today). But it appears to me that some people are under the impression that a) if code has unit tests it is bug free, and b) if code does not have unit tests, it is full of bugs. Both are wrong. I take Jerome Kieffer's (always great to see synchrotron people here!) story as a good illustration. He didn't test before using scipy. When he found a problem, he first assumed it was in his code, and only after some work found the problem was in scipy itself. This is how science works. Yes, it would have been better if the problem hadn't existed, but now the problem has been fixed for later users. If Jerome had trusted nothing, he would have had no reason to trust scipy, and the bug in scipy may not have been found. Finally, the fact that his story of finding a bug in scipy was worth repeating suggest that the number of bugs found per user is very low. --Matt Newville From matthew.brett at gmail.com Fri Jun 7 08:17:54 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 7 Jun 2013 05:17:54 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: Hi, On Fri, Jun 7, 2013 at 5:03 AM, Matt Newville wrote: > Again, I'm not opposed to unit testing, or any other testing method at > all, and find unit testing very useful (I was writing unit tests > yesterday, in fact, and may write more today). But it appears to me > that some people are under the impression that a) if code has unit > tests it is bug free, and b) if code does not have unit tests, it is > full of bugs. Both are wrong. I don't suppose anyone thinks that exactly. If they've been writing and using software for a while they probably think that code with unit tests is more likely to be reliable than code without, and that code without unit tests should be treated with great caution. I think it would be very hard to argue that the need for unit tests had been overplayed. I haven't seen anyone test code too much, and I have often seen people (myself included) test code too little. Best, Matthew From pav at iki.fi Fri Jun 7 09:20:54 2013 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 7 Jun 2013 13:20:54 +0000 (UTC) Subject: [SciPy-User] peer review of scientific software References: <51A39B6D.4030607@gmail.com> Message-ID: Matt Newville cars.uchicago.edu> writes: [clip] > But it appears to me > that some people are under the impression that a) if code has unit > tests it is bug free, and b) if code does not have unit tests, it is > full of bugs. Both are wrong. Every scientist worth their salt of course tests whether their results are robust. When this involves programming, the typical approach I've seen is to run with inputs for which the expected outputs are known in some other way, and compare results, check conservation laws etc. The first issue of course is that this testing is done manually, which is what you are inclined to do if you have not heard about automated testing (which nobody bothers to teach you about during your studies in non-IT fields). For small code bases, this probably scales. For larger projects or a longer time span, the amount of effort in testing increases. The second issue is that typically what has been tested is not recorded anywhere. (I have *never* seen anyone keep a programming notebook, whereas a lab notebook is quite mandatory...) As a consequence, you keep forgetting what you did. This wastes time, and you cannot communicate to other people what you have actually tested, even if they take you by your word. Of course, you can maintain a set of test cases and expected results, but this is essentially unit/functional testing. *** I'd argue that a) and b) become more likely as the size of the code base and development history grows bigger, unless you throw proportional manpower to manual testing. Manpower however does not remove the issue that being unable to communicate to other people in a detailed way what you actually did is unprofessional in science. -- Pauli Virtanen From msuzen at gmail.com Fri Jun 7 18:14:33 2013 From: msuzen at gmail.com (Suzen, Mehmet) Date: Sat, 8 Jun 2013 00:14:33 +0200 Subject: [SciPy-User] Linux Journal Article Message-ID: This might be interesting. Article by Joey Bernard: Running scientific code using IPython and SciPy http://dl.acm.org/citation.cfm?id=2492105 From ronogara at yahoo.com Sun Jun 9 15:47:34 2013 From: ronogara at yahoo.com (R. O'Gara) Date: Sun, 9 Jun 2013 12:47:34 -0700 (PDT) Subject: [SciPy-User] FW: breaking news R. O'Gara Message-ID: <1370807254.31232.YahooMailNeo@web126102.mail.ne1.yahoo.com> http://www.izberidasportuvash.org/jheqxcsf.php -------------- next part -------------- An HTML attachment was scrubbed... URL: From wuzzyview at gmail.com Mon Jun 10 10:56:03 2013 From: wuzzyview at gmail.com (Ahmed Fasih) Date: Mon, 10 Jun 2013 10:56:03 -0400 Subject: [SciPy-User] Question about Scipy tutorial relating QR decomposition and SVD Message-ID: In the Scipy tutorial's discussion of linear algebra, specifically the QR decomposition [1], the claim is made that the QR decomposition can be found via the SVD, i.e., rather than doing >> Q, R = scipy.linalg.qr(A) one may use the SVD to get a QR decomposition: >> U, S, Vh = scipy.linalg.svd(A) >> Q2 = U >> R2 = numpy.dot(numpy.diag(S), Vh) However, having just tried this for a random square matrix `A`, I can verify that `R2` above is not upper-triangular, and (Q2, R2) isn't quite a QR decomposition. Should the tutorial be updated to excise this from its discussion, or am I doing something wrong? Thanks, Ahmed [1] http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html#qr-decomposition From robert.kern at gmail.com Mon Jun 10 11:09:47 2013 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 10 Jun 2013 16:09:47 +0100 Subject: [SciPy-User] Question about Scipy tutorial relating QR decomposition and SVD In-Reply-To: References: Message-ID: On Mon, Jun 10, 2013 at 3:56 PM, Ahmed Fasih wrote: > In the Scipy tutorial's discussion of linear algebra, specifically the > QR decomposition [1], the claim is made that the QR decomposition can > be found via the SVD, i.e., rather than doing > >>> Q, R = scipy.linalg.qr(A) > > one may use the SVD to get a QR decomposition: > >>> U, S, Vh = scipy.linalg.svd(A) >>> Q2 = U >>> R2 = numpy.dot(numpy.diag(S), Vh) > > However, having just tried this for a random square matrix `A`, I can > verify that `R2` above is not upper-triangular, and (Q2, R2) isn't > quite a QR decomposition. Should the tutorial be updated to excise > this from its discussion, or am I doing something wrong? The tutorial is wrong. The SVD and the QR decomposition do not have that relationship. -- Robert Kern From ralf.gommers at gmail.com Mon Jun 10 15:23:48 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 10 Jun 2013 21:23:48 +0200 Subject: [SciPy-User] Python GSoC Planet Message-ID: This may be interesting to follow, roughly half the students are working on science & engineering topics: http://terri.toybox.ca/python-soc/ Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From dkajah at gmail.com Mon Jun 10 15:55:48 2013 From: dkajah at gmail.com (Daniel Penalva) Date: Mon, 10 Jun 2013 16:55:48 -0300 Subject: [SciPy-User] Python GSoC Planet In-Reply-To: References: Message-ID: Many thanks Ralf, i will spread the word in my net. On Mon, Jun 10, 2013 at 4:23 PM, Ralf Gommers wrote: > This may be interesting to follow, roughly half the students are working > on science & engineering topics: http://terri.toybox.ca/python-soc/ > > Ralf > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jun 11 11:51:31 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 11 Jun 2013 11:51:31 -0400 Subject: [SciPy-User] bugs in scipy.stats Message-ID: Follow-up on recent thread on unit and functional testing. scipy.stats has still some bugs, at least with some options. Problems: - functions that are not heavily used - functions without unit tests - functions where unit tests only cover a special case I'm looking for the first time more seriously at tests for equality of variances: levene `trimmed` returns wrong numbers if data is not sorted (`trimmed` is doing better in Monte Carlo studies than `median`) trim_mean returns wrong numbers if data is 2-d obrientransform raises an exception if not all arrays have the same length any other broken corners ??? stats review has still some way to go. volunteers for checking some less used corners ? Josef From guziy.sasha at gmail.com Tue Jun 11 13:24:32 2013 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Tue, 11 Jun 2013 13:24:32 -0400 Subject: [SciPy-User] bugs in scipy.stats In-Reply-To: References: Message-ID: Hi Josef, could you, please, list the functions which need to be tested? ? And the link to the testing approach that you'd prefer me to use unittest, nose, doctest? I am not experienced tester but really want to help. Cheers -- Oleksandr (Sasha) Huziy -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jun 11 13:58:54 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 11 Jun 2013 13:58:54 -0400 Subject: [SciPy-User] bugs in scipy.stats In-Reply-To: References: Message-ID: On Tue, Jun 11, 2013 at 1:24 PM, Oleksandr Huziy wrote: > Hi Josef, > > could you, please, list the functions which need to be tested? > And the link to the testing approach that you'd prefer me to use unittest, > nose, doctest? I am not experienced tester but really want to help. Hi Sasha, I didn't run the test coverage on scipy.stats in a long time This was my old list (2009) which is very outdated https://github.com/scipy/scipy/issues/1554 All tests are run with nose, scipy doesn't have doctests. The testing guidelines are at https://github.com/numpy/numpy/blob/master/doc/TESTS.rst.txt The pattern for the tests can be seen in the test suite https://github.com/scipy/scipy/tree/master/scipy/stats/tests especially test_stats.py and test_morestats.py, and those for mstats One check that would also be very helpful is to try out different kinds of arguments. For example, I think there might still be problems with 2d arrays in some functions. Some will raise ValueErrors if they cannot handle 2d arrays, but some might just return incorrect numbers. Example: I never looked closely at `mood` which has unit tests, so a quick try: >>> stats.mood(np.random.randn(10,2), np.random.randn(15,2)) (26.664783935766987, 1.2060935978310698e-156) >>> stats.mood(np.random.randn(10), np.random.randn(15)) (-0.46553454010068451, 0.64154870791874163) the first result looks pretty weird In these cases we should add a `raise ValueError` or try to enhance it to 2d. Thank you, Josef > > Cheers > -- > Oleksandr (Sasha) Huziy > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Tue Jun 11 14:21:18 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 11 Jun 2013 14:21:18 -0400 Subject: [SciPy-User] bugs in scipy.stats In-Reply-To: References: Message-ID: On Tue, Jun 11, 2013 at 1:58 PM, wrote: > On Tue, Jun 11, 2013 at 1:24 PM, Oleksandr Huziy wrote: >> Hi Josef, >> >> could you, please, list the functions which need to be tested? >> And the link to the testing approach that you'd prefer me to use unittest, >> nose, doctest? I am not experienced tester but really want to help. > > Hi Sasha, > > I didn't run the test coverage on scipy.stats in a long time > This was my old list (2009) which is very outdated > https://github.com/scipy/scipy/issues/1554 > > All tests are run with nose, scipy doesn't have doctests. The testing > guidelines are at > https://github.com/numpy/numpy/blob/master/doc/TESTS.rst.txt > > The pattern for the tests can be seen in the test suite > https://github.com/scipy/scipy/tree/master/scipy/stats/tests > especially test_stats.py and test_morestats.py, and those for mstats > > One check that would also be very helpful is to try out different > kinds of arguments. > For example, I think there might still be problems with 2d arrays in > some functions. Some will raise ValueErrors if they cannot handle 2d > arrays, but some might just return incorrect numbers. > > Example: I never looked closely at `mood` which has unit tests, so a quick try: >>>> stats.mood(np.random.randn(10,2), np.random.randn(15,2)) > (26.664783935766987, 1.2060935978310698e-156) >>>> stats.mood(np.random.randn(10), np.random.randn(15)) > (-0.46553454010068451, 0.64154870791874163) > > the first result looks pretty weird > > In these cases we should add a `raise ValueError` or try to enhance it to 2d. If you check a function and it works as advertised, then this would also be good to know We still have an old milestone for the stats review, where we can also note that everything is fine https://github.com/scipy/scipy/issues?milestone=4&state=open or open a new issue and note the functions that you checked there, so we have a record. Other hypothesis test that I never looked at in detail and tried out only with "nice" numbers are fligner, ansari, bartlett, ... >>> x, y = np.random.randn(10,2), np.random.randn(15,2) >>> stats.bartlett(x, y) (4.7695839486013287e-05, 0.99448967952524281) >>> stats.bartlett(x.ravel(), y.ravel()) (0.00010231554134127248, 0.99192944393408666) looks also wrong (ttests are vectorized, ks tests raise an exception somewhere in the code with 2d) Josef > > Thank you, > > Josef > >> >> Cheers >> -- >> Oleksandr (Sasha) Huziy >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> From guziy.sasha at gmail.com Wed Jun 12 00:37:33 2013 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Wed, 12 Jun 2013 00:37:33 -0400 Subject: [SciPy-User] bugs in scipy.stats In-Reply-To: References: Message-ID: Thank you Josef, for the detailed description. Though I have one more question concerning the workflow. After I've cloned the project locally can I develop and run tests from inside the directory without reinstalling it? (I suppose build is necessary when fortran code changes), but I would like to be able to edit and run the code from the same place.. (this would permit for example to use it as a project in my favourite IDE). But when I try to run test_morestats.py from inside the project I get the following exception: Traceback (most recent call last): File "/home/san/.IntelliJIdea11/config/plugins/python/helpers/pycharm/utrunner.py", line 113, in modules = [loadSource(a[0])] File "/home/san/.IntelliJIdea11/config/plugins/python/helpers/pycharm/utrunner.py", line 44, in loadSource module = imp.load_source(moduleName, fileName) File "/home/san/Python/scipy_project/scipy/stats/tests/test_morestats.py", line 14, in import scipy.stats as stats File "/home/san/Python/scipy_project/scipy/__init__.py", line 120, in raise ImportError(msg) ImportError: Error importing scipy: you cannot import scipy while being in scipy source directory; please exit the scipy source tree first, and relaunch your python intepreter. How do you work on it? I understand that installing it (using virtualenv, which I am currently using for scipy testing) and then running tests in a different place would work, but is it the only way, it does not seem efficient. Cheers -- Sasha 2013/6/11 > On Tue, Jun 11, 2013 at 1:58 PM, wrote: > > On Tue, Jun 11, 2013 at 1:24 PM, Oleksandr Huziy > wrote: > >> Hi Josef, > >> > >> could you, please, list the functions which need to be tested? > >> And the link to the testing approach that you'd prefer me to use > unittest, > >> nose, doctest? I am not experienced tester but really want to help. > > > > Hi Sasha, > > > > I didn't run the test coverage on scipy.stats in a long time > > This was my old list (2009) which is very outdated > > https://github.com/scipy/scipy/issues/1554 > > > > All tests are run with nose, scipy doesn't have doctests. The testing > > guidelines are at > > https://github.com/numpy/numpy/blob/master/doc/TESTS.rst.txt > > > > The pattern for the tests can be seen in the test suite > > https://github.com/scipy/scipy/tree/master/scipy/stats/tests > > especially test_stats.py and test_morestats.py, and those for mstats > > > > One check that would also be very helpful is to try out different > > kinds of arguments. > > For example, I think there might still be problems with 2d arrays in > > some functions. Some will raise ValueErrors if they cannot handle 2d > > arrays, but some might just return incorrect numbers. > > > > Example: I never looked closely at `mood` which has unit tests, so a > quick try: > >>>> stats.mood(np.random.randn(10,2), np.random.randn(15,2)) > > (26.664783935766987, 1.2060935978310698e-156) > >>>> stats.mood(np.random.randn(10), np.random.randn(15)) > > (-0.46553454010068451, 0.64154870791874163) > > > > the first result looks pretty weird > > > > In these cases we should add a `raise ValueError` or try to enhance it > to 2d. > > If you check a function and it works as advertised, then this would > also be good to know > We still have an old milestone for the stats review, where we can also > note that everything is fine > https://github.com/scipy/scipy/issues?milestone=4&state=open > > or open a new issue and note the functions that you checked there, so > we have a record. > > Other hypothesis test that I never looked at in detail and tried out > only with "nice" numbers are > fligner, ansari, bartlett, ... > > >>> x, y = np.random.randn(10,2), np.random.randn(15,2) > >>> stats.bartlett(x, y) > (4.7695839486013287e-05, 0.99448967952524281) > >>> stats.bartlett(x.ravel(), y.ravel()) > (0.00010231554134127248, 0.99192944393408666) > > looks also wrong > > (ttests are vectorized, ks tests raise an exception somewhere in the > code with 2d) > > Josef > > > > > > Thank you, > > > > Josef > > > >> > >> Cheers > >> -- > >> Oleksandr (Sasha) Huziy > >> > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jun 12 01:26:57 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 12 Jun 2013 01:26:57 -0400 Subject: [SciPy-User] bugs in scipy.stats In-Reply-To: References: Message-ID: On Wed, Jun 12, 2013 at 12:37 AM, Oleksandr Huziy wrote: > Thank you Josef, > > for the detailed description. Though I have one more question concerning the > workflow. > > After I've cloned the project locally can I develop and run tests from > inside the directory without reinstalling it? (I suppose build is necessary > when fortran code changes), but I would like to be able to edit and run the > code from the same place.. (this would permit for example to use it as a > project in my favourite IDE). But when I try to run test_morestats.py from > inside the project I get the following exception: > > Traceback (most recent call last): > File > "/home/san/.IntelliJIdea11/config/plugins/python/helpers/pycharm/utrunner.py", > line 113, in > modules = [loadSource(a[0])] > File > "/home/san/.IntelliJIdea11/config/plugins/python/helpers/pycharm/utrunner.py", > line 44, in loadSource > module = imp.load_source(moduleName, fileName) > File "/home/san/Python/scipy_project/scipy/stats/tests/test_morestats.py", > line 14, in > import scipy.stats as stats > File "/home/san/Python/scipy_project/scipy/__init__.py", line 120, in > > raise ImportError(msg) > ImportError: Error importing scipy: you cannot import scipy while > being in scipy source directory; please exit the scipy source > tree first, and relaunch your python intepreter. > > How do you work on it? I understand that installing it (using virtualenv, > which I am currently using for scipy testing) and then running tests in a > different place would work, but is it the only way, it does not seem > efficient. I usually run nosetests directly in a separate shell window, which makes it easy to run just specific test modules or individual tests nosetests path_to_test_module nosetests path_to_test_module:testfunction_name (IIRC) It uses whichever scipy is on the python path. So if you build inplace, then you should be able to edit, run nosetests, commit for pure python code. And the test_module can be, but does not need to be inside scipy, for example run a test module that is in a source tree when the build/installed scipy is somewhere else. The alternative using scipy.stats.tests in a separate python interpreter is much slower, since it's running all the tests for scipy.stats. Full test without skipping the slow tests takes 5 minutes or so. For me working on scipy is more complicated, since I cannot build it on Windows, but that's what I do for statsmodels. I usually use Eclipse for editing the source with inplace build of extensions, spyder to run some examples, and nosetests in a shell for the tests. There are some utility scripts for compiling and testing scipy, but for pure python code it's a detour that's not necessary. Cheers, Josef > > Cheers > -- > Sasha > > > > > 2013/6/11 > >> On Tue, Jun 11, 2013 at 1:58 PM, wrote: >> > On Tue, Jun 11, 2013 at 1:24 PM, Oleksandr Huziy >> > wrote: >> >> Hi Josef, >> >> >> >> could you, please, list the functions which need to be tested? >> >> And the link to the testing approach that you'd prefer me to use >> >> unittest, >> >> nose, doctest? I am not experienced tester but really want to help. >> > >> > Hi Sasha, >> > >> > I didn't run the test coverage on scipy.stats in a long time >> > This was my old list (2009) which is very outdated >> > https://github.com/scipy/scipy/issues/1554 >> > >> > All tests are run with nose, scipy doesn't have doctests. The testing >> > guidelines are at >> > https://github.com/numpy/numpy/blob/master/doc/TESTS.rst.txt >> > >> > The pattern for the tests can be seen in the test suite >> > https://github.com/scipy/scipy/tree/master/scipy/stats/tests >> > especially test_stats.py and test_morestats.py, and those for mstats >> > >> > One check that would also be very helpful is to try out different >> > kinds of arguments. >> > For example, I think there might still be problems with 2d arrays in >> > some functions. Some will raise ValueErrors if they cannot handle 2d >> > arrays, but some might just return incorrect numbers. >> > >> > Example: I never looked closely at `mood` which has unit tests, so a >> > quick try: >> >>>> stats.mood(np.random.randn(10,2), np.random.randn(15,2)) >> > (26.664783935766987, 1.2060935978310698e-156) >> >>>> stats.mood(np.random.randn(10), np.random.randn(15)) >> > (-0.46553454010068451, 0.64154870791874163) >> > >> > the first result looks pretty weird >> > >> > In these cases we should add a `raise ValueError` or try to enhance it >> > to 2d. >> >> If you check a function and it works as advertised, then this would >> also be good to know >> We still have an old milestone for the stats review, where we can also >> note that everything is fine >> https://github.com/scipy/scipy/issues?milestone=4&state=open >> >> or open a new issue and note the functions that you checked there, so >> we have a record. >> >> Other hypothesis test that I never looked at in detail and tried out >> only with "nice" numbers are >> fligner, ansari, bartlett, ... >> >> >>> x, y = np.random.randn(10,2), np.random.randn(15,2) >> >>> stats.bartlett(x, y) >> (4.7695839486013287e-05, 0.99448967952524281) >> >>> stats.bartlett(x.ravel(), y.ravel()) >> (0.00010231554134127248, 0.99192944393408666) >> >> looks also wrong >> >> (ttests are vectorized, ks tests raise an exception somewhere in the >> code with 2d) >> >> Josef >> >> >> > >> > Thank you, >> > >> > Josef >> > >> >> >> >> Cheers >> >> -- >> >> Oleksandr (Sasha) Huziy >> >> >> >> _______________________________________________ >> >> SciPy-User mailing list >> >> SciPy-User at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From pav at iki.fi Wed Jun 12 06:15:34 2013 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 12 Jun 2013 10:15:34 +0000 (UTC) Subject: [SciPy-User] bugs in scipy.stats References: Message-ID: gmail.com> writes: [clip] > There are some utility scripts for compiling and testing scipy, but > for pure python code it's a detour that's not necessary. If you on the other hand are able to build scipy, it's also easy to just run: python runtests.py -vg -t \ scipy/stats/tests/test_stats.py:TestRound.test_rounding0 -- Pauli Virtanen From sudheer.joseph at yahoo.com Wed Jun 12 20:48:26 2013 From: sudheer.joseph at yahoo.com (Sudheer Joseph) Date: Thu, 13 Jun 2013 08:48:26 +0800 (SGT) Subject: [SciPy-User] Fw: [SciPy-Dev] t-statistic In-Reply-To: <1371034285.13509.YahooMailNeo@web193403.mail.sg3.yahoo.com> References: <1371034285.13509.YahooMailNeo@web193403.mail.sg3.yahoo.com> Message-ID: <1371084506.22161.YahooMailNeo@web193404.mail.sg3.yahoo.com> ?Dear experts, ? ????????????????? I am doing a project involving ?regression of a model variable with observed variable and wanted to find ?t-values dynamically as the number of available observations involved ?in comparison changes. Is there a tool in numpy/scipy which gives the ?appropriate t-value if we give number of samples ? ? ?t = 2.31??? ??? ??? ??? # appropriate t value (where n=9, two tailed 95%) ? ?with best regards, ?Sudheer From josef.pktd at gmail.com Wed Jun 12 23:07:00 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 12 Jun 2013 23:07:00 -0400 Subject: [SciPy-User] Welch's Anova (unequal variances) Message-ID: Question out of curiosity scipy stats has f_oneway which does the standard one-way ANOVA that assumes equal variances across groups. Similar to Welch's t-test, Welch's ANOVA allows for different variances across groups. I don't find anything with a Google search for "Welch's Anova in python". Does everyone who uses ANOVA have data with equal variances across groups, or is there something that google didn't find, or is everyone using resampling methods? Josef From jsalvatier at gmail.com Thu Jun 13 04:13:51 2013 From: jsalvatier at gmail.com (John Salvatier) Date: Thu, 13 Jun 2013 01:13:51 -0700 Subject: [SciPy-User] Gradient of spline interpolation at a point wrt changing the knot points Message-ID: I'm using scipy.interpolate.InterpolatedUnivariateSpline in a statistical application. y = InterpolatedUnivariateSpline(x0,y0)(x) I need the derivatives of the spline evaluated at a given point with respect to changing the knot values (y0). That is, I need the derivative of y wrt to y0. Is there a way I could compute this? Thank you, John -------------- next part -------------- An HTML attachment was scrubbed... URL: From sgarcia at olfac.univ-lyon1.fr Thu Jun 13 05:40:09 2013 From: sgarcia at olfac.univ-lyon1.fr (Samuel Garcia) Date: Thu, 13 Jun 2013 11:40:09 +0200 Subject: [SciPy-User] ANN: neo 0.3.0 release Message-ID: <51B99379.6060001@olfac.univ-lyon1.fr> We are pleased to announce the 0.3.0 release of the neo. Neo is a package for representing electrophysiology data in Python, together with support for reading a wide range of neurophysiology file formats, including Spike2, NeuroExplorer, AlphaOmega, Axon, Blackrock, Plexon, Tdt, and support for writing to a subset of these formats plus non-proprietary formats including HDF5. The goal of Neo is to improve interoperability between Python tools for analyzing, visualizing and generating electrophysiology data (such as OpenElectrophy, NeuroTools, G-node, Helmholtz, PyNN) by providing a common, shared object model. In order to be as lightweight a dependency as possible, Neo is deliberately limited to represention of data, with no functions for data analysis or visualization. Neo implements a hierarchical data model well adapted to intracellular and extracellular electrophysiology and EEG data with support for multi-electrodes (for example tetrodes). Neo's data objects build on the quantities_ package, which in turn builds on NumPy by adding support for physical dimensions. Thus neo objects behave just like normal NumPy arrays, but with additional metadata, checks for dimensional consistency and automatic unit conversion. Release 0.3.0 notes: * various bug fixes in neo.io * added ElphyIO * SpikeTrain performence improved * An IO class now can return a list of Block (see read_all_blocks in IOs) * python3 compatibility improved Home page: http://neuralensemble.org/neo Mailing list: https://groups.google.com/forum/?fromgroups#!forum/neuralensemble Documentation: http://packages.python.org/neo/ The neo team From gkclri at yahoo.com Thu Jun 13 11:40:48 2013 From: gkclri at yahoo.com (Gopalakrishnan Ravimohan) Date: Thu, 13 Jun 2013 08:40:48 -0700 (PDT) Subject: [SciPy-User] hey Message-ID: <1371138048.98459.YahooMailNeo@web125106.mail.ne1.yahoo.com> http://www.moreaux.com.au/wp-content/themes/toolbox/youtube.php?bnflasex892weqv.html gkclri Gopalakrishnan Ravimohan ....................... If at first you don't succeed, give up; no use being a damn fool. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglists at xgm.de Fri Jun 14 06:50:17 2013 From: mailinglists at xgm.de (Florian Lindner) Date: Fri, 14 Jun 2013 12:50:17 +0200 Subject: [SciPy-User] Ignore characters while reading text Message-ID: <5789326.FQK5ToiTHf@horus> Hello, I have a text file with data like 1 (2 3 4) (5 6 7) (8 9 10) 2 (4 5 1) (3 6 8) (1 6 45) How can I read that file into an array? [ [1, 2, 3, 5, 6, 7, 8, 9, 10] [2, 4, 1, 3, ... ] ] I tried genfromtxt with deletechars="()" but that seems to affect only 'names'. I also tried delimiter="() " but that didn't work either. Thanks! Florian From newville at cars.uchicago.edu Fri Jun 14 08:32:09 2013 From: newville at cars.uchicago.edu (Matt Newville) Date: Fri, 14 Jun 2013 07:32:09 -0500 Subject: [SciPy-User] Ignore characters while reading text In-Reply-To: <5789326.FQK5ToiTHf@horus> References: <5789326.FQK5ToiTHf@horus> Message-ID: On Fri, Jun 14, 2013 at 5:50 AM, Florian Lindner wrote: > Hello, > > I have a text file with data like > > 1 (2 3 4) (5 6 7) (8 9 10) > 2 (4 5 1) (3 6 8) (1 6 45) > > How can I read that file into an array? > > [ > [1, 2, 3, 5, 6, 7, 8, 9, 10] > [2, 4, 1, 3, ... ] > ] > > I tried genfromtxt with deletechars="()" but that seems to affect only 'names'. > I also tried delimiter="() " but that didn't work either. Would this do? import numpy as np from cStringIO import StringIO txt= '1 (2 3 4) (5 6 7) (8 9 10)' np.loadtxt(StringIO(txt.replace('(', '').replace(')', ''))) --Matt From davidmenhur at gmail.com Fri Jun 14 08:46:22 2013 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Fri, 14 Jun 2013 14:46:22 +0200 Subject: [SciPy-User] Ignore characters while reading text In-Reply-To: References: <5789326.FQK5ToiTHf@horus> Message-ID: On 14 June 2013 14:32, Matt Newville wrote: > Would this do? > > import numpy as np > from cStringIO import StringIO > txt= '1 (2 3 4) (5 6 7) (8 9 10)' > np.loadtxt(StringIO(txt.replace('(', '').replace(')', ''))) If I am not mistaken, then you are reading the data twice or thrice. If this is big, and performance is critical, you may be better off doing the loadtxt yourself. The core of np.loadtxt is essencially a "from line in file: data.append(parse(line))", with some wrapping intelligence, that probably is not needed in your case. https://github.com/numpy/numpy/blob/v1.7.0/numpy/lib/npyio.py#L610 ---->loadtxt https://github.com/numpy/numpy/blob/v1.7.0/numpy/lib/npyio.py#L1573 --->genfromtxt David. From scopatz at gmail.com Sun Jun 2 11:59:17 2013 From: scopatz at gmail.com (Anthony Scopatz) Date: Sun, 2 Jun 2013 10:59:17 -0500 Subject: [SciPy-User] ANN: PyTables 3.0 Message-ID: =========================== Announcing PyTables 3.0.0 =========================== We are happy to announce PyTables 3.0.0. PyTables 3.0.0 comes after about 5 years from the last major release (2.0) and 7 months since the last stable release (2.4.0). This is new major release and an important milestone for the PyTables project since it provides the long waited support for Python 3.x, which has been around for 4 years. Almost all of the core numeric/scientific packages for Python already support Python 3 so we are very happy that now also PyTables can provide this important feature. What's new ========== A short summary of main new features: - Since this release, PyTables now provides full support to Python 3 - The entire code base is now more compliant with coding style guidelines described in PEP8. - Basic support for HDF5 drivers. It now is possible to open/create an HDF5 file using one of the SEC2, DIRECT, LOG, WINDOWS, STDIO or CORE drivers. - Basic support for in-memory image files. An HDF5 file can be set from or copied into a memory buffer. - Implemented methods to get/set the user block size in a HDF5 file. - All read methods now have an optional *out* argument that allows to pass a pre-allocated array to store data. - Added support for the floating point data types with extended precision (Float96, Float128, Complex192 and Complex256). - Consistent ``create_xxx()`` signatures. Now it is possible to create all data sets Array, CArray, EArray, VLArray, and Table from existing Python objects. - Complete rewrite of the `nodes.filenode` module. Now it is fully compliant with the interfaces defined in the standard `io` module. Only non-buffered binary I/O is supported currently. Please refer to the RELEASE_NOTES document for a more detailed list of changes in this release. As always, a large amount of bugs have been addressed and squashed as well. In case you want to know more in detail what has changed in this version, please refer to: http://pytables.github.io/release_notes.html You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://sourceforge.net/projects/pytables/files/pytables/3.0.0 For an online version of the manual, visit: http://pytables.github.io/usersguide/index.html What it is? =========== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than a tenth of a second. Resources ========= About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Developers -------------- next part -------------- An HTML attachment was scrubbed... URL: From hamra at whamra.com Mon Jun 3 14:55:53 2013 From: hamra at whamra.com (Waleed Hamra) Date: Mon, 03 Jun 2013 21:55:53 +0300 Subject: [SciPy-User] how do I get the subtrees of dendrogram made by scipy.cluster.hierarchy? Message-ID: <1649728.LLlivvqnTC@waleed-virtual-machine> I had a confusion regarding this module (scipy.cluster.hierarchy) ... and still have some ! For example we have this dendrogram: http://img62.imageshack.us/img62/8130/3ieb4.png My question is how can I extract the coloured subtrees (each one represent a cluster) in a nice format, say SIF format ? Now the code to get the plot above is: In [1]: import scipy In [2]: import scipy.cluster.hierarchy as sch In [3]: import matplotlib.pylab as plt In [4]: X = scipy.randn(100,2) In [5]: d = sch.distance.pdist(X) In [6]: Z= sch.linkage(d,method='complete') In [7]: P =sch.dendrogram(Z) In [8]: plt.savefig('plot_dendrogram.png') In [9]: T = sch.fcluster(Z, 0.5*d.max(), 'distance') In [10]: T Out[10]: array([4, 5, 3, 2, 2, 3, 5, 2, 2, 5, 2, 2, 2, 3, 2, 3, 2, 5, 4, 5, 2, 5, 2, 3, 3, 3, 1, 3, 4, 2, 2, 4, 2, 4, 3, 3, 2, 5, 5, 5, 3, 2, 2, 2, 5, 4, 2, 4, 2, 2, 5, 5, 1, 2, 3, 2, 2, 5, 4, 2, 5, 4, 3, 5, 4, 4, 2, 2, 2, 4, 2, 5, 2, 2, 3, 3, 2, 4, 5, 3, 4, 4, 2, 1, 5, 4, 2, 2, 5, 5, 2, 2, 5, 5, 5, 4, 3, 3, 2, 4], dtype=int32) In [11]: sch.leaders(Z,T) Out[11]: (array([190, 191, 182, 193, 194], dtype=int32), array([2, 3, 1, 4,5],dtype=int32)) So now, the output of fcluster() gives the clustering of the nodes (by their id's), and leaders() described here is supposed to return 2 arrays: first one contains the leader nodes of the clusters generated by Z, here we can see we have 5 clusters, as well as in the plot and the second one the id's of these clusters So if this leaders() returns resp. L and M : L[2]=182 and M[2]=1, then cluster 1 is leaded by node id 182, which doesn't exist in the observations set X, the documentation says "... then it corresponds to a non-singleton cluster". But I can't get it ... Also, I converted the Z to a tree by sch.to_tree(Z), that will return an easy- to-use tree object, which I want to visualize, but which tool should I use as a graphical platform that manipulate these kind of tree objects as inputs? thanks in advance :) From hamra at whamra.com Mon Jun 3 15:44:25 2013 From: hamra at whamra.com (Waleed Hamra) Date: Mon, 03 Jun 2013 22:44:25 +0300 Subject: [SciPy-User] how do I get the subtrees of dendrogram made by scipy.cluster.hierarchy? Message-ID: <1470482.sEqcgp5PqC@waleed-virtual-machine> I had a confusion regarding this module (scipy.cluster.hierarchy) ... and still have some ! For example we have this dendrogram: http://img62.imageshack.us/img62/8130/3ieb4.png My question is how can I extract the coloured subtrees (each one represent a cluster) in a nice format, say SIF format ? Now the code to get the plot above is: In [1]: import scipy In [2]: import scipy.cluster.hierarchy as sch In [3]: import matplotlib.pylab as plt In [4]: X = scipy.randn(100,2) In [5]: d = sch.distance.pdist(X) In [6]: Z= sch.linkage(d,method='complete') In [7]: P =sch.dendrogram(Z) In [8]: plt.savefig('plot_dendrogram.png') In [9]: T = sch.fcluster(Z, 0.5*d.max(), 'distance') In [10]: T Out[10]: array([4, 5, 3, 2, 2, 3, 5, 2, 2, 5, 2, 2, 2, 3, 2, 3, 2, 5, 4, 5, 2, 5, 2, 3, 3, 3, 1, 3, 4, 2, 2, 4, 2, 4, 3, 3, 2, 5, 5, 5, 3, 2, 2, 2, 5, 4, 2, 4, 2, 2, 5, 5, 1, 2, 3, 2, 2, 5, 4, 2, 5, 4, 3, 5, 4, 4, 2, 2, 2, 4, 2, 5, 2, 2, 3, 3, 2, 4, 5, 3, 4, 4, 2, 1, 5, 4, 2, 2, 5, 5, 2, 2, 5, 5, 5, 4, 3, 3, 2, 4], dtype=int32) In [11]: sch.leaders(Z,T) Out[11]: (array([190, 191, 182, 193, 194], dtype=int32), array([2, 3, 1, 4,5],dtype=int32)) So now, the output of fcluster() gives the clustering of the nodes (by their id's), and leaders() described here is supposed to return 2 arrays: first one contains the leader nodes of the clusters generated by Z, here we can see we have 5 clusters, as well as in the plot and the second one the id's of these clusters So if this leaders() returns resp. L and M : L[2]=182 and M[2]=1, then cluster 1 is leaded by node id 182, which doesn't exist in the observations set X, the documentation says "... then it corresponds to a non-singleton cluster". But I can't get it ... Also, I converted the Z to a tree by sch.to_tree(Z), that will return an easy- to-use tree object, which I want to visualize, but which tool should I use as a graphical platform that manipulate these kind of tree objects as inputs? thanks in advance :) From nils106 at googlemail.com Thu Jun 6 09:15:59 2013 From: nils106 at googlemail.com (Nils Wagner) Date: Thu, 6 Jun 2013 15:15:59 +0200 Subject: [SciPy-User] Read matrix from matrix market format file In-Reply-To: <7BE88C19F22BFE44979FC114121314D2413EAF2E@MBX2.OPENHOST.FR> References: <7BE88C19F22BFE44979FC114121314D2413EAF2E@MBX2.OPENHOST.FR> Message-ID: Please, can you provide the first three lines of your *.mtx file ? Nils On Thu, Jun 6, 2013 at 3:05 PM, Val?ne Pellissier < Valene.Pellissier at cedrat.com> wrote: > Hi, **** > > ** ** > > I've got some problems reading a matrix from a matrix market format file. > **** > > I have Python 3.2 and installed Numpy 1.7.1, Scipy 0.12.0 and matplotlib > 1.2.1 on Windows 64. **** > > I tried the scipy.io.mmread function but got an error I don't understand. > **** > > ** ** > > >>> B=scipy.io.mmread("my_matrix.mtx")**** > > Traceback (most recent call last):**** > > File "", line 1, in **** > > File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 70, in mmread > **** > > return MMFile().read(source)**** > > File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 301, in read > **** > > self._parse_header(stream)**** > > File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 337, in > _parse_he**** > > der**** > > self.__class__.info(stream)**** > > File "C:\Python32\lib\site-packages\scipy\io\mmio.py", line 208, in info > **** > > raise ValueError("Header line not of length 3: " + line)**** > > ** ** > > It is a big matrix. **** > > Is there someone who has a clue about what I'm doing wrong ? **** > > ** ** > > Your help will be much appreciated.**** > > Thanks**** > > Valene**** > > ** ** > > *****Val?ne PELLISSIER *- R&D Engineer > *CEDRAT S.A.* > 15 Chemin de Malacher - Inovall?e - 38246 MEYLAN cedex - FRANCE > Phone: +33 (0)4 76 90 50 45 - Fax: +33 (0)4 56 38 08 30**** > > valene.pellissier at cedrat.com - www.cedrat.com [image: youtube][image: > youtube] *** > * > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.blelloch at gmail.com Sat Jun 8 23:46:03 2013 From: paul.blelloch at gmail.com (Paul Blelloch) Date: Sat, 8 Jun 2013 20:46:03 -0700 (PDT) Subject: [SciPy-User] Numpy 1.7.1 Crashing with MKL and AVX instructions In-Reply-To: <07647d8b-69b2-422a-aa78-b3af9ad11c31@googlegroups.com> References: <07647d8b-69b2-422a-aa78-b3af9ad11c31@googlegroups.com> Message-ID: <32f8e11e-7565-48bf-bb3b-fe4be2bee9cd@googlegroups.com> Christoph Gohlke figured this out and has updated the MKL 64-bit builds of both numpy 1.7.1 and scipy 0.12.0 on his site. On Friday, May 24, 2013 8:24:37 AM UTC-7, Paul Blelloch wrote: > > I've tried posting this on the numpy list, but it keeps on getting > bounced, so I'll try here: > > I found that when I went from numpy 1.7.0 to 1.7.1 I get a crash whenever > I try an eigenvalue calculation (or any other linalg calculation) on > matrices bigger than about 200x200. This happens with both the latest > Anaconda and WinPython 64-bit Windows distributions (both of which use > numpy 1.7.1) and occurs on all the HP workstations at my company. The > folks at Continuum helped me debug this and determined that the failure was > most likely in use of AVX instructions in MKL, but we haven't found a > workaround yet short of sticking to numpy 1.7.0. The problem is apparently > specific to some combination of processor and OS. Here's what I have: > > > > OS Name Microsoft Windows 7 Professional > Version 6.1.7601 Service Pack 1 Build 7601 > OS Manufacturer Microsoft Corporation > System Name Z420-6 > System Manufacturer Hewlett-Packard > System Model HP Z420 Workstation > System Type x64-based PC > Processor Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz, 3201 Mhz, 6 Core(s), > 12 Logical Processor(s) > BIOS Version/Date Hewlett-Packard J61 v01.14, 7/17/2012 > SMBIOS Version 2.7 > Windows Directory C:\Windows > System Directory C:\Windows\system32 > Boot Device \Device\HarddiskVolume1 > Locale United States > Hardware Abstraction Layer Version = "6.1.7601.17514" > Installed Physical Memory (RAM) 32.0 GB > Total Physical Memory 31.9 GB > Available Physical Memory 28.4 GB > Total Virtual Memory 95.8 GB > Available Virtual Memory 92.1 GB > Page File Space 63.9 GB > Page File C:\pagefile.sys > > > > I'm surprised that I'm the only person out there running into this. Is > there anyone else who's running into this problem with numpy 1.7.1 with MKL > on 64-bit Windows? > > > > -Paul Blelloch > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zacdup at yahoo.com Thu Jun 13 04:07:42 2013 From: zacdup at yahoo.com (Museful) Date: Thu, 13 Jun 2013 01:07:42 -0700 (PDT) Subject: [SciPy-User] Linear interpolation in 3D In-Reply-To: References: Message-ID: <1371110862392-18395.post@n7.nabble.com> See this article on multilinear interpolation . -- View this message in context: http://scipy-user.10969.n7.nabble.com/Linear-interpolation-in-3D-tp12989p18395.html Sent from the Scipy-User mailing list archive at Nabble.com. From njs at pobox.com Fri Jun 14 10:07:50 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 14 Jun 2013 15:07:50 +0100 Subject: [SciPy-User] sqrtm is too slow for matrices of size 1000 In-Reply-To: <63b35b81-023d-4950-a584-7b2bcbfee247@googlegroups.com> References: <63b35b81-023d-4950-a584-7b2bcbfee247@googlegroups.com> Message-ID: On 14 Jun 2013 14:46, "Paul Blelloch" wrote: > > I think that I found the problem. It was in not recognizing that the inner for loop is actually a dot product. If I replace the following lines of code: > > > s = 0 > > for k in range(i+1,j): > > s = s + R[i,k]*R[k,j] > > > with > > > s = np.dot(R[i,(i+1):j],R[(i+1):j,j]) > > > Run time decreases from 367 to 15.6 seconds. My guess is that you could get considerable further speedup, but I'm pleased with the 15.6 seconds. If you copy the sqrtm function from scipy and make that change I think that you'll see considerable improvement. If you'd like to submit a pull request with this change then I bet the scipy developers will be very interested... -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Fri Jun 14 10:31:13 2013 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Fri, 14 Jun 2013 10:31:13 -0400 Subject: [SciPy-User] Storing return values of optimize.fmin() In-Reply-To: References: Message-ID: On Mon, Apr 22, 2013 at 6:12 AM, Jeroen Meidam wrote: > Hi, > > I am using optimize.fmin to minimize a function over 2 parameters. > In the documentation it says that the output is: > (xopt, {fopt, iter, funcalls, warnflag}) > > I have no problem putting xopt into a variable, because this is simply > done by writing: > xopt = fmin(function,x0) > After which I can use xopt for anything I need it for. > > What I want however, is to store "fopt" into a variable, like I did with > xopt. In the standard case, fopt is only returned as text in the output > stream: > " > Optimization terminated successfully. > Current function value: -0.995801 <--- This is what I'm > interested in > Iterations: 35 > Function evaluations: 71 > " > > How can I store it into a variable? Is it possible? > > Your email is dated April 22, so you might already have the answer by now, but in case not: Use the argument `full_output=True`. For example: In [97]: xopt, fopt, iter, funcalls, warnflag = fmin(func, 0, full_output=True) Optimization terminated successfully. Current function value: 3.000000 Iterations: 31 Function evaluations: 62 In [98]: xopt Out[98]: array([ 10.]) In [99]: fopt Out[99]: 3.0 Warren > Thanks, > Jeroen > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mutantturkey at gmail.com Fri Jun 14 10:36:34 2013 From: mutantturkey at gmail.com (Calvin Morrison) Date: Fri, 14 Jun 2013 10:36:34 -0400 Subject: [SciPy-User] Sparse Matricies and NNLS In-Reply-To: References: Message-ID: Ariel, Thank you! I will sure to take a look at it! Calvin On 15 April 2013 12:39, Ariel Rokem wrote: > Hey Calvin, > > > On Mon, Apr 1, 2013 at 6:07 AM, Calvin Morrison > wrote: >> >> Unforunately, >> >> Tsnnls might have been fast in 2001, trying it on a moderatley sized >> dataset is beyond slow >> >> Calvin >> >> On Apr 1, 2013 8:57 AM, "Jonathan Guyer" wrote: >>> >>> >>> On Mar 28, 2013, at 5:33 PM, Calvin Morrison wrote: >>> >>> > It seems nobody wants to touch the nnls algorithm because the only >>> > implementation that is floating around is the one from the original >>> > publication or automatic conversions of it. >>> >>> For whatever it's worth, the second google hit for "nnls sparse" is >>> >>> http://www.michaelpiatek.com/papers/tsnnls.pdf >>> >>> "tsnnls: A solver for large sparse least squares problems with >>> non-negative variables >>> >>> The solution of large, sparse constrained least-squares problems is a >>> staple in scientific and engineering applications. However, currently >>> available codes for such problems are proprietary or based on MATLAB. We >>> announce a freely available C implementation of the fast block pivoting >>> algorithm of Portugal, Judice, and Vicente. Our version is several times >>> faster than Matstoms? MATLAB implementation of the same algorithm. Further, >>> our code matches the accuracy of MATLAB?s built-in lsqnonneg function." >>> >>> All links to the code seem to be dead, but it's probably worth contacting >>> the authors. >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > I've found stochastic gradient descent to be very useful for this kind of > thing. > > Here's an implementation, adapted from a colleague's Matlab implementation: > > https://gist.github.com/arokem/5389417 > > HTH, > > Ariel > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From newville at cars.uchicago.edu Fri Jun 14 11:59:07 2013 From: newville at cars.uchicago.edu (Matt Newville) Date: Fri, 14 Jun 2013 10:59:07 -0500 Subject: [SciPy-User] Ignore characters while reading text In-Reply-To: References: <5789326.FQK5ToiTHf@horus> Message-ID: On Fri, Jun 14, 2013 at 7:46 AM, Da?id wrote: > On 14 June 2013 14:32, Matt Newville wrote: >> Would this do? >> >> import numpy as np >> from cStringIO import StringIO >> txt= '1 (2 3 4) (5 6 7) (8 9 10)' >> np.loadtxt(StringIO(txt.replace('(', '').replace(')', ''))) > > If I am not mistaken, then you are reading the data twice or thrice. > If this is big, and performance is critical, you may be better off > doing the loadtxt yourself. The core of np.loadtxt is essencially a > "from line in file: data.append(parse(line))", with some wrapping > intelligence, that probably is not needed in your case. > > https://github.com/numpy/numpy/blob/v1.7.0/numpy/lib/npyio.py#L610 ---->loadtxt > https://github.com/numpy/numpy/blob/v1.7.0/numpy/lib/npyio.py#L1573 > --->genfromtxt > > > David. What do you mean by "reading the data twice or thrice"? I would have said text data in this snippet is stored in a string, but never read from a disk. Once read from disk, the string.replace() method is fast, and StringIO makes a string look like a file-like structure, so I don't see how data is "read" multiple times. You are right that numpy.loadtxt is slightly slower than rolling your own. For 2Mb files with 20000 lines of 12 columns (all integers), the test code below gives: Array size = (20000, 12) Results are equivalent? True True True Time, numpy.loadtxt, parens not allowed: 13.5254 sec Time, numpy.loadtxt, parens allowed: 13.7965 sec Time, python list, parens not allowed: 10.1362 sec Time, python list, parens allowed: 10.7494 sec Allowing for parens with text.replace('(', '') etc is not significant -- certainly less time than pre-processing the files in any way. Using numpy.loadtxt is 30% slower than a direct read to a list, then conversion to an array, which might make a difference in some cases, but involves less code, and is more robust against unexpected input. import timeit import numpy as np from cStringIO import StringIO def f2arr_np0(fname): txt = open(fname, 'r').read() return np.loadtxt(StringIO(txt)) def f2arr_np1(fname): txt = open(fname, 'r').read() return np.loadtxt(StringIO(txt.replace('(', '').replace(')', ''))) def f2arr_py0(fname): fh = open(fname, 'r') tmp = [] for line in fh.readlines(): tmp.append([int(word) for word in line.split()]) return np.array(tmp) def f2arr_py1(fname): fh = open(fname, 'r') tmp = [] for line in fh.readlines(): tmp.append([int(word) for word in line.replace('(', '').replace(')', '').split()]) return np.array(tmp) # the file test0.dat has embedded parens, test1.dat does not p0 = f2arr_py0('test0.dat') p1 = f2arr_py1('test1.dat') n0 = f2arr_np0('test0.dat') n1 = f2arr_np1('test1.dat') print 'Array size = ', n1.shape print 'Results are equivalent? ', np.all(p0 == p1), np.all(p0 == n0), np.all(p0 == n1) rnp0 = timeit.timeit("f2arr_np0('test0.dat')", setup='from __main__ import f2arr_np0', number=25) rnp1 = timeit.timeit("f2arr_np1('test1.dat')", setup='from __main__ import f2arr_np1', number=25) rpy0 = timeit.timeit("f2arr_py0('test0.dat')", setup='from __main__ import f2arr_py0', number=25) rpy1 = timeit.timeit("f2arr_py1('test1.dat')", setup='from __main__ import f2arr_py1', number=25) print 'Time, numpy.loadtxt, parens not allowed: %.4f sec' % rnp0 print 'Time, numpy.loadtxt, parens allowed: %.4f sec' % rnp1 print 'Time, python list, parens not allowed: %.4f sec' % rpy0 print 'Time, python list, parens allowed: %.4f sec' % rpy1 --Matt From pav at iki.fi Fri Jun 14 17:45:15 2013 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 14 Jun 2013 21:45:15 +0000 (UTC) Subject: [SciPy-User] =?utf-8?q?Gradient_of_spline_interpolation_at_a_poin?= =?utf-8?q?t_wrt=09changing_the_knot_points?= References: Message-ID: John Salvatier gmail.com> writes: > > I'm using scipy.interpolate.InterpolatedUnivariateSpline in a statistical application.? > y = InterpolatedUnivariateSpline(x0,y0)(x) > > I need the derivatives of the spline evaluated at > a given point with respect to changing the knot values (y0). > ?That is, I need the derivative of y wrt to y0. Is there > a way I could compute this? You can work it out from the B-spline representation: https://github.com/pv/scipy-work/blob/spline- unify/scipy/interpolate/_bspline.py#L29 Not that there is ready-made code for it, but it should be possible to write it. -- Pauli Virtanen From pav at iki.fi Fri Jun 14 17:47:05 2013 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 14 Jun 2013 21:47:05 +0000 (UTC) Subject: [SciPy-User] =?utf-8?q?Gradient_of_spline_interpolation_at_a_poin?= =?utf-8?q?t_wrt=09changing_the_knot_points?= References: Message-ID: Pauli Virtanen iki.fi> writes: > John Salvatier gmail.com> writes: > > > > I'm using scipy.interpolate.InterpolatedUnivariateSpline in a statistical > application.? > > y = InterpolatedUnivariateSpline(x0,y0)(x) > > > > I need the derivatives of the spline evaluated at > > a given point with respect to changing the knot values (y0). > > ?That is, I need the derivative of y wrt to y0. Is there > > a way I could compute this? > > You can work it out from the B-spline representation: > > https://github.com/pv/scipy-work/blob/spline- > unify/scipy/interpolate/_bspline.py#L29 > > Not that there is ready-made code for it, but it should be > possible to write it. Sorry, wrote it too fast. Computing the spline coefficients is IIRC is a global process, and it is affected by knot locations, so I doubt there is a simple formula for the derivative. -- Pauli Virtanen From davidmenhur at gmail.com Fri Jun 14 19:03:09 2013 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Sat, 15 Jun 2013 01:03:09 +0200 Subject: [SciPy-User] Gradient of spline interpolation at a point wrt changing the knot points In-Reply-To: References: Message-ID: On 14 June 2013 23:47, Pauli Virtanen wrote: > Sorry, wrote it too fast. Computing the spline coefficients > is IIRC is a global process, and it is affected by knot locations, > so I doubt there is a simple formula for the derivative. Actually, it only depends on the surrounding points, and with linear properties. See, for example, its matrix form [1]: a multi-diagonal fixed matrix A multiplying a vector of coefficients b. A b = y So, one could solve the vector of coeficients: b = A^-1 y Now, the definition of derivative with respect to y_i (the value of y at point i): d b/dy_i = limit h->0 ((A^-1 (y + h)-A^-1 y)/h) = limit h->0 A^-1 h /h where h = [0, 0 .... h ....0] in the i-th position. And this is the sum of the i-th row of the matrix. In the case of cubic interpolation, that would be 0.507 for any point not in the borders or next to one. David. [1] Formula 9: http://www.mechanicaldust.com/UCB/math128a/cubic.pdf From davidmenhur at gmail.com Fri Jun 14 19:15:28 2013 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Sat, 15 Jun 2013 01:15:28 +0200 Subject: [SciPy-User] Ignore characters while reading text In-Reply-To: References: <5789326.FQK5ToiTHf@horus> Message-ID: On 14 June 2013 17:59, Matt Newville wrote: > What do you mean by "reading the data twice or thrice"? I would have > said text data in this snippet is stored in a string, but never read > from a disk. Once read from disk, the string.replace() method is > fast, and StringIO makes a string look like a file-like structure, so > I don't see how data is "read" multiple times. Bad choice of words on my part, sorry. You are right you read from disk only once, but you loop on it twice replacing (although this loop is implemented in C, so probably quite fast). Actually, at this point, it would be time to do some measurements. I have created a random string of numbers, including some parenthesis, of one million elements. One pass of replace takes ~1.2 ms, where the two replaces together are 3 ms. So you were right, this is most probably fast enough, and quite close to the most you can get. From josef.pktd at gmail.com Sat Jun 15 08:26:59 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 15 Jun 2013 08:26:59 -0400 Subject: [SciPy-User] Welch's Anova (unequal variances) In-Reply-To: References: Message-ID: On Wed, Jun 12, 2013 at 11:07 PM, wrote: > Question out of curiosity > > scipy stats has f_oneway which does the standard one-way ANOVA that > assumes equal variances across groups. > Similar to Welch's t-test, Welch's ANOVA allows for different > variances across groups. > > I don't find anything with a Google search for "Welch's Anova in python". > > Does everyone who uses ANOVA have data with equal variances across > groups, or is there something that google didn't find, or is everyone > using resampling methods? I misspelled stats.f_oneway and got stats.oneway instead. (I never seen that one.) it has an 'equal_var' option. However, either all numbers are wrong, or I don't understand what it's doing. Josef > > Josef From pav at iki.fi Sat Jun 15 09:23:06 2013 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 15 Jun 2013 16:23:06 +0300 Subject: [SciPy-User] Gradient of spline interpolation at a point wrt changing the knot points In-Reply-To: References: Message-ID: 15.06.2013 02:03, Da?id kirjoitti: > On 14 June 2013 23:47, Pauli Virtanen wrote: >> Sorry, wrote it too fast. Computing the spline coefficients >> is IIRC is a global process, and it is affected by knot locations, >> so I doubt there is a simple formula for the derivative. > > Actually, it only depends on the surrounding points, and with linear > properties. See, for example, its matrix form [1]: a multi-diagonal > fixed matrix A multiplying a vector of coefficients b. > > A b = y The matrix A depends here on the knot locations, so taking derivative with respect to them is not so straightforward? Ah, but I now read the original mail and it says the derivatives should be taken with respect to knot values. This is not so much of a problem... -- Pauli Virtanen From tmp50 at ukr.net Sat Jun 15 11:02:01 2013 From: tmp50 at ukr.net (Dmitrey) Date: Sat, 15 Jun 2013 18:02:01 +0300 Subject: [SciPy-User] new OpenOpt Suite release 0.50 Message-ID: <91637.1371308521.15870062949008277504@ffe6.ukr.net> Hi all, > I'm glad to inform you about new OpenOpt Suite release 0.50 (2013-June-15): > > * interalg (solver with specifiable accuracy) now works many times (sometimes orders) faster on (possibly multidimensional) integration problems (IP) and on some optimization problems * Add modeling dense (MI)(QC)QP in FuncDesigner (alpha-version, rendering may work slowly yet) * Bugfix for cplex wrapper * Some improvements for FuncDesigner interval analysis (and thus interalg) * Add FuncDesigner interval analysis for tan in range(-pi/2,pi/2) * Some other bugfixes and improvements * (Proprietary) FuncDesigner stochastic addon now is available as standalone pyc-file, became available for Python3 as well > > Regards, Dmitrey. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.robitaille at gmail.com Sun Jun 16 03:24:28 2013 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Sun, 16 Jun 2013 09:24:28 +0200 Subject: [SciPy-User] Covariance matrix from curve_fit Message-ID: Hi everyone, I have a question regarding the output from the scipy.optimize.curve_fit function - in the following example: """ In [1]: import numpy as np In [2]: from scipy.optimize import curve_fit In [3]: f = lambda x, a, b: a * x + b In [4]: x = np.array([0., 1., 2.]) In [5]: y = np.array([1.2, 4.6, 7.8]) In [6]: e = np.array([1., 1., 1.]) In [7]: curve_fit(f, x, y, sigma=e) Out[7]: (array([ 3.3 , 1.23333333]), array([[ 0.00333333, -0.00333333], [-0.00333333, 0.00555556]])) In [8]: curve_fit(f, x, y, sigma=e * 100) Out[8]: (array([ 3.3 , 1.23333333]), array([[ 0.00333333, -0.00333333], [-0.00333333, 0.00555556]])) """ it's clear that the covariance matrix does not take into account the uncertainties on the data points. If I do: """ popt, pcov = curve_fit(...) """ Then pcov[0,0]**0.5 is therefore not the uncertainty on the parameter, so I was wondering how this should be scaled to give the actual uncertainty on the parameter? Thanks! Tom From ralf.gommers at gmail.com Sun Jun 16 05:15:16 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 16 Jun 2013 11:15:16 +0200 Subject: [SciPy-User] problem with scipy.io.wavfile (urgent) In-Reply-To: References: Message-ID: On Mon, May 13, 2013 at 1:00 PM, rohan wadnerkar wrote: Hello all, > I am trying to read .wav file using scipy.io.wavfile.read(). It reads some > file properly. For some files its giving following error... > Hi, can you open an issue https://github.com/scipy/scipy/issues and include a link to a (preferably small) wav file that's failing for, plus the exact code to reproduce your issue? Without those details it's not possible to figure out what's going wrong here. Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From aldcroft at head.cfa.harvard.edu Sun Jun 16 06:57:17 2013 From: aldcroft at head.cfa.harvard.edu (Aldcroft, Thomas) Date: Sun, 16 Jun 2013 06:57:17 -0400 Subject: [SciPy-User] Covariance matrix from curve_fit In-Reply-To: References: Message-ID: On Sun, Jun 16, 2013 at 3:24 AM, Thomas Robitaille < thomas.robitaille at gmail.com> wrote: > Hi everyone, > > I have a question regarding the output from the > scipy.optimize.curve_fit function - in the following example: > > """ > In [1]: import numpy as np > > In [2]: from scipy.optimize import curve_fit > > In [3]: f = lambda x, a, b: a * x + b > > In [4]: x = np.array([0., 1., 2.]) > > In [5]: y = np.array([1.2, 4.6, 7.8]) > > In [6]: e = np.array([1., 1., 1.]) > > In [7]: curve_fit(f, x, y, sigma=e) > Out[7]: > (array([ 3.3 , 1.23333333]), > array([[ 0.00333333, -0.00333333], > [-0.00333333, 0.00555556]])) > > In [8]: curve_fit(f, x, y, sigma=e * 100) > Out[8]: > (array([ 3.3 , 1.23333333]), > array([[ 0.00333333, -0.00333333], > [-0.00333333, 0.00555556]])) > """ > > it's clear that the covariance matrix does not take into account the > uncertainties on the data points. If I do: > > """ > popt, pcov = curve_fit(...) > """ > > Then pcov[0,0]**0.5 is therefore not the uncertainty on the parameter, > so I was wondering how this should be scaled to give the actual > uncertainty on the parameter? > There was a long discussion by email and then github on this: http://mail.scipy.org/pipermail/scipy-user/2011-August/030412.html https://github.com/scipy/scipy/pull/448 The open pull request has the code to do the scaling you want. - Tom > > Thanks! > Tom > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.robitaille at gmail.com Sun Jun 16 15:33:59 2013 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Sun, 16 Jun 2013 21:33:59 +0200 Subject: [SciPy-User] Covariance matrix from curve_fit In-Reply-To: References: Message-ID: Hi Tom, On 16 June 2013 12:57, Aldcroft, Thomas wrote: > > > > On Sun, Jun 16, 2013 at 3:24 AM, Thomas Robitaille > wrote: >> >> Hi everyone, >> >> I have a question regarding the output from the >> scipy.optimize.curve_fit function - in the following example: >> >> """ >> In [1]: import numpy as np >> >> In [2]: from scipy.optimize import curve_fit >> >> In [3]: f = lambda x, a, b: a * x + b >> >> In [4]: x = np.array([0., 1., 2.]) >> >> In [5]: y = np.array([1.2, 4.6, 7.8]) >> >> In [6]: e = np.array([1., 1., 1.]) >> >> In [7]: curve_fit(f, x, y, sigma=e) >> Out[7]: >> (array([ 3.3 , 1.23333333]), >> array([[ 0.00333333, -0.00333333], >> [-0.00333333, 0.00555556]])) >> >> In [8]: curve_fit(f, x, y, sigma=e * 100) >> Out[8]: >> (array([ 3.3 , 1.23333333]), >> array([[ 0.00333333, -0.00333333], >> [-0.00333333, 0.00555556]])) >> """ >> >> it's clear that the covariance matrix does not take into account the >> uncertainties on the data points. If I do: >> >> """ >> popt, pcov = curve_fit(...) >> """ >> >> Then pcov[0,0]**0.5 is therefore not the uncertainty on the parameter, >> so I was wondering how this should be scaled to give the actual >> uncertainty on the parameter? > > > There was a long discussion by email and then github on this: > > http://mail.scipy.org/pipermail/scipy-user/2011-August/030412.html > https://github.com/scipy/scipy/pull/448 Thanks for pointing me to this discussion and pull request - I think this pull request should be finalized, and most importantly, the documentation of curve_fit improved - at the moment, the name ``sigma`` implies that the uncertainties are 1-sigma normal deviations, which to me (and a number of other Python users I know) implies that the covariance matrix takes this into account in the parameter uncertainties. I understand that the new (lack of) scaling will have to be optional for backward-compatibility reasons, but it's unfortunate given the connotations a variable like ``sigma`` has... Cheers, Tom > > The open pull request has the code to do the scaling you want. > > - Tom > >> >> >> Thanks! >> Tom >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From mresimulator at yahoo.com.ar Sun Jun 16 16:51:04 2013 From: mresimulator at yahoo.com.ar (MRE Simulator) Date: Sun, 16 Jun 2013 13:51:04 -0700 (PDT) Subject: [SciPy-User] Build a new continuous probability density function Message-ID: <1371415864.77934.YahooMailNeo@web161505.mail.bf1.yahoo.com> Hi experts! Im a newby user of Python, sage and Scipy. Using scipy.stats module, i'm trying to build a new probability density function, f(x) (not include in scipy module). I wanna call this function in similar way that other probability density function in scipy, i.e.: from scipy.stats import new_function. And do some math with it: new_function.mean(loc=....., scale= -----), etc. ?What must i do (step by step, including definition of scale and loc)? Waiting for your answers. Thanks a lot! -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Jun 16 17:04:41 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 16 Jun 2013 17:04:41 -0400 Subject: [SciPy-User] Build a new continuous probability density function In-Reply-To: <1371415864.77934.YahooMailNeo@web161505.mail.bf1.yahoo.com> References: <1371415864.77934.YahooMailNeo@web161505.mail.bf1.yahoo.com> Message-ID: On Sun, Jun 16, 2013 at 4:51 PM, MRE Simulator wrote: > Hi experts! > Im a newby user of Python, sage and Scipy. > Using scipy.stats module, i'm trying to build a new probability density > function, f(x) (not include in scipy module). I wanna call this function in > similar way that other probability density function in scipy, i.e.: > from scipy.stats import new_function. > And do some math with it: > new_function.mean(loc=....., scale= -----), etc. > ?What must i do (step by step, including definition of scale and loc)? > Waiting for your answers. a bit of explanation https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L944 you can look at any distribution as example, like https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L3581 most things are handled generically, the more ._xxx methods you can specify the better is the performance because you don't have to rely on the generic, often slow implementation. loc and scale are handled completely generically and cannot be overwritten by the ._xxx methods. The only tricky part might be if you have bound support or need to override shape parameter restrictions. If you have specific questions, I can answer those. Best if you describe more details of the distribution or show pdf and cdf and parameter restrictions. Josef > Thanks a lot! > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From mailinglists at xgm.de Mon Jun 17 16:46:04 2013 From: mailinglists at xgm.de (Florian Lindner) Date: Mon, 17 Jun 2013 22:46:04 +0200 Subject: [SciPy-User] Ignore characters while reading text In-Reply-To: References: <5789326.FQK5ToiTHf@horus> Message-ID: <1754187.Hls0pJYs3D@horus> Am Freitag, 14. Juni 2013, 07:32:09 schrieb Matt Newville: > On Fri, Jun 14, 2013 at 5:50 AM, Florian Lindner wrote: > > Hello, > > > > I have a text file with data like > > > > 1 (2 3 4) (5 6 7) (8 9 10) > > 2 (4 5 1) (3 6 8) (1 6 45) > > > > How can I read that file into an array? > > > > [ > > [1, 2, 3, 5, 6, 7, 8, 9, 10] > > [2, 4, 1, 3, ... ] > > ] > > > > I tried genfromtxt with deletechars="()" but that seems to affect only > > 'names'. I also tried delimiter="() " but that didn't work either. > > Would this do? > > import numpy as np > from cStringIO import StringIO > txt= '1 (2 3 4) (5 6 7) (8 9 10)' > np.loadtxt(StringIO(txt.replace('(', '').replace(')', ''))) Thanks! Speed is not an issue here, it's a just a (8000, 20) array. From josef.pktd at gmail.com Mon Jun 17 23:25:15 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 17 Jun 2013 23:25:15 -0400 Subject: [SciPy-User] quasi random, Halton sequence Message-ID: I didn't find any quasi-random sequences in python that is BSD compatible. The question shows up every few years. Is there anything now? a quick translation from c to python, (to be translated to cython and to c (going in a circle)) maybe there is something slightly off (e.g. gap in circle) Josef ----------------- # -*- coding: utf-8 -*- """ Created on Mon Jun 17 22:12:21 2013 Author: Sebastien Paris Josef Perktold translation from c http://www.mathworks.com/matlabcentral/fileexchange/17457-quasi-montecarlo-halton-sequence-generator """ #void halton(int dim , int nbpts, double *h , double *p ) #{ # # double lognbpts , d , sum; # # int i , j , n , t , b; # # static int P[11] = {2 ,3 ,5 , 7 , 11 , 13 , 17 , 19 , 23 , 29 , 31}; # # # lognbpts = log(nbpts + 1); # # # for(i = 0 ; i < dim ; i++) # # { # # b = P[i]; # # n = (int) ceil(lognbpts/log(b)); # # # for(t = 0 ; t < n ; t++) # # { # p[t] = pow(b , -(t + 1) ); # } # # # for (j = 0 ; j < nbpts ; j++) # # { # # d = j + 1; # # sum = fmod(d , b)*p[0]; # # # for (t = 1 ; t < n ; t++) # # { # # d = floor(d/b); # # sum += fmod(d , b)*p[t]; # # } # # # h[j*dim + i] = sum; # # } # # } # #} from math import log, floor, ceil, fmod import numpy as np def halton(dim, nbpts): h = np.empty(nbpts * dim) h.fill(np.nan) p = np.empty(nbpts) p.fill(np.nan) P = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31] lognbpts = log(nbpts + 1) for i in range(dim): b = P[i] n = int(ceil(lognbpts / log(b))) for t in range(n): p[t] = pow(b, -(t + 1) ) for j in range(nbpts): d = j + 1 sum_ = fmod(d, b) * p[0] for t in range(1, n): d = floor(d / b) sum_ += fmod(d, b) * p[t] h[j*dim + i] = sum_ return h.reshape(nbpts, dim) x = halton(2, 5000); #plot(x(1 , :) , x(2 , :) , '+') print x[:5] import matplotlib.pyplot as plt plt.figure() plt.plot(x[:500, 0], x[:500, 1], '+') plt.title('uniform-distribution (500)') plt.figure() plt.plot(x[:, 0], x[:, 1], '+') plt.title('uniform-distribution') from scipy import stats plt.figure() xn = stats.norm._ppf(x) plt.plot(xn[:, 0], xn[:, 1], '+') plt.title('normal-distribution') plt.figure() plt.plot(stats.t._ppf(x[:, 0], 3), stats.t._ppf(x[:, 1], 3), '+') plt.title('t-distribution') plt.figure() x0 = xn[:100] x0 /= np.sqrt((x0*x0 + 1e-100).sum(1))[:,None] plt.plot(x0[:, 0] , x0[:, 1], '+') plt.xlim(-1.1, 1.1) plt.ylim(-1.1, 1.1) plt.title('uniform on circle') plt.show() ----------------- From elmar at net4werling.de Thu Jun 20 15:23:08 2013 From: elmar at net4werling.de (elmar werling) Date: Thu, 20 Jun 2013 21:23:08 +0200 Subject: [SciPy-User] pandas: strange result using df.date.tolist Message-ID: Hello, I get rather strage results using pandas "tolist" with date time values. As an examples 2013-01-15 13:56:44 is converted to 1970-01-16 141:56:44. The following script: -------------------------------------------------------------- import pandas as pd from datetime import datetime file_name = 'test_file.xlsx' reader = pd.ExcelFile(file_name) sheets = reader.sheet_names df = reader.parse(sheets[0], header=0,parse_cols='A,B') date = [] for i in range(len(df)): yr = df.date[i].year mo = df.date[i].month dy = df.date[i].day hr = df.time[i].hour mi = df.time[i].minute sc = df.time[i].second _date = datetime(yr, mo, dy, hr, mi, sc) date.append(_date) df['date2'] = date print 'date from list' print date print print 'date from pd.DataFrame' print df['date2'] print print 'date from df.date.values' print df.date2.values print print 'pandas version: ', pd.__version__ -------------------------------------------------------------- gives -------------------------------------------------------------- Python 2.7.3 (default, Jan 2 2013, 13:56:14) [GCC 4.7.2] on linux2 Type "copyright", "credits" or "license()" for more information. >>> date from list [datetime.datetime(2013, 1, 15, 13, 56, 44), datetime.datetime(2013, 1, 18, 8, 17, 13), datetime.datetime(2013, 1, 18, 9, 17, 13), datetime.datetime(2013, 1, 23, 11, 12, 2), datetime.datetime(2013, 1, 28, 9, 21, 12)] date from pd.DataFrame 0 2013-01-15 13:56:44 1 2013-01-18 08:17:13 2 2013-01-18 09:17:13 3 2013-01-23 11:12:02 4 2013-01-28 09:21:12 Name: date2, dtype: datetime64[ns] date from df.date.values [1970-01-16 141:56:44 1970-01-16 208:17:13 1970-01-16 209:17:13 1970-01-16 75:12:02 1970-01-16 193:21:12] pandas version: 0.11.0 -------------------------------------------------------------- Any help is wellcome Elmar -------------- next part -------------- A non-text attachment was scrubbed... Name: test_file.xlsx Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Size: 9513 bytes Desc: not available URL: From jreback at yahoo.com Thu Jun 20 15:35:11 2013 From: jreback at yahoo.com (Jeff Reback) Date: Thu, 20 Jun 2013 12:35:11 -0700 (PDT) Subject: [SciPy-User] pandas: strange result using df.date.tolist In-Reply-To: References: Message-ID: <1371756911.23463.YahooMailNeo@web142701.mail.bf1.yahoo.com> this is a numpy < 1.7.0 issue, see here (a little down): http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#potential-porting-issues-for-pandas-0-7-3-users values are usable, just a printing issue ? In [2]: x = [datetime.datetime(2013, 1, 15, 13, 56, 44), datetime.datetime(2013, 1,? 18, 8, 17, 13), datetime.datetime(2013, 1, 18, 9, 17, 13),? datetime.datetime(2013, 1, 23, 11, 12, 2), datetime.datetime(2013, 1,? 28, 9, 21, 12)] In [4]: Series(x).values Out[4]:? array(['2013-01-15T08:56:44.000000000-0500', ? ? ? ?'2013-01-18T03:17:13.000000000-0500', ? ? ? ?'2013-01-18T04:17:13.000000000-0500', ? ? ? ?'2013-01-23T06:12:02.000000000-0500', ? ? ? ?'2013-01-28T04:21:12.000000000-0500'], dtype='datetime64[ns]') In [5]: np.__version__ Out[5]: '1.7.1' ------------------------------- In [3]: np.__version__ Out[3]: '1.6.1' In [5]: x = [datetime.datetime(2013, 1, 15, 13, 56, 44), datetime.datetime(2013, 1,? ? ?...: 18, 8, 17, 13), datetime.datetime(2013, 1, 18, 9, 17, 13),? ? ?...: datetime.datetime(2013, 1, 23, 11, 12, 2), datetime.datetime(2013, 1,? ? ?...: 28, 9, 21, 12)] In [7]: pd.Series(x).values Out[7]:? array([1970-01-16 141:56:44, 1970-01-16 208:17:13, 1970-01-16 209:17:13, ? ? ? ?1970-01-16 75:12:02, 1970-01-16 193:21:12], dtype=datetime64[ns]) ________________________________ From: elmar werling To: scipy-user at scipy.org Sent: Thursday, June 20, 2013 3:23 PM Subject: [SciPy-User] pandas: strange result using df.date.tolist Hello, I get rather strage results using pandas "tolist" with date time values. As an examples 2013-01-15 13:56:44 is converted to 1970-01-16 141:56:44. The following script: -------------------------------------------------------------- import pandas as pd from datetime import datetime file_name = 'test_file.xlsx' reader = pd.ExcelFile(file_name) sheets = reader.sheet_names df = reader.parse(sheets[0], header=0,parse_cols='A,B') date = [] for i in range(len(df)): ? ? yr = df.date[i].year ? ? mo = df.date[i].month ? ? dy = df.date[i].day ? ? hr = df.time[i].hour ? ? mi = df.time[i].minute ? ? sc = df.time[i].second ? ? _date = datetime(yr, mo, dy, hr, mi, sc) ? ? date.append(_date) df['date2'] = date print 'date from list' print date print print 'date from pd.DataFrame' print df['date2'] print print 'date from df.date.values' print df.date2.values print print 'pandas version: ', pd.__version__ -------------------------------------------------------------- gives -------------------------------------------------------------- Python 2.7.3 (default, Jan? 2 2013, 13:56:14) [GCC 4.7.2] on linux2 Type "copyright", "credits" or "license()" for more information. >>> date from list [datetime.datetime(2013, 1, 15, 13, 56, 44), datetime.datetime(2013, 1, 18, 8, 17, 13), datetime.datetime(2013, 1, 18, 9, 17, 13), datetime.datetime(2013, 1, 23, 11, 12, 2), datetime.datetime(2013, 1, 28, 9, 21, 12)] date from pd.DataFrame 0? 2013-01-15 13:56:44 1? 2013-01-18 08:17:13 2? 2013-01-18 09:17:13 3? 2013-01-23 11:12:02 4? 2013-01-28 09:21:12 Name: date2, dtype: datetime64[ns] date from df.date.values [1970-01-16 141:56:44 1970-01-16 208:17:13 1970-01-16 209:17:13 ? 1970-01-16 75:12:02 1970-01-16 193:21:12] pandas version:? 0.11.0 -------------------------------------------------------------- Any help is wellcome Elmar _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From elmar at net4werling.de Thu Jun 20 15:59:11 2013 From: elmar at net4werling.de (elmar werling) Date: Thu, 20 Jun 2013 21:59:11 +0200 Subject: [SciPy-User] pandas: strange result using df.date.tolist In-Reply-To: <1371756911.23463.YahooMailNeo@web142701.mail.bf1.yahoo.com> References: <1371756911.23463.YahooMailNeo@web142701.mail.bf1.yahoo.com> Message-ID: thank you, problem solved with numpy 1,7,1 Elmar Am 20.06.2013 21:35, schrieb Jeff Reback: > this is a numpy < 1.7.0 issue, see here (a little down): > > http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#potential-porting-issues-for-pandas-0-7-3-users > > values are usable, just a printing issue > In [2]: x = [datetime.datetime(2013, 1, 15, 13, 56, 44), > datetime.datetime(2013, 1, > 18, 8, 17, 13), datetime.datetime(2013, 1, 18, 9, 17, 13), > datetime.datetime(2013, 1, 23, 11, 12, 2), datetime.datetime(2013, 1, > 28, 9, 21, 12)] > > In [4]: Series(x).values > Out[4]: > array(['2013-01-15T08:56:44.000000000-0500', > '2013-01-18T03:17:13.000000000-0500', > '2013-01-18T04:17:13.000000000-0500', > '2013-01-23T06:12:02.000000000-0500', > '2013-01-28T04:21:12.000000000-0500'], dtype='datetime64[ns]') > > In [5]: np.__version__ > Out[5]: '1.7.1' > > ------------------------------- > > In [3]: np.__version__ > Out[3]: '1.6.1' > > In [5]: x = [datetime.datetime(2013, 1, 15, 13, 56, 44), > datetime.datetime(2013, 1, > ...: 18, 8, 17, 13), datetime.datetime(2013, 1, 18, 9, 17, 13), > ...: datetime.datetime(2013, 1, 23, 11, 12, 2), datetime.datetime(2013, 1, > ...: 28, 9, 21, 12)] > > In [7]: pd.Series(x).values > Out[7]: > array([1970-01-16 141:56:44, 1970-01-16 208:17:13, 1970-01-16 209:17:13, > 1970-01-16 75:12:02, 1970-01-16 193:21:12], dtype=datetime64[ns]) > > ------------------------------------------------------------------------ > *From:* elmar werling > *To:* scipy-user at scipy.org > *Sent:* Thursday, June 20, 2013 3:23 PM > *Subject:* [SciPy-User] pandas: strange result using df.date.tolist > > Hello, > > I get rather strage results using pandas "tolist" with date time values. > As an examples 2013-01-15 13:56:44 is converted to 1970-01-16 141:56:44. > > The following script: > -------------------------------------------------------------- > import pandas as pd > from datetime import datetime > > file_name = 'test_file.xlsx' > reader = pd.ExcelFile(file_name) > sheets = reader.sheet_names > df = reader.parse(sheets[0], header=0,parse_cols='A,B') > > date = [] > for i in range(len(df)): > yr = df.date[i].year > mo = df.date[i].month > dy = df.date[i].day > hr = df.time[i].hour > mi = df.time[i].minute > sc = df.time[i].second > _date = datetime(yr, mo, dy, hr, mi, sc) > date.append(_date) > > df['date2'] = date > > print 'date from list' > print date > print > print 'date from pd.DataFrame' > print df['date2'] > print > print 'date from df.date.values' > print df.date2.values > print > print 'pandas version: ', pd.__version__ > > -------------------------------------------------------------- > gives > > -------------------------------------------------------------- > Python 2.7.3 (default, Jan 2 2013, 13:56:14) > [GCC 4.7.2] on linux2 > Type "copyright", "credits" or "license()" for more information. > > >>> > > date from list > [datetime.datetime(2013, 1, 15, 13, 56, 44), datetime.datetime(2013, 1, > 18, 8, 17, 13), datetime.datetime(2013, 1, 18, 9, 17, 13), > datetime.datetime(2013, 1, 23, 11, 12, 2), datetime.datetime(2013, 1, > 28, 9, 21, 12)] > > date from pd.DataFrame > 0 2013-01-15 13:56:44 > 1 2013-01-18 08:17:13 > 2 2013-01-18 09:17:13 > 3 2013-01-23 11:12:02 > 4 2013-01-28 09:21:12 > Name: date2, dtype: datetime64[ns] > > date from df.date.values > [1970-01-16 141:56:44 1970-01-16 208:17:13 1970-01-16 209:17:13 > 1970-01-16 75:12:02 1970-01-16 193:21:12] > > pandas version: 0.11.0 > > -------------------------------------------------------------- > > Any help is wellcome > Elmar > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From franz_lambert_engel at yahoo.de Sat Jun 22 10:14:12 2013 From: franz_lambert_engel at yahoo.de (Franz Engel) Date: Sat, 22 Jun 2013 16:14:12 +0200 Subject: [SciPy-User] Spline Interpolation with non continuous data Message-ID: <001601ce6f52$c5aaa760$50fff620$@de> Hi, I try to interpolated a spline throw a dataset (the record of a robot motion path). Usually it works really good, always when the robot drives without stops. But if the robot stops he moves a little bit backwards. If this happens I can't use my "normal" method with the interpolate.UnivariateSpline-function( http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.Univar iateSpline.html), because the robot motion is not longer continuously. Does somebody has an idea which function could solve my problem? Or is there a good filter to reduce the redundant robot path. (the backwards path is only a little bit displaced relative to path without stops) Regards, Franz -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Tue Jun 25 06:27:35 2013 From: tmp50 at ukr.net (Dmitrey) Date: Tue, 25 Jun 2013 13:27:35 +0300 Subject: [SciPy-User] [ANN] Using some MATLAB optimization solvers from Python (OpenOpt/FuncDesigner) Message-ID: <99974.1372156055.8305133911023747072@ffe16.ukr.net> Hi all, > > FYI some MATLAB solvers now can be involved with OpenOpt or FuncDesigner : > * LP linprog * QP quadprog * LLSP lsqlin * MILP bintprog > > Sparsity handling is supported. > > You should have * MATLAB (or MATLAB Component Runtime) * mlabwrap > Unfortunately, it will hardly work out-of-the-box, you have to adjust some paths and some environment variables. > > As for nonlinear solvers, e.g. fmincon, probably they could be connected via involving C MEX files, but it is not possible with current state of mlabwrap yet. > > Read MATLAB entry for details. > > Regards, D. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joonhyoung.ro at gmail.com Tue Jun 25 17:10:12 2013 From: joonhyoung.ro at gmail.com (Joon Ro) Date: Tue, 25 Jun 2013 16:10:12 -0500 Subject: [SciPy-User] Spline Interpolation with non continuous data In-Reply-To: <001601ce6f52$c5aaa760$50fff620$@de> References: <001601ce6f52$c5aaa760$50fff620$@de> Message-ID: On Sat, Jun 22, 2013 at 9:14 AM, Franz Engel wrote: > > > I try to interpolated a spline throw a dataset (the record of a robot > motion path). Usually it works really good, always when the robot drives > without stops. But if the robot stops he moves a little bit backwards. If > this happens I can?t use my ?normal? method with the > interpolate.UnivariateSpline-function( > http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.UnivariateSpline.html), > because the robot motion is not longer continuously. > Hi, I am not familiar how your function looks like, but it sounds like you should look for shape-preserving interpolation like monotone cubic Hermite interpolation. Best, Joon -------------- next part -------------- An HTML attachment was scrubbed... URL: From xabart at gmail.com Tue Jun 25 19:18:10 2013 From: xabart at gmail.com (Xavier Barthelemy) Date: Wed, 26 Jun 2013 09:18:10 +1000 Subject: [SciPy-User] Spline Interpolation with non continuous data In-Reply-To: References: <001601ce6f52$c5aaa760$50fff620$@de> Message-ID: Yes, or use tension spline, with the W parameter. It will stop (well, not stop, but limit) the natural spline to oscillate and have the gibbs phenomena. I found that some time ago on stackoverflow. when I want matlab-like medium tension I use W=sqrt(weights) when I want high tension, I use W=weights. #In short, to match matlab's error calculation, you need to pass "w" to splrep or UnivariateSpline, where w = np.sqrt(trapz_weights(x)) # def trapz_weights(x): dx = np.diff(x) w = np.empty(x.shape) w[1:-1] = (dx[1:] + dx[:-1])/2. w[0] = dx[0] / 2. w[-1] = dx[-1] / 2. return w Xavier 2013/6/26 Joon Ro > On Sat, Jun 22, 2013 at 9:14 AM, Franz Engel > wrote: > >> >> >> I try to interpolated a spline throw a dataset (the record of a robot >> motion path). Usually it works really good, always when the robot drives >> without stops. But if the robot stops he moves a little bit backwards. If >> this happens I can?t use my ?normal? method with the >> interpolate.UnivariateSpline-function( >> http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.UnivariateSpline.html), >> because the robot motion is not longer continuously. >> > Hi, > > I am not familiar how your function looks like, but it sounds like you > should look for shape-preserving interpolation like monotone cubic Hermite > interpolation. > > Best, > Joon > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- ? Quand le gouvernement viole les droits du peuple, l'insurrection est, pour le peuple et pour chaque portion du peuple, le plus sacr? des droits et le plus indispensable des devoirs ? D?claration des droits de l'homme et du citoyen, article 35, 1793 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dkajah at gmail.com Tue Jun 25 23:19:46 2013 From: dkajah at gmail.com (Daniel Penalva) Date: Wed, 26 Jun 2013 00:19:46 -0300 Subject: [SciPy-User] Passing keyword argment to functions in odeint Message-ID: Is it possible to pass keyword arguments to functions like func(x, t, K = False), while using odeint to integrate ? i have tried usual way odeint(func, x, t, (K) ), but it did not work. I know that it will work if i pass all the parameters in the order, but i dont wanna to do that weird thing cause in my function i have plenty of keyword params to use... thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From juanlu001 at gmail.com Wed Jun 26 06:47:22 2013 From: juanlu001 at gmail.com (Juan Luis Cano) Date: Wed, 26 Jun 2013 12:47:22 +0200 Subject: [SciPy-User] Passing keyword argment to functions in odeint In-Reply-To: References: Message-ID: <51CAC6BA.9020003@gmail.com> On 06/26/2013 05:19 AM, Daniel Penalva wrote: > Is it possible to pass keyword arguments to functions like > > func(x, t, K = False), > > while using odeint to integrate ? > > i have tried usual way odeint(func, x, t, (K) ), but it did not work. Probably because you have to pass them as a tuple (K,) Juanlu From parrenin.ujf at gmail.com Wed Jun 26 12:26:10 2013 From: parrenin.ujf at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Parrenin?=) Date: Wed, 26 Jun 2013 18:26:10 +0200 Subject: [SciPy-User] leastsq Message-ID: Dear all, I am experimenting the optimize module of scipy. My optimization problem is a leastsq problem. However, the leastsq function seems to be not appropriate for two reasons: - there is no possibility to specify a covariance matrix between the leastsq terms. They are supposed to be independent, which is a too strong assumption in my case. - the analyzed covariance matrix (i.e. the inverse of the jacobian of the cost function) cannot be simply outputed. Of course I could use a more generic optimization function, like the minimize one. However this seems sub-optimal because the minimisation of a least squares problem can dealt more efficiently (the jacobian of the cost function can be approximated using the jacobian of the terms to minimize). Can anybody help me? Are there plans to improve the leastsq function? Best regards, Fr?d?ric Parrenin -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jun 26 12:47:07 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 26 Jun 2013 12:47:07 -0400 Subject: [SciPy-User] leastsq In-Reply-To: References: Message-ID: On Wed, Jun 26, 2013 at 12:26 PM, Fr?d?ric Parrenin wrote: > Dear all, > > I am experimenting the optimize module of scipy. > My optimization problem is a leastsq problem. > However, the leastsq function seems to be not appropriate for two reasons: > - there is no possibility to specify a covariance matrix between the leastsq > terms. They are supposed to be independent, which is a too strong assumption > in my case. > - the analyzed covariance matrix (i.e. the inverse of the jacobian of the > cost function) cannot be simply outputed. > > Of course I could use a more generic optimization function, like the > minimize one. > However this seems sub-optimal because the minimisation of a least squares > problem can dealt more efficiently (the jacobian of the cost function can be > approximated using the jacobian of the terms to minimize). > > Can anybody help me? > Are there plans to improve the leastsq function? leastsq is a low level function and I think we should not load it up with any options. for weighted least-squares the more highlevel interface with additional results is optimize.curve_fit. However it doesn't allow for a full covariance matrix for the errors. If you want to use leastsq with a full covariance matrix, then you could transform both sides yourself, similar to what is done in curve_fit, but with the cholesky of the inverse covariance matrix. We use that in statsmodels.GLS, but only for linear models. But, if there a large number of observations, then using the full covariance matrix is inefficient, and in many cases a more direct transformation can be used. nonlinear least squares is still largely missing in statsmodels. I don't know if any of the other packages that are based on leastsq have the option. Josef > > Best regards, > > Fr?d?ric Parrenin > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From dkajah at gmail.com Wed Jun 26 16:31:24 2013 From: dkajah at gmail.com (Daniel Penalva) Date: Wed, 26 Jun 2013 17:31:24 -0300 Subject: [SciPy-User] Passing keyword argment to functions in odeint In-Reply-To: <51CAC6BA.9020003@gmail.com> References: <51CAC6BA.9020003@gmail.com> Message-ID: Yah, you are right, i've done that but it not work. Sorry for my late example, a more apropriated one is: func(x, t, K = False, boundary = 'free', mimese = False); and than : odeint(func, x, t, ( K, mimese)); but it will not work as expected. It means that, in odeint, i cant ignore some of keyword argument as in a usual function evaluation. Should this be tagged as a bug ? On Wed, Jun 26, 2013 at 7:47 AM, Juan Luis Cano wrote: > On 06/26/2013 05:19 AM, Daniel Penalva wrote: > > Is it possible to pass keyword arguments to functions like > > > > func(x, t, K = False), > > > > while using odeint to integrate ? > > > > i have tried usual way odeint(func, x, t, (K) ), but it did not work. > > Probably because you have to pass them as a tuple > > (K,) > > Juanlu > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dkajah at gmail.com Wed Jun 26 16:31:44 2013 From: dkajah at gmail.com (Daniel Penalva) Date: Wed, 26 Jun 2013 17:31:44 -0300 Subject: [SciPy-User] Passing keyword argment to functions in odeint In-Reply-To: References: <51CAC6BA.9020003@gmail.com> Message-ID: thank you for your answers :-D On Wed, Jun 26, 2013 at 5:31 PM, Daniel Penalva wrote: > Yah, you are right, i've done that but it not work. Sorry for my late > example, a more apropriated one is: > > func(x, t, K = False, boundary = 'free', mimese = False); > > and than : > > odeint(func, x, t, ( K, mimese)); > > but it will not work as expected. It means that, in odeint, i cant ignore > some of keyword argument as in a usual function evaluation. Should this be > tagged as a bug ? > > > > > On Wed, Jun 26, 2013 at 7:47 AM, Juan Luis Cano wrote: > >> On 06/26/2013 05:19 AM, Daniel Penalva wrote: >> > Is it possible to pass keyword arguments to functions like >> > >> > func(x, t, K = False), >> > >> > while using odeint to integrate ? >> > >> > i have tried usual way odeint(func, x, t, (K) ), but it did not work. >> >> Probably because you have to pass them as a tuple >> >> (K,) >> >> Juanlu >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.haslwanter at gmail.com Tue Jun 18 11:53:58 2013 From: thomas.haslwanter at gmail.com (Thomas Haslwanter) Date: Tue, 18 Jun 2013 08:53:58 -0700 (PDT) Subject: [SciPy-User] Filtering history in ipython Message-ID: <7db3f6dd-2d0c-411c-ada2-f06bd9533a22@googlegroups.com> Since I could not find an "ipython" group, I thought I dare to ask the following question here: I am trying to find out how to filter out ipython commands from previous sessions. *What works:* If I want to see all the commands from the current session that contain the word ?plot?, I type In [xx]: %hist ?g plot If I want to see all the commands from the last session, I type In [xx]: %hist ~1/1-~1/1000 *What does NOT work:* What do I have to type to find all the commands from the last session that contain the word ?plot?? In [xx]: %hist ?g plot ~1/1-~1/1000 Should work ? but it does not! Any help would be appreciated. thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecarlson at eng.ua.edu Mon Jun 24 16:44:34 2013 From: ecarlson at eng.ua.edu (Eric Carlson) Date: Mon, 24 Jun 2013 15:44:34 -0500 Subject: [SciPy-User] Spline Interpolation with non continuous data In-Reply-To: <001601ce6f52$c5aaa760$50fff620$@de> References: <001601ce6f52$c5aaa760$50fff620$@de> Message-ID: <51C8AFB2.5050606@eng.ua.edu> Hello, pchip sometimes does wonders ("wonders"=="oscillation-free") on data with piecewise continuous derivatives. Spline fits of degree 1 should also work. OTOH, sometimes pchip makes little difference from regular cubic splines, and in those cases the filtering may be your only option import scipy.interpolate from numpy import linspace xdata = ... ydata = ... f_approx = scipy.interpolate.pchip(xdata,ydata) ####Now that you have continuous function, use it many ways, ### for example: xeval = linspace(x_low, x_high, 201) #set evaluation points yeval = f_approx(xeval) Cheers, Eric On 6/22/2013 9:14 AM, Franz Engel wrote: > Hi, > > I try to interpolated a spline throw a dataset (the record of a robot > motion path). Usually it works really good, always when the robot drives > without stops. But if the robot stops he moves a little bit backwards. > If this happens I can?t use my ?normal? method with the > interpolate.UnivariateSpline-function(http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.UnivariateSpline.html), > because the robot motion is not longer continuously. Does somebody has > an idea which function could solve my problem? Or is there a good filter > to reduce the redundant robot path. (the backwards path is only a little > bit displaced relative to path without stops) > > Regards, > > Franz > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From Phillip.M.Feldman at gmail.com Mon Jun 24 16:52:14 2013 From: Phillip.M.Feldman at gmail.com (pfeldman) Date: Mon, 24 Jun 2013 13:52:14 -0700 (PDT) Subject: [SciPy-User] ftol and xtol In-Reply-To: <1370491726005-18358.post@n7.nabble.com> References: <1370472243657-18355.post@n7.nabble.com> <1370491726005-18358.post@n7.nabble.com> Message-ID: <1372107134975-18455.post@n7.nabble.com> I believe that making the optimization interfaces more uniform would be a substantial improvement, and I'm somewhat disappointed that there has been no discussion on this topic. -- View this message in context: http://scipy-user.10969.n7.nabble.com/ftol-and-xtol-tp18355p18455.html Sent from the Scipy-User mailing list archive at Nabble.com. From ralf.gommers at gmail.com Wed Jun 26 17:05:02 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 26 Jun 2013 23:05:02 +0200 Subject: [SciPy-User] ftol and xtol In-Reply-To: <1372107134975-18455.post@n7.nabble.com> References: <1370472243657-18355.post@n7.nabble.com> <1370491726005-18358.post@n7.nabble.com> <1372107134975-18455.post@n7.nabble.com> Message-ID: On Mon, Jun 24, 2013 at 10:52 PM, pfeldman wrote: > I believe that making the optimization interfaces more uniform would be a > substantial improvement, and I'm somewhat disappointed that there has been > no discussion on this topic. > Scipy 0.11 made a major step in unifying the interfaces: http://docs.scipy.org/doc/scipy-dev/reference/release.0.11.0.html#scipy-optimize-improvements That was extensively discussed. More (backwards-compatible) improvements in this direction are of course very welcome. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Jun 26 17:24:07 2013 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 26 Jun 2013 22:24:07 +0100 Subject: [SciPy-User] Filtering history in ipython In-Reply-To: <7db3f6dd-2d0c-411c-ada2-f06bd9533a22@googlegroups.com> References: <7db3f6dd-2d0c-411c-ada2-f06bd9533a22@googlegroups.com> Message-ID: On Tue, Jun 18, 2013 at 4:53 PM, Thomas Haslwanter < thomas.haslwanter at gmail.com> wrote: > Since I could not find an "ipython" group, http://mail.scipy.org/mailman/listinfo/ipython-user -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From helmrp at yahoo.com Wed Jun 26 19:07:23 2013 From: helmrp at yahoo.com (The Helmbolds) Date: Wed, 26 Jun 2013 16:07:23 -0700 (PDT) Subject: [SciPy-User] SciPy-User Digest, Vol 118, Issue 37 In-Reply-To: References: Message-ID: <1372288043.99323.YahooMailNeo@web142801.mail.bf1.yahoo.com> Subject: Re: [SciPy-User] Passing keyword argument to functions in ??? odeint ? The facts are that in `odeint` the?use of "*" is mandatory; while in `ode` it is forbidden. ? Call it a bug or a gotcha or whatever pejorative term you prefer, it's inconsistent, confusing, bad, wrong, inexcusable, violates all kinds of good coding theory and practice, and should be fixed immediately. There's no excuse for this kind of?nonsense. ? Bob -------------- next part -------------- An HTML attachment was scrubbed... URL: From parrenin.ujf at gmail.com Thu Jun 27 09:04:16 2013 From: parrenin.ujf at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Parrenin?=) Date: Thu, 27 Jun 2013 15:04:16 +0200 Subject: [SciPy-User] leastsq In-Reply-To: References: Message-ID: Dear Josef, Thank you for your answer. OK to use the curve_fit function with a change of variables to have a diagonal covariance matrix. However, here are two questions/remarks: - Curve_fit takes as input both the parameters to fit and a variable x where the data are 'located'. This approach seems sub-optimal since in many inverse problems, the function is evaluated for all x at a time. Running the function independently N times will significantly decrease the computation time. Maybe in this case the best thing to do is to declare that x is empty, but how to do that in practice? - It is not very clear from the scipy doc http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html what the function f is supposed to return. Is it just a scalar function or can it be a ndarray or even something else? Some more complex examples in the doc would really help to better understand how it works. Best regards, Fr?d?ric Parrenin 2013/6/26 > On Wed, Jun 26, 2013 at 12:26 PM, Fr?d?ric Parrenin > wrote: > > Dear all, > > > > I am experimenting the optimize module of scipy. > > My optimization problem is a leastsq problem. > > However, the leastsq function seems to be not appropriate for two > reasons: > > - there is no possibility to specify a covariance matrix between the > leastsq > > terms. They are supposed to be independent, which is a too strong > assumption > > in my case. > > - the analyzed covariance matrix (i.e. the inverse of the jacobian of the > > cost function) cannot be simply outputed. > > > > Of course I could use a more generic optimization function, like the > > minimize one. > > However this seems sub-optimal because the minimisation of a least > squares > > problem can dealt more efficiently (the jacobian of the cost function > can be > > approximated using the jacobian of the terms to minimize). > > > > Can anybody help me? > > Are there plans to improve the leastsq function? > > leastsq is a low level function and I think we should not load it up > with any options. > > for weighted least-squares the more highlevel interface with > additional results is optimize.curve_fit. > However it doesn't allow for a full covariance matrix for the errors. > > If you want to use leastsq with a full covariance matrix, then you > could transform both sides yourself, similar to what is done in > curve_fit, but with the cholesky of the inverse covariance matrix. > We use that in statsmodels.GLS, but only for linear models. > But, if there a large number of observations, then using the full > covariance matrix is inefficient, and in many cases a more direct > transformation can be used. > > nonlinear least squares is still largely missing in statsmodels. > > I don't know if any of the other packages that are based on leastsq > have the option. > > Josef > > > > > > > Best regards, > > > > Fr?d?ric Parrenin > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jun 27 09:21:27 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 27 Jun 2013 09:21:27 -0400 Subject: [SciPy-User] leastsq In-Reply-To: References: Message-ID: On Thu, Jun 27, 2013 at 9:04 AM, Fr?d?ric Parrenin wrote: > Dear Josef, > > Thank you for your answer. > OK to use the curve_fit function with a change of variables to have a > diagonal covariance matrix. > > However, here are two questions/remarks: > - Curve_fit takes as input both the parameters to fit and a variable x where > the data are 'located'. This approach seems sub-optimal since in many > inverse problems, the function is evaluated for all x at a time. Running the > function independently N times will significantly decrease the computation > time. I don't understand this part. We are fitting a curve to N observations. We need all of them to calculate the residual sum of squares. > Maybe in this case the best thing to do is to declare that x is empty, but > how to do that in practice? You don't need to use x, you can just write f as a method in a class and attach whatever attributes you want to reuse in the f method. (I'm not completely remember how this was implemented, and no time to look it up right now.) > - It is not very clear from the scipy doc > http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html > what the function f is supposed to return. Is it just a scalar function or > can it be a ndarray or even something else? the function should return an array of predicted values, one element for each observation > > Some more complex examples in the doc would really help to better understand > how it works. There are several examples on stackoverflow, including the case when f is a method in a class Josef > > Best regards, > > Fr?d?ric Parrenin > > > > > > 2013/6/26 > >> On Wed, Jun 26, 2013 at 12:26 PM, Fr?d?ric Parrenin >> wrote: >> > Dear all, >> > >> > I am experimenting the optimize module of scipy. >> > My optimization problem is a leastsq problem. >> > However, the leastsq function seems to be not appropriate for two >> > reasons: >> > - there is no possibility to specify a covariance matrix between the >> > leastsq >> > terms. They are supposed to be independent, which is a too strong >> > assumption >> > in my case. >> > - the analyzed covariance matrix (i.e. the inverse of the jacobian of >> > the >> > cost function) cannot be simply outputed. >> > >> > Of course I could use a more generic optimization function, like the >> > minimize one. >> > However this seems sub-optimal because the minimisation of a least >> > squares >> > problem can dealt more efficiently (the jacobian of the cost function >> > can be >> > approximated using the jacobian of the terms to minimize). >> > >> > Can anybody help me? >> > Are there plans to improve the leastsq function? >> >> leastsq is a low level function and I think we should not load it up >> with any options. >> >> for weighted least-squares the more highlevel interface with >> additional results is optimize.curve_fit. >> However it doesn't allow for a full covariance matrix for the errors. >> >> If you want to use leastsq with a full covariance matrix, then you >> could transform both sides yourself, similar to what is done in >> curve_fit, but with the cholesky of the inverse covariance matrix. >> We use that in statsmodels.GLS, but only for linear models. >> But, if there a large number of observations, then using the full >> covariance matrix is inefficient, and in many cases a more direct >> transformation can be used. >> >> nonlinear least squares is still largely missing in statsmodels. >> >> I don't know if any of the other packages that are based on leastsq >> have the option. >> >> Josef >> >> >> >> > >> > Best regards, >> > >> > Fr?d?ric Parrenin >> > >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From newville at cars.uchicago.edu Thu Jun 27 09:35:27 2013 From: newville at cars.uchicago.edu (Matt Newville) Date: Thu, 27 Jun 2013 08:35:27 -0500 Subject: [SciPy-User] leastsq In-Reply-To: References: Message-ID: Hi, I'm pretty baffled by these questions. optimize.leastsq() does not take a covariance matrix as input, but can give one as output. It can take functions used to compute the Jacobian... Perhaps that would accomplish what you're trying to do? optimize.curve_fit() is a wrapper around leastsq() for the common case of "fitting data" in which one has a set of observations at a set of sampled "data points", and a set of variables used in a model for the data. Like leastsq(), it returns the covariance. If curve_fit() does what you need but seems sup-optimal, than leastsq() is probably what you want to use. Hope that helps, but maybe I'm not understanding what you're trying to do. --Matt Newville From parrenin.ujf at gmail.com Thu Jun 27 10:13:33 2013 From: parrenin.ujf at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Parrenin?=) Date: Thu, 27 Jun 2013 16:13:33 +0200 Subject: [SciPy-User] leastsq In-Reply-To: References: Message-ID: Dear Matt, Yes, leastsq is probably what I need. As Josef suggested, I can decompose the observation covariance matrix using Choleski to transform the model into one with independent observations. It is still not very clear how to obtain the analyzed (or posterior) covariance matrix around the solution. At first glance, cov_x is what we are looking for but when looking at the doc, it specifies: Uses the fjac and ipvt optional outputs to construct an estimate of the jacobian around the solution. None if a singular matrix encountered (indicates very flat curvature in some direction). This matrix must be multiplied by the residual variance to get the covariance of the parameter estimates ? see curve_fit. Is not jacobian an error in the documentation? I would have expected 'covariance'. Best regards, Fr?d?ric 2013/6/27 Matt Newville > Hi, > > I'm pretty baffled by these questions. optimize.leastsq() does not > take a covariance matrix as input, but can give one as output. It > can take functions used to compute the Jacobian... Perhaps that would > accomplish what you're trying to do? > > optimize.curve_fit() is a wrapper around leastsq() for the common case > of "fitting data" in which one has a set of observations at a set of > sampled "data points", and a set of variables used in a model for the > data. Like leastsq(), it returns the covariance. If curve_fit() > does what you need but seems sup-optimal, than leastsq() is probably > what you want to use. > > Hope that helps, but maybe I'm not understanding what you're trying to do. > > --Matt Newville > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jun 27 10:37:35 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 27 Jun 2013 10:37:35 -0400 Subject: [SciPy-User] leastsq In-Reply-To: References: Message-ID: On Thu, Jun 27, 2013 at 10:13 AM, Fr?d?ric Parrenin wrote: > Dear Matt, > > Yes, leastsq is probably what I need. > As Josef suggested, I can decompose the observation covariance matrix using > Choleski to transform the model into one with independent observations. > > It is still not very clear how to obtain the analyzed (or posterior) > covariance matrix around the solution. > At first glance, cov_x is what we are looking for but when looking at the > doc, it specifies: > > Uses the fjac and ipvt optional outputs to construct an estimate of the > jacobian around the solution. None if a singular matrix encountered > (indicates very flat curvature in some direction). This matrix must be > multiplied by the residual variance to get the covariance of the parameter > estimates ? see curve_fit. > > Is not jacobian an error in the documentation? I would have expected > 'covariance'. I always have problems with this part (I read what I want to hear not what is written) As far as I understand It uses the outerproduct of the jacobian as an estimator for the raw covariance. If the error function is a standard least squares problem, then this is also the Hessian (matrix of second derivatives of the objective function). The raw covariance corresponds to inv(X'X) in a linear regression problem (where the X could be the transformed, whitened observations) The jacobian takes the place of X in the non-linear least squares problem. Unless the equation is already prewhitened correctly with the variance of the error (a different set of long threads on the mailing list and github issue), then we need to multiply the raw covariance matrix by an estimate of the error variance. As far as I remember; Even if you use cholsigmainv as transformation (prewithening), the calculations for the covariance is exactly the same as in curve_fit because everything already uses the whitened terms. So I think you could just copy the parts from curve_fit to get the covariance for the more general correlated error case. Josef > > Best regards, > > Fr?d?ric > > > > > > 2013/6/27 Matt Newville >> >> Hi, >> >> I'm pretty baffled by these questions. optimize.leastsq() does not >> take a covariance matrix as input, but can give one as output. It >> can take functions used to compute the Jacobian... Perhaps that would >> accomplish what you're trying to do? >> >> optimize.curve_fit() is a wrapper around leastsq() for the common case >> of "fitting data" in which one has a set of observations at a set of >> sampled "data points", and a set of variables used in a model for the >> data. Like leastsq(), it returns the covariance. If curve_fit() >> does what you need but seems sup-optimal, than leastsq() is probably >> what you want to use. >> >> Hope that helps, but maybe I'm not understanding what you're trying to do. >> >> --Matt Newville >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From takowl at gmail.com Thu Jun 27 20:22:37 2013 From: takowl at gmail.com (Thomas Kluyver) Date: Fri, 28 Jun 2013 01:22:37 +0100 Subject: [SciPy-User] SciPy ecosystem and Python 3 Message-ID: At a conversation over lunch here at the SciPy conference, a few of us mentioned that we're starting to use Python 3 in earnest for our work. For new users, the choice of two major Python versions is confusing and offputting, and we're not going to completely get rid of that confusion until we can simply point new users to Python 3. Most of our introductions, like the SciPy stack install page, point to Python 2 because of the ecosystem, but more and more packages now support Python 3, and we're reaching the point where we could reasonably recommend Python 3 for new users. The aim of this post is to get an overview of where the ecosystem is with: - What packages don't yet support Python 3, or are still too unstable? - How important are each of those: how widely relevant are they, and are substitutes available? - What other conditions need to be met to recommend Python 3? E.g. Scientific Python distros, Linux distro packaging, documentation, etc. Thanks, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From joonhyoung.ro at gmail.com Fri Jun 28 00:19:00 2013 From: joonhyoung.ro at gmail.com (Joon Ro) Date: Thu, 27 Jun 2013 23:19:00 -0500 Subject: [SciPy-User] noob question: numpy copy vs standard lib copy In-Reply-To: References: Message-ID: On Mon, May 13, 2013 at 3:23 PM, psoriasis wrote: I'm new to python. As I understand it, assignment copies by reference > and to do otherwise requires a function like the standard library's > copy or deepcopy functions. However, from what I see numpy has it's > own copy function and using it on a random object (instance of a test > class I made up not an array etc) doesn't seem to return the expected > copy object. I did try importing the copy module and that worked > but then the numpy copy module was "shadowed" but I don't know if > that's a problem. > > Still, I'm sure numpy users need to copy regular objects so what's the > standard solution to this? > > Hi, If you import those modules like import copy and import numpy as np, then you would use those functions with copy.copy() and np.copy() so you would not have the issue. If you import those modules by from copy import * and from numpy import *, and you would have the problem. The first importing method is recommended one since by looking at your code it is explicit where the function comes from. But, if you use numpy functions a lot (especially if you are interactively exploring), then I would import numpy with from numpy import * and import copy module with import copy (or import copy as cp) and make copy.copy()explicit. Let me know if this is not clear. Best, Joon -------------- next part -------------- An HTML attachment was scrubbed... URL: From tristan.strange at gmail.com Fri Jun 28 06:08:32 2013 From: tristan.strange at gmail.com (Tristan Strange) Date: Fri, 28 Jun 2013 11:08:32 +0100 Subject: [SciPy-User] butter() and filtfilt() - differences between MATLAB and scipy Message-ID: Hi all, I'm porting a script form MATLAB to Python and am getting very different results form the butter functions in the languages. In MATLAB when I do the following: w=2/(256/2); [b,a]=butter(9,w,'low'); b comes out as a matrix containing : 2.8109e-15 2.5298e-14 1.0119e-13 2.3612e-13 3.5418e-13 3.5418e-13 2.3612e-13 1.0119e-13 2.5298e-14 2.8109e-15 When done in Python using scipy.signal's butter like so: w = 2.0 / (256.0 / 2.0) b, a = butter(9, w, 'low') I get the following array with only a single value for b: array([ 2.81094410e-15]) and the following warning is issued: /usr/lib/python2.7/dist-packages/scipy/signal/filter_design.py:288: BadCoefficients: Badly conditioned filter coefficients (numerator): the results may be meaningless "results may be meaningless", BadCoefficients) Both functions in MATLAB and Python output the same a. When using these values in filtfilt() I get totally different results. I've tried exporting b from MATLAB and loading it in to Python and passing that in to filtfilt() but still get totally diffreent results. Can anyone tell me how to port this MATLAB code to Python such that the results are the same or explain what the problem is? Many thanks, Tristan -------------- next part -------------- An HTML attachment was scrubbed... URL: From silva at lma.cnrs-mrs.fr Fri Jun 28 07:18:12 2013 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Fri, 28 Jun 2013 13:18:12 +0200 Subject: [SciPy-User] butter() and filtfilt() - differences between MATLAB and scipy In-Reply-To: References: Message-ID: <1372418292.5177.16.camel@laptop-101> Tristan Strange a ?crit : > In MATLAB [...] b comes out as a matrix containing : > > 2.8109e-15 2.5298e-14 1.0119e-13 2.3612e-13 3.5418e-13 3.5418e-13 > 2.3612e-13 1.0119e-13 2.5298e-14 2.8109e-15 > > When done in Python using scipy.signal's butter like so: > [...] the following warning is issued: > > /usr/lib/python2.7/dist-packages/scipy/signal/filter_design.py:288: > BadCoefficients: Badly conditioned filter coefficients (numerator): the > results may be meaningless "results may be meaningless", BadCoefficients) You should maybe worry about getting (in matlab) such values for b. All are pretty close to 0. This is what scipy implementation is warning you about ("Badly conditioned filter coefficients (numerator): the results may be meaningless"). Using this b vector may lead to output signals prone to numerical noise... From roger.fearick at uct.ac.za Fri Jun 28 07:23:42 2013 From: roger.fearick at uct.ac.za (Roger Fearick) Date: Fri, 28 Jun 2013 11:23:42 +0000 Subject: [SciPy-User] butter() and filtfilt() - differences between MATLAB and scipy In-Reply-To: References: Message-ID: You're using Python 2.7: maybe 2/(256/2) = 0. ________________________________ Hi all, I'm porting a script form MATLAB to Python and am getting very different results form the butter functions in the languages. In MATLAB when I do the following: w=2/(256/2); [b,a]=butter(9,w,'low'); b comes out as a matrix containing : 2.8109e-15 2.5298e-14 1.0119e-13 2.3612e-13 3.5418e-13 3.5418e-13 2.3612e-13 1.0119e-13 2.5298e-14 2.8109e-15 When done in Python using scipy.signal's butter like so: w = 2.0 / (256.0 / 2.0) b, a = butter(9, w, 'low') I get the following array with only a single value for b: array([ 2.81094410e-15]) and the following warning is issued: /usr/lib/python2.7/dist-packages/scipy/signal/filter_design.py:288: BadCoefficients: Badly conditioned filter coefficients (numerator): the results may be meaningless "results may be meaningless", BadCoefficients) Both functions in MATLAB and Python output the same a. When using these values in filtfilt() I get totally different results. I've tried exporting b from MATLAB and loading it in to Python and passing that in to filtfilt() but still get totally diffreent results. Can anyone tell me how to port this MATLAB code to Python such that the results are the same or explain what the problem is? Many thanks, Tristan ________________________________ UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 9111. This e-mail is intended only for the person(s) to whom it is addressed. If the e-mail has reached you in error, please notify the author. If you are not the intended recipient of the e-mail you may not use, disclose, copy, redirect or print the content. If this e-mail is not related to the business of UCT it is sent by the sender in the sender's individual capacity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tristan.strange at gmail.com Fri Jun 28 07:46:54 2013 From: tristan.strange at gmail.com (Tristan Strange) Date: Fri, 28 Jun 2013 12:46:54 +0100 Subject: [SciPy-User] butter() and filtfilt() - differences between MATLAB and scipy In-Reply-To: References: Message-ID: On 28 June 2013 12:23, Roger Fearick wrote: > You're using Python 2.7: maybe 2/(256/2) = 0. > It's not this I'm afraid. I import division from __future__ >Tristan Strange a ?crit : >> In MATLAB [...] b comes out as a matrix containing : >> >> 2.8109e-15 2.5298e-14 1.0119e-13 2.3612e-13 3.5418e-13 3.5418e-13 >> 2.3612e-13 1.0119e-13 2.5298e-14 2.8109e-15 >> >> When done in Python using scipy.signal's butter like so: >> [...] the following warning is issued: >> >> /usr/lib/python2.7/dist-packages/scipy/signal/filter_design.py:288: >> BadCoefficients: Badly conditioned filter coefficients (numerator): the >> results may be meaningless "results may be meaningless", BadCoefficients) > You should maybe worry about getting (in matlab) such values for b. All > are pretty close to 0. This is what scipy implementation is warning you > about ("Badly conditioned filter coefficients (numerator): the results > may be meaningless"). Using this b vector may lead to output signals > prone to numerical noise... Ok, thanks. Apparently the MATLAB implementation functions as expected.... Any one else have any ideas? Cheers, Tristan -------------- next part -------------- An HTML attachment was scrubbed... URL: From silva at lma.cnrs-mrs.fr Fri Jun 28 09:00:44 2013 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Fri, 28 Jun 2013 15:00:44 +0200 Subject: [SciPy-User] butter() and filtfilt() - differences between MATLAB and scipy In-Reply-To: References: Message-ID: <1372424444.5177.18.camel@laptop-101> Tristan Strange a ?crit : > > You should maybe worry about getting (in matlab) such values for b. All > > are pretty close to 0. This is what scipy implementation is warning you > > about ("Badly conditioned filter coefficients (numerator): the results > > may be meaningless"). Using this b vector may lead to output signals > > prone to numerical noise... > > Ok, thanks. Apparently the MATLAB implementation functions as expected.... Does it mean that you checked the frequency response of the filter ? (with scipy.signal.freqz and matlab's freqz) From paul.blelloch at ata-e.com Fri Jun 28 11:45:54 2013 From: paul.blelloch at ata-e.com (Paul Blelloch) Date: Fri, 28 Jun 2013 08:45:54 -0700 Subject: [SciPy-User] butter() and filtfilt() - differences between MATLAB and scipy Message-ID: I ran the same butterworth filter problem and got the following results: >>> w=2./(256./2.) >>> b,a=butter(9,w) >>> b array([ 2.52984969e-14, 1.01193988e-13, 2.36119304e-13, 3.54178956e-13, 3.54178956e-13, 2.36119304e-13, 1.01193988e-13, 2.52984969e-14, 2.81094410e-15]) >>> a array([ 1. , -8.71731939, 33.77839104, -76.36014168, 110.98476353, -107.55300968, 69.49343715, -28.86903222, 6.99664203, -0.75373077]) These numerator values are different from Matlab's, which are: b = Columns 1 through 6 2.7931e-15 2.5138e-14 1.0055e-13 2.3462e-13 3.5193e-13 3.5193e-13 Columns 7 through 10 2.3462e-13 1.0055e-13 2.5138e-14 2.7931e-15 a = Columns 1 through 6 1.0000e+00 -8.7173e+00 3.3778e+01 -7.6360e+01 1.1098e+02 -1.0755e+02 Columns 7 through 10 6.9493e+01 -2.8869e+01 6.9966e+00 -7.5373e-01 I don't know why my numerator values are so different from yours. I'm using a 64-bit MKL optimized version of scipy 0.12.0. The differences between Matlab and scipy are on very small numbers. If you use a lower order filter, where the denominator coefficients are significant there are no differences between Matlab and scipy. The Matlab results may well be more accurate based on a difference in order of operations or something, but when I compared the 'filtfilt' output from the two sets of coefficients applied to white noise I didn't see much difference. What was more interesting to me was that the filtfilt results from Matlab were quite different in the initial transient than the filtfilt results from scipy using the same coefficients. It does appear to me that there's a difference in the application of the filtfilt function. -Paul -----Original Message----- From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] On Behalf Of scipy-user-request at scipy.org Sent: Friday, June 28, 2013 4:42 AM To: scipy-user at scipy.org Subject: SciPy-User Digest, Vol 118, Issue 40 Send SciPy-User mailing list submissions to scipy-user at scipy.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.scipy.org/mailman/listinfo/scipy-user or, via email, send a message with subject or body 'help' to scipy-user-request at scipy.org You can reach the person managing the list at scipy-user-owner at scipy.org When replying, please edit your Subject line so it is more specific than "Re: Contents of SciPy-User digest..." Today's Topics: 1. SciPy ecosystem and Python 3 (Thomas Kluyver) 2. Re: noob question: numpy copy vs standard lib copy (Joon Ro) 3. butter() and filtfilt() - differences between MATLAB and scipy (Tristan Strange) 4. Re: butter() and filtfilt() - differences between MATLAB and scipy (Fabrice Silva) 5. Re: butter() and filtfilt() - differences between MATLAB and scipy (Roger Fearick) 6. Re: butter() and filtfilt() - differences between MATLAB and scipy (Tristan Strange) ---------------------------------------------------------------------- Message: 1 Date: Fri, 28 Jun 2013 01:22:37 +0100 From: Thomas Kluyver Subject: [SciPy-User] SciPy ecosystem and Python 3 To: SciPy Users List Message-ID: Content-Type: text/plain; charset="iso-8859-1" At a conversation over lunch here at the SciPy conference, a few of us mentioned that we're starting to use Python 3 in earnest for our work. For new users, the choice of two major Python versions is confusing and offputting, and we're not going to completely get rid of that confusion until we can simply point new users to Python 3. Most of our introductions, like the SciPy stack install page, point to Python 2 because of the ecosystem, but more and more packages now support Python 3, and we're reaching the point where we could reasonably recommend Python 3 for new users. The aim of this post is to get an overview of where the ecosystem is with: - What packages don't yet support Python 3, or are still too unstable? - How important are each of those: how widely relevant are they, and are substitutes available? - What other conditions need to be met to recommend Python 3? E.g. Scientific Python distros, Linux distro packaging, documentation, etc. Thanks, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20130628/cfe910ca/attachment-0001.html ------------------------------ Message: 2 Date: Thu, 27 Jun 2013 23:19:00 -0500 From: Joon Ro Subject: Re: [SciPy-User] noob question: numpy copy vs standard lib copy To: SciPy Users List Message-ID: Content-Type: text/plain; charset="utf-8" On Mon, May 13, 2013 at 3:23 PM, psoriasis wrote: I'm new to python. As I understand it, assignment copies by reference > and to do otherwise requires a function like the standard library's > copy or deepcopy functions. However, from what I see numpy has it's > own copy function and using it on a random object (instance of a test > class I made up not an array etc) doesn't seem to return the expected > copy object. I did try importing the copy module and that worked > but then the numpy copy module was "shadowed" but I don't know if > that's a problem. > > Still, I'm sure numpy users need to copy regular objects so what's the > standard solution to this? > > Hi, If you import those modules like import copy and import numpy as np, then you would use those functions with copy.copy() and np.copy() so you would not have the issue. If you import those modules by from copy import * and from numpy import *, and you would have the problem. The first importing method is recommended one since by looking at your code it is explicit where the function comes from. But, if you use numpy functions a lot (especially if you are interactively exploring), then I would import numpy with from numpy import * and import copy module with import copy (or import copy as cp) and make copy.copy()explicit. Let me know if this is not clear. Best, Joon -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20130627/58a1209d/attachment-0001.html ------------------------------ Message: 3 Date: Fri, 28 Jun 2013 11:08:32 +0100 From: Tristan Strange Subject: [SciPy-User] butter() and filtfilt() - differences between MATLAB and scipy To: scipy-user at scipy.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hi all, I'm porting a script form MATLAB to Python and am getting very different results form the butter functions in the languages. In MATLAB when I do the following: w=2/(256/2); [b,a]=butter(9,w,'low'); b comes out as a matrix containing : 2.8109e-15 2.5298e-14 1.0119e-13 2.3612e-13 3.5418e-13 3.5418e-13 2.3612e-13 1.0119e-13 2.5298e-14 2.8109e-15 When done in Python using scipy.signal's butter like so: w = 2.0 / (256.0 / 2.0) b, a = butter(9, w, 'low') I get the following array with only a single value for b: array([ 2.81094410e-15]) and the following warning is issued: /usr/lib/python2.7/dist-packages/scipy/signal/filter_design.py:288: BadCoefficients: Badly conditioned filter coefficients (numerator): the results may be meaningless "results may be meaningless", BadCoefficients) Both functions in MATLAB and Python output the same a. When using these values in filtfilt() I get totally different results. I've tried exporting b from MATLAB and loading it in to Python and passing that in to filtfilt() but still get totally diffreent results. Can anyone tell me how to port this MATLAB code to Python such that the results are the same or explain what the problem is? Many thanks, Tristan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20130628/7f2f80c7/attachment-0001.html ------------------------------ Message: 4 Date: Fri, 28 Jun 2013 13:18:12 +0200 From: Fabrice Silva Subject: Re: [SciPy-User] butter() and filtfilt() - differences between MATLAB and scipy To: scipy-user at scipy.org Message-ID: <1372418292.5177.16.camel at laptop-101> Content-Type: text/plain; charset="UTF-8" Tristan Strange a ?crit : > In MATLAB [...] b comes out as a matrix containing : > > 2.8109e-15 2.5298e-14 1.0119e-13 2.3612e-13 3.5418e-13 3.5418e-13 > 2.3612e-13 1.0119e-13 2.5298e-14 2.8109e-15 > > When done in Python using scipy.signal's butter like so: > [...] the following warning is issued: > > /usr/lib/python2.7/dist-packages/scipy/signal/filter_design.py:288: > BadCoefficients: Badly conditioned filter coefficients (numerator): > the results may be meaningless "results may be meaningless", > BadCoefficients) You should maybe worry about getting (in matlab) such values for b. All are pretty close to 0. This is what scipy implementation is warning you about ("Badly conditioned filter coefficients (numerator): the results may be meaningless"). Using this b vector may lead to output signals prone to numerical noise... ------------------------------ Message: 5 Date: Fri, 28 Jun 2013 11:23:42 +0000 From: Roger Fearick Subject: Re: [SciPy-User] butter() and filtfilt() - differences between MATLAB and scipy To: SciPy Users List Message-ID: Content-Type: text/plain; charset="windows-1252" You're using Python 2.7: maybe 2/(256/2) = 0. ________________________________ Hi all, I'm porting a script form MATLAB to Python and am getting very different results form the butter functions in the languages. In MATLAB when I do the following: w=2/(256/2); [b,a]=butter(9,w,'low'); b comes out as a matrix containing : 2.8109e-15 2.5298e-14 1.0119e-13 2.3612e-13 3.5418e-13 3.5418e-13 2.3612e-13 1.0119e-13 2.5298e-14 2.8109e-15 When done in Python using scipy.signal's butter like so: w = 2.0 / (256.0 / 2.0) b, a = butter(9, w, 'low') I get the following array with only a single value for b: array([ 2.81094410e-15]) and the following warning is issued: /usr/lib/python2.7/dist-packages/scipy/signal/filter_design.py:288: BadCoefficients: Badly conditioned filter coefficients (numerator): the results may be meaningless "results may be meaningless", BadCoefficients) Both functions in MATLAB and Python output the same a. When using these values in filtfilt() I get totally different results. I've tried exporting b from MATLAB and loading it in to Python and passing that in to filtfilt() but still get totally diffreent results. Can anyone tell me how to port this MATLAB code to Python such that the results are the same or explain what the problem is? Many thanks, Tristan ________________________________ UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 9111. This e-mail is intended only for the person(s) to whom it is addressed. If the e-mail has reached you in error, please notify the author. If you are not the intended recipient of the e-mail you may not use, disclose, copy, redirect or print the content. If this e-mail is not related to the business of UCT it is sent by the sender in the sender's individual capacity. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20130628/2830d273/attachment-0001.html ------------------------------ Message: 6 Date: Fri, 28 Jun 2013 12:46:54 +0100 From: Tristan Strange Subject: Re: [SciPy-User] butter() and filtfilt() - differences between MATLAB and scipy To: SciPy Users List Message-ID: Content-Type: text/plain; charset="iso-8859-1" On 28 June 2013 12:23, Roger Fearick wrote: > You're using Python 2.7: maybe 2/(256/2) = 0. > It's not this I'm afraid. I import division from __future__ >Tristan Strange a ?crit : >> In MATLAB [...] b comes out as a matrix containing : >> >> 2.8109e-15 2.5298e-14 1.0119e-13 2.3612e-13 3.5418e-13 3.5418e-13 >> 2.3612e-13 1.0119e-13 2.5298e-14 2.8109e-15 >> >> When done in Python using scipy.signal's butter like so: >> [...] the following warning is issued: >> >> /usr/lib/python2.7/dist-packages/scipy/signal/filter_design.py:288: >> BadCoefficients: Badly conditioned filter coefficients (numerator): >> the results may be meaningless "results may be meaningless", BadCoefficients) > You should maybe worry about getting (in matlab) such values for b. > All are pretty close to 0. This is what scipy implementation is > warning you about ("Badly conditioned filter coefficients (numerator): > the results may be meaningless"). Using this b vector may lead to > output signals prone to numerical noise... Ok, thanks. Apparently the MATLAB implementation functions as expected.... Any one else have any ideas? Cheers, Tristan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20130628/5f00483a/attachment.html ------------------------------ _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user End of SciPy-User Digest, Vol 118, Issue 40 ******************************************* From ralf.gommers at gmail.com Sat Jun 29 17:47:52 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 29 Jun 2013 23:47:52 +0200 Subject: [SciPy-User] SciPy ecosystem and Python 3 In-Reply-To: References: Message-ID: On Fri, Jun 28, 2013 at 2:22 AM, Thomas Kluyver wrote: > At a conversation over lunch here at the SciPy conference, a few of us > mentioned that we're starting to use Python 3 in earnest for our work. > > For new users, the choice of two major Python versions is confusing and > offputting, and we're not going to completely get rid of that confusion > until we can simply point new users to Python 3. Most of our introductions, > like the SciPy stack install page, point to Python 2 because of the > ecosystem, but more and more packages now support Python 3, and we're > reaching the point where we could reasonably recommend Python 3 for new > users. > > The aim of this post is to get an overview of where the ecosystem is with: > - What packages don't yet support Python 3, or are still too unstable? > scikit-learn > - How important are each of those: how widely relevant are they, and are > substitutes available? > - What other conditions need to be met to recommend Python 3? E.g. > Scientific Python distros, Linux distro packaging, documentation, etc. > Before recommending Python 3.x over 2.x I think it's important to not only have the very latest release or master branch of projects support 3.x, but at least 1 or 2 more versions. Reason: a lot of users (I suspect the majority) will not be able to freely upgrade to the latest version of projects. Packaging and documentation are of course also important. No 2to3 perhaps desirable. I think we're still one to two years away from recommending 3.x over 2.x Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From takowl at gmail.com Sat Jun 29 18:14:54 2013 From: takowl at gmail.com (Thomas Kluyver) Date: Sat, 29 Jun 2013 23:14:54 +0100 Subject: [SciPy-User] SciPy ecosystem and Python 3 In-Reply-To: References: Message-ID: > scikit-learn Yes, I saw that the sprint here in Austin was planning to work on Py3 support. Olivier, how did that go? On 29 June 2013 22:47, Ralf Gommers wrote: > Before recommending Python 3.x over 2.x I think it's important to not only > have the very latest release or master branch of projects support 3.x, but > at least 1 or 2 more versions. Reason: a lot of users (I suspect the > majority) will not be able to freely upgrade to the latest version of > projects. > I certainly wouldn't count a project as having Py3 support until there's a released version. But if they can't upgrade to the latest version, chances are that they also don't have a choice between Py2 and Py3, so our recommendation doesn't matter much to them. In that case, the recommendation is targeting the sysadmin who will be deciding what to install next year. > Packaging and documentation are of course also important. No 2to3 perhaps > desirable. > Packaging: Debian/Ubuntu already have all the core SciPy stack packages except sympy for Python 3 (and I'm working on getting sympy done). Do we know where other distros are? Docs: I suspect there's still some way to go, but a lot of that will probably be quite mechanical print -> print(). Some of this will have to be part of the 'Python 3 D-day', because we probably don't want to make all the docs assume Python 3 while we're still recommending Python 2. No 2to3: Desirable, but not essential, I think. It's more of a developer problem than a user problem. Thanks, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From joonhyoung.ro at gmail.com Sat Jun 29 19:48:21 2013 From: joonhyoung.ro at gmail.com (Joon Ro) Date: Sat, 29 Jun 2013 18:48:21 -0500 Subject: [SciPy-User] SciPy ecosystem and Python 3 In-Reply-To: References: Message-ID: On Thu, Jun 27, 2013 at 7:22 PM, Thomas Kluyver wrote: > - What other conditions need to be met to recommend Python 3? E.g. > Scientific Python distros, Linux distro packaging, documentation, etc. > > For me having a cross-platform scientific python distro with python 3 would make a big difference - because even with Python 2, individually installing different scientific packages can be challenging. Also with a distro it would be much easier to check if some of the libraries one uses are available or not - I don't want keep checking this for each library that I use, it would be easier to just check a disto's available library list. Also with a distro I can selectively start using Python 3 for the projects which does not depend on missing libraries. -Joon -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sat Jun 29 20:01:23 2013 From: travis at continuum.io (Travis Oliphant) Date: Sat, 29 Jun 2013 19:01:23 -0500 Subject: [SciPy-User] SciPy ecosystem and Python 3 In-Reply-To: References: Message-ID: Anaconda makes it easy to create an environment with Python 3 packages. Many are already available. Try conda create -n py33 python=3.3 numpy look at anaconda packages to see all the available free binaries for python 3.3 On Jun 29, 2013 6:48 PM, "Joon Ro" wrote: > On Thu, Jun 27, 2013 at 7:22 PM, Thomas Kluyver wrote: > >> - What other conditions need to be met to recommend Python 3? E.g. >> Scientific Python distros, Linux distro packaging, documentation, etc. >> >> > For me having a cross-platform scientific python distro with python 3 > would make a big difference - because even with Python 2, individually > installing different scientific packages can be challenging. > > Also with a distro it would be much easier to check if some of the > libraries one uses are available or not - I don't want keep checking this > for each library that I use, it would be easier to just check a disto's > available library list. Also with a distro I can selectively start using > Python 3 for the projects which does not depend on missing libraries. > > -Joon > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joonhyoung.ro at gmail.com Sat Jun 29 20:11:44 2013 From: joonhyoung.ro at gmail.com (Joon Ro) Date: Sat, 29 Jun 2013 19:11:44 -0500 Subject: [SciPy-User] SciPy ecosystem and Python 3 In-Reply-To: References: Message-ID: On Sat, Jun 29, 2013 at 7:01 PM, Travis Oliphant wrote: > Anaconda makes it easy to create an environment with Python 3 packages. > Many are already available. > > Try > > conda create -n py33 python=3.3 numpy > > look at anaconda packages to see all the available free binaries for > python 3.3 > Thanks! I will give it a try. -Joon > On Jun 29, 2013 6:48 PM, "Joon Ro" wrote: > >> On Thu, Jun 27, 2013 at 7:22 PM, Thomas Kluyver wrote: >> >>> - What other conditions need to be met to recommend Python 3? E.g. >>> Scientific Python distros, Linux distro packaging, documentation, etc. >>> >>> >> For me having a cross-platform scientific python distro with python 3 >> would make a big difference - because even with Python 2, individually >> installing different scientific packages can be challenging. >> >> Also with a distro it would be much easier to check if some of the >> libraries one uses are available or not - I don't want keep checking this >> for each library that I use, it would be easier to just check a disto's >> available library list. Also with a distro I can selectively start using >> Python 3 for the projects which does not depend on missing libraries. >> >> -Joon >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From takowl at gmail.com Sat Jun 29 23:48:59 2013 From: takowl at gmail.com (Thomas Kluyver) Date: Sun, 30 Jun 2013 04:48:59 +0100 Subject: [SciPy-User] SciPy ecosystem and Python 3 In-Reply-To: References: Message-ID: On 30 June 2013 00:48, Joon Ro wrote: > For me having a cross-platform scientific python distro with python 3 > would make a big difference - because even with Python 2, individually > installing different scientific packages can be challenging. Almar Klein also has his Pyzo distro based on Python 3. The list of packages can be seen here, though I guess Anaconda probably has more: http://www.pyzo.org/packages.html Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmhobson at gmail.com Sun Jun 30 10:08:27 2013 From: pmhobson at gmail.com (Paul Hobson) Date: Sun, 30 Jun 2013 07:08:27 -0700 Subject: [SciPy-User] SciPy ecosystem and Python 3 In-Reply-To: References: Message-ID: On Sat, Jun 29, 2013 at 5:01 PM, Travis Oliphant wrote: > Anaconda makes it easy to create an environment with Python 3 packages. > Many are already available. > > Try > > conda create -n py33 python=3.3 numpy > > look at anaconda packages to see all the available free binaries for > python 3.3 > At the risk of high jacking the thread, I'll ask you, Travis: I just created oa Python 3.3 environment on my Windows work machine. Worked wonderfully and it's a testament to how hard y'all are working over at Continuum. Thanks for all of that. I point all our new staff to your Python 2.7 distro. However, the 3.3 environment is not in the registry, so if there's a package (particularly pyodbc in this case) that I *need* for work, I can't install it from the Golhke collection. Any plan to be able to register different python environments in Windows? Thanks, -paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmhobson at gmail.com Sun Jun 30 10:11:41 2013 From: pmhobson at gmail.com (Paul Hobson) Date: Sun, 30 Jun 2013 07:11:41 -0700 Subject: [SciPy-User] SciPy ecosystem and Python 3 In-Reply-To: References: Message-ID: The main road block holding me back, just as a user in the environmental consulting world, is the shapely/decartes combo. I believe that I only have two project relying on them, and I could probably work around that in some way, but I have a notebook in "production" for one project and no budget to make that switch. Luckily though, that's on a separate machine from my main working environment, so I actually plan on making the switch soon (I've installed everything so far). On Thu, Jun 27, 2013 at 5:22 PM, Thomas Kluyver wrote: > At a conversation over lunch here at the SciPy conference, a few of us > mentioned that we're starting to use Python 3 in earnest for our work. > > For new users, the choice of two major Python versions is confusing and > offputting, and we're not going to completely get rid of that confusion > until we can simply point new users to Python 3. Most of our introductions, > like the SciPy stack install page, point to Python 2 because of the > ecosystem, but more and more packages now support Python 3, and we're > reaching the point where we could reasonably recommend Python 3 for new > users. > > The aim of this post is to get an overview of where the ecosystem is with: > - What packages don't yet support Python 3, or are still too unstable? > - How important are each of those: how widely relevant are they, and are > substitutes available? > - What other conditions need to be met to recommend Python 3? E.g. > Scientific Python distros, Linux distro packaging, documentation, etc. > > Thanks, > Thomas > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From takowl at gmail.com Sun Jun 30 11:29:35 2013 From: takowl at gmail.com (Thomas Kluyver) Date: Sun, 30 Jun 2013 16:29:35 +0100 Subject: [SciPy-User] SciPy ecosystem and Python 3 In-Reply-To: References: Message-ID: On 30 June 2013 15:11, Paul Hobson wrote: > The main road block holding me back, just as a user in the environmental > consulting world, is the shapely/decartes combo. I believe that I only have > two project relying on them, and I could probably work around that in some > way, but I have a notebook in "production" for one project and no budget to > make that switch. Thanks. I've started an Etherpad to track the key points of this discussion - feel free to add to it: https://etherpad.mozilla.org/JdAHGQihei Best wishes, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Jun 30 15:04:25 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 30 Jun 2013 21:04:25 +0200 Subject: [SciPy-User] SciPy ecosystem and Python 3 In-Reply-To: References: Message-ID: On Sun, Jun 30, 2013 at 12:14 AM, Thomas Kluyver wrote: > > On 29 June 2013 22:47, Ralf Gommers wrote: > >> Before recommending Python 3.x over 2.x I think it's important to not >> only have the very latest release or master branch of projects support 3.x, >> but at least 1 or 2 more versions. Reason: a lot of users (I suspect the >> majority) will not be able to freely upgrade to the latest version of >> projects. >> > > I certainly wouldn't count a project as having Py3 support until there's a > released version. But if they can't upgrade to the latest version, chances > are that they also don't have a choice between Py2 and Py3, so our > recommendation doesn't matter much to them. In that case, the > recommendation is targeting the sysadmin who will be deciding what to > install next year. > That's not quite what I meant. Even on a work pc on which I don't have admin rights I will be able to install Anaconda or another distribution. All the basic examples in Python and numpy/scipy docs will work. But I don't work in a vacuum, so I'll find out at some later stage that some code that my co-workers wrote depends on version (current minus 2) of some package that only supports 3.x in version (current). This should be the exception and not the norm before recommending 3.x imho. Also, if many of the active developers haven't yet moved to 3.x (and yes that includes me) then it's most definitely too early to recommend said move to people who aren't very familiar with Python yet. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From joseluismietta at yahoo.com.ar Wed Jun 26 19:00:29 2013 From: joseluismietta at yahoo.com.ar (=?iso-8859-1?Q?Jos=E8_Luis_Mietta?=) Date: Wed, 26 Jun 2013 16:00:29 -0700 (PDT) Subject: [SciPy-User] At.: question about refresh numpy array in a for-cycle Message-ID: <1372287629.89197.YahooMailNeo@web142306.mail.bf1.yahoo.com> Hi experts! Im writing a code with a numpy array L, the numpy matrix M and the next script: for x in L:? for l in srange(N):? z= l in L? if z is False and M[x,l] != 0:? L=np.append(L,l) here, in the end of the cycle, new elements are incorporated to the array 'L'. I want these new elements be considered as 'x' index in the cycle. When I execute the script I see that only the 'originals' elements of L are considered as 'x'. How can i fix it? Waiting for your answers. Thanks a lot! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Phillip.M.Feldman at gmail.com Thu Jun 27 20:33:00 2013 From: Phillip.M.Feldman at gmail.com (pfeldman) Date: Thu, 27 Jun 2013 17:33:00 -0700 (PDT) Subject: [SciPy-User] ftol and xtol In-Reply-To: References: <1370472243657-18355.post@n7.nabble.com> <1370491726005-18358.post@n7.nabble.com> <1372107134975-18455.post@n7.nabble.com> Message-ID: I am using the latest released version of SciPy, and I agree that the interfaces are much improved. The termination conditions for the optimizers is an area where the interfaces are still far from uniform. To allow termination conditions to be specified in a uniform fashion without breaking backwards compatibility, one would probably have to support two interfaces. Phillip On Wed, Jun 26, 2013 at 2:05 PM, Ralf Gommers-3 [via Scipy-User] < ml-node+s10969n18466h44 at n7.nabble.com> wrote: > > > > On Mon, Jun 24, 2013 at 10:52 PM, pfeldman <[hidden email] > > wrote: > >> I believe that making the optimization interfaces more uniform would be a >> substantial improvement, and I'm somewhat disappointed that there has been >> no discussion on this topic. >> > > Scipy 0.11 made a major step in unifying the interfaces: > http://docs.scipy.org/doc/scipy-dev/reference/release.0.11.0.html#scipy-optimize-improvements > > That was extensively discussed. More (backwards-compatible) improvements > in this direction are of course very welcome. > > Ralf > > > > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > http://scipy-user.10969.n7.nabble.com/ftol-and-xtol-tp18355p18466.html > To unsubscribe from ftol and xtol, click here > . > NAML > -- View this message in context: http://scipy-user.10969.n7.nabble.com/ftol-and-xtol-tp18355p18476.html Sent from the Scipy-User mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Sat Jun 29 18:41:29 2013 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Sat, 29 Jun 2013 17:41:29 -0500 Subject: [SciPy-User] SciPy ecosystem and Python 3 In-Reply-To: References: Message-ID: 2013/6/29 Thomas Kluyver : >> scikit-learn > > Yes, I saw that the sprint here in Austin was planning to work on Py3 > support. Olivier, how did that go? We made some progress but it's not complete yet. I had to work on other issues as well. The Python 3 port of scikit-learn will probably get completed during the Paris sprint in July though. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel