From fausto_barbuto at yahoo.ca Sat Oct 1 23:44:45 2016 From: fausto_barbuto at yahoo.ca (Fausto Arinos de A. Barbuto) Date: Sun, 2 Oct 2016 03:44:45 +0000 (UTC) Subject: [SciPy-User] Scipy on Cygwin References: <1272389650.3975386.1475379885399.ref@mail.yahoo.com> Message-ID: <1272389650.3975386.1475379885399@mail.yahoo.com> Is it possible to build Scipy on Cygwin? How? Thanks. F. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Mon Oct 3 05:48:06 2016 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Mon, 3 Oct 2016 11:48:06 +0200 Subject: [SciPy-User] ANN: pandas v0.19.0 released Message-ID: Hi all, I'm happy to announce pandas 0.19.0 has been released. This is a major release from 0.18.1 and includes a number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. See the Whatsnew file for more information. We recommend that all users upgrade to this version. This is the work of 5 months of development by 117 contributors. A big thank you to all contributors! Joris --- *What is it:* pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with ?relational? or ?labeled? data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. *Highlights of the 0.19.0 release include:* - New method merge_asof for asof-style time-series joining, see here - The .rolling() method is now time-series aware, see here - read_csv now supports parsing Categorical data, see here - A function union_categorical has been added for combining categoricals, see here - PeriodIndex now has its own period dtype, and changed to be more consistent with other Index classes. See here - Sparse data structures gained enhanced support of int and bool dtypes, see here - Comparison operations with Series no longer ignores the index, see here for an overview of the API changes. - Introduction of a pandas development API for utility functions, see here . - Deprecation of Panel4D and PanelND. We recommend to represent these types of n-dimensional data with the xarray package . - Removal of the previously deprecated modules pandas.io.data, pandas.io.wb, pandas.tools.rplot. See the Whatsnew file for more information. *How to get it:* Source tarballs and windows/mac/linux wheels are available on PyPI (thanks to Christoph Gohlke for the windows wheels, and to Matthew Brett for setting up the mac/linux wheels). Conda packages are already available via the conda-forge channel (conda install pandas -c conda-forge). It will be available on the main channel shortly. *Issues:* Please report any issues on our issue tracker: https://github.com/pydata/pandas/issues *Thanks to all the contributors:* - adneu - Adrien Emery - agraboso - Alex Alekseyev - Alex Vig - Allen Riddell - Amol - Amol Agrawal - Andy R. Terrel - Anthonios Partheniou - babakkeyvani - Ben Kandel - Bob Baxley - Brett Rosen - c123w - Camilo Cota - Chris - chris-b1 - Chris Grinolds - Christian Hudon - Christopher C. Aycock - Chris Warth - cmazzullo - conquistador1492 - cr3 - Daniel Siladji - Douglas McNeil - Drewrey Lupton - dsm054 - Eduardo Blancas Reyes - Elliot Marsden - Evan Wright - Felix Marczinowski - Francis T. O?Donovan - G?bor Lipt?k - Geraint Duck - gfyoung - Giacomo Ferroni - Grant Roch - Haleemur Ali - harshul1610 - Hassan Shamim - iamsimha - Iulius Curt - Ivan Nazarov - jackieleng - Jeff Reback - Jeffrey Gerard - Jenn Olsen - Jim Crist - Joe Jevnik - John Evans - John Freeman - John Liekezer - Johnny Gill - John W. O?Brien - John Zwinck - Jordan Erenrich - Joris Van den Bossche - Josh Howes - Jozef Brandys - Kamil Sindi - Ka Wo Chen - Kerby Shedden - Kernc - Kevin Sheppard - Matthieu Brucher - Maximilian Roos - Michael Scherer - Mike Graham - Mortada Mehyar - mpuels - Muhammad Haseeb Tariq - Nate George - Neil Parley - Nicolas Bonnotte - OXPHOS - Pan Deng / Zora - Paul - Pauli Virtanen - Paul Mestemaker - Pawel Kordek - Pietro Battiston - pijucha - Piotr Jucha - priyankjain - Ravi Kumar Nimmi - Robert Gieseke - Robert Kern - Roger Thomas - Roy Keyes - Russell Smith - Sahil Dua - Sanjiv Lobo - Sa?o Stanovnik - Shawn Heide - sinhrks - Sinhrks - Stephen Kappel - Steve Choi - Stewart Henderson - Sudarshan Konge - Thomas A Caswell - Tom Augspurger - Tom Bird - Uwe Hoffmann - wcwagner - WillAyd - Xiang Zhang - Yadunandan - Yaroslav Halchenko - YG-Riku - Yuichiro Kaneko - yui-knk - zhangjinjie - znmean - ????Yan Facai? -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Oct 3 22:15:24 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 3 Oct 2016 20:15:24 -0600 Subject: [SciPy-User] NumPy 1.11.2 released Message-ID: *Hi All,* I'm pleased to announce the release of Numpy 1.11.2. This release supports Python 2.6 - 2.7, and 3.2 - 3.5 and fixes bugs and regressions found in Numpy 1.11.1. Wheels for Linux, Windows, and OSX can be found on PyPI. Sources are available on both PyPI and Sourceforge . Thanks to all who were involved in this release. Contributors and merged pull requests are listed below. *Contributors to v1.11.2* - Allan Haldane - Bertrand Lefebvre - Charles Harris - Julian Taylor - Lo?c Est?ve - Marshall Bockrath-Vandegrift + - Michael Seifert + - Pauli Virtanen - Ralf Gommers - Sebastian Berg - Shota Kawabuchi + - Thomas A Caswell - Valentin Valls + - Xavier Abellan Ecija + A total of 14 people contributed to this release. People with a "+" by their names contributed a patch for the first time. *Pull requests merged for v1.11.2* - #7736 : Backport 4619, BUG: many functions silently drop keepdims kwarg - #7738 : Backport 5706, ENH: add extra kwargs and update doc of many MA... - #7778 : DOC: Update Numpy 1.11.1 release notes. - #7793 : Backport 7515, BUG: MaskedArray.count treats negative axes incorrectly - #7816 : Backport 7463, BUG: fix array too big error for wide dtypes. - #7821 : Backport 7817, BUG: Make sure npy_mul_with_overflow_ detects... - #7824 : Backport 7820, MAINT: Allocate fewer bytes for empty arrays. - #7847 : Backport 7791, MAINT,DOC: Fix some imp module uses and update... - #7849 : Backport 7848, MAINT: Fix remaining uses of deprecated Python... - #7851 : Backport 7840, Fix ATLAS version detection - #7870 : Backport 7853, BUG: Raise RuntimeError when reloading numpy is... - #7896 : Backport 7894, BUG: construct ma.array from np.array which contains... - #7904 : Backport 7903, BUG: fix float16 type not being called due to... - #7917 : BUG: Production install of numpy should not require nose. - #7919 : Backport 7908, BLD: Fixed MKL detection for recent versions of... - #7920 : Backport #7911: BUG: fix for issue#7835 (ma.median of 1d) - #7932 : Backport 7925, Monkey-patch _msvccompile.gen_lib_option like... - #7939 : Backport 7931, BUG: Check for HAVE_LDOUBLE_DOUBLE_DOUBLE_LE in... - #7953 : Backport 7937, BUG: Guard against buggy comparisons in generic... - #7954 : Backport 7952, BUG: Use keyword arguments to initialize Extension... - #7955 : Backport 7941, BUG: Make sure numpy globals keep identity after... - #7972 : Backport 7963, BUG: MSVCCompiler grows 'lib' & 'include' env... - #7990 : Backport 7977, DOC: Create 1.11.2 release notes. - #8005 : Backport 7956, BLD: remove __NUMPY_SETUP__ from builtins at end... - #8007 : Backport 8006, DOC: Update 1.11.2 release notes. - #8010 : Backport 8008, MAINT: Remove leftover imp module imports. - #8012 : Backport 8011, DOC: Update 1.11.2 release notes. - #8020 : Backport 8018, BUG: Fixes return for np.ma.count if keepdims... - #8024 : Backport 8016, BUG: Fix numpy.ma.median. - #8031 : Backport 8030, BUG: fix np.ma.median with only one non-masked... - #8032 : Backport 8028, DOC: Update 1.11.2 release notes. - #8044 : Backport 8042, BUG: core: fix bug in NpyIter buffering with discontinuous... - #8046 : Backport 8045, DOC: Update 1.11.2 release notes. Enjoy, Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Oct 4 14:44:35 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 4 Oct 2016 11:44:35 -0700 Subject: [SciPy-User] [Numpy-discussion] NumPy 1.11.2 released In-Reply-To: References: Message-ID: I'm pleased to announce the release of Numpy 1.11.2. This release supports > Python 2.6 - 2.7, and 3.2 - 3.5 and fixes bugs and regressions found in > Numpy 1.11.1. Wheels for Linux, Windows, and OSX can be found on PyPI. > Sources are available on both PyPI and Sourceforge > . > and on conda-forge: https://anaconda.org/conda-forge/numpy Hmm, not Windows (darn fortran an openblas!) -- but thanks for getting that up fast! And of course, thanks to all in the numpy community for getting this build out. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From prabhu at aero.iitb.ac.in Mon Oct 17 02:19:36 2016 From: prabhu at aero.iitb.ac.in (Prabhu Ramachandran) Date: Mon, 17 Oct 2016 11:49:36 +0530 Subject: [SciPy-User] [ANN] SciPy India 2016: call for papers Message-ID: <54902ef9-7612-4c2a-ac77-7bb114fad057@aero.iitb.ac.in> Hello, We are pleased to announce the SciPy India conference 2016. SciPy India is an annual conference on using Python for research and education. The conference is currently in its eighth year and will be held at IIT Bombay on 10th and 11th December, 2016. The registration and call for papers are open. Please visit http://scipy.in to register and submit your proposals. Please spread the word! Call for Papers ============= We look forward to your submissions on the use of Python for scientific computing and education. This includes pedagogy, exploration, modeling, and analysis from both applied and developmental perspectives. We welcome contributions from academia as well as industry. For details on the paper submission please see here: http://scipy.in/2016/cfp/ Important Dates ================ - Call for proposals end: 20th November 2016 - List of accepted proposals will be published: 1st December 2016. We look forward to seeing you at SciPy India. Regards, Prabhu Ramachandran (For the SciPy organizing team) From mansingtakale at gmail.com Mon Oct 17 04:27:06 2016 From: mansingtakale at gmail.com (mansing takale) Date: Mon, 17 Oct 2016 13:57:06 +0530 Subject: [SciPy-User] [ANN] SciPy India 2016: call for papers In-Reply-To: <54902ef9-7612-4c2a-ac77-7bb114fad057@aero.iitb.ac.in> References: <54902ef9-7612-4c2a-ac77-7bb114fad057@aero.iitb.ac.in> Message-ID: Dear Sir, Thank you for the mail. We are interested to participate SciPy-2016 along with my students. Bye with regards Sincerely Yours, Mansing V. Takale, Department of Physics, Shivaji University,Kolhapur (M.S.) India-416004 On Mon, Oct 17, 2016 at 11:49 AM, Prabhu Ramachandran < prabhu at aero.iitb.ac.in> wrote: > Hello, > > We are pleased to announce the SciPy India conference 2016. SciPy India > is an > annual conference on using Python for research and education. The > conference is > currently in its eighth year and will be held at IIT Bombay on 10th and > 11th > December, 2016. The registration and call for papers are open. Please > visit > http://scipy.in to register and submit your proposals. > > Please spread the word! > > Call for Papers > ============= > > We look forward to your submissions on the use of Python for scientific > computing and education. This includes pedagogy, exploration, modeling, and > analysis from both applied and developmental perspectives. We welcome > contributions from academia as well as industry. > > For details on the paper submission please see here: > > http://scipy.in/2016/cfp/ > > Important Dates > ================ > > - Call for proposals end: 20th November 2016 > - List of accepted proposals will be published: 1st December 2016. > > > We look forward to seeing you at SciPy India. > > Regards, > Prabhu Ramachandran > (For the SciPy organizing team) > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-user > -- *Dr. Mansing V. Takale*, Assistant Professor, Department of Physics, Shivaji University, Kolhapur-416004 India (M.S.) Contact: +91-9673041222 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjm.guzman at gmail.com Tue Oct 18 16:25:18 2016 From: sjm.guzman at gmail.com (Jose Guzman) Date: Tue, 18 Oct 2016 22:25:18 +0200 Subject: [SciPy-User] return values of scipy.signal.deconvolve Message-ID: <0840DF47-0B6C-44B7-9F1E-41EB5D25F7D8@gmail.com> Dear Scipy users, I?m trying to understand what scipy.signal.deconvolve is doing? so I may not have the concepts very clear here? basically I?m trying to compute the times where an event is occurring based on a template. I created a notebook to illustrate it: https://github.com/JoseGuzman/myIPythonNotebooks/blob/master/NEURON/Deconvolution%20example.ipynb I have a trace where the signal is repeated several times. When I deconvolve the trace with the signal, I cannot see relationship between the number of events and the quotient that scipy.signal.deconvolve is returning. I would appreciate some hints here, Thank you Jose From denis.akhiyarov at gmail.com Wed Oct 19 00:21:44 2016 From: denis.akhiyarov at gmail.com (Denis Akhiyarov) Date: Wed, 19 Oct 2016 04:21:44 +0000 Subject: [SciPy-User] Abandoned SciPy Google Group Message-ID: Why is this google group for scipy not synced with this mailing list? There is a long list of unresolved questions. https://groups.google.com/forum/m/#!forum/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Oct 19 00:25:13 2016 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 18 Oct 2016 21:25:13 -0700 Subject: [SciPy-User] Abandoned SciPy Google Group In-Reply-To: References: Message-ID: On Tue, Oct 18, 2016 at 9:21 PM, Denis Akhiyarov wrote: > > Why is this google group for scipy not synced with this mailing list? There is a long list of unresolved questions. > > https://groups.google.com/forum/m/#!forum/scipy-user That group was created by a third party. Presumably, they stopped maintaining it and keeping the synchronization up to date. https://mail.scipy.org/pipermail/scipy-user/2007-July/013121.html -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From rays at blue-cove.com Fri Oct 21 15:34:45 2016 From: rays at blue-cove.com (R Schumacher) Date: Fri, 21 Oct 2016 12:34:45 -0700 Subject: [SciPy-User] how to treat an invalid value, in signal/filter_design.py Message-ID: <201610211935.u9LJZ7FX002096@blue-cove.com> In an attempt to computationally invert the effect of an analog RC filter on a data set and reconstruct the signal prior to the analog front end, a co-worker suggested: "Mathematically, you just reverse the a and b parameters. Then the zeros become the poles, but if the new poles are not inside the unit circle, the filter is not stable." So then to "stabilize" the poles' issue seen, I test for the DIV/0 error and set it to 2./N+0.j in scipy/signal/filter_design.py ~ line 244 d = polyval(a[::-1], zm1) if d[0]==0.0+0.j: d[0] = 2./N+0.j h = polyval(b[::-1], zm1) / d - Question is, is this a mathematically valid treatment? - Is there a better way to invert a Butterworth filter, or work with the DIV/0 that occurs without modifying the signal library? I noted d[0] > 2./N+0.j makes the zero bin result spike low; 2/N gives a reasonable "extension" of the response curve. The process in general causes a near-zero offset however, which I remove with a high pass now; In an full FFT of a ~megasample one can see that the first 5 bins have run away. An example attached... Ray Schumacher Programmer/Consultant -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- import numpy as np from scipy.signal import butter, lfilter, freqz, filtfilt, iirdesign, cheby1 from scipy.ndimage.interpolation import zoom import matplotlib.pyplot as plt def butter_highpass(cutoff, fs, order=5): nyq = 0.5 * fs normal_cutoff = cutoff / nyq b, a = butter(order, Wn=normal_cutoff, btype='highpass', analog=False) return b, a def butter_inv_highpass(cutoff, fs, order=5): nyq = 0.5 * fs normal_cutoff = cutoff / nyq b, a = butter(order, Wn=normal_cutoff, btype='highpass', analog=False) ## swap the components return a, b def butter_highpass_filter(data, cutoff, fs, order=5): b, a = butter_highpass(cutoff, fs, order=order) y = lfilter(b, a, data) return y def butter_inv_highpass_filter(data, cutoff, fs, order=5): b, a = butter_inv_highpass(cutoff, fs, order=1) offset = data.mean() y = lfilter(b, a, data) ## remove new offset y -= (y.mean() - offset) ## high pass filter 3x ## make a low pass to subtract hpf = .05 nyq_rate = (fs/2.) / 80. gstop, gpass = 20, .1 b,a = iirdesign(wp = (hpf/2.)/nyq_rate, ws = hpf/nyq_rate, gstop=gstop, gpass=gpass, ftype='cheby1') for x in range(3): print 'len y', len(y) yd = decimate(decimate(decimate(y, 5), 4), 4) filtered = filtfilt(b, a, yd, method='gust') y = y - rebin(filtered, len(y)) return y def decimate(x, q, n=None, axis=-1): """ Downsample the signal by using a filter. By default, an order 8 Chebyshev type I filter is used. Parameters ---------- x : ndarray The signal to be downsampled, as an N-dimensional array. q : int The downsampling factor. n : int, optional The order of the filter (1 less than the length for 'fir'). axis : int, optional The axis along which to decimate. This is zero_phase : Prevent phase shift by filtering with ``filtfilt`` instead of ``lfilter``. Returns ------- y : ndarray The down-sampled signal. """ if not isinstance(q, int): raise TypeError("q must be an integer") if n is None: n = 8 b, a = cheby1(n, 0.05, 0.8 / q) y = filtfilt(b, a, x, axis=axis) sl = [slice(None)] * y.ndim sl[axis] = slice(None, None, q) return y[sl] def rebin(oldData, newLen): """ linear transform using scipy interp zoom Ex: rebin([1,2,3,4,5], [0,0,0,0,0,0,0,0]) scipy.ndimage.interpolation.zoom(input, zoom, output=None, order=3, mode='constant', cval=0.0, prefilter=True)[source] """ ## there are 1/ratio new bins per old ratio = newLen / float(len(oldData)) #print 'ratio', ratio, len(newData) newData = zoom(oldData, ratio, output=None, order=3, mode='nearest', prefilter=True) #print 'lens', (len(oldData)) , float(len(newData)) return newData # Filter requirements. order = 1 fs = 1024.0 # sample rate, Hz cutoff = 11.6 # desired cutoff frequency of the filter, Hz nyquist = fs/2. # Get the filter coefficients so we can check its frequency response. b, a = butter_highpass(cutoff, fs, order) bi, ai = butter_inv_highpass(cutoff, fs, order) # Plot the frequency response. plt.subplot(2, 1, 1) w, h = freqz(b, a, worN=8000) plt.plot(0.5*fs*w/np.pi, np.abs(h), 'g', label='high pass resp') wi, hi = freqz(bi, ai, worN=8000) plt.plot(0.5*fs*wi/np.pi, np.abs(hi), 'r', label='inv. high pass resp') plt.plot(cutoff, 0.5*np.sqrt(2), 'ko') plt.axvline(cutoff, color='k') plt.xlim(0, 0.05*fs) plt.ylim(0, 5) plt.title("Lowpass Filter Frequency Response") plt.xlabel('Frequency [Hz]') # add the legend in the middle of the plot leg = plt.legend(fancybox=True) # set the alpha value of the legend: it will be translucent leg.get_frame().set_alpha(0.5) plt.subplots_adjust(hspace=0.35) plt.grid() # Demonstrate the use of the filter. # First make some data to be filtered. T = 5.0 # seconds n = int(T * fs) # total number of samples t = np.linspace(0, T, n, endpoint=False) # "Noisy" data. We want to recover the 1.2 Hz signal from this. data = np.sin(1.2*2*np.pi*t)# + 1.5*np.cos(9*2*np.pi*t) + 0.5*np.sin(12.0*2*np.pi*t) # Filter the data, and plot both the original and filtered signals. y = butter_highpass_filter(data, cutoff, fs, order) yi = butter_inv_highpass_filter(y, cutoff, fs, order) plt.subplot(2, 1, 2) plt.plot(t, data, 'b-', label='1.2Hz "real" data') plt.plot(t, y, 'g-', linewidth=2, label='blue box data') plt.plot(t, yi, 'r--', linewidth=2, label='round-trip data') plt.xlabel('Time [sec]') plt.grid() #plt.legend() # add the legend in the middle of the plot leg = plt.legend(fancybox=True) # set the alpha value of the legend: it will be translucent leg.get_frame().set_alpha(0.5) plt.subplots_adjust(hspace=0.35) plt.show() -------------- next part -------------- Ray Schumacher Programmer/Consultant PO Box 182, Pine Valley, CA 91962 (858)248-7232 http://rjs.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Oct 25 08:54:59 2016 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 25 Oct 2016 08:54:59 -0400 Subject: [SciPy-User] is doc for stats.chi2 correct? Message-ID: The expression for pdf given here https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2.html#scipy.stats.chi2 doesn't seem to match that given in other references: https://en.wikipedia.org/wiki/Chi-squared_distribution or Proakis: Digital Communications, 3rd edition, pg 42 From josef.pktd at gmail.com Tue Oct 25 09:23:51 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 25 Oct 2016 09:23:51 -0400 Subject: [SciPy-User] is doc for stats.chi2 correct? In-Reply-To: References: Message-ID: On Tue, Oct 25, 2016 at 8:54 AM, Neal Becker wrote: > The expression for pdf given here > https://docs.scipy.org/doc/scipy/reference/generated/ > scipy.stats.chi2.html#scipy.stats.chi2 > doesn't seem to match that given in other references: > https://en.wikipedia.org/wiki/Chi-squared_distribution looks the same to me, collect powers of 2 Josef > > or > Proakis: Digital Communications, 3rd edition, pg 42 > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Oct 25 09:59:47 2016 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 25 Oct 2016 09:59:47 -0400 Subject: [SciPy-User] is doc for stats.chi2 correct? References: Message-ID: Of course, you are correct. Sorry for the noise. josef.pktd at gmail.com wrote: > On Tue, Oct 25, 2016 at 8:54 AM, Neal Becker wrote: > >> The expression for pdf given here >> https://docs.scipy.org/doc/scipy/reference/generated/ >> scipy.stats.chi2.html#scipy.stats.chi2 >> doesn't seem to match that given in other references: >> https://en.wikipedia.org/wiki/Chi-squared_distribution > > > looks the same to me, collect powers of 2 > > Josef > > > >> >> or >> Proakis: Digital Communications, 3rd edition, pg 42 >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> https://mail.scipy.org/mailman/listinfo/scipy-user >> From jaredvacanti at gmail.com Tue Oct 25 17:16:52 2016 From: jaredvacanti at gmail.com (Jared Vacanti) Date: Tue, 25 Oct 2016 16:16:52 -0500 Subject: [SciPy-User] Generating PDF from 'sampled' pdf Message-ID: I have an approximation of a PDF (by taking the derivative of an approximation of the CDF) but can't get scipy to 'interpolate' a distribution from this data. I conceptually understand the difficulty because I'm not looking at observations, but already an attempt at the PDF. I wrote an SO question here asking the same thing - http://scicomp.stackexchange.com/questions/25311/python-differentiating-cubic-spline-numerically-or-analytically . The link contains a SSCCE with actual data, but I would like to be able to apply this to other areas of research as well. Can I fit a probability density function to my attempted "sampled" collection of one? Jared Vacanti -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Oct 25 17:33:01 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 25 Oct 2016 17:33:01 -0400 Subject: [SciPy-User] Generating PDF from 'sampled' pdf In-Reply-To: References: Message-ID: On Tue, Oct 25, 2016 at 5:16 PM, Jared Vacanti wrote: > I have an approximation of a PDF (by taking the derivative of an > approximation of the CDF) but can't get scipy to 'interpolate' a > distribution from this data. I conceptually understand the difficulty > because I'm not looking at observations, but already an attempt at the PDF. > > I wrote an SO question here asking the same thing - http://scicomp. > stackexchange.com/questions/25311/python-differentiating- > cubic-spline-numerically-or-analytically . The link contains a SSCCE with > actual data, but I would like to be able to apply this to other areas of > research as well. > > Can I fit a probability density function to my attempted "sampled" > collection of one? > I didn't look carefully, but the first thing I would try is giving up on interpolation and use a smoothing spline instead, trying some s > 0. If you want to have derivatives without a lot of variation, then I think reducing the number of knots would help given that your underlying functions looks pretty smooth. scipy has monotonic splines also, if needed. Josef > > Jared Vacanti > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.chopin at ensae.fr Fri Oct 28 12:53:21 2016 From: nicolas.chopin at ensae.fr (Nicolas Chopin) Date: Fri, 28 Oct 2016 16:53:21 +0000 Subject: [SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0 Message-ID: Hi list, I'm working on a package that does some complicate Monte Carlo experiments. The package passes around frozen distributions quite a lot. Trying to understand why certain parts were so slow, I did a bit of profiling, and stumbled upon this: > %timeit x = scipy.stats.norm.rvs(size=1000) > 10000 loops, best of 3: 49.3 ?s per loop > %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000) > 1000 loops, best of 3: 512 ?s per loop So a x10 penalty when using a frozen dist, even if the size of the simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot replicate this problem on another machine with scipy 0.13.3 and Ubuntu 14.04 (there is a penalty, but it's much smaller). In the profiler, I can see that a lot of time is spent doing string operations (such as expand_tabs) in order to generate the doc. In the source, I see that this may depend on a certain -00 flag??? I do realise that instantiating a frozen distribution requires some argument checking and what not, but here it looks too expensive. For my package, this amounts to hours spent on ... tab extensions? Anyway, I'd like to ask (a) is this a known problem? I could not find anything on-line about this. (b) Is this going to be fixed in some future version of scipy? (c) is there a way to fix this with *this* version of scipy using this flag mentioned in the source, and then how? (c) or should I instead re-define manually my own distributions objects? (it's really convenient for what I'm trying to do to define distributions as objects with methods rvs, logpdf, and so on). Many thanks for reading this! :-) All the best -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Oct 28 13:12:41 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 28 Oct 2016 13:12:41 -0400 Subject: [SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0 In-Reply-To: References: Message-ID: On Fri, Oct 28, 2016 at 12:53 PM, Nicolas Chopin wrote: > Hi list, > I'm working on a package that does some complicate Monte Carlo > experiments. The package passes around frozen distributions quite a lot. > Trying to understand why certain parts were so slow, I did a bit of > profiling, and stumbled upon this: > > > %timeit x = scipy.stats.norm.rvs(size=1000) > > 10000 loops, best of 3: 49.3 ?s per loop > > > %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000) > > 1000 loops, best of 3: 512 ?s per loop > Can you time here just the rvs call and not the instantiation of the frozen distribution. Frozen distributions have now more overhead in the construction because a new instance of the distribution is created instead of reusing the global instance as in older scipy versions.That might still have an effect in the ?s range. (The reason was to avoid the possibility of spillover of attributes across instances.) > > So a x10 penalty when using a frozen dist, even if the size of the > simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I > cannot replicate this problem on another machine with scipy 0.13.3 and > Ubuntu 14.04 (there is a penalty, but it's much smaller). > > In the profiler, I can see that a lot of time is spent doing string > operations (such as expand_tabs) in order to generate the doc. In the > source, I see that this may depend on a certain -00 flag??? > > I do realise that instantiating a frozen distribution requires some > argument checking and what not, but here it looks too expensive. For my > package, this amounts to hours spent on ... tab extensions? > > Anyway, I'd like to ask > (a) is this a known problem? I could not find anything on-line about this. > (b) Is this going to be fixed in some future version of scipy? > (c) is there a way to fix this with *this* version of scipy using this > flag mentioned in the source, and then how? > (c) or should I instead re-define manually my own distributions objects? > (it's really convenient for what I'm trying to do to define distributions > as objects with methods rvs, logpdf, and so on). > I think we never had any discussion on timing details. Overall, the overhead of scipy.stats.distributions is not relatively small when the underlying calculation is fast, e.g. using numpy.random directly for rvs is quite a bit faster, when the function is available in numpy. Josef > > Many thanks for reading this! :-) > All the best > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.chopin at ensae.fr Fri Oct 28 13:21:45 2016 From: nicolas.chopin at ensae.fr (Nicolas Chopin) Date: Fri, 28 Oct 2016 17:21:45 +0000 Subject: [SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0 In-Reply-To: References: Message-ID: If I time just the rvs call then I get essentially the same time as with > x = scipy.stats.norm.rvs(size=1000) so yes, it's the initialisation of the frozen distribution that costs so much. And, in my case, it seems it adds up to quite a lot. So what you're saying is that indeed there was recent change that makes frozen dist creation more expensive? so that's "a feature not a bug"? In that case, I will create my own classes. A pity, but well... Thanks a lot for your prompt answer Nicolas On Fri, 28 Oct 2016 at 19:12 wrote: > On Fri, Oct 28, 2016 at 12:53 PM, Nicolas Chopin > wrote: > > Hi list, > I'm working on a package that does some complicate Monte Carlo > experiments. The package passes around frozen distributions quite a lot. > Trying to understand why certain parts were so slow, I did a bit of > profiling, and stumbled upon this: > > > %timeit x = scipy.stats.norm.rvs(size=1000) > > 10000 loops, best of 3: 49.3 ?s per loop > > > %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000) > > 1000 loops, best of 3: 512 ?s per loop > > > Can you time here just the rvs call and not the instantiation of the > frozen distribution. > > Frozen distributions have now more overhead in the construction because a > new instance of the distribution is created instead of reusing the global > instance as in older scipy versions.That might still have an effect in the > ?s range. > (The reason was to avoid the possibility of spillover of attributes across > instances.) > > > > > So a x10 penalty when using a frozen dist, even if the size of the > simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I > cannot replicate this problem on another machine with scipy 0.13.3 and > Ubuntu 14.04 (there is a penalty, but it's much smaller). > > In the profiler, I can see that a lot of time is spent doing string > operations (such as expand_tabs) in order to generate the doc. In the > source, I see that this may depend on a certain -00 flag??? > > I do realise that instantiating a frozen distribution requires some > argument checking and what not, but here it looks too expensive. For my > package, this amounts to hours spent on ... tab extensions? > > Anyway, I'd like to ask > (a) is this a known problem? I could not find anything on-line about this. > (b) Is this going to be fixed in some future version of scipy? > (c) is there a way to fix this with *this* version of scipy using this > flag mentioned in the source, and then how? > (c) or should I instead re-define manually my own distributions objects? > (it's really convenient for what I'm trying to do to define distributions > as objects with methods rvs, logpdf, and so on). > > > I think we never had any discussion on timing details. Overall, the > overhead of scipy.stats.distributions is not relatively small when the > underlying calculation is fast, e.g. using numpy.random directly for rvs is > quite a bit faster, when the function is available in numpy. > > Josef > > > > Many thanks for reading this! :-) > All the best > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From evgeny.burovskiy at gmail.com Fri Oct 28 13:28:55 2016 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Fri, 28 Oct 2016 20:28:55 +0300 Subject: [SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0 In-Reply-To: References: Message-ID: On Fri, Oct 28, 2016 at 7:53 PM, Nicolas Chopin wrote: > Hi list, > I'm working on a package that does some complicate Monte Carlo experiments. > The package passes around frozen distributions quite a lot. Trying to > understand why certain parts were so slow, I did a bit of profiling, and > stumbled upon this: > > > %timeit x = scipy.stats.norm.rvs(size=1000) >> 10000 loops, best of 3: 49.3 ?s per loop > >> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000) >> 1000 loops, best of 3: 512 ?s per loop > > So a x10 penalty when using a frozen dist, even if the size of the simulated > vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot > replicate this problem on another machine with scipy 0.13.3 and Ubuntu 14.04 > (there is a penalty, but it's much smaller). > > In the profiler, I can see that a lot of time is spent doing string > operations (such as expand_tabs) in order to generate the doc. In the > source, I see that this may depend on a certain -00 flag??? > > I do realise that instantiating a frozen distribution requires some argument > checking and what not, but here it looks too expensive. For my package, this > amounts to hours spent on ... tab extensions? > > Anyway, I'd like to ask > (a) is this a known problem? I could not find anything on-line about this. > (b) Is this going to be fixed in some future version of scipy? > (c) is there a way to fix this with *this* version of scipy using this flag > mentioned in the source, and then how? > (c) or should I instead re-define manually my own distributions objects? > (it's really convenient for what I'm trying to do to define distributions as > objects with methods rvs, logpdf, and so on). > > Many thanks for reading this! :-) > All the best Why are you including the construction time into your timings? Surely, if you use frozen distributions for some MC work, you're not recreating frozen instances in hot loops? In [4]: %timeit norm.rvs(size=100, random_state=123) The slowest run took 142.68 times longer than the fastest. This could mean that an intermediate result is being cached. 10000 loops, best of 3: 74.2 ?s per loop In [5]: %timeit dist = norm(); dist.rvs(size=100, random_state=123) The slowest run took 4.40 times longer than the fastest. This could mean that an intermediate result is being cached. 1000 loops, best of 3: 796 ?s per loop In [6]: %timeit dist = norm() The slowest run took 4.89 times longer than the fastest. This could mean that an intermediate result is being cached. 1000 loops, best of 3: 672 ?s per loop > (b) Is this going to be fixed in some future version of scipy? > (c) is there a way to fix this with *this* version of scipy using this flag > mentioned in the source, and then how? You could of course try reverting https://github.com/scipy/scipy/pull/3245 for your local copy of scipy. It went in into scipy 0.14, so this is the likely suspect. From josef.pktd at gmail.com Fri Oct 28 13:37:08 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 28 Oct 2016 13:37:08 -0400 Subject: [SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0 In-Reply-To: References: Message-ID: On Fri, Oct 28, 2016 at 1:21 PM, Nicolas Chopin wrote: > If I time just the rvs call then I get essentially the same time as with > > x = scipy.stats.norm.rvs(size=1000) > > so yes, it's the initialisation of the frozen distribution that costs so > much. And, in my case, it seems it adds up to quite a lot. > > So what you're saying is that indeed there was recent change that makes > frozen dist creation more expensive? so that's "a feature not a bug"? In > that case, I will create my own classes. A pity, but well... > Creating a new instance is a feature. It's still possible that there is some speedup possible in the implementation but AFAIR I didn't see anything that would have been obvious (a few mu-s up or down?) However, given your description that you pass the frozen instances around, you shouldn't be so much instance creation, otherwise you could also use the unfrozen global instance of the distributions. In general, I avoid scipy.stats.distributions in loops for restricted cases when I don't need the flexibility and input checking, but I don't think it's worth the effort when we would have to replicate most of what's already there. Josef > > Thanks a lot for your prompt answer > Nicolas > > On Fri, 28 Oct 2016 at 19:12 wrote: > >> On Fri, Oct 28, 2016 at 12:53 PM, Nicolas Chopin > > wrote: >> >> Hi list, >> I'm working on a package that does some complicate Monte Carlo >> experiments. The package passes around frozen distributions quite a lot. >> Trying to understand why certain parts were so slow, I did a bit of >> profiling, and stumbled upon this: >> >> > %timeit x = scipy.stats.norm.rvs(size=1000) >> > 10000 loops, best of 3: 49.3 ?s per loop >> >> > %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000) >> > 1000 loops, best of 3: 512 ?s per loop >> >> >> Can you time here just the rvs call and not the instantiation of the >> frozen distribution. >> >> Frozen distributions have now more overhead in the construction because a >> new instance of the distribution is created instead of reusing the global >> instance as in older scipy versions.That might still have an effect in the >> ?s range. >> (The reason was to avoid the possibility of spillover of attributes >> across instances.) >> >> >> >> >> So a x10 penalty when using a frozen dist, even if the size of the >> simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I >> cannot replicate this problem on another machine with scipy 0.13.3 and >> Ubuntu 14.04 (there is a penalty, but it's much smaller). >> >> In the profiler, I can see that a lot of time is spent doing string >> operations (such as expand_tabs) in order to generate the doc. In the >> source, I see that this may depend on a certain -00 flag??? >> >> I do realise that instantiating a frozen distribution requires some >> argument checking and what not, but here it looks too expensive. For my >> package, this amounts to hours spent on ... tab extensions? >> >> Anyway, I'd like to ask >> (a) is this a known problem? I could not find anything on-line about >> this. >> (b) Is this going to be fixed in some future version of scipy? >> (c) is there a way to fix this with *this* version of scipy using this >> flag mentioned in the source, and then how? >> (c) or should I instead re-define manually my own distributions objects? >> (it's really convenient for what I'm trying to do to define distributions >> as objects with methods rvs, logpdf, and so on). >> >> >> I think we never had any discussion on timing details. Overall, the >> overhead of scipy.stats.distributions is not relatively small when the >> underlying calculation is fast, e.g. using numpy.random directly for rvs is >> quite a bit faster, when the function is available in numpy. >> >> Josef >> >> >> >> Many thanks for reading this! :-) >> All the best >> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> https://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> https://mail.scipy.org/mailman/listinfo/scipy-user >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.chopin at ensae.fr Fri Oct 28 13:37:34 2016 From: nicolas.chopin at ensae.fr (Nicolas Chopin) Date: Fri, 28 Oct 2016 17:37:34 +0000 Subject: [SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0 In-Reply-To: References: Message-ID: Yes, as I have just said, I agree that it is the creation of the frozen dist that explains the difference. I do need to create a *lot* of frozen distributions, there is no way around that in what I do. Typically, one run may involve O(10^8) frozen distributions; for each of these I may either simulate a vector (of size 10^2-10^3), or compute the log-pdf of a vector of the same size, or both. On Fri, 28 Oct 2016 at 19:29 Evgeni Burovski wrote: > On Fri, Oct 28, 2016 at 7:53 PM, Nicolas Chopin > wrote: > > Hi list, > > I'm working on a package that does some complicate Monte Carlo > experiments. > > The package passes around frozen distributions quite a lot. Trying to > > understand why certain parts were so slow, I did a bit of profiling, and > > stumbled upon this: > > > > > %timeit x = scipy.stats.norm.rvs(size=1000) > >> 10000 loops, best of 3: 49.3 ?s per loop > > > >> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000) > >> 1000 loops, best of 3: 512 ?s per loop > > > > So a x10 penalty when using a frozen dist, even if the size of the > simulated > > vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot > > replicate this problem on another machine with scipy 0.13.3 and Ubuntu > 14.04 > > (there is a penalty, but it's much smaller). > > > > In the profiler, I can see that a lot of time is spent doing string > > operations (such as expand_tabs) in order to generate the doc. In the > > source, I see that this may depend on a certain -00 flag??? > > > > I do realise that instantiating a frozen distribution requires some > argument > > checking and what not, but here it looks too expensive. For my package, > this > > amounts to hours spent on ... tab extensions? > > > > Anyway, I'd like to ask > > (a) is this a known problem? I could not find anything on-line about > this. > > (b) Is this going to be fixed in some future version of scipy? > > (c) is there a way to fix this with *this* version of scipy using this > flag > > mentioned in the source, and then how? > > (c) or should I instead re-define manually my own distributions objects? > > (it's really convenient for what I'm trying to do to define > distributions as > > objects with methods rvs, logpdf, and so on). > > > > Many thanks for reading this! :-) > > All the best > > > Why are you including the construction time into your timings? Surely, > if you use frozen distributions for some MC work, you're not > recreating frozen instances in hot loops? > > > In [4]: %timeit norm.rvs(size=100, random_state=123) > The slowest run took 142.68 times longer than the fastest. This could > mean that an intermediate result is being cached. > 10000 loops, best of 3: 74.2 ?s per loop > > In [5]: %timeit dist = norm(); dist.rvs(size=100, random_state=123) > The slowest run took 4.40 times longer than the fastest. This could > mean that an intermediate result is being cached. > 1000 loops, best of 3: 796 ?s per loop > > In [6]: %timeit dist = norm() > The slowest run took 4.89 times longer than the fastest. This could > mean that an intermediate result is being cached. > 1000 loops, best of 3: 672 ?s per loop > > > (b) Is this going to be fixed in some future version of scipy? > > (c) is there a way to fix this with *this* version of scipy using this > flag > > mentioned in the source, and then how? > > You could of course try reverting > https://github.com/scipy/scipy/pull/3245 for your local copy of scipy. > It went in into scipy 0.14, so this is the likely suspect. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Oct 28 15:03:19 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 29 Oct 2016 08:03:19 +1300 Subject: [SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0 In-Reply-To: References: Message-ID: On Sat, Oct 29, 2016 at 6:37 AM, Nicolas Chopin wrote: > Yes, as I have just said, I agree that it is the creation of the frozen > dist that > explains the difference. > > I do need to create a *lot* of frozen distributions, there is no way > around that > in what I do. > Whatever you can do with frozen distributions you can also do with the regular non-frozen ones, so I doubt that that's true. > Typically, one run may involve O(10^8) frozen distributions; > for each of these I may either simulate a vector (of size 10^2-10^3), or > compute > the log-pdf of a vector of the same size, or both. > You haven't explained what's wrong with simply using the rvs() and logpdf() methods from the distribution instances provided in the stats namespace. Ralf > > On Fri, 28 Oct 2016 at 19:29 Evgeni Burovski > wrote: > >> On Fri, Oct 28, 2016 at 7:53 PM, Nicolas Chopin >> wrote: >> > Hi list, >> > I'm working on a package that does some complicate Monte Carlo >> experiments. >> > The package passes around frozen distributions quite a lot. Trying to >> > understand why certain parts were so slow, I did a bit of profiling, and >> > stumbled upon this: >> > >> > > %timeit x = scipy.stats.norm.rvs(size=1000) >> >> 10000 loops, best of 3: 49.3 ?s per loop >> > >> >> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000) >> >> 1000 loops, best of 3: 512 ?s per loop >> > >> > So a x10 penalty when using a frozen dist, even if the size of the >> simulated >> > vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot >> > replicate this problem on another machine with scipy 0.13.3 and Ubuntu >> 14.04 >> > (there is a penalty, but it's much smaller). >> > >> > In the profiler, I can see that a lot of time is spent doing string >> > operations (such as expand_tabs) in order to generate the doc. In the >> > source, I see that this may depend on a certain -00 flag??? >> > >> > I do realise that instantiating a frozen distribution requires some >> argument >> > checking and what not, but here it looks too expensive. For my package, >> this >> > amounts to hours spent on ... tab extensions? >> > >> > Anyway, I'd like to ask >> > (a) is this a known problem? I could not find anything on-line about >> this. >> > (b) Is this going to be fixed in some future version of scipy? >> > (c) is there a way to fix this with *this* version of scipy using this >> flag >> > mentioned in the source, and then how? >> > (c) or should I instead re-define manually my own distributions objects? >> > (it's really convenient for what I'm trying to do to define >> distributions as >> > objects with methods rvs, logpdf, and so on). >> > >> > Many thanks for reading this! :-) >> > All the best >> >> >> Why are you including the construction time into your timings? Surely, >> if you use frozen distributions for some MC work, you're not >> recreating frozen instances in hot loops? >> >> >> In [4]: %timeit norm.rvs(size=100, random_state=123) >> The slowest run took 142.68 times longer than the fastest. This could >> mean that an intermediate result is being cached. >> 10000 loops, best of 3: 74.2 ?s per loop >> >> In [5]: %timeit dist = norm(); dist.rvs(size=100, random_state=123) >> The slowest run took 4.40 times longer than the fastest. This could >> mean that an intermediate result is being cached. >> 1000 loops, best of 3: 796 ?s per loop >> >> In [6]: %timeit dist = norm() >> The slowest run took 4.89 times longer than the fastest. This could >> mean that an intermediate result is being cached. >> 1000 loops, best of 3: 672 ?s per loop >> >> > (b) Is this going to be fixed in some future version of scipy? >> > (c) is there a way to fix this with *this* version of scipy using this >> flag >> > mentioned in the source, and then how? >> >> You could of course try reverting >> https://github.com/scipy/scipy/pull/3245 for your local copy of scipy. >> It went in into scipy 0.14, so this is the likely suspect. >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> https://mail.scipy.org/mailman/listinfo/scipy-user >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Oct 29 09:06:43 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Oct 2016 07:06:43 -0600 Subject: [SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0 In-Reply-To: References: Message-ID: On Fri, Oct 28, 2016 at 10:53 AM, Nicolas Chopin wrote: > Hi list, > I'm working on a package that does some complicate Monte Carlo > experiments. The package passes around frozen distributions quite a lot. > Trying to understand why certain parts were so slow, I did a bit of > profiling, and stumbled upon this: > > > %timeit x = scipy.stats.norm.rvs(size=1000) > > 10000 loops, best of 3: 49.3 ?s per loop > > > %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000) > > 1000 loops, best of 3: 512 ?s per loop > > So a x10 penalty when using a frozen dist, even if the size of the > simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I > cannot replicate this problem on another machine with scipy 0.13.3 and > Ubuntu 14.04 (there is a penalty, but it's much smaller). > > In the profiler, I can see that a lot of time is spent doing string > operations (such as expand_tabs) in order to generate the doc. In the > source, I see that this may depend on a certain -00 flag??? > Did you try running with the -OO flag? Anyone know how well that works? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.chopin at ensae.fr Sat Oct 29 10:51:42 2016 From: nicolas.chopin at ensae.fr (Nicolas Chopin) Date: Sat, 29 Oct 2016 14:51:42 +0000 Subject: [SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0 In-Reply-To: References: Message-ID: hi, Charles: no, I didn't, I'm not clear how to use this flag? Ralf: since you're asking, I may as well give you more details about my stuff. Basically, I'd like to do some basic probabilistic programming: i.e.to give the user the ability to define stochastic models as Python objects; e.g. class MarkovChain(object): " abstract class " def simulate(T): path = [] for t in range(T): path.extend(self.M(path[t-1])) class RandomWalk(MarkovChain): def __init__(self,sigma=1.): self.sigma = sigma def M(self,t,xp): return stats.norm(loc=xp,scale=self.sigma) Here, I define a base class for Markov chains, with method simulate that can simulate a trajectory. Then I define a particular (parametric) sub-class, that of Gaussian random walks. One part of my package defines an algorithm that takes as an argument such a *class*, generate many possible parameters (above, sigma), and for each parameter, generate trajectories; sometimes the logpdf or the ppf functions must be computed as well. Of course, I could ask the user to provide as an input a function for generating rvs, but then I would need to ask also a function for computing the log-pdf, and so on. In fact, I have a few ideas (and prototype code) on how to extend frozen distributions so as to do more advanced probabilistic programming, such as: * product distributions: prod_dist(stats.beta(3,2), norm(loc=3) ) returns an object that corresponds to the distribution of (X,Y), where X~Beta(3,2), Y~N(3,1); for instance if you apply method rvs, you obtain a [N,2] numpy array * dict distribution: same idea, but returns a record array, (or takes a record array for logpdf, etc) But I'm not sure there's much interest in extending scipy distributions in this way? Best On Sat, 29 Oct 2016 at 15:06 Charles R Harris wrote: > On Fri, Oct 28, 2016 at 10:53 AM, Nicolas Chopin > wrote: > > Hi list, > I'm working on a package that does some complicate Monte Carlo > experiments. The package passes around frozen distributions quite a lot. > Trying to understand why certain parts were so slow, I did a bit of > profiling, and stumbled upon this: > > > %timeit x = scipy.stats.norm.rvs(size=1000) > > 10000 loops, best of 3: 49.3 ?s per loop > > > %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000) > > 1000 loops, best of 3: 512 ?s per loop > > So a x10 penalty when using a frozen dist, even if the size of the > simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I > cannot replicate this problem on another machine with scipy 0.13.3 and > Ubuntu 14.04 (there is a penalty, but it's much smaller). > > In the profiler, I can see that a lot of time is spent doing string > operations (such as expand_tabs) in order to generate the doc. In the > source, I see that this may depend on a certain -00 flag??? > > > Did you try running with the -OO flag? Anyone know how well that works? > > Chuck > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Oct 29 11:49:25 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Oct 2016 09:49:25 -0600 Subject: [SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0 In-Reply-To: References: Message-ID: On Sat, Oct 29, 2016 at 8:51 AM, Nicolas Chopin wrote: > hi, > Charles: no, I didn't, I'm not clear how to use this flag? > It is passed to cpython and produces *.pyo files without docstrings. I probably doesn't do what you want if the docstrings are dynamically generated (I don't know), but it can be checked if the flag was passed to python so it should be possible to make docstring generation depend on it, and it probably should. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Oct 30 03:48:00 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 30 Oct 2016 20:48:00 +1300 Subject: [SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0 In-Reply-To: References: Message-ID: On Sun, Oct 30, 2016 at 4:49 AM, Charles R Harris wrote: > > > On Sat, Oct 29, 2016 at 8:51 AM, Nicolas Chopin > wrote: > >> hi, >> Charles: no, I didn't, I'm not clear how to use this flag? >> > > It is passed to cpython and produces *.pyo files without docstrings. I > probably doesn't do what you want if the docstrings are dynamically > generated (I don't know), > That is handled by doing docstring manipulation inside ``if __doc__ is None:`` Ralf > but it can be checked if the flag was passed to python so it should be > possible to make docstring generation depend on it, and it probably should. > -------------- next part -------------- An HTML attachment was scrubbed... URL: