[SciPy-Dev] Stochastic calculus package

Fri Jun 15 13:51:34 EDT 2018

Hi everybody,

following up on this conversation of some time ago, a pre-release of the
package, renamed SdePy, is now available on pip (
https://pypi.org/project/sdepy  ) and as a project on github (
https://github.com/sdepy/sdepy ), version 1.0.0rc3. A short quickstart
guide (https://sdepy.readthedocs.io/en/latest/intro.html#id2 ) is the
easiest path to see what it does and what it does not. Any feedback,
comments and suggestions are very welcome!

Since last September, the package has evolved quite a bit and had its fair
share of refactoring, going through several 0.x unpublished versions before
stabilizing in its current form (marked as beta). Here are a few takeaways
I got from this thread, which I acted upon:

-  Integrating user defined SDEs turned out to be a main point of interest:
this ‘odeint’ like functionality has been considerably streamlined and
  simplified - formerly, preset processes were mainly in focus, and getting
a new process up and running took a deep dive in a not-so-welcoming API,
now upgraded as well (the former ‘integrator’ has been superseded by a
‘paths_generator’ parent class, and cooperating ‘integrator’ and ‘SDE’
classes).

-  The ‘kfunc’ beast was admittedly a hard sell: all undue dependencies of
the core modules on its machinery have been eliminated, it plays now the
role of an optional decorator for the shortcuts of frequently used stuff
(it’s fully functional though and, in my experience, quite handy).

-  The question “Do you REALLY need a ‘process’ class” clearly stood out in
your comments: after some experimentation, the main reason for answering in
the yes turned out, in my view, to be its seamless interoperability with
the rest of the package, hardly achievable without a dedicated design. Now
process instances can indeed be fed to integrators, both as time-dependent
and/or path-dependent SDE parameters, and as stochasticity sources: this
opened countless hacking opportunities when doing computations, and
spoiling them seemed a pity.

In case any of you is willing to take a look, feel free to get in touch for
any problems/bugs/questions you might have

Cheers
Maurizio

On Mon, 11 Sep 2017 at 19:39, <josef.pktd at gmail.com> wrote:

>
>
> On Mon, Sep 11, 2017 at 12:35 PM, Maurizio Cipollina <
> maurizio.cipollina at gmail.com> wrote:
>
>> Hi Ralf,
>>
>>
>> thanks for going through this, here are some follow up comments on your
>> points:
>>
>>
>>
>> On the two packages you found:
>>
>>
>>    - Sdeint: provides several SDE integration algorithms but no specific
>>    processes; does not cover processes with jumps; offers no explicit
>>    provisions for handling solutions in a number of paths, which is what you
>>    need since the integration output is random variables, not numbers.
>>    - SDE-higham is not a package or tool, but rather a collection of
>>    recipes for a few textbook examples (no jumps either), in the form that an
>>    interactive session might take (set globals, do calculations, plot and
>>    print results).
>>
>>
>>
>> About your questions/concerns:
>>
>>
>>    1. ndarray subclassing: the integrator does not depend on the process
>>    class, and process realizations may be easily tweaked to return an ndarray
>>    instead of a process instance; however, a main point of the package is not
>>    just to produce SDE solutions, but to allow the user to handle them
>>    (almost) effortlessly once obtained: so one might possibly hide the process
>>    class inside the testing suite, which depends on it, but hardly get rid of
>>    it altogether. Indeed, I subclassed ndarray as safely as I could figure
>>    out, but I will surely go through the scipy-dev archives and find out what
>>    advised you against doing so.
>>    2. odeint and scipy.integrate: my view is that odeint churns out
>>    numbers dependant on numbers, while SDE integration churns out random
>>    variables dependant on random variables (the stochasticity sources), so
>>    there is a shift of paradigm and the two schemes do not fit smoothly one
>>    within the other. The step by step integration of SDEs makes sense, and
>>    gets useful, if you can easily control the inputs (eg. correlations among
>>    stochasticity sources fed into different SDEs), generate the output in a
>>    decent number of paths, and easily extract statistics of expressions
>>    dependent on it (hence the flow ‘get a number of paths packaged as a
>>    process instance’ -> ‘do numpy algebra with it’ -> ‘estimate probability
>>    distributions etc. using the process class methods’ -> ‘scale it up with
>>    the montecarlo class in case the needed paths do not fit into memory’).
>>    3. Montecarlo simulations is a big topic indeed, that goes far beyond
>>    this package; what is called the montecarlo class is in fact a tool to
>>    cumulate statistical information extracted from a number of realizations of
>>    a random variable, and is focused on making easy to interpret the output of
>>    SDE integration.
>>    4. Black-Scholes assists one more tests in a rather extensive testing
>>    suite, and might be painlessly dropped.
>>
>>
>>
>> This said, I see your general feeling is that this whole subject is too
>> specialized to fit into scipy at this stage. I suggested otherwise, in my
>> proposal, because stochastic calculus is regarded as one of the pillars of
>> modern probability theory, and covering it into scipy might be relevant to
>> lots of mathematicians and physicists, both practitioners and students,
>> that might otherwise hesitate to adopt, and build dependencies upon, a
>> standalone package. In case you reconsider let me know, I’d be happy to
>> help (now or in the future…).
>>
>>
>>
>> If you have further questions, let me know as well.
>>
>> Cheers
>>
>> Maurizio
>>
>
>
> To partially follow up, similar to Ralph's view.
>
> I think it would be good to have this as a separate package, or at least
> the code on github.
> With distributions like conda it is now much easier to install additional
> packages, once they are established enough and are included in
> distributions or easily pip installable.
>
> The main reason that I think scipy is currently not the appropriate
> package for it is that the design and review would require a lot of work. I
> guess that there are not many maintainers and reviewers that are familiar
> with this area. I also know only of applications in finance and it is not
> clear whether or which parts would be of more general interest.
>
> In my opinion, some core tools for SDE and jump diffusions would fit in
> scipy. But the application support will not or might not fit well in a
> general purpose scientific numerical library. This would be similar to
> other areas where scipy provides the core computational tools, but
> applications are supported by other packages.
>
> For example, the process class might be similar in magnitude to scipy
> stats distributions or signal statespace classes, but I have no idea what
> design for it would fit in scipy.
> Some of the time aggregation properties sound similar to the corresponding
> pandas functions, and it is not clear whether users would want to use a
> separate class or just use pandas instead.
>
> Similarly, the kfunc class sounds like something that doesn't have a
> similar pattern in scipy.
>
> Design changes can be made more easily in a standalone package, while all
> refactorings of classes in scipy are difficult issues because of the
> backwards compatibility policy which requires that the design should be
> largely settled before a merge.
>
> Josef
>
> https://groups.google.com/d/msg/pystatsmodels/xsttiEiyqAg/maR6n4EeAgAJ
> (stackoverflow question has been deleted "Is there a library out there to
> calibrate an Ornstein-Uhlenbeck model?")
>
>
> https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/tsa/diffusion.py
> and diffusion2.py
>
>
>
>
>>
>>
>>
>> On 9 September 2017 at 03:31, Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>> Hi Maurizio,
>>>
>>>
>>> On Fri, Sep 8, 2017 at 7:55 PM, Maurizio Cipollina <
>>> maurizio.cipollina at gmail.com> wrote:
>>>
>>>> Hi everybody, and first of all thanks for the great job you have been
>>>> doing, on which I relied *A LOT* over time.
>>>>
>>>> Stochastic calculus and Stochastic Differential Equations are a
>>>> significant branch of mathematics that to my understanding is
>>>> underrepresented in scipy, as well as in the open source python stack at
>>>> large. I may well be missing something, but as I can see one can find out
>>>> there many recipies or collections of recipies, and a number of
>>>> packages/projects skewed towards finance, but no structured framework for
>>>> generating and handling stochastic processes and numerically solving SDEs.
>>>>
>>>
>>> I've never used these, but a quick search turns up two packages:
>>>   https://pypi.python.org/pypi/sdeint
>>>   https://github.com/alu042/SDE-higham
>>> Not very mature probably, but would be good to compare your code with
>>> those.
>>>
>>>
>>>> This is a pity, since scipy/numpy provide efficen implementations of
>>>> all the basic ingredients needed to do the trick. Out of professional need,
>>>> and prompted by this gap, I have developed a general purpose package of
>>>> which I own the copyright, and which I would be happy to release under the
>>>> BSD license as part of scipy, in case there is a consensus among the scipy
>>>> community that it makes sense to do so (the package has no dependencies
>>>> beyond standard library, numpy and scipy, and might probably fit best as a
>>>> subpackage of scipy.stats).
>>>>
>>>> Before setting up a PR, I would be happy to hear your thoughts: I have
>>>> pasted below the package docstring, that should give an overview of its
>>>> scope and limitations. Please note that some of the realized processes
>>>> stray into mathematical finance territory, somehow inevitably since finance
>>>> is one major field of application of stochastic calculus, but the focus is
>>>> the latter, not the former, at least in my intentions.
>>>>
>>>
>>> First thought: this looks like a useful addition to the scientific
>>> Python ecosystem, but is probably too specialized for SciPy (at least at
>>> this stage).
>>>
>>> Based on your docstring below, here are some questions/concerns:
>>> 1. We definitely don't want a new ndarray subclass (there's a pretty
>>> strong consensus at this point that subclassing ndarray is usually not a
>>> good solution), so your `process` doesn't sound attractive.
>>> 2. The `integrator` class sounds a lot like odeint, so maybe this would
>>> fit better with scipy.integrate?
>>> 3. Monte-Carlo simulation is a big and fairly technical topic. There are
>>> whole packages (e.g. PyMC3, emcee) dedicated to this. It may not fit well
>>> with SciPy.
>>> 4. Things specific to finance like Black-Scholes and put/call options
>>> are not a good fit.
>>> 5. Maintainers for this added code. The ODE integrators in
>>> scipy.integrate already suffer from lack of a dedicated maintainer, SDEs
>>> are more specialized so are likely to not be very well-maintained by other
>>> devs than you.
>>>
>>> Without seeing your code it's hard to say for sure, but it looks to me
>>> like it would be better to release your code as a standalone package.
>>>
>>> Cheers,
>>> Ralf
>>>
>>>
>>>
>>>> Thanks in advance for taking your time to go through this, and for your
>>>> comments and suggestions.
>>>> Maurizio
>>>>
>>>> “””
>>>> ===========================================================
>>>> Stochastic - Monte Carlo simulation of stochastic processes
>>>> ===========================================================
>>>>
>>>> This package provides:
>>>>
>>>> 1.  A `process` class, a subclass of `numpy.ndarray` representing
>>>>     a sequence of values in time, realized in one or several paths.
>>>>     Algebraic manipulations and ufunc computations are supported for
>>>>     instances that share the same timeline and comply with numpy
>>>>     broadcasting rules. Interpolation along the timeline is supported
>>>>     via callability of `process` instances. Process-specific
>>>> functionalities,
>>>>     such as averaging and indexing along time or across paths, are
>>>> delegated
>>>>     to process-specific methods, attributes and properties (no
>>>> overriding
>>>>     of `numpy.ndarray` operations).
>>>>
>>>> 2.  The `source` class and its subclasses, stochasticity sources
>>>> providing
>>>>     numerical realizations of the differentials commonly found
>>>>     in stochastic differential equations (SDEs), with or without
>>>>     memory of formerly invoked realizations.
>>>>
>>>> 3.  The `integrator` class, a general framework for numerical SDE
>>>>     integration, intended for subclassing, to assist the definition of
>>>> step by
>>>>     step integration algorithms and the computation of numerical
>>>> realizations of
>>>>     stochastic processes.
>>>>
>>>> 4.  Several objects providing realizations of specific stochastic
>>>> processes
>>>>     along a given timeline and across a requested number of paths.
>>>>     Realizations are obtained via explicit formulae, in case the
>>>> relevant SDE
>>>>     has a closed form solution, or otherwise as `integrator` subclasses
>>>> performing
>>>>     Euler-Maruyama numerical integration.
>>>>
>>>> 5.  A `montecarlo` class, as an aid to cumulate the results of several
>>>>     Monte Carlo simulations of a given stochastic variable, and to
>>>> extract
>>>>     summary estimates for its probability distribution function and
>>>>     statistics, thus supporting simulations across a number of paths
>>>> that exceeds
>>>>     the maximum allowed by available memory.
>>>>
>>>> 6.  Several analytical results relating to the supported stochastic
>>>>     processes, as a general reference and for testing purpouses.
>>>>
>>>> For all sources and realizations, process values can take any shape,
>>>> scalar or multidimensional. Correlated multivariate stochasticity
>>>> sources are
>>>> supported. Time-varying process parameters (correlations, intensity of
>>>> Poisson
>>>> processes, volatilities etc.) are allowed whenever applicable.
>>>> `process` instances act as valid stochasticity source realizations (as
>>>> does
>>>> any callable object complying with a `source` protocol), and may be
>>>> passed as a source specification when computing the realization of a
>>>> given
>>>> process.
>>>>
>>>> Computations are fully vectorized across paths, providing an efficient
>>>> infrastructure for simulating a large number of process realizations.
>>>> Less so, for large number of time steps: integrating 1000 time steps
>>>> across 100000 paths takes seconds, one million time steps across
>>>> 100 paths takes minutes.
>>>>
>>>> General infrastructure
>>>> ======================
>>>> .. autosummary::
>>>>    :toctree: generated/
>>>>
>>>>    process     -- Array representation of a process (a subclass of
>>>>                   -- numpy.ndarray).
>>>>    kfunc       -- Base class for functions with managed keyword
>>>> arguments.
>>>>    source      -- Base class for stochasticity sources.
>>>>    true_source -- Base class for stochasticity sources with memory.
>>>>    integrator  -- General framework for numerical SDE integration.
>>>>    montecarlo  -- Summmary statistics of Monte Carlo simulations.
>>>>
>>>>
>>>> Stochasticity sources
>>>> =====================
>>>> .. autosummary::
>>>>    :toctree: generated/
>>>>
>>>>    wiener              -- dW, a source of standard Wiener process
>>>> (Brownian
>>>>                           -- motion) increments.
>>>>    symmetric_wiener    -- as above, with symmetric paths (averages
>>>> exactly 0).
>>>>    true_wiener         -- dW, a source of standard Wiener process
>>>> (Brownian
>>>>                           -- motion) increments, with memory.
>>>>    poisson             -- dN, a source of Poisson process increments.
>>>>    compound_poisson    -- dJ, a source of compound Poisson process
>>>> increments.
>>>>                           -- (jumps distributed in time as a Poisson
>>>> process,
>>>>                           -- and in size as a given `scipy.stats`
>>>> distribution).
>>>>    compound_poisson_rv -- Preset jump size distributions for compound
>>>> Poisson
>>>>                           -- process increments.
>>>>
>>>>
>>>> Stochastic process realizations
>>>> ===============================
>>>> .. autosummary::
>>>>    :toctree: generated/
>>>>
>>>>    const_wiener_process          -- Wiener process (Brownian motion),
>>>> with
>>>>                                     -- time-independent parameters.
>>>>    const_lognorm_process         -- Lognormal process, with
>>>> time-independent
>>>>                                     -- parameters.
>>>>    wiener_process                -- Wiener process (Brownian motion).
>>>>    lognorm_process               -- Lognormal process.
>>>>    ornstein_uhlenbeck_process    -- Ornstein-Uhlenbeck process
>>>> (mean-reverting
>>>>                                     -- Brownian motion).
>>>>    hull_white_process            -- F-factors Hull-White process (sum
>>>> of F
>>>>                                     -- correlated mean-reverting
>>>> Brownian
>>>>                                     -- motions).
>>>>    hull_white_1factor_process    -- 1-factor Hull-White process (F=1
>>>> Hull-White
>>>>                                     -- process with F-index collapsed
>>>> to a
>>>>                                     -- scalar).
>>>>    cox_ingersoll_ross_process    -- Cox-Ingersoll-Ross mean-reverting
>>>> process.
>>>>    full_heston_process           -- Heston stochastic volatility process
>>>>                                     -- (returns both process and
>>>> volatility).
>>>>    heston_process                -- Heston stochastic volatility process
>>>>                                     -- (stores and returns process
>>>> only).
>>>>    jump_diffusion_process        -- Jump-diffusion process (lognormal
>>>> process
>>>>                                     -- with compound Poisson jumps).
>>>>    merton_jump_diffusion_process -- Merton jump-diffusion process
>>>>                                     -- (jump-diffusion process with
>>>> normal
>>>>                                     -- jump size distribution).
>>>>    kou_jump_diffusion_process    -- Kou jump-diffusion process
>>>> (jump-diffusion
>>>>                                     -- process with double exponential
>>>>                                     -- jump size distribution).
>>>>
>>>> Shortcuts::
>>>>
>>>>    lognorm -- lognorm_process
>>>>    oruh    -- ornstein_uhlenbeck_process
>>>>    hwp     -- hull_white_process
>>>>    hw1f    -- hull_white_1factor_process
>>>>    cir     -- cox_ingersoll_ross_process
>>>>    heston  -- heston_process
>>>>    mjd     -- merton_jump_diffusion_process
>>>>    kou     -- kou_jump_diffusion_process
>>>>
>>>>
>>>> Analytical results
>>>> ==================
>>>> Exact statistics for the realized stochastic processes
>>>> are listed below, limited to the case of constant process parameters
>>>> and with
>>>> some further limitations to the parameters' domains.
>>>> Function arguments are consistent with those of the corresponding
>>>> processes.
>>>> Suffixes `pdf`, `cdf` and `chf` stand respectively for probability
>>>> distribution
>>>> function, cumulative probability distribution function, and
>>>> characteristic
>>>> function. Black-Scholes formulae for the valuation of call and put
>>>> options have been
>>>> included (prefixed with `bs` below).
>>>>
>>>> .. autosummary::
>>>>    :toctree: generated/
>>>>
>>>>    wiener_mean
>>>>    wiener_var
>>>>    wiener_std
>>>>    wiener_pdf
>>>>    wiener_cdf
>>>>    wiener_chf
>>>>
>>>>    lognorm_mean
>>>>    lognorm_var
>>>>    lognorm_std
>>>>    lognorm_pdf
>>>>    lognorm_cdf
>>>>    lognorm_log_chf
>>>>
>>>>    oruh_mean
>>>>    oruh_var
>>>>    oruh_std
>>>>    oruh_pdf
>>>>    oruh_cdf
>>>>
>>>>    hw2f_mean
>>>>    hw2f_var
>>>>    hw2f_std
>>>>    hw2f_pdf
>>>>    hw2f_cdf
>>>>
>>>>    cir_mean
>>>>    cir_var
>>>>    cir_std
>>>>    cir_pdf
>>>>
>>>>    heston_log_mean
>>>>    heston_log_var
>>>>    heston_log_std
>>>>    heston_log_pdf
>>>>    heston_log_chf
>>>>
>>>>    mjd_log_pdf
>>>>    mjd_log_chf
>>>>
>>>>    kou_mean
>>>>    kou_log_pdf
>>>>    kou_log_chf
>>>>
>>>>    bsd1d2
>>>>    bscall
>>>>    bscall_delta
>>>>    bsput
>>>>    bsput_delta
>>>>
>>>> Notes
>>>> =====
>>>> To the benefit of interactive and notebook sessions, and as an aid to
>>>> fluently
>>>> freeze or vary the plurality of parameters that define each stochastic
>>>> process,
>>>> all sources, process realizations and analytical results are objects
>>>> with managed keywords (subclasses of `kfunc`): if the corresponding
>>>> classes
>>>> are instantiated with both variables (non keyword arguments) and
>>>> parameters
>>>> (keyword arguments), they behave as functions returning the
>>>> computations'
>>>> result; if called with parameters only, return an instance that stores
>>>> the set
>>>> parameters, and exposes the same calling pattern (call with variables
>>>> and optionally with parameters to get results, call with parameters only
>>>> to get a new instance storing modified parameters).
>>>> Parameters that are not specified fall back to the class defaults, if
>>>> calling
>>>> the class, or to the instance's stored values, if calling an instance.
>>>> """
>>>>
>>>> _______________________________________________
>>>> SciPy-Dev mailing list
>>>> SciPy-Dev at python.org
>>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>
>>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
>>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20180615/fe7a450f/attachment-0001.html>