This package implements functionalities for working with Stochastic Differential Equations models (SDEs for short).
It includes simulation routines as well as estimation methods based on observed time series.
Conceptually the information required to describe an SDEs can be divided in three groups: model, sampling and data.
The sdelearn
class is the main class of this package, dedicated to the interaction with the user.
How to create a sdelearn
class?
A sdelearn
class is based on three dedicated subclasses, SdeModel
, SdeSampling
and SdeData
,
containing information about the model, the sampling structure and the observed data respectively. First these three classes
must be created:

SdeModel
: contains information about the Sde model, in particular the “actual” Sde formula. It is assumed to be a parametric
model, i.e. the functional form of the model is known up to some parameters.
In order to construct this class user is required to
supply two functions, a drift function (drift
) and a diffusion function (diff
); an arraylike objectmod_shape
containing the
dimensions of the model of the form [n_var
,n_noise
], where the first dimension represents the number of variables
and the second the number of Gaussian noises; a dictionarypar_names
with keys"drift"
and"diffusion"
and with values
given by character lists containing all the parameter names appearing in the corresponding drift and diffusion function,
e.g.par_names = {"drift": ["par_dr1, "par_dr2"...], "diffusion: ["par_di1, "par_dr2"...] "
(this argument is optional
and parameter names can be set later using the functionset_param
); a character listvar_names
containing variable names,
if missing automatically set toX0
,X1
…X[n_var]
.The
mode
argument controls the way the model is specified. There are two ways to supply the drift and diffusion components of the model: “symbolic” or “functional” mode.Symbolic mode. In symbolic mode (
mode="sym"
, the default) the drift and diffusion are supplied as lists ofsympy
expressions,
where all the nonconstant values, i.e. parameters and state variables, are expressed assympy
symbols. All the mathematical
functions used in the expressions have to be imported fromsympy
, e.g. usesympy.sqrt
instead ofmath.sqrt
. The length of thedrift
list has to match number of variables in the modeln_var
. Similarly thediff
argument has to be a matrixlike object or nested
list with lengthn_var
and the length ofdiff[0]
isn_noise
.Function mode.
This is be specified bymode="fun"
. The drift function must be a vector valued function, taking as input two arguments: the state value and the parameters.
The input state should be a numeric vector or list,
the parameters should be a dictionary. The value returned by this function must match the number of variablesn_var
in the model.
Similarly, the diffusion function of the model must be supplied as a matrix valued function,
which takes as input the current state and a dictionary containing the parameters. The dimensions of the output value of the diffusion
function must match the number of variables and noises supplied: i.e. it must be an_var
xn_noise
matrix.
Drift and diffusion functions can be scalar valued.
The parameters must be addressed by name
in both these functions, i.e. as keys in a dictionary.
Note that names are important here: names used in the drift and diffusion function definitions must be consistent with
those supplied as initial values for estimation or simulation (simulate
). See the examples for details. As a rule of thumb
the models should be supplied as you’d write them with “pen and paper”; 
SdeSampling
: it contains information about the temporal sampling of the data. It is constructed by supplying the
time of the initial observationinitial
(typicallyinitial=0
), the last observed timeterminal
and the one betweendelta
, the time span between each pair
of observations (assumed constant), orn
the number of points in the grid (including endpoints). Ifdelta
is given
the terminal value might not be matched exactly and will be replaced by the largest value in the grid <= terminal. A time grid corresponding to the observation time is automatically generated; 
SdeData
: it contains empirically observed or simulated data. It should be a data frame where each row corresponds to an observation of the time series.
The observation times should match the time grid supplied in the sampling information: that is the number of rows inSdeData.data
should be equal to the length of the gridSDEsampling.grid
.
Finally, an instance of sdelearn
can be created as Sde(model = SdeModel, sampling=SdeSampling, data=SdeData)
where the value of each of the three arguments is an instance of the previous classes. The data argument
is optional. Data can be added later e.g. by simulation or by using the setData function
.
Learning model parameters using a SdeLearner
The parameters of a SDE can be estimated using an object of classSdeLearner
. Currently available learners are Qmle and Adalasso.
Technical details
This section contains some information about the internal structure of the package
(if you are getting unexpected errors this is a good place to start).

param
: when inmode="fun"
, typical name for parameter argument of drift and diffusion function. Both functions share the same
parameter dictionary, and the full parameter dictionary will be passed to both functions. Parameter names
used inside the function will make the difference. Initially, if thepar_names
argument is left blank, themodel
is not aware of what the parameters
of the models are. They will be inferred when simulation takes place without distinction between drift and diffusion parameters.
When thesimulate
method
or an estimation method is called the user will have to supply atruep
parameter or a starting parameter for
the optimization which will act as a template for the parameter space of the model.
Before any estimation takes place the parameter names should be explicitly set. 
the
SdeLearner
class is generic (“abstract”) and the user should never
directly use it but instead they should use one of the subclasses implementing
specific methods. 
in numerical computation the dictionary of parameters is converted
to arrays. This arrays must match the order of the parameters in the model.
which is drift first then diffusion, in lexicographic order.
Fit and loss functions should automatically match the supplied values
with the order specified in the model: currently automatic reordering is done for
argumentsparam
of the loss function,start
andbounds
in model fitting. Note that bounds
do not have names, so they are assumed to have the same order asstart
.
The ordered list of parameters can be
accessed bySde.model.param
.
Examples
A multivariate model.
Functional mode. This is the direct way to approach Sde modeling with sdelearn
.
Define the drift function:
def b(x, param):
out = [0,0]
out[0]= param["theta.dr00"]  param["theta.dr01"] * x[0]
out[1] = param["theta.dr10"]  param["theta.dr11"] * x[1]
return out
Define the diffusion function:
def A(x, param):
out = [[0,0],[0,0]]
out[0][0] = param["theta.di00"] + param["theta.di01"] * x[0]
out[1][1] = param["theta.di10"] + param["theta.di11"] * x[0]
out[1][0] = 0
out[0][1] = 0
return out
Create the Sde object
sde = Sde(sampling=SdeSampling(initial=0, terminal=1, delta=0.01),
model=SdeModel(b, A, mod_shape=[2,2],
par_names={"drift": ["theta.dr00", "theta.dr01", "theta.dr10", "theta.dr11"],
"diffusion": ["theta.di00", "theta.di01", "theta.di10", "theta.di11"]}
)
)
Set the true value of the parameter and simulate a sample path of the process:
truep = {"theta.dr00": 0, "theta.dr01": 0.5, "theta.dr10": 0, "theta.dr11": 0.5, "theta.di00": 0, "theta.di01": 1, "theta.di10": 0, "theta.di11": 1}
sde.simulate(truep=truep, x0=[1, 2])
Plot the simulated path:
Symbolic mode.
GitHub