Temporal Point Processes (TPP) with JumpProcesses and PointProcesses
JumpProcesses was initially developed to simulate the trajectory of jump processes. Therefore, those with a background in point process might find the nomenclature in the library documentation confusing. In reality, jump and point processes share many things in common, but diverge in scope. This tutorial will cover JumpProcesses from the perspective of point process theory.
In this application tutorial, we show how to interface JumpProcesses and the PointProcesses.jl library, and leverage this interface to then illustrate many different aspects usually discussed in point process theory. PointProcesses.jl offers an interface for defining marked temporal point processes (TPPs). We will show how to link JumpProcesses into this interface by implementing SciMLPointProcess
, which is a concrete implementation of PointProcesses' AbstractPointProcess
that uses solvers from JumpProcesses and SciML.
TPP Theory
TPPs describe a set of discrete points over continuous time. Conventionally, we assume that time starts at $0$. We can represent a TPP as a random integer measure $N( \cdot )$, this random function counts the number of points in a set of intervals over the real line. For instance, $N([5, 10])$ denotes the number of points (or events) in between time $5$ and $10$ inclusive. The number of points in this interval is a random variable. If $N$ is a Poisson process with conditional intensity (or rate) equal to $1$, then $N[5, 10]$ is distributed according to a Poisson distribution with parameter $\lambda = 5$.
For convenience, we denote $N(t) \equiv N[0, t)$ as the number of points since the start of time until $t$, exclusive of $t$. A TPP is simple, if only a single event can occur in any unit of time $t$, that is, $\Pr(N[t, t + \epsilon) > 2) = 0$. We can then define a differential of $N$, $dN$, which describes the change in $N(t)$ over an infinitesimal amount of time.
\[dN(t) = \begin{cases} 1 \text{ , if } N[t, t + \epsilon] = 1 \\ 0 \text{ , if } N[t, t + \epsilon] = 0. \end{cases}\]
Therefore, we can use any TPP to define a stochastic differential equation (SDE) with discontinuous jumps.
\[du = f(u,p,t)dt + g(u,p,t) dW(t) + h(u,p,t) dN(t)\]
In the jump literature, $N$ is usually a Poisson process which allows for a number of convenient properties for jump modelling. See Björk [2] and Hanson [3] for more theoretical discussions of SDE with discontinuous jumps.
We can start implementing our TPP interface in Julia by working with the following SDE with discontinuous jumps.
\[du = dN(t)\]
In this case, $u(t)$ is a monotonic function which counts the total number of points since the start of time. Therefore, there is a one-to-one map between $N$ and $u$.
A TPP is marked if, in addition to the temporal support $\mathbb{R}$, there is a mark space $\mathcal{K}$ such that $N$ is a random integer measure over $\mathbb{R} \times \mathcal{K}$. Intuitively, for every point in the process there is a mark associated with it. If the mark space is discrete, we have a multivariate TPP. There are different ways to interpret TPPs, and we will move between these interpretations throughout the tutorial.
Since TPP sampling is more efficient if we split any marked TPP into a sparsely connected multivariate TPP, we define SciMLPointProcess
as a multivariate TPP such that each subcomponent is itself a marked TPP on a continuous space. Therefore, we have that our structure includes a vector of sub-TPP jumps
, a vector of mark distributions mark_dist
and the sparsely connected graph g
.
using JumpProcesses
using PointProcesses
struct SciMLPointProcess{M, J <: JumpProcesses.AbstractJump, G, D, T <: Real} <:
AbstractPointProcess{M}
jumps::Vector{J}
mark_dist::Vector{D}
g::G
p::Any
tmin::T
tmax::T
end
function Base.show(io::IO, pp::SciMLPointProcess)
println(io,
"SciMLPointProcess with $(length(pp.jumps)) processes on the interval [$(pp.tmin), $(pp.tmax)).")
end
In alignment with PointProcesses
API we define methods for extracting the boundaries of the time interval we will be working with.
PointProcesses.min_time(pp::SciMLPointProcess) = pp.tmin
PointProcesses.max_time(pp::SciMLPointProcess) = pp.tmax
As we want to keep SciMLPointProcess
as general as possible we define two methods for initializing and resetting the parameters of the TPP. The usefulness of these methods will become apparent further below.
params(pp::SciMLPointProcess) = pp.p(pp)
params!(pp::SciMLPointProcess, p) = pp.p(pp, p)
The likelihood of any simple TPP is fully characterized by its conditional intensity $\lambda^\ast (t, k) \equiv \lambda^\ast(t) \times f^\ast (k \mid t)$, where $f^\ast(k \mid t)$ is the mark distribution and $\lambda^\ast(t)$ is the conditional intensity of the ground process which can be further factorized as:
\[\lambda^\ast (t) \equiv \lambda(t \mid H_{t^-} ) = \frac{p^\ast(t)}{1 - \int_{t^-}^{t_n} p^\ast(u) \, du},\]
The internal history of the process up to but not including $t$ is $H_{t^-} \equiv \{ (t_n, k_n) \mid 0 \leq t_n \leq t \}$. Following the convention in the TPP literature, the superscript $\ast$ denotes the conditioning of any function on $H_{t^-}$. So, $p^\ast(t) \equiv p(t \mid H_{t^-})$ is the density function corresponding to the probability of an event taking place at time $t$ given $H_{t^-}$. The mark distribution denotes the density function corresponding to the probability of observing mark $k$ given the occurrence of an event at time $t$ and internal history $H_{t^-}$. In summary, the conditional intensity is the likelihood of observing a point in the next infinitesimal unit of time, given that no point has occurred since the last observed point in $H_{t^-}$. Alternatively, we can interpret it as the number of points that we expect to see in the next marginal interval $E[dNt(dt \times d\mu(k))] = \lambda^\ast (t, k) dt d\mu(k)$. For more details, see Chapter 7, Daley and Vere-Jones[1].
PointProcesses provides a convenient interface for keeping track of the history of a marked TPP called History
.
mutable struct History{M, T <: Real}
times::Vector{T}
marks::Vector{M}
tmin::T
tmax::T
end
We define a method for resetting the history, which can be useful during simulation.
function reset!(h::History)
empty!(event_times(h))
empty!(event_marks(h))
end
Marked Hawkes Process
Until now, everything has been fairly theoretical. But we have enough to start walking through a concrete example. Throughout this tutorial we will implement a case of the Hawkes process. Hawkes processes are classic TPP models that show self-exciting behavior whereby the occurrence of an event increases the likelihood of other events nearby. They are useful models to describe earthquakes, gang violence, bank defaults, etc. For a complete treatment of Hawkes processes see Laub, Lee and Taimre [4].
In this tutorial we propose a spatial Hawkes process as a simple model for call detail records (CDRs). CDR records the time and location when a user starts a mobile data session. Behind the scenes, anytime a user sees a message from someone in their network, they are more likely to use their phone. In addition to that, users tend to gravitate around their home, so the locations they visit are spread around their home according to a Gaussian distribution.
Formally, consider a graph $G$ with $V$ nodes. Our Hawkes process is characterized by $V$ TPPs such that the conditional intensity rate of node $i$ connected to a set of nodes $E_i$ in the graph $G$ is given by:
\[\lambda_i^\ast (t) = \lambda + \sum_{j \in E_i} \sum_{t_{n_j} < t} \alpha \exp \left[ -\beta (t - t_{n_j}) \right]\]
This conditional intensity is known as a self-exciting, because the occurrence of an event $j$ at $t_{n_j}$ will increase the conditional intensity of all processes connected to it by $alpha$. This influence will then decrease at rate $\beta$.
The conditional intensity of this process has a recursive formulation which we can use to our advantage to significantly speed simulation. Let $t_{N_i} = \max \{t_{n_j} < t \mid j \in E_i\}$ and $\phi_i^\ast(t)$ below.
\[\begin{split} \phi_i^\ast (t) &= \sum_{j \in E_i} \sum_{t_{n_j} < t} \alpha \exp \left[-\beta (t - t_{N_i} + t_{N_i} - t_{n_j}) \right] \\ &= \exp \left[ -\beta (t - t_{N_i}) \right] \sum_{j \in E_i} \sum_{t_{n_j} \leq t_{N_i}} \alpha \exp \left[-\beta (t_{N_i} - t_{n_j}) \right] \\ &= \exp \left[ -\beta (t - t_{N_i}) \right] \left( \alpha + \phi_i^\ast (t_{N_i}) \right) \end{split}\]
Then the conditional intensity can be re-written in terms of $\phi_i^\ast (t_{N_i})$.
\[\lambda_i^\ast (t) = \lambda + \phi_i^\ast (t) = \lambda + \exp \left[ -\beta (t - t_{N_i}) \right] \left( \alpha + \phi_i^\ast (t_{N_i}) \right)\]
We translate these expressions to Julia by employing a closure which will allow us to obtain the rate for each node $i$ in $G$.
function hawkes_rate(i::Int, g)
function rate(u, p, t)
(; λ, α, β, ϕ, T) = p
return λ + exp(-β * (t - T[i])) * ϕ[i]
end
return rate
end
We assume that each sup-process i
is a marked TPP. With probability $(1 - \omega^\ast(t))$, we draw a mark from a 2-dimensional Gaussian distribution centered in $\mu_i$ with $\sigma_1$ standard deviation; and with probability $\omega^\ast(t)$ we draw from a 2-dimensional Gaussian distribution centered on the last location visited by $i$ with $\sigma_2$ standard deviation. Like the conditional intensity, $\omega^\ast(t)$ decays exponentially with rate $-\beta$.
\[\omega^\ast(t) = \exp [ -\beta (t - t_{n_i}) ]\]
In other words, node $i$ tends to gravitate around its home location, but given some recent activity the node will be more likely to be close to its most recent location. Let $k \equiv (i, m)$, then the mark distribution can be represented as a mixture distribution.
\[\begin{split} f^\ast(k \mid t) = &\frac{\lambda_i^\ast (t)}{\sum_i \lambda_i^\ast (t)} \times \\ &\left[ (1 - \omega(t)) \frac{1}{\sigma_1 \sqrt{2\pi}} \exp \left( -\frac{(m - \mu_i)^\top(m - \mu_i)}{2 \sigma_1^2} \right) + \right. \\ &\left. \omega(t) \frac{1}{\sigma_2 \sqrt{2\pi}} \exp \left( -\frac{(m_{n_i} - \mu_i)^\top(m_{n_i} - \mu_i)}{2 \sigma_2^2} \right) \right] \end{split}\]
In Julia, we define a method for constructing Hawkes jumps. JumpProcesses define different types of jumps that vary according to the behavior of the conditional intensity and the intended simulation algorithm. Since the conditional intensity of our Hawkes process is not fixed, we will use VariableRateJump
to construct the Hawkes jumps. The structure requires a rate
which we defined above and an affect!
which tells the program what happens when a jump occurs and is when we draw the marks from our distribution. In addition to that, since we intend to use the Coevolve
algorithm for simulation — see below —, we need to define the rate upper-bound, the interval for which the upper-bound is valid, and, optionally, a lower-bound for improved simulation efficiency.
function hawkes_jump(i::Int, g, mark_dist)
rate = hawkes_rate(i, g)
urate = rate
lrate(u, p, t) = p.λ
rateinterval = (u, p, t) -> begin
_lrate = lrate(u, p, t)
_urate = urate(u, p, t)
return _urate == _lrate ? typemax(t) : 1 / (2 * _urate)
end
function affect!(integrator)
(; λ, α, β, ϕ, M, T, h) = integrator.p
ω = exp(-β * (integrator.t - T[i]))
for j in g[i]
ϕ[j] = α + exp(-β * (integrator.t - T[j])) * ϕ[j]
T[j] = integrator.t
end
m = rand() > ω ? rand(mark_dist[i]) : rand(MvNormal(collect(M[i]), [0.01, 0.01]))
push!(h, integrator.t, (i, m); check = false)
end
return VariableRateJump(rate, affect!; lrate, urate, rateinterval)
end
To initialize SciMLPointProcess
we also need to define a function for returning and resetting the parameters of our model.
function hawkes_p(pp::SciMLPointProcess{M, J, G, D, T}) where {M, J, G, D, T}
g = pp.g
tmin = pp.tmin
tmax = pp.tmax
h = History(; times = T[], marks = Tuple{Int, M}[], tmin = tmin, tmax = tmax)
return (λ = 0.5,
α = 0.1,
β = 2.0,
ϕ = zeros(T, length(g)),
M = map(d -> tuple(mean(d)...), pp.mark_dist),
T = zeros(T, length(g)),
h = h)
end
function hawkes_p(pp::SciMLPointProcess{M, J, G, D, T}, p) where {M, J, G, D, T}
reset!(p.h)
p.ϕ .= zero(p.ϕ)
p.M .= map(d -> tuple(mean(d)...), pp.mark_dist)
p.T .= zero(p.T)
end
Now, we are ready to initialize our SciMLPointProcess
as a Hawkes process.
using Graphs
using Distributions
V = 10
G = erdos_renyi(V, 0.2)
g = [[[i]; neighbors(G, i)] for i in 1:nv(G)]
mark_dist = [MvNormal(rand(2), [0.2, 0.2]) for i in 1:nv(G)]
jumps = [hawkes_jump(i, g, mark_dist) for i in 1:nv(G)]
tspan = (0.0, 50.0)
hawkes = SciMLPointProcess{
Vector{Real},
eltype(jumps),
typeof(g),
eltype(mark_dist),
eltype(tspan)
}(jumps,
mark_dist,
g,
hawkes_p,
tspan[1],
tspan[2])
SciMLPointProcess with 10 processes on the interval [0.0, 50.0).
Sampling
JumpProcesses shines in the simulation of SDEs with discontinuous jumps. The mapping we introduced in the previous Section whereby $du = dN(t)$ implies that JumpProcesses also excels in simulating TPPs.
JumpProcesses offers a plethora of simulation algorithms for TPPs. The library call them aggregators because these algorithms are methods for aggregating a set of jumps to determine the next jump time. In Jump Aggregators for Exact Simulation, we discuss the trade-off between different simulation algorithms.
To simulate a SciMLPointProcess
, we start by overloading Base.rand
. In our implementation, we initialize a JumpProblem
with the jumps and parameters passed to the SciMLPointProcess
as well as with the desired simulation algorithm. In this tutorial we use the Coevolve
aggregator which is an algorithm inspired by Ogata's algorithm for bounded TPPs with modifications to improve efficiency and to allow for the concurrent evolution of the TPP and simulation time.
Finally, we sample a path from our JumpProblem
using the SSAStepper
. A stepper tells the solver how to step through time. When simulating TPPs, we do not need to evolve time incrementally by small deltas. The SSAStepper
allow us to step through time one candidate at a time.
using OrdinaryDiffEq
using Random
function Base.rand(rng::AbstractRNG, pp::SciMLPointProcess)
return rand(rng, pp, min_time(pp), max_time(pp), 1)[1]
end
function Base.rand(rng::AbstractRNG, pp::SciMLPointProcess, n::Int)
return rand(rng, pp, min_time(pp), max_time(pp), n)
end
function Base.rand(pp::SciMLPointProcess, n::Int)
return rand(Random.default_rng(), pp, min_time(pp), max_time(pp), n)
end
function Base.rand(rng::AbstractRNG,
pp::SciMLPointProcess{M, J, G, D, T},
tmin::T,
tmax::T,
n::Int) where {M, J, G, D, T <: Real}
tspan = (tmin, tmax)
save_positions = (false, false)
out = Array{History, 1}(undef, n)
p = params(pp)
dprob = DiscreteProblem([0], tspan, p)
jprob = JumpProblem(dprob, Coevolve(), jumps...; dep_graph = pp.g, save_positions, rng)
for i in 1:n
params!(pp, p)
solve(jprob, SSAStepper())
out[i] = deepcopy(p.h)
end
return out
end
We can easily sample the Hawkes process introduced in the previous section.
h = rand(hawkes)
History{Tuple{Int64, Vector{Real}},Float64} with 285 events on interval [0.0, 50.0)
It would be useful to visualize our sampled points to get a good feeling of our simulation. Since PointProcesses does not offer recipes for visualizing TPP samples, we propose a few options. First, we visualize our sample as a sequence of points though time, each line represents the realization of one of the sub-TPPs of SciMLPointProcess
.
using Plots
@userplot BarcodePlot
@recipe function f(x::BarcodePlot)
h, ix = x.args
times = event_times(h)
marks = event_marks(h)
histories = [times[filter(n -> marks[n][1] == i, 1:length(times))] for i in ix]
seriestype := :scatter
yticks --> (1:length(ix), ix)
alpha --> 0.5
markerstrokewidth --> 0
markerstrokealpha --> 0
markershape --> :circle
ylims --> (0.0, length(ix) + 1.0)
for i in 1:length(ix)
@series begin
label := ix[i]
histories[i], fill(i, length(histories[i]))
end
end
end
Let's visualize our data as a barcode.
barcodeplot(h, 1:10; legend = false)
In reality, we do not observe which components generated which points. Therefore, a fairer representation of the data would be as a scatter plot. We also plot the latent home locations which show that points are indeed clustered around them.
scatter(map(m -> tuple(m[2]...), event_marks(h)), label = "events")
scatter!(map(d -> tuple(mean(d)...), mark_dist), label = "cluster epicentre")
The Conditional Intensity
The first Section of this tutorial introduced the conditional intensity. Here we will delve deeper into this concept which plays a prominent role in TPP theory due to its ability to fully characterize any TPP.
The conditional intensity is a stochastic process since it depends on the realization of $N$. Given a history of events $H_{t^-}$, we can recover the conditional intensity of any jump by evaluating its rate function. We can take advantage of OrdinaryDiffEq to build a DiscreteProblem
which we solve with the FunctionMap
stepper, which steps through time stopping on each event and on specific locations specified by the user. Our method computes the conditional intensity for each sub-TPP of the SciMLPointProcess
, it returns the solution of the DiscreteProblem
and the final parameters.
function intensity(pp::SciMLPointProcess, t, h; saveat = [], save_positions = (true, true))
p = params(pp)
times = event_times(h)
marks = event_marks(h)
tmin, tmax = min_time(pp), t
tstops = typeof(saveat) <: Number ? collect(tmin:saveat:tmax) : copy(saveat)
append!(tstops, times)
sort!(tstops)
unique!(tstops)
rates(u, p, t) = [jump.rate(nothing, p, t) for jump in pp.jumps]
condition(u, t, integrator) = t ∈ times
function affect!(integrator)
n = searchsortedfirst(times, integrator.t)
mark = marks[n]
ix = mark[1]
pp.jumps[ix].affect!(integrator)
# overwrite history with true mark
integrator.p.h.marks[n] = mark
integrator.u = rates(integrator.u, integrator.p, integrator.t)
end
callback = DiscreteCallback(condition, affect!; save_positions)
dprob = DiscreteProblem(rates, rates(nothing, p, tmin), (tmin, tmax), p; callback,
tstops, saveat)
sol = solve(dprob, FunctionMap())
return sol
end
As an illustration, we plot the conditional intensity of the first sub-TPP of our sampled Hawkes process.
λ = intensity(hawkes, max_time(hawkes), h; saveat = 0.1)
plot(λ, idxs = 1)
Now, we can specialize a number of functions required by the interface defined in PointProcesses.
function ground_intensity(pp::SciMLPointProcess, t, h)
λ = intensity(pp, t, h; saveat = [t], save_positions = (false, false))
ground_intensity(pp, t, λ)
end
function ground_intensity(pp::SciMLPointProcess, t, λ::ODESolution)
return vec(sum(λ(t), dims = 1))
end
function ground_intensity(pp::SciMLPointProcess, t::T, λ::ODESolution) where {T <: Number}
return sum(λ(t))
end
function mark_distribution(pp::SciMLPointProcess, t, h)
λ = intensity(pp, t, h)
mark_distribution(pp, t, λ)
end
function mark_distribution(pp::SciMLPointProcess, t, λ)
λt = λ(t)
d = MixtureModel(pp.mark_dist, λt ./ sum(λt))
return d
end
function intensity(pp::SciMLPointProcess, m, t, h)
λ = intensity(pp, t, h)
intensity(pp, m, t, λ)
end
function intensity(pp::SciMLPointProcess, m, t, λ::ODESolution)
λt = λ(t)
return sum([densityof(pp.mark_dist[i], m) * λt[i] for i in 1:length(pp.jumps)])
end
function log_intensity(pp::SciMLPointProcess, m, t, h)
return log(intensity(pp, m, t, h))
end
We can then visualize the ground intensity.
plot(ground_intensity(hawkes, λ.t, λ))
The Compensator
The compensator is defined as the integral of the conditional intensity.
\[\Lambda(t, k) \equiv \int_0^t \lambda^\ast (u, k) du\]
With some abuse of notation, we obtain the compensator of the ground process by integrating over the marks.
\[\Lambda(t) \equiv \int_0^t \sum_{k \in \mathcal{K}} \lambda^\ast (u, k) du\]
In Julia, we can derive the compensator by simply integrating over the conditional intensity using an ODEProblem.
function integrated_intensity(pp::SciMLPointProcess,
t,
h;
alg = nothing,
saveat = [],
save_positions = (true, true))
p = params(pp)
times = event_times(h)
marks = event_marks(h)
tspan = (min_time(pp), t)
rates(u, p, t) = [jump.rate(nothing, p, t) for jump in pp.jumps]
condition(u, t, integrator) = t ∈ times
function affect!(integrator)
n = searchsortedfirst(times, integrator.t)
mark = marks[n]
ix = mark[1]
pp.jumps[ix].affect!(integrator)
integrator.p.h.marks[n] = mark
end
callback = DiscreteCallback(condition, affect!; save_positions)
prob = ODEProblem(rates,
zeros(eltype(times), length(pp.jumps)),
tspan,
p;
tstops = times,
callback,
saveat)
sol = solve(prob, alg)
return sol
end
The PointProcess interface expects that we define a method for computing the compensator of the ground process for a given interval. This will be useful in the next section when we compute the log-likelihood.
function integrated_ground_intensity(pp::SciMLPointProcess, h, a, b)
Λ = integrated_intensity(pp,
b,
h;
alg = Rodas4P(),
saveat = [a, b],
save_positions = (false, false))
return sum(Λ(b)) - sum(Λ(a))
end
In practice, we must pay close attention to the shape of the conditional intensity. In the Hawkes case, the conditional intensity is highly stiff as events will cause the intensity to spike. Therefore, we must select an ODE solver that can deal with stiff problems like Rodas4P()
.
Λ = integrated_intensity(hawkes,
max_time(hawkes),
h;
saveat = event_times(h),
alg = Rodas4P())
plot(Λ)
The compensator of the Hawkes process lends itself to an analytical solution which can be implemented in Julia as a DiscreteProblem
much like the conditional intensity above.
function hawkes_integrated_intensity(pp::SciMLPointProcess, t, h; saveat = [])
(; λ, α, β) = params(pp)
p = (λ = λ, α = α, β = β, h = h)
saveat = typeof(saveat) <: Number ? collect(min_time(h):saveat:t) : copy(saveat)
tstops = copy(event_times(h))
function compensator(u, p, t)
(; λ, α, β, h) = p
u = zeros(typeof(t), length(pp.jumps))
for (i, Ei) in enumerate(pp.g)
u[i] += λ * t
for (_t, (_j, _)) in zip(event_times(h), event_marks(h))
_t >= t && break
_j ∉ Ei && continue
u[i] += (α / β) * (1 - exp(-β * (t - _t)))
end
end
return u
end
dprob = DiscreteProblem(compensator, zeros(eltype(event_times(h)), length(pp.jumps)),
(min_time(h), t), p; tstops, saveat)
sol = solve(dprob, FunctionMap())
return sol
end
hawkes_integrated_intensity (generic function with 1 method)
Λ_exact = hawkes_integrated_intensity(hawkes, max_time(hawkes), h)
plot!(Λ_exact)
Time-change theorems for TPPs show that the compensator transforms a complex TPP into a Poisson process with unit rate. We can interpret the compensator as making the TPP as random as possible. We can check that this is indeed the case for our model and simulation algorithm.
First, we define a method for filtering the history.
function Base.filter(f, h::History)
times = event_times(h)
marks = event_marks(h)
filtered_times = eltype(times)[]
filtered_marks = eltype(marks)[]
for (t, mark) in zip(times, marks)
if f(t, mark)
push!(filtered_times, t)
push!(filtered_marks, mark)
end
end
return History(;
times = filtered_times,
marks = filtered_marks,
tmin = min_time(h),
tmax = max_time(h))
end
And a recipe for QQ-plots.
@userplot QQPlot
@recipe function f(x::QQPlot)
empirical_quant, expected_quant = x.args
max_empirical_quant = maximum(maximum, empirical_quant)
max_expected_quant = maximum(expected_quant)
upperlim = ceil(maximum([max_empirical_quant, max_expected_quant]))
@series begin
seriestype := :line
linecolor := :lightgray
label --> ""
(x) -> x
end
@series begin
seriestype := :scatter
aspect_ratio := :equal
xlims := (0.0, upperlim)
ylims := (0.0, upperlim)
xaxis --> "Expected quantile"
yaxis --> "Empirical quantile"
markerstrokewidth --> 0
markerstrokealpha --> 0
markersize --> 1.5
size --> (400, 500)
label --> permutedims(["quantiles $i" for i in 1:length(empirical_quant)])
expected_quant, empirical_quant
end
end
To produce the QQ-plots, we ran the simulation a reasonable number of times to ensure an accurate estimation of the empirical quantiles of the inter-arrival times. The time_change
method comes from PointProcess which applies the compensator to the input history. We take advantage of the fact that the compensator is an ODESolution
which overloads an interpolation method in itself to produce the QQ-plot for the ground process.
Δt̃ = []
for _ in 1:250
_h = rand(hawkes)
_Λ = integrated_intensity(hawkes,
max_time(hawkes),
_h;
saveat = event_times(_h),
alg = Rodas4P())
_h̃ = time_change(_h, (t) -> sum(_Λ(t)))
append!(Δt̃, diff(event_times(_h̃)))
end
empirical_quant = quantile(Δt̃, 0.01:0.01:0.99)
expected_quant = quantile(Exponential(1.0), 0.01:0.01:0.99)
qqplot(empirical_quant, expected_quant; legend = false)
Likewise, we can produce the QQ-plot for each sub-TPP.
Δt̃ = [[] for _ in 1:nv(G)]
for _ in 1:250
_h = rand(hawkes)
_Λ = integrated_intensity(hawkes,
max_time(hawkes),
_h;
saveat = event_times(_h),
alg = Rodas4P())
for i in 1:nv(G)
_h̃ = time_change(filter((t, mark) -> mark[1] == i, _h), (t) -> Λ(t; idxs = i))
append!(Δt̃[i], diff(event_times(_h̃)))
end
end
empirical_quant = map(t -> quantile(t, 0.01:0.01:0.99), Δt̃)
expected_quant = quantile(Exponential(1.0), 0.01:0.01:0.99)
qqplot(empirical_quant, expected_quant)
The Log-Likelihood
Once we know how to compute the conditional intensity and the compensator, it is very easy to obtain the log-likelihood of a TPP because it is simply the sum of the log-conditional intensity at event times minus the compensator up to the end of the observed time. For a full derivation, see Chapter 7, Daley and Vere-Jones[1].
\[\ell (H_{T^-}) = \sum_{n=1}^N \left( \log \lambda^\ast(t_n, k_n) \right) - \int_0^T \sum_{k \in \mathcal{K}} \lambda^\ast (u, k) du\]
In Julia, we follow the PointProcess API to define a method for computing the log-likelihood.
using DensityInterface
function DensityInterface.logdensityof(pp::SciMLPointProcess, h)
T = max_time(pp)
times = event_times(h)
marks = event_marks(h)
Λ = integrated_ground_intensity(pp, h, min_time(h), T)
λ = intensity(pp, T, h; saveat = times, save_positions = (false, false))
logλ = 0
for (t, (i, m)) in zip(times, marks)
logλ += log_intensity(pp, m, t, λ)
log_intensity(pp, m, t, λ), intensity(pp, m, t, λ)
end
return logλ - Λ
end
This allows us to compute the log-likelihood of our sample.
logdensityof(hawkes, h)
100.64328812094442
This tutorial demonstrated the versatility of the SciML library to model TPPs. We made extensive use of JumpProcesses and OrdinaryDiffEq to fulfill the interface specified by the PointProcess library. However, we have left out a discussion about fitting the parameters of a TPP to the data. In many cases, we cannot derive an analytical estimator via maximum likelihood, and we must tap into machine learning tools like gradient descent and expectation-maximization algorithms to minimize the log-likelihood function. Julia is a great tool for these kinds of tasks with libraries such as Optimization and Flux which are outside the scope this tutorial.
References
- 1D. J. Daley and D. Vere-Jones, An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods, Springer-Verlag (2003). doi:10.1007/b97277.
- 2T., Björk, Point Processes and Jump Diffusions: An Introduction with Finance Applications, Cambridge University Press (2021). doi:10.1017/9781009002127.
- 3F. B. Hanson, Applied Stochastic Processes and Control for Jump-Diffusions: Modeling, Analysis and Computation, Society for Industrial and Applied Mathematics (2007). doi:10.1137/1.9780898718638.
- 4P. J. Laub, Y. Lee and T. Taimre, The Elements of Hawkes Processes, Springer International Publishing (2021). doi:10.1007/978-3-030-84639-8.