Structural Identifiability Analysis

During parameter fitting, parameter values are inferred from data. Parameter identifiability refers to whether inferring parameter values for a given model is mathematically feasible. Ideally, parameter fitting should always be accompanied with an identifiability analysis of the problem.

Identifiability can be divided into structural and practical identifiability^[1]. Structural identifiability considers only the mathematical model, and which parameters are and are not inherently identifiable due to model structure. Practical identifiability also considers the available data, and determines what system quantities can be inferred from it. In the idealised case of an infinite amount of non-noisy data, practical identifiability converges to structural identifiability. Generally, structural identifiability is assessed before parameters are fitted, while practical identifiability is assessed afterwards.

Structural identifiability (which is what this tutorial considers) can be illustrated by the following differential equation: ${dx \over dt} = p1*p2*x(t)$ where, however much data is collected on $x$, it is impossible to determine the distinct values of $p1$ and $p2$. Hence, these parameters are non-identifiable (however, their product, $p1*p2$, is identifiable).

Catalyst contains a special extension for carrying out structural identifiability analysis of generated reaction rate equation ODE models using the StructuralIdentifiability.jl package. This enables StructuralIdentifiability's assess_identifiability, assess_local_identifiability, and find_identifiable_functions functions to be called directly on Catalyst ReactionSystems. It also implements specialised routines to make these more efficient when applied to reaction network models (e.g. by improving runtimes). How to use these functions is described in the following tutorial, with StructuralIdentifiability providing a more extensive documentation.

Structural identifiability can be divided into local and global identifiability. If a model quantity is locally identifiable, it means that its true value can be determined down to a finite-number of possible options. This also means that there is some limited region around the quantity's true value where this true value is the only possible value (and hence, within this region, the quantity is fully identifiable). Globally identifiable quantities' values, on the other hand, can be uniquely determined. Again, while identifiability can be confirmed structurally for a quantity, it does not necessarily mean that it is practically identifiable for some given data.

Generally, there are three types of quantities for which identifiability can be assessed.

Parameters (e.g. $p1$ and $p2$).
Full variable trajectories (e.g. $x(t)$).
Variable initial conditions (e.g. $x(0)$).

StructuralIdentifiability currently assesses identifiability for the first two only (however, if $x(t)$ is identifiable, then $x(0)$ will be as well).

Note

Currently, the StructuralIdentifiability.jl extension only considers structural identifiability for the ODE generated by the reaction rate equation. It is possible that for the SDE model (generated by the chemical Langevin equation) and the jump model (generated by stochastic chemical kinetics) the identifiability of model quantities is different.

Global identifiability analysis

Basic example

Global identifiability can be assessed using the assess_identifiability function. For each model quantity (parameters and variables), it will assess whether they are:

Globally identifiable.
Locally identifiable.
Unidentifiable.

To it, we provide our ReactionSystem model and a list of quantities that we are able to measure. Here, we consider a Goodwind oscillator (a simple 3-component model, where the three species $M$, $E$, and $P$ are produced and degraded, which may exhibit oscillations)^[2]. Let us say that we are able to measure the concentration of $M$, we then designate this using the measured_quantities argument. We can now assess identifiability in the following way:

using Catalyst, Logging, StructuralIdentifiability
gwo = @reaction_network begin
    (pₘ/(1+P), dₘ), 0 <--> M
    (pₑ*M,dₑ), 0 <--> E
    (pₚ*E,dₚ), 0 <--> P
end
assess_identifiability(gwo; measured_quantities = [:M], loglevel = Logging.Error)

OrderedCollections.OrderedDict{SymbolicUtils.BasicSymbolic{Real}, Symbol} with 9 entries:
  M(t) => :globally
  E(t) => :nonidentifiable
  P(t) => :globally
  pₘ   => :globally
  dₘ   => :globally
  pₑ   => :nonidentifiable
  dₑ   => :locally
  pₚ   => :nonidentifiable
  dₚ   => :locally

From the output, we find that E(t), pₑ, and pₚ (the trajectory of $E$, and the production rates of $E$ and $P$, respectively) are non-identifiable. Next, dₑ and dₚ (the degradation rates of $E$ and $P$, respectively) are locally identifiable. Finally, P(t), M(t), pₘ, and dₘ (the trajectories of P and M, and the production and degradation rate of M, respectively) are all globally identifiable. We note that we also imported the Logging.jl package, and provided the loglevel = Logging.Error input argument. StructuralIdentifiability functions generally provide a large number of output messages. Hence, we will use this argument (which requires the Logging package) throughout this tutorial to decrease the amount of printed text.

Next, we also assess identifiability in the case where we can measure all three species concentrations:

assess_identifiability(gwo; measured_quantities = [:M, :P, :E], loglevel = Logging.Error)

OrderedCollections.OrderedDict{SymbolicUtils.BasicSymbolic{Real}, Symbol} with 9 entries:
  M(t) => :globally
  E(t) => :globally
  P(t) => :globally
  pₘ   => :globally
  dₘ   => :globally
  pₑ   => :globally
  dₑ   => :globally
  pₚ   => :globally
  dₚ   => :globally

in which case all species trajectories and parameters become identifiable.

Indicating known parameters

In the previous case we assumed that all parameters are unknown, however, this is not necessarily true. If there are parameters with known values, we can supply these using the known_p argument. Providing this additional information might also make other, previously unidentifiable, parameters identifiable. Let us consider the previous example, where we measure the concentration of $M$ only, but now assume we also know the production rate of $E$ ($pₑ$):

assess_identifiability(gwo; measured_quantities = [:M], known_p = [:pₑ], loglevel = Logging.Error)

OrderedCollections.OrderedDict{SymbolicUtils.BasicSymbolic{Real}, Symbol} with 9 entries:
  M(t) => :globally
  E(t) => :locally
  P(t) => :globally
  pₘ   => :globally
  dₘ   => :globally
  pₑ   => :globally
  dₑ   => :locally
  pₚ   => :globally
  dₚ   => :locally

Not only does this turn the previously non-identifiable pₑ (globally) identifiable (which is obvious, given that its value is now known), but this additional information improve identifiability for several other network components.

To, in a similar manner, indicate that certain initial conditions are known is a work in progress. Hopefully this feature should be an available in the near future.

Providing non-trivial measured quantities

Sometimes, ones may not have measurements of species, but rather some combinations of species (or possibly parameters). To account for this, measured_quantities accepts any algebraic expression (and not just single species). To form such expressions, species and parameters have to first be @unpack'ed from the model. Say that we have a model where an enzyme ($E$) is converted between an active and inactive form, which in turns activates the production of a product, $P$:

rs = @reaction_network begin
    (kA,kD), Eᵢ <--> Eₐ
    (Eₐ, d), 0 <-->P
end

\[ \begin{align*} \mathrm{\mathtt{E_i}} &\xrightleftharpoons[\mathtt{kD}]{\mathtt{kA}} \mathrm{\mathtt{E_a}} \\ \varnothing &\xrightleftharpoons[d]{\mathtt{E_a}} \mathrm{P} \end{align*} \]

If we can measure the total amount of $E$ ($=Eᵢ+Eₐ$), as well as the amount of $P$, we can use the following to assess identifiability:

@unpack Eᵢ, Eₐ = rs
assess_identifiability(rs; measured_quantities = [Eᵢ + Eₐ, :P], loglevel = Logging.Error)

Assessing identifiability for specified quantities only

By default, StructuralIdentifiability assesses identifiability for all parameters and variables. It is, however, possible to designate precisely which quantities you want to check using the funcs_to_check option. This both includes selecting a smaller subset of parameters and variables to check, or defining customised expressions. Let us consider the Goodwind from previously, and say that we would like to check whether the production parameters ($pₘ$, $pₑ$, and $pₚ$) and the total amount of the three species ($P + M + E$) are identifiable quantities. Here, we would first unpack these (allowing us to form algebraic expressions) and then use the following code:

@unpack pₘ, pₑ, pₚ, M, E, P = gwo
assess_identifiability(gwo; measured_quantities = [:M], funcs_to_check = [pₘ, pₑ, pₚ, M + E + P], loglevel = Logging.Error)

Probability of correctness

The identifiability methods used can, in theory, produce erroneous results. However, it is possible to adjust the lower bound for the probability of correctness using the argument prob_threshold (by default set to 0.99, that is, at least a $99\%$ chance of correctness). We can e.g. increase the bound through:

assess_identifiability(gwo; measured_quantities=[:M], prob_threshold = 0.999, loglevel = Logging.Error)

giving a minimum bound of $99.9\%$ chance of correctness. In practise, the bounds used by StructuralIdentifiability are very conservative, which means that while the minimum guaranteed probability of correctness in the default case is $99\%$, in practise it is much higher. While increasing the value of prob_threshold increases the certainty of correctness, it will also increase the time required to assess identifiability.

Local identifiability analysis

Local identifiability can be assessed through the assess_local_identifiability function. While this is already determined by assess_identifiability, assessing local identifiability only has the advantage that it is easier to compute. Hence, there might be models where global identifiability analysis fails (or takes a prohibitively long time), where instead assess_local_identifiability can be used. This function takes the same inputs as assess_identifiability and returns, for each quantity, true if it is locally identifiable (or false if it is not). Here, for the Goodwind oscillator, we assesses it for local identifiability only:

assess_local_identifiability(gwo; measured_quantities = [:M], loglevel = Logging.Error)

OrderedCollections.OrderedDict{SymbolicUtils.BasicSymbolic{Real}, Bool} with 9 entries:
  M(t) => 1
  E(t) => 0
  P(t) => 1
  pₘ   => 1
  dₘ   => 1
  pₑ   => 0
  dₑ   => 1
  pₚ   => 0
  dₚ   => 1

We note that the results are consistent with those produced by assess_identifiability (with globally or locally identifiable quantities here all being assessed as at least locally identifiable).

Finding identifiable functions

Finally, StructuralIdentifiability provides the find_identifiable_functions function. Rather than determining the identifiability of each parameter and unknown of the model, it finds a set of identifiable functions, such as any other identifiable expression of the model can be generated by these. Let us again consider the Goodwind oscillator, using the find_identifiable_functions function we find that identifiability can be reduced to five globally identifiable expressions:

find_identifiable_functions(gwo; measured_quantities = [:M], loglevel = Logging.Error)

\[ \begin{equation} \left[ \begin{array}{c} \mathtt{d{_m}} \\ \mathtt{p{_m}} \\ \mathtt{d{_e}} + \mathtt{d{_p}} \\ \mathtt{d{_e}} \mathtt{d{_p}} \\ \mathtt{p{_e}} \mathtt{p{_p}} \\ \end{array} \right] \end{equation} \]

Again, these results are consistent with those produced by assess_identifiability. There, pₑ and pₚ where found to be globally identifiable. Here, they correspond directly to identifiable expressions. The remaining four parameters (pₘ, dₘ, dₑ, and dₚ) occur as part of more complicated composite expressions.

find_identifiable_functions tries to simplify its output functions to create nice expressions. The degree to which it does this can be adjusted using the simplify keywords. Using the :weak, :standard (default), and :strong arguments, increased simplification can be forced (at the expense of longer runtime).

Creating StructuralIdentifiability compatible ODE models from Catalyst `ReactionSystem`s

While the functionality described above covers the vast majority of analysis that user might want to perform, the StructuralIdentifiability package supports several additional features. While these does not have inherent Catalyst support, we do provide the make_si_ode function to simplify their use. Similar to the previous functions, it takes a ReactionSystem, lists of measured quantities, and known parameter values. The output is a ODE of the standard form supported by StructuralIdentifiability. It can be created using the following syntax:

si_ode = make_si_ode(gwo; measured_quantities = [:M])

and then used as input to various StructuralIdentifiability functions. In the following example we use StructuralIdentifiability's print_for_DAISY function, printing the model as an expression that can be used by the DAISY software for identifiability analysis^[3].

print_for_DAISY(si_ode)

Notes on systems with conservation laws

Several reaction network models, such as

rs = @reaction_network begin
    (k1,k2), X1 <--> X2
end

\[ \begin{align*} \mathrm{\mathtt{X1}} &\xrightleftharpoons[\mathtt{k2}]{\mathtt{k1}} \mathrm{\mathtt{X2}} \end{align*} \]

contain conservation laws (in this case $Γ = X1 + X2$, where $Γ = X1(0) + X2(0)$ is a constant). Because the presence of such conservation laws makes structural identifiability analysis prohibitively computationally expensive (for all but the simplest of cases), these are automatically eliminated by Catalyst. This is handled internally, and should not be noticeable to the user. The exception is the make_si_ode function. For each conservation law, its output will have one ODE removed, and instead have a conservation parameter (of the form Γ[i]) added to its equations. This feature can be disabled through the remove_conserved = false option.

Systems with exponent parameters

Structural identifiability cannot currently be applied to systems with parameters (or species) in exponents. E.g. this

rn_si_impossible = @reaction_network begin
    (hill(X,v,K,n),d), 0 <--> X
end
assess_identifiability(rn; measured_quantities = [:X])

is currently not possible. Hopefully this will be a supported feature in the future. For now, such expressions will have to be rewritten to not include such exponents. For some cases, e.g. 10^k this is trivial. However, it is also possible generally (but more involved and often includes introducing additional variables).

Citation

If you use this functionality in your research, please cite the following paper to support the authors of the StructuralIdentifiability package:

@article{structidjl,
  author  = {Dong, R. and Goodbrake, C. and Harrington, H. and Pogudin G.},
  title   = {Differential Elimination for Dynamical Models via Projections with Applications to Structural Identifiability},
  journal = {SIAM Journal on Applied Algebra and Geometry},
  url     = {https://doi.org/10.1137/22M1469067},
  year    = {2023}
  volume  = {7},
  number  = {1},
  pages   = {194-235}
}