Exposing More Parallelism By Tearing Algebraic Equations in Systems

Sometimes it can be very non-trivial to parallelize a system. In this tutorial, we will demonstrate how to make use of mtkcompile to expose more parallelism in the solution process and parallelize the resulting simulation.

The Component Library

The following tutorial will use the following set of components describing electrical circuits:

using ModelingToolkit, OrdinaryDiffEq
using ModelingToolkit: t_nounits as t, D_nounits as D

# Basic electric components
@connector function Pin(; name)
    @variables v(t)=1.0 i(t)=1.0 [connect=Flow]
    System(Equation[], t, [v, i], [], name = name)
end

function Ground(; name)
    @named g = Pin()
    eqs = [g.v ~ 0]
    compose(System(eqs, t, [], [], name = name), g)
end

function ConstantVoltage(; name, V = 1.0)
    val = V
    @named p = Pin()
    @named n = Pin()
    @parameters V = V
    eqs = [V ~ p.v - n.v
           0 ~ p.i + n.i]
    compose(System(eqs, t, [], [V], name = name), p, n)
end

@connector function HeatPort(; name)
    @variables T(t)=293.15 Q_flow(t)=0.0 [connect=Flow]
    System(Equation[], t, [T, Q_flow], [], name = name)
end

function HeatingResistor(; name, R = 1.0, TAmbient = 293.15, alpha = 1.0)
    @named p = Pin()
    @named n = Pin()
    @named h = HeatPort()
    @variables v(t) RTherm(t)
    @parameters R=R TAmbient=TAmbient alpha=alpha
    eqs = [RTherm ~ R * (1 + alpha * (h.T - TAmbient))
           v ~ p.i * RTherm
           h.Q_flow ~ -v * p.i # -LossPower
           v ~ p.v - n.v
           0 ~ p.i + n.i]
    compose(System(eqs, t, [v, RTherm], [R, TAmbient, alpha],
            name = name), p, n, h)
end

function HeatCapacitor(; name, rho = 8050, V = 1, cp = 460, TAmbient = 293.15)
    @parameters rho=rho V=V cp=cp
    C = rho * V * cp
    @named h = HeatPort()
    eqs = [
        D(h.T) ~ h.Q_flow / C
    ]
    compose(System(eqs, t, [], [rho, V, cp],
            name = name), h)
end

function Capacitor(; name, C = 1.0)
    @named p = Pin()
    @named n = Pin()
    @variables v(t) = 0.0
    @parameters C = C
    eqs = [v ~ p.v - n.v
           0 ~ p.i + n.i
           D(v) ~ p.i / C]
    compose(System(eqs, t, [v], [C],
            name = name), p, n)
end

function parallel_rc_model(i; name, source, ground, R, C)
    resistor = HeatingResistor(name = Symbol(:resistor, i), R = R)
    capacitor = Capacitor(name = Symbol(:capacitor, i), C = C)
    heat_capacitor = HeatCapacitor(name = Symbol(:heat_capacitor, i))

    rc_eqs = [connect(source.p, resistor.p)
              connect(resistor.n, capacitor.p)
              connect(capacitor.n, source.n, ground.g)
              connect(resistor.h, heat_capacitor.h)]

    compose(System(rc_eqs, t, name = Symbol(name, i)),
        [resistor, capacitor, source, ground, heat_capacitor])
end
parallel_rc_model (generic function with 1 method)

The Model

Assuming that the components are defined, our model is 50 resistors and capacitors connected in parallel. Thus following the acausal components tutorial, we can connect a bunch of RC components as follows:

V = 2.0
@named source = ConstantVoltage(V = V)
@named ground = Ground()
N = 50
Rs = 10 .^ range(0, stop = -4, length = N)
Cs = 10 .^ range(-3, stop = 0, length = N)
rc_systems = map(1:N) do i
    parallel_rc_model(i; name = :rc, source = source, ground = ground, R = Rs[i], C = Cs[i])
end;
@variables E(t) = 0.0
eqs = [
    D(E) ~ sum(((i, sys),) -> getproperty(sys, Symbol(:resistor, i)).h.Q_flow,
    enumerate(rc_systems))
]
@named _big_rc = System(eqs, t, [E], [])
@named big_rc = compose(_big_rc, rc_systems)
Model big_rc:
Subsystems (50): see hierarchy(big_rc)
  rc1
  rc2
  rc3
  rc4
  ⋮
Equations (1251):
  801 standard: see equations(big_rc)
  450 connecting: see equations(expand_connections(big_rc))
Unknowns (1051): see unknowns(big_rc)
  E(t)
  rc1₊resistor1₊v(t)
  rc1₊resistor1₊RTherm(t)
  rc1₊resistor1₊p₊v(t)
  ⋮
Parameters (400): see parameters(big_rc)
  rc1₊resistor1₊R
  rc1₊resistor1₊TAmbient
  rc1₊resistor1₊alpha
  rc1₊capacitor1₊C
  ⋮

Now let's say we want to expose a bit more parallelism via running tearing. How do we do that?

sys = mtkcompile(big_rc)
Model big_rc:
Equations (151):
  151 standard: see equations(big_rc)
Unknowns (151): see unknowns(big_rc)
  rc1₊resistor1₊p₊i(t)
  rc2₊resistor2₊p₊i(t)
  rc3₊resistor3₊p₊i(t)
  rc4₊resistor4₊p₊i(t)
  ⋮
Parameters (400): see parameters(big_rc)
  rc1₊resistor1₊R
  rc1₊resistor1₊TAmbient
  rc1₊resistor1₊alpha
  rc1₊capacitor1₊C
  ⋮
Observed (900): see observed(big_rc)

Done, that's it. There's no more to it.

What Happened?

Yes, that's a good question! Let's investigate a little bit more what had happened. If you look at the system we defined:

length(equations(big_rc))
801

You see, it started as a massive 1051 set of equations. However, after eliminating redundancies, we arrive at 151 equations:

equations(sys)
151-element Vector{Equation}:
 0 ~ -rc50₊resistor50₊v(t) + rc50₊resistor50₊RTherm(t)*rc50₊resistor50₊p₊i(t)
 0 ~ -rc49₊resistor49₊v(t) + rc49₊resistor49₊RTherm(t)*rc49₊resistor49₊p₊i(t)
 0 ~ -rc48₊resistor48₊v(t) + rc48₊resistor48₊p₊i(t)*rc48₊resistor48₊RTherm(t)
 0 ~ -rc47₊resistor47₊v(t) + rc47₊resistor47₊p₊i(t)*rc47₊resistor47₊RTherm(t)
 0 ~ -rc46₊resistor46₊v(t) + rc46₊resistor46₊RTherm(t)*rc46₊resistor46₊p₊i(t)
 0 ~ -rc45₊resistor45₊v(t) + rc45₊resistor45₊p₊i(t)*rc45₊resistor45₊RTherm(t)
 0 ~ -rc44₊resistor44₊v(t) + rc44₊resistor44₊RTherm(t)*rc44₊resistor44₊p₊i(t)
 0 ~ -rc43₊resistor43₊v(t) + rc43₊resistor43₊p₊i(t)*rc43₊resistor43₊RTherm(t)
 0 ~ -rc42₊resistor42₊v(t) + rc42₊resistor42₊p₊i(t)*rc42₊resistor42₊RTherm(t)
 0 ~ -rc41₊resistor41₊v(t) + rc41₊resistor41₊RTherm(t)*rc41₊resistor41₊p₊i(t)
 ⋮
 Differential(t, 1)(rc4₊heat_capacitor4₊h₊T(t)) ~ rc4₊heat_capacitor4₊h₊Q_flow(t) / (rc4₊heat_capacitor4₊V*rc4₊heat_capacitor4₊cp*rc4₊heat_capacitor4₊rho)
 Differential(t, 1)(rc4₊capacitor4₊v(t)) ~ rc4₊capacitor4₊p₊i(t) / rc4₊capacitor4₊C
 Differential(t, 1)(rc3₊heat_capacitor3₊h₊T(t)) ~ rc3₊heat_capacitor3₊h₊Q_flow(t) / (rc3₊heat_capacitor3₊V*rc3₊heat_capacitor3₊cp*rc3₊heat_capacitor3₊rho)
 Differential(t, 1)(rc3₊capacitor3₊v(t)) ~ rc3₊capacitor3₊p₊i(t) / rc3₊capacitor3₊C
 Differential(t, 1)(rc2₊heat_capacitor2₊h₊T(t)) ~ rc2₊heat_capacitor2₊h₊Q_flow(t) / (rc2₊heat_capacitor2₊V*rc2₊heat_capacitor2₊cp*rc2₊heat_capacitor2₊rho)
 Differential(t, 1)(rc2₊capacitor2₊v(t)) ~ rc2₊capacitor2₊p₊i(t) / rc2₊capacitor2₊C
 Differential(t, 1)(rc1₊heat_capacitor1₊h₊T(t)) ~ rc1₊heat_capacitor1₊h₊Q_flow(t) / (rc1₊heat_capacitor1₊V*rc1₊heat_capacitor1₊cp*rc1₊heat_capacitor1₊rho)
 Differential(t, 1)(rc1₊capacitor1₊v(t)) ~ rc1₊capacitor1₊p₊i(t) / rc1₊capacitor1₊C
 Differential(t, 1)(E(t)) ~ -rc49₊heat_capacitor49₊h₊Q_flow(t) - rc16₊heat_capacitor16₊h₊Q_flow(t) - rc27₊heat_capacitor27₊h₊Q_flow(t) - rc28₊heat_capacitor28₊h₊Q_flow(t) - rc32₊heat_capacitor32₊h₊Q_flow(t) - rc37₊heat_capacitor37₊h₊Q_flow(t) - rc43₊heat_capacitor43₊h₊Q_flow(t) - rc1₊heat_capacitor1₊h₊Q_flow(t) - rc4₊heat_capacitor4₊h₊Q_flow(t) - rc29₊heat_capacitor29₊h₊Q_flow(t) - rc13₊heat_capacitor13₊h₊Q_flow(t) - rc10₊heat_capacitor10₊h₊Q_flow(t) - rc36₊heat_capacitor36₊h₊Q_flow(t) - rc22₊heat_capacitor22₊h₊Q_flow(t) - rc46₊heat_capacitor46₊h₊Q_flow(t) - rc5₊heat_capacitor5₊h₊Q_flow(t) - rc38₊heat_capacitor38₊h₊Q_flow(t) - rc8₊heat_capacitor8₊h₊Q_flow(t) - rc25₊heat_capacitor25₊h₊Q_flow(t) - rc20₊heat_capacitor20₊h₊Q_flow(t) - rc48₊heat_capacitor48₊h₊Q_flow(t) - rc18₊heat_capacitor18₊h₊Q_flow(t) - rc12₊heat_capacitor12₊h₊Q_flow(t) - rc17₊heat_capacitor17₊h₊Q_flow(t) - rc26₊heat_capacitor26₊h₊Q_flow(t) - rc45₊heat_capacitor45₊h₊Q_flow(t) - rc3₊heat_capacitor3₊h₊Q_flow(t) - rc35₊heat_capacitor35₊h₊Q_flow(t) - rc15₊heat_capacitor15₊h₊Q_flow(t) - rc34₊heat_capacitor34₊h₊Q_flow(t) - rc7₊heat_capacitor7₊h₊Q_flow(t) - rc39₊heat_capacitor39₊h₊Q_flow(t) - rc33₊heat_capacitor33₊h₊Q_flow(t) - rc30₊heat_capacitor30₊h₊Q_flow(t) - rc24₊heat_capacitor24₊h₊Q_flow(t) - rc47₊heat_capacitor47₊h₊Q_flow(t) - rc21₊heat_capacitor21₊h₊Q_flow(t) - rc40₊heat_capacitor40₊h₊Q_flow(t) - rc42₊heat_capacitor42₊h₊Q_flow(t) - rc2₊heat_capacitor2₊h₊Q_flow(t) - rc11₊heat_capacitor11₊h₊Q_flow(t) - rc31₊heat_capacitor31₊h₊Q_flow(t) - rc23₊heat_capacitor23₊h₊Q_flow(t) - rc19₊heat_capacitor19₊h₊Q_flow(t) - rc50₊heat_capacitor50₊h₊Q_flow(t) - rc44₊heat_capacitor44₊h₊Q_flow(t) - rc9₊heat_capacitor9₊h₊Q_flow(t) - rc14₊heat_capacitor14₊h₊Q_flow(t) - rc41₊heat_capacitor41₊h₊Q_flow(t) - rc6₊heat_capacitor6₊h₊Q_flow(t)

That's not all though. In addition, the tearing process has turned the sets of nonlinear equations into separate blocks and constructed a DAG for the dependencies between the blocks. We can use the bipartite graph functionality to dig in and investigate what this means:

using ModelingToolkit.BipartiteGraphs
ts = TearingState(expand_connections(big_rc))
inc_org = BipartiteGraphs.incidence_matrix(ts.structure.graph)
# Note: sorted_incidence_matrix requires the system and matrix dimensions to match
# blt_reduced = StructuralTransformations.sorted_incidence_matrix(sys)
1051×1152 SparseArrays.SparseMatrixCSC{Bool, Int64} with 2351 stored entries:
⎡⠀⠀⠀⠀⢳⠀⠀⢰⠃⠀⠀⠀⠀⠀⠛⠶⣄⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠀⠀⢀⠸⡀⠀⢠⣪⡄⠀⠀⠀⠀⠀⠀⠀⠙⠳⢦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⢆⠀⠑⡅⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠻⠦⣄⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠐⠀⢸⠀⠆⢑⠒⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠻⢦⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⢀⠀⠀⠇⢰⠡⠀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠳⢦⣀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠈⠀⠀⢱⠔⡡⠀⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠓⠦⣄⣀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠠⡀⠀⠘⡣⢐⠀⠠⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠳⣦⠀⠀⎥
⎢⠀⠀⠀⠀⢆⠀⠀⢋⠆⠀⠀⠈⠙⠲⢦⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡄⠀⎥
⎢⠀⠀⠀⠀⢸⡀⠀⢈⢔⠀⠀⠀⠀⠀⠀⠈⠙⠳⢦⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢱⠀⎥
⎢⠀⠀⠀⠈⠀⡆⠀⢔⠝⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠲⣤⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢹⠀⎥
⎢⠀⠀⠀⠀⠀⢣⠀⡀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠓⢦⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⠀⎥
⎢⠀⠀⠀⠀⠀⢰⠀⠆⢨⠈⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠶⣤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⡇⎥
⎢⠀⠀⠀⠀⠀⠀⡃⢸⠡⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠳⢦⣄⠀⠀⠀⠀⠀⠀⠀⡇⎥
⎢⠀⠀⠀⠀⠀⠀⢳⡰⢁⠀⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⠳⣤⣀⠀⠀⠀⠁⠧⎥
⎢⠀⠀⠀⢀⠀⠀⠐⣆⠡⠀⢀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠳⢦⡄⡀⠸⎥
⎢⠀⠀⠀⠀⡆⠀⠀⡥⠘⠀⠈⠳⣦⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⡇⠈⎥
⎢⡍⢢⡀⠀⠀⠀⠀⠉⠉⢄⡀⠀⠀⠈⠉⠓⠒⠒⠒⠢⠤⠤⠤⣄⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⎥
⎣⢦⠀⠙⠦⠀⠀⠀⠀⠀⠀⠲⠤⣀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠉⠑⠒⠒⠒⠤⠤⠤⠄⠀⎦

The figure on the left is the original incidence matrix of the algebraic equations. Notice that the original formulation of the model has dependencies between different equations, and so the full set of equations must be solved together. That exposes no parallelism. However, the Block Lower Triangular (BLT) transformation exposes independent blocks. This is then further improved by the tearing process, which removes 90% of the equations and transforms the nonlinear equations into 50 independent blocks, which can now all be solved in parallel. The conclusion is that, your attempts to parallelize are neigh: performing parallelism after structural simplification greatly improves the problem that can be parallelized, so this is better than trying to do it by hand.

After performing this, you can construct the ODEProblem and set parallel_form to use the exposed parallelism in multithreaded function constructions, but this showcases why mtkcompile is so important to that process.