Exposing More Parallelism By Tearing Algebraic Equations in Systems
Sometimes it can be very non-trivial to parallelize a system. In this tutorial, we will demonstrate how to make use of mtkcompile to expose more parallelism in the solution process and parallelize the resulting simulation.
The Component Library
The following tutorial will use the following set of components describing electrical circuits:
using ModelingToolkit, OrdinaryDiffEq
using ModelingToolkit: t_nounits as t, D_nounits as D
# Basic electric components
@connector function Pin(; name)
@variables v(t)=1.0 i(t)=1.0 [connect=Flow]
System(Equation[], t, [v, i], [], name = name)
end
function Ground(; name)
@named g = Pin()
eqs = [g.v ~ 0]
compose(System(eqs, t, [], [], name = name), g)
end
function ConstantVoltage(; name, V = 1.0)
val = V
@named p = Pin()
@named n = Pin()
@parameters V = V
eqs = [V ~ p.v - n.v
0 ~ p.i + n.i]
compose(System(eqs, t, [], [V], name = name), p, n)
end
@connector function HeatPort(; name)
@variables T(t)=293.15 Q_flow(t)=0.0 [connect=Flow]
System(Equation[], t, [T, Q_flow], [], name = name)
end
function HeatingResistor(; name, R = 1.0, TAmbient = 293.15, alpha = 1.0)
@named p = Pin()
@named n = Pin()
@named h = HeatPort()
@variables v(t) RTherm(t)
@parameters R=R TAmbient=TAmbient alpha=alpha
eqs = [RTherm ~ R * (1 + alpha * (h.T - TAmbient))
v ~ p.i * RTherm
h.Q_flow ~ -v * p.i # -LossPower
v ~ p.v - n.v
0 ~ p.i + n.i]
compose(System(eqs, t, [v, RTherm], [R, TAmbient, alpha],
name = name), p, n, h)
end
function HeatCapacitor(; name, rho = 8050, V = 1, cp = 460, TAmbient = 293.15)
@parameters rho=rho V=V cp=cp
C = rho * V * cp
@named h = HeatPort()
eqs = [
D(h.T) ~ h.Q_flow / C
]
compose(System(eqs, t, [], [rho, V, cp],
name = name), h)
end
function Capacitor(; name, C = 1.0)
@named p = Pin()
@named n = Pin()
@variables v(t) = 0.0
@parameters C = C
eqs = [v ~ p.v - n.v
0 ~ p.i + n.i
D(v) ~ p.i / C]
compose(System(eqs, t, [v], [C],
name = name), p, n)
end
function parallel_rc_model(i; name, source, ground, R, C)
resistor = HeatingResistor(name = Symbol(:resistor, i), R = R)
capacitor = Capacitor(name = Symbol(:capacitor, i), C = C)
heat_capacitor = HeatCapacitor(name = Symbol(:heat_capacitor, i))
rc_eqs = [connect(source.p, resistor.p)
connect(resistor.n, capacitor.p)
connect(capacitor.n, source.n, ground.g)
connect(resistor.h, heat_capacitor.h)]
compose(System(rc_eqs, t, name = Symbol(name, i)),
[resistor, capacitor, source, ground, heat_capacitor])
endparallel_rc_model (generic function with 1 method)The Model
Assuming that the components are defined, our model is 50 resistors and capacitors connected in parallel. Thus following the acausal components tutorial, we can connect a bunch of RC components as follows:
V = 2.0
@named source = ConstantVoltage(V = V)
@named ground = Ground()
N = 50
Rs = 10 .^ range(0, stop = -4, length = N)
Cs = 10 .^ range(-3, stop = 0, length = N)
rc_systems = map(1:N) do i
parallel_rc_model(i; name = :rc, source = source, ground = ground, R = Rs[i], C = Cs[i])
end;
@variables E(t) = 0.0
eqs = [
D(E) ~ sum(((i, sys),) -> getproperty(sys, Symbol(:resistor, i)).h.Q_flow,
enumerate(rc_systems))
]
@named _big_rc = System(eqs, t, [E], [])
@named big_rc = compose(_big_rc, rc_systems)Model big_rc:
Subsystems (50): see hierarchy(big_rc)
rc1
rc2
rc3
rc4
⋮
Equations (1251):
801 standard: see equations(big_rc)
450 connecting: see equations(expand_connections(big_rc))
Unknowns (1051): see unknowns(big_rc)
E(t)
rc1₊resistor1₊v(t)
rc1₊resistor1₊RTherm(t)
rc1₊resistor1₊p₊v(t)
⋮
Parameters (400): see parameters(big_rc)
rc1₊resistor1₊R
rc1₊resistor1₊TAmbient
rc1₊resistor1₊alpha
rc1₊capacitor1₊C
⋮Now let's say we want to expose a bit more parallelism via running tearing. How do we do that?
sys = mtkcompile(big_rc)Model big_rc:
Equations (151):
151 standard: see equations(big_rc)
Unknowns (151): see unknowns(big_rc)
rc1₊resistor1₊p₊i(t)
rc2₊resistor2₊p₊i(t)
rc3₊resistor3₊p₊i(t)
rc4₊resistor4₊p₊i(t)
⋮
Parameters (400): see parameters(big_rc)
rc1₊resistor1₊R
rc1₊resistor1₊TAmbient
rc1₊resistor1₊alpha
rc1₊capacitor1₊C
⋮
Observed (900): see observed(big_rc)Done, that's it. There's no more to it.
What Happened?
Yes, that's a good question! Let's investigate a little bit more what had happened. If you look at the system we defined:
length(equations(big_rc))801You see, it started as a massive 1051 set of equations. However, after eliminating redundancies, we arrive at 151 equations:
equations(sys)151-element Vector{Equation}:
0 ~ -rc50₊resistor50₊v(t) + rc50₊resistor50₊RTherm(t)*rc50₊resistor50₊p₊i(t)
0 ~ -rc49₊resistor49₊v(t) + rc49₊resistor49₊RTherm(t)*rc49₊resistor49₊p₊i(t)
0 ~ -rc48₊resistor48₊v(t) + rc48₊resistor48₊p₊i(t)*rc48₊resistor48₊RTherm(t)
0 ~ -rc47₊resistor47₊v(t) + rc47₊resistor47₊p₊i(t)*rc47₊resistor47₊RTherm(t)
0 ~ -rc46₊resistor46₊v(t) + rc46₊resistor46₊RTherm(t)*rc46₊resistor46₊p₊i(t)
0 ~ -rc45₊resistor45₊v(t) + rc45₊resistor45₊p₊i(t)*rc45₊resistor45₊RTherm(t)
0 ~ -rc44₊resistor44₊v(t) + rc44₊resistor44₊RTherm(t)*rc44₊resistor44₊p₊i(t)
0 ~ -rc43₊resistor43₊v(t) + rc43₊resistor43₊p₊i(t)*rc43₊resistor43₊RTherm(t)
0 ~ -rc42₊resistor42₊v(t) + rc42₊resistor42₊p₊i(t)*rc42₊resistor42₊RTherm(t)
0 ~ -rc41₊resistor41₊v(t) + rc41₊resistor41₊RTherm(t)*rc41₊resistor41₊p₊i(t)
⋮
Differential(t, 1)(rc4₊heat_capacitor4₊h₊T(t)) ~ rc4₊heat_capacitor4₊h₊Q_flow(t) / (rc4₊heat_capacitor4₊V*rc4₊heat_capacitor4₊cp*rc4₊heat_capacitor4₊rho)
Differential(t, 1)(rc4₊capacitor4₊v(t)) ~ rc4₊capacitor4₊p₊i(t) / rc4₊capacitor4₊C
Differential(t, 1)(rc3₊heat_capacitor3₊h₊T(t)) ~ rc3₊heat_capacitor3₊h₊Q_flow(t) / (rc3₊heat_capacitor3₊V*rc3₊heat_capacitor3₊cp*rc3₊heat_capacitor3₊rho)
Differential(t, 1)(rc3₊capacitor3₊v(t)) ~ rc3₊capacitor3₊p₊i(t) / rc3₊capacitor3₊C
Differential(t, 1)(rc2₊heat_capacitor2₊h₊T(t)) ~ rc2₊heat_capacitor2₊h₊Q_flow(t) / (rc2₊heat_capacitor2₊V*rc2₊heat_capacitor2₊cp*rc2₊heat_capacitor2₊rho)
Differential(t, 1)(rc2₊capacitor2₊v(t)) ~ rc2₊capacitor2₊p₊i(t) / rc2₊capacitor2₊C
Differential(t, 1)(rc1₊heat_capacitor1₊h₊T(t)) ~ rc1₊heat_capacitor1₊h₊Q_flow(t) / (rc1₊heat_capacitor1₊V*rc1₊heat_capacitor1₊cp*rc1₊heat_capacitor1₊rho)
Differential(t, 1)(rc1₊capacitor1₊v(t)) ~ rc1₊capacitor1₊p₊i(t) / rc1₊capacitor1₊C
Differential(t, 1)(E(t)) ~ -rc49₊heat_capacitor49₊h₊Q_flow(t) - rc16₊heat_capacitor16₊h₊Q_flow(t) - rc27₊heat_capacitor27₊h₊Q_flow(t) - rc28₊heat_capacitor28₊h₊Q_flow(t) - rc32₊heat_capacitor32₊h₊Q_flow(t) - rc37₊heat_capacitor37₊h₊Q_flow(t) - rc43₊heat_capacitor43₊h₊Q_flow(t) - rc1₊heat_capacitor1₊h₊Q_flow(t) - rc4₊heat_capacitor4₊h₊Q_flow(t) - rc29₊heat_capacitor29₊h₊Q_flow(t) - rc13₊heat_capacitor13₊h₊Q_flow(t) - rc10₊heat_capacitor10₊h₊Q_flow(t) - rc36₊heat_capacitor36₊h₊Q_flow(t) - rc22₊heat_capacitor22₊h₊Q_flow(t) - rc46₊heat_capacitor46₊h₊Q_flow(t) - rc5₊heat_capacitor5₊h₊Q_flow(t) - rc38₊heat_capacitor38₊h₊Q_flow(t) - rc8₊heat_capacitor8₊h₊Q_flow(t) - rc25₊heat_capacitor25₊h₊Q_flow(t) - rc20₊heat_capacitor20₊h₊Q_flow(t) - rc48₊heat_capacitor48₊h₊Q_flow(t) - rc18₊heat_capacitor18₊h₊Q_flow(t) - rc12₊heat_capacitor12₊h₊Q_flow(t) - rc17₊heat_capacitor17₊h₊Q_flow(t) - rc26₊heat_capacitor26₊h₊Q_flow(t) - rc45₊heat_capacitor45₊h₊Q_flow(t) - rc3₊heat_capacitor3₊h₊Q_flow(t) - rc35₊heat_capacitor35₊h₊Q_flow(t) - rc15₊heat_capacitor15₊h₊Q_flow(t) - rc34₊heat_capacitor34₊h₊Q_flow(t) - rc7₊heat_capacitor7₊h₊Q_flow(t) - rc39₊heat_capacitor39₊h₊Q_flow(t) - rc33₊heat_capacitor33₊h₊Q_flow(t) - rc30₊heat_capacitor30₊h₊Q_flow(t) - rc24₊heat_capacitor24₊h₊Q_flow(t) - rc47₊heat_capacitor47₊h₊Q_flow(t) - rc21₊heat_capacitor21₊h₊Q_flow(t) - rc40₊heat_capacitor40₊h₊Q_flow(t) - rc42₊heat_capacitor42₊h₊Q_flow(t) - rc2₊heat_capacitor2₊h₊Q_flow(t) - rc11₊heat_capacitor11₊h₊Q_flow(t) - rc31₊heat_capacitor31₊h₊Q_flow(t) - rc23₊heat_capacitor23₊h₊Q_flow(t) - rc19₊heat_capacitor19₊h₊Q_flow(t) - rc50₊heat_capacitor50₊h₊Q_flow(t) - rc44₊heat_capacitor44₊h₊Q_flow(t) - rc9₊heat_capacitor9₊h₊Q_flow(t) - rc14₊heat_capacitor14₊h₊Q_flow(t) - rc41₊heat_capacitor41₊h₊Q_flow(t) - rc6₊heat_capacitor6₊h₊Q_flow(t)That's not all though. In addition, the tearing process has turned the sets of nonlinear equations into separate blocks and constructed a DAG for the dependencies between the blocks. We can use the bipartite graph functionality to dig in and investigate what this means:
using ModelingToolkit.BipartiteGraphs
ts = TearingState(expand_connections(big_rc))
inc_org = BipartiteGraphs.incidence_matrix(ts.structure.graph)
# Note: sorted_incidence_matrix requires the system and matrix dimensions to match
# blt_reduced = StructuralTransformations.sorted_incidence_matrix(sys)1051×1152 SparseArrays.SparseMatrixCSC{Bool, Int64} with 2351 stored entries:
⎡⠀⠀⠀⠀⢳⠀⠀⢰⠃⠀⠀⠀⠀⠀⠛⠶⣄⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠀⠀⢀⠸⡀⠀⢠⣪⡄⠀⠀⠀⠀⠀⠀⠀⠙⠳⢦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⢆⠀⠑⡅⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠻⠦⣄⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠐⠀⢸⠀⠆⢑⠒⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠻⢦⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⢀⠀⠀⠇⢰⠡⠀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠳⢦⣀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠈⠀⠀⢱⠔⡡⠀⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠓⠦⣄⣀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠠⡀⠀⠘⡣⢐⠀⠠⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠳⣦⠀⠀⎥
⎢⠀⠀⠀⠀⢆⠀⠀⢋⠆⠀⠀⠈⠙⠲⢦⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡄⠀⎥
⎢⠀⠀⠀⠀⢸⡀⠀⢈⢔⠀⠀⠀⠀⠀⠀⠈⠙⠳⢦⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢱⠀⎥
⎢⠀⠀⠀⠈⠀⡆⠀⢔⠝⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠲⣤⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢹⠀⎥
⎢⠀⠀⠀⠀⠀⢣⠀⡀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠓⢦⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⠀⎥
⎢⠀⠀⠀⠀⠀⢰⠀⠆⢨⠈⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠶⣤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⡇⎥
⎢⠀⠀⠀⠀⠀⠀⡃⢸⠡⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠳⢦⣄⠀⠀⠀⠀⠀⠀⠀⡇⎥
⎢⠀⠀⠀⠀⠀⠀⢳⡰⢁⠀⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⠳⣤⣀⠀⠀⠀⠁⠧⎥
⎢⠀⠀⠀⢀⠀⠀⠐⣆⠡⠀⢀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠳⢦⡄⡀⠸⎥
⎢⠀⠀⠀⠀⡆⠀⠀⡥⠘⠀⠈⠳⣦⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⡇⠈⎥
⎢⡍⢢⡀⠀⠀⠀⠀⠉⠉⢄⡀⠀⠀⠈⠉⠓⠒⠒⠒⠢⠤⠤⠤⣄⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⎥
⎣⢦⠀⠙⠦⠀⠀⠀⠀⠀⠀⠲⠤⣀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠉⠑⠒⠒⠒⠤⠤⠤⠄⠀⎦The figure on the left is the original incidence matrix of the algebraic equations. Notice that the original formulation of the model has dependencies between different equations, and so the full set of equations must be solved together. That exposes no parallelism. However, the Block Lower Triangular (BLT) transformation exposes independent blocks. This is then further improved by the tearing process, which removes 90% of the equations and transforms the nonlinear equations into 50 independent blocks, which can now all be solved in parallel. The conclusion is that, your attempts to parallelize are neigh: performing parallelism after structural simplification greatly improves the problem that can be parallelized, so this is better than trying to do it by hand.
After performing this, you can construct the ODEProblem and set parallel_form to use the exposed parallelism in multithreaded function constructions, but this showcases why mtkcompile is so important to that process.