Compute efficiencies

Geochemistry simulations use a large amount of memory since they store information about a complete geochemical system at each finite-element node, and, optionally, populate a huge number of AuxVariables with useful information. On the other hand, ignoring transport, they are almost embarrassingly parallel, so may be solved efficiently using a large number of processors.

This page explores computational aspects of the 2D GeoTES simulation, the Weber-Tensleep GeoTES simulation and the FORGE geothermal simulation which are coupled porous-flow + geochemistry simulations.

Memory usage for pure geochemistry simulations

While spatially-dependent geochemistry simulations are rarely run without transport, it is useful to quantify the amount of memory used by geochemistry models alone (without transport) because memory consumption can easily swamp all other considerations. Table 1 lists the numbers of species in the two simulations used in this section.

Table 1: Size of the two simulations

	2D	Weber-Tensleep
Basis species	4	20
Secondary species	1	166
Minerals	1	15
Total species ( $var element = document.getElementById("moose-equation-58c3f6c2-bc13-4511-a579-8bdb7c9e42d9");katex.render("S", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ )	6	201

The Geochemistry Actions, such as the SpatialReactionSolver, add a large number of AuxVariables by default. These record the species' molality, free mass, free volume, activity, bulk composition, etc, which may be useful but add considerably to the memory consumption. Some examples are quantified in Table 2 and Table 3.

Table 2: Memory used by the 2D geochemistry-only simulation

Number of nodes ( $var element = document.getElementById("moose-equation-c6b4a860-3204-4257-8422-648f81baa007");katex.render("N", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ )	MB with Zero AuxVariables ( $var element = document.getElementById("moose-equation-cffa4b40-2e3e-42f5-a2ca-ef42d34dabf9");katex.render("A=0", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ )	MB with 24 AuxVariables ( $var element = document.getElementById("moose-equation-994a6fd6-c3d5-445c-b8ab-e3d73651f4bb");katex.render("A=24", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ )
1181	56	61
3333	63	71
6633	80	93
13233	114	139
26433	183	229

Table 3: Memory used by the Weber-Tensleep geochemistry-only simulation

Number of nodes ( $var element = document.getElementById("moose-equation-43b92601-6815-43fb-ae09-f29d0bb636d5");katex.render("N", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ )	MB with 6 AuxVariables ( $var element = document.getElementById("moose-equation-4e5ad042-4ccc-40b5-911b-828ef728056c");katex.render("A=6", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ )	MB with 770 AuxVariables ( $var element = document.getElementById("moose-equation-7675214a-ff5f-400c-b96a-5765e11454c0");katex.render("A=770", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ )
480	94	227
930	130	290
1830	204	416
3630	351	667
7230	582	1171

The results are summarised in the approximate formula $var element = document.getElementById("moose-equation-8c4be7c8-af68-4984-aa8e-1b8eb5d11a3e");katex.render("\\mathrm{Memory\\ (MB)} \\approx (46 + 0.003N) + (0.1 + 3.5\\times 10^{-4}N)S + (0.1 + 7\\times 10^{-5}N)A \\ ,", element, {displayMode:true,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ where $var element = document.getElementById("moose-equation-cf9ee0cf-6fab-41e1-9c53-2a6118205f0f");katex.render("N", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ is the number of nodes, $var element = document.getElementById("moose-equation-b1a1b10b-c2ca-469c-95c8-f3c565dabb9b");katex.render("S", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ the number of species ( $var element = document.getElementById("moose-equation-4ad3eaa6-8e37-418c-b0de-322b90c5aabf");katex.render("S=6", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ and $var element = document.getElementById("moose-equation-87675db7-22f0-4516-b2fe-db44866b229c");katex.render("S=201", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ in the two examples) and $var element = document.getElementById("moose-equation-27836737-65b7-4506-a01f-3aebafec5b4e");katex.render("A", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ the number of AuxVariables.

The first term is largely independent of geochemistry so is not too important here.
The second term depends on the size of the geochemical system, $var element = document.getElementById("moose-equation-8cd4f757-4fd4-4c37-af41-b4e6ffa11aa5");katex.render("S", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ , and quantifies the cost of keeping track of an entire geochemical at each node, which is approximately $var element = document.getElementById("moose-equation-b7906740-be4b-40ac-a80e-9bf681e3f820");katex.render("350S", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ bytes per node. Recall that $var element = document.getElementById("moose-equation-4c89831b-1ec2-4ab8-b3c7-b2aa180b9c35");katex.render("S", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ is the total number of species in the geochemical model.
The third term depends on the size of the Auxiliary system, which is approximately $var element = document.getElementById("moose-equation-dc146751-1143-49db-a087-a5669a327686");katex.render("70A", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ bytes per node. Recall that this is about $var element = document.getElementById("moose-equation-1fcbcaff-f89c-478b-ae82-f52dbcb3285d");katex.render("4S", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ if all the default AuxVariables are added by the Action.

Of course, the memory consumption will be dependent on the compiler and architecture, but the above formula should provide an approximate guide.

Compute time for pure geochemistry simulations

The geochemistry-only part of the FORGE geothermal simulation (aquifer_geochemistry.i with 15251 nodes and no exodus output) may be run using a different number of processors in order to demonstrate the computational efficiencies of pure geochemistry simulations that have no transport. (In fact, exactly the same computations are performed at each node in this case, but MOOSE does not know this.) As is evident from Table 4 and Figure 1, geochemical-only simulations scale very well with the number of processors.

Table 4: Compute time (s) for the FORGE geothermal simulation using different numbers of processors

Number of processors	Time required (s)
1	2001
2	1010, 1011, 1010
4	533, 531, 534
6	387, 409, 399
8	298, 313, 317
10	233, 236, 241
16	156, 157, 158
20	132, 144, 135, 132

Figure 1: CPU time required to solve the FORGE geochemistry-only simulation with no transport.

Relative compute-time spent in transport and reactions

It is common that the "reaction" part of reactive-transport simulations proceeds quickly compared with the transport part. This is because the geochemistry module bypasses a lot of MOOSE overhead and because there are no spatial derivatives that require the use of PETSc. Table 5 quantifies this for the FORGE example and an example from the geochemistry test suite. Although this is just one example, the large cost of a full PorousFlow solve is evident. If possible, compute time will be reduced by pre-computing the Darcy velocity and time-dependent temperature field and using these in the GeochemistryTimeDerivative and ConservativeAdvection Kernels, as in the advection_1.i input file.

Table 5: Compute time (s) for various parts of a typical reactive-transport simulation. The mesh in each case has 1886 nodes, and 20 time steps are run using a single processor.

Simulation	Time required (s)	Notes
FORGE `aquifer_geochemistry.i`	14	11 basis species, 14 kinetic minerals, 181056 AuxVariables
FORGE `porous_flow.i`	90	12 Variables, 22632 DoFs, 20746 AuxVariables. The simulation needs approximately 2 Nonlinear iterations per time-step and uses ILU
Above two combined in a MultiApp	97	Enhanced efficiency compared with $var element = document.getElementById("moose-equation-14ce8cb3-6cfe-4767-9e2c-be197ecc9b03");katex.render("90+14", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ because `porous_flow.i` demands small timesteps so `aquifer_geochemistry.i` does little work
Test `advection_1.i` with modified mesh and 12 "conc" variables	28	12 Variables, 22632 DoFs, 3600 AuxVariables. The simulation needs approximately 2 Nonlinear iterations per time-step and uses ILU

Solver choices for simulations coupled with PorousFlow

Geochemistry simulations are often coupled with PorousFlow in order to simulate reactive transport. The previous section demonstrated that the "transport" part of such simulations is likely to be much more expensive than the "reactive" part. Hence, it is useful to explore ways of improving its efficiency. One way is to pre-compute the Darcy velocity and temperature field, and use the GeochemistryTimeDerivative and ConservativeAdvection Kernels. However, this is often not possible. Therefore, this section quantifies the compute time required for a typical PorousFlow simulation.

A 2D example

The FORGE porous_flow.i input file is used with mesh defined by


    type = GeneratedMeshGenerator
    dim = 2
    nx = 135
    ny = 120
    xmin = -400
    xmax = 500
    ymin = -400
    ymax = 400

This leads to 197472 DoFs, which splits nicely over 20 processors to yield about 10000 DoFs per processor. The simulation is run for 20 time-steps, most of which converge in 2 Nonlinear iterations. Table 6 shows that there are a variety of solver and preconditioner choices that provide fairly comparable performance, but that there are some that are quite poor. The results are for only one 2D model, but it is useful to note that hypre, FSP, ASM + ILU and bjacobi all perform similarly, and that complicated features such as the asm_overlap, asm_shift_type, diagonal_scale and diagonal_scale_fix, and very strong preconditioners like MUMPS are not beneficial in these types of simulations, in contrast to multi-phase PorousFlow models.

Table 6: Compute time (s) for 20 time-steps of the 2D porous_flow.i simulations run over 20 processors with about 10000 DoFs per processor. Note that the time required is only accurate to within about 5% due to the vagaries of HPC.

Time required (s)	Preconditioner	Notes
51	hypre	Using the boomeramg type with default options
59	hypre	Using boomeramg and choosing `max_levels = 2`
51	hypre	Using boomeramg and choosing `max_levels >= 4`
51 – 57	hypre	Using boomeramg and varying `max_iter` between 2 (fastest) and 5 (slowest)
51	hypre	Using boomeramg with various levels of `strong_threshold`
53	BJACOBI	With the GMRES KSP method
52	FSP	Fieldsplit with $var element = document.getElementById("moose-equation-43c1732b-9b6f-48e7-8ae9-b046d2be1a7a");katex.render("u", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ being the mass fractions and temperature, and $var element = document.getElementById("moose-equation-8b1bdf66-1bdf-4d26-81da-dd5e64ebe8af");katex.render("p", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ being porepressure, and using hypre + gmres for each subsystem, and `splitting_type` additive or multiplicative
56	ASM + ILU	With the GMRES KSP method. Virtually independent of `asm_overlap`. Not needed: nonzero `shift_type`, `diagonal_scale` and `diagonal_scale_fix`
56	ASM + ILU	With the BCGS KSP method
57	SOR	With default options
58	FSP	Fieldsplit with $var element = document.getElementById("moose-equation-7065e614-2a07-4a06-8714-90d4c90c6b30");katex.render("u", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ being the mass fractions and temperature, and $var element = document.getElementById("moose-equation-16c73ef8-34f5-469c-8dbb-02bb295f825e");katex.render("p", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ being porepressure, and using hypre + gmres for each subsystem, and `splitting_type` schur
94	LU	With the GMRES KSP method
94	LU	Using MUMPS
109	ASM + LU	With the GMRES KSP method
116	ASM + ILU	With the TFQMR KSP method
124	FSP	Fieldsplit with $var element = document.getElementById("moose-equation-18fa560b-10a3-4253-8bf0-e27b3d642c7b");katex.render("u", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ being the mass fractions, and $var element = document.getElementById("moose-equation-819373c3-9faa-4c8b-9ac9-7824a411266b");katex.render("p", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ being porepressure and temperature, and using hypre + gmres for each subsystem

The fieldsplit preconditioning is a little more complicated than simply setting pc_type = hypre, pc_hypre_type = boomeramg. Here is a sample block:


[Preconditioning]
  [FSP]
    type = FSP
    topsplit = 'up' # 'up' should match the following block name
    [up]
      splitting = 'u p' # 'u' and 'p' are the names of subsolvers, below
      splitting_type  = multiplicative
      petsc_options_iname = '-pc_fieldsplit_schur_fact_type  -pc_fieldsplit_schur_precondition'
      petsc_options_value = 'full selfp'
    []
    [u]
      vars = 'f_H f_Na f_K f_Ca f_Mg f_SiO2 f_Al f_Cl f_SO4 f_HCO3 temperature'
      petsc_options_iname = '-pc_type -ksp_type'
      petsc_options_value = ' hypre    gmres'
    []
    [p]
      vars = 'porepressure'
      petsc_options_iname = '-pc_type -ksp_type'
      petsc_options_value = ' hypre    gmres'
    []
  []
[]

A 3D example

The FORGE porous_flow.i input file used with mesh defined by


    type = GeneratedMeshGenerator
    dim = 3
    nx = 36
    ny = 32
    nz = 10
    xmin = -400
    xmax = 500
    ymin = -400
    ymax = 400

This leads to 161172 DoFs, which splits nicely over 20 processors to yield about 10000 DoFs per processor. The simulation is run for 10 time-steps, most of which converge in 2 Nonlinear iterations. Table 7 shows that there are a variety of solver and preconditioner choices that provide fairly comparable performance, but some are quite poor. The results are for only one 3D model, but it is useful to note that hypre is the best by a significant margin, followed by FSP + hypre + gmres and ASM + ILU. Complicated features such as the asm_overlap, asm_shift_type, diagonal_scale and diagonal_scale_fix, and strong preconditioners like ASM + LU or MUMPS add extra overhead in these types of simulations, in contrast to multi-phase PorousFlow models.

Table 7: Compute time (s) for 10 time-steps of the 3D porous_flow.i simulations run over 20 processors with about 10000 DoFs per processor. Each simulation uses the GMRES KSP method with restart = 300. Note that the time required is only accurate to within about 5% due to the vagaries of HPC.

Time required (s)	Preconditioner	Notes
100	hypre	Using the boomeramg type with default options
170	FSP	Fieldsplit with $var element = document.getElementById("moose-equation-867dd2b9-952a-4979-a492-1cc2c2d2923e");katex.render("u", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ being the mass fractions and temperature, and $var element = document.getElementById("moose-equation-aca0227c-0b83-402f-bb4d-ee07672cd38d");katex.render("p", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ being porepressure, and using hypre + gmres for each subsystem, and `splitting_type` multiplicative
200	ASM + ILU	`asm_overlap = 0`
250	ASM + ILU	`asm_overlap = 0` and `asm_shift_type = NONZERO` and `-ksp_diagonal_scale -ksp_diagonal_scale_fix`
300	BJACOBI
472	ASM + LU	With `asm_overlap = 0`
580	ASM + LU	With `asm_overlap = 1`
>1000	ASM + ILU	`asm_overlap = 1`
>1000	ASM + LU	With `asm_overlap = 2`
>1000	LU	Using MUMPS

Enthusiastic readers will play with further choices (for instance, Eisenstat-Walker options) but it is hoped the above tables will provide a useful starting point.

Reactive-transport using multiple processors

A large reactive-transport simulation based on the 2D FORGE model is used to study the scaling behaviour with the number of processors. Given the above results, this is mainly testing the scaling behaviour of the linear solver for the PorousFlow simulation. The mesh used is


    type = GeneratedMeshGenerator
    dim = 2
    nx = 450
    ny = 400
    xmin = -400
    xmax = 500
    ymin = -400
    ymax = 400

The magnitude of the problem is tabulated in Table 8. The simulation is run for 100 time-steps using the hypre boomeramg preconditioner in porous_flow.i.

Table 8: Size of the large reactive-transport 2D simulation.

Parameter	Number
Nodes	180851
Elements	180000
PorousFlow DoFs	2170212
PorousFlow AuxVariables DoFs	1989361
Geochemistry basis species	11
Geochemistry kinetic species	14
Geochemistry AuxVariable DoFs	17361696

The results are tabulated in Table 9 and Figure 1

Table 9: Size of the large reactive-transport 2D simulation.

Number of processors	Walltime (s)
10	6253
20	3310
40	2028
60	1216
80	960
100	830
140	606
200	470

Figure 2: CPU time required to solve the FORGE reactive-transport simulation involving a MultiApp coupling between PorousFlow and Geochemistry.

Compute efficiencies
Memory usage for pure geochemistry simulations
Compute time for pure geochemistry simulations
Relative compute - time spent in transport and reactions
Solver choices for simulations coupled with PorousFlow
Reactive - transport using multiple processors