Stochastic Tools Batch Mode

The SamplerFullSolveMultiApp and SamplerTransientMultiApp are capable of running sub-applications in one of three different modes:

normal: One sub-application is created for each row of data (num_rows) supplied by the Sampler object.
batch-reset: $var element = document.getElementById("moose-equation-61fad970-71ad-400d-927a-37e65405af54");katex.render("N", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ sub-applications are created, where the sub-applications are destroyed and re-created (on the same existing MPI communicator) for each row of data supplied by the Sampler object.
batch-restore: $var element = document.getElementById("moose-equation-1cbf9840-36ba-46e0-9226-402e5bc710ba");katex.render("N", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ sub-applications are created, where the sub-application is backed up after initialization. Then for each row of data supplied by the Sampler object the sub-application is restored to the initial state prior to execution.

For the two "batch" options, $var element = document.getElementById("moose-equation-44e13572-2a99-4fa9-8ef5-99fed169ee8a");katex.render("N", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ indicates the number of applications created, which in the most general expression is given by

$var element = document.getElementById("moose-equation-c76bb434-b9eb-43cf-8395-9f9304d29f26");katex.render("N=\\text{min}(n_{rows}, \\text{floor}\\left(\\frac{n_{proc}}{n_{min}}\\right))", element, {displayMode:true,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$

where $var element = document.getElementById("moose-equation-52f79bba-3f64-42fd-a020-1ba7fd99649e");katex.render("n_{rows}", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ is the num_rows parameter on the Sampler object, $var element = document.getElementById("moose-equation-217a4a82-51ed-4191-bb28-989541060f6a");katex.render("n_{proc}", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ is the number of processors (launched to the mpiexec command), and $var element = document.getElementById("moose-equation-b4687f79-2e56-4c3e-bb59-5999f425d239");katex.render("n_{min}", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ is the min_procs_per_app, or the minimum number of processors which should be used to run each sub-application. If you launch your Stochastic Tools main application with fewer than min_procs_per_app, the simulation will still proceed, but will just use the maximum number of ranks you have provided as the "effective" min_procs_per_app. We can also illustrate this with an example. Table 1 shows the number of applications created ( $var element = document.getElementById("moose-equation-f0b20526-449e-48fa-8565-afba7991bfea");katex.render("N", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ ) for two different choices of num_rows, min_procs_per_app, and number of processors.

Table 1: Number of sub-apps launched for the "batch" modes for different example choices of num_rows and min_procs_per_app

`num_rows`	`min_procs_per_app`	Processors	Number of Sub-Apps
4	2	9	4
20	2	9	4
4	2	3	1
4	2	1	1

All three modes are available when using SamplerFullSolveMultiApp, the "batch-reset" mode is not available for SamplerTransientMultiApp because the sub-application has state that must be maintained as simulation time progresses.

The primary benefit to using a batch mode is to improve performance of a simulation by reducing the memory of the running application. The performance gains depend on the type of sub-application being executed as well as the number of samples being evaluated. The following sections highlight the the performance improvements that may be expected for full solve and transient sub-applications.

Example 1: Full Solve Sub-Application

The first example demonstrates the performance improvements to expect when using SamplerFullSolveMultiApp with sub-applications. In this case, the sub-application solves steady-state diffusion on a unit cube domain with Dirichlet boundary conditions on the left, $var element = document.getElementById("moose-equation-509444d7-eb04-4e07-bd4f-9e0304e56ddd");katex.render("x=0", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ , and right, $var element = document.getElementById("moose-equation-f7f4f9f5-b854-4438-a963-3e6e316133ac");katex.render("y=0", element, {displayMode:false,throwOnError:false,macros:{"\\eqc":"\\,,","\\eqp":"\\,.","\\pd":"\\frac{\\partial #1}{\\partial #2}","\\pr":"\\left(#1\\right)","\\ddt":"\\frac{d #1}{d t}"}});$ , sides of the domain, the complete input file for this problem is given in Listing 1.

Listing 1: Complete input file for steady-state diffusion problem.

[Mesh<<<{"href": "../../syntax/Mesh/index.html"}>>>]
  type = GeneratedMesh
  dim = 3
  nx = 10
  ny = 10
  nz = 10
[]

[Variables<<<{"href": "../../syntax/Variables/index.html"}>>>]
  [u]
  []
[]

[Kernels<<<{"href": "../../syntax/Kernels/index.html"}>>>]
  [diff]
    type = ADDiffusion<<<{"description": "Same as `Diffusion` in terms of physics/residual, but the Jacobian is computed using forward automatic differentiation", "href": "../../source/kernels/ADDiffusion.html"}>>>
    variable<<<{"description": "The name of the variable that this residual object operates on"}>>> = u
  []
  [time]
    type = ADTimeDerivative<<<{"description": "The time derivative operator with the weak form of $(\\psi_i, \\frac{\\partial u_h}{\\partial t})$.", "href": "../../source/kernels/ADTimeDerivative.html"}>>>
    variable<<<{"description": "The name of the variable that this residual object operates on"}>>> = u
  []
[]

[BCs<<<{"href": "../../syntax/BCs/index.html"}>>>]
  [left]
    type = DirichletBC<<<{"description": "Imposes the essential boundary condition $u=g$, where $g$ is a constant, controllable value.", "href": "../../source/bcs/DirichletBC.html"}>>>
    variable<<<{"description": "The name of the variable that this residual object operates on"}>>> = u
    boundary<<<{"description": "The list of boundary IDs from the mesh where this object applies"}>>> = left
    value<<<{"description": "Value of the BC"}>>> = 0
  []
  [right]
    type = DirichletBC<<<{"description": "Imposes the essential boundary condition $u=g$, where $g$ is a constant, controllable value.", "href": "../../source/bcs/DirichletBC.html"}>>>
    variable<<<{"description": "The name of the variable that this residual object operates on"}>>> = u
    boundary<<<{"description": "The list of boundary IDs from the mesh where this object applies"}>>> = right
    value<<<{"description": "Value of the BC"}>>> = 1
  []
[]

[Postprocessors<<<{"href": "../../syntax/Postprocessors/index.html"}>>>]
  [average]
    type = AverageNodalVariableValue<<<{"description": "Computes the average value of a field by sampling all nodal solutions on the domain or within a subdomain", "href": "../../source/postprocessors/AverageNodalVariableValue.html"}>>>
    variable<<<{"description": "The name of the variable that this postprocessor operates on"}>>> = u
  []
[]

[Executioner<<<{"href": "../../syntax/Executioner/index.html"}>>>]
  type = Transient
  num_steps = 1
  dt = 0.25
  solve_type = NEWTON
[]

[Controls<<<{"href": "../../syntax/Controls/index.html"}>>>]
  [receiver]
    type = SamplerReceiver<<<{"description": "Control for receiving data from a Sampler via SamplerParameterTransfer.", "href": "../../source/controls/SamplerReceiver.html"}>>>
  []
[]

[Outputs<<<{"href": "../../syntax/Outputs/index.html"}>>>]
[]

The master application does not perform a solve, it performs a stochastic analysis using the MonteCarlo object to perturb the values of the two Dirichlet conditions on the sub-applications to vary with a uniform distribution. The complete input file for the master application is given in Listing 1.

Listing 2: Complete input file for master application that performs stochastic simulations of the steady-state diffusion problem in Listing 1 using Monte Carlo sampling.

[StochasticTools<<<{"href": "../../syntax/StochasticTools/index.html"}>>>]
[]

[Distributions<<<{"href": "../../syntax/Distributions/index.html"}>>>]
  [uniform]
    type = Uniform<<<{"description": "Continuous uniform distribution.", "href": "../../source/distributions/Uniform.html"}>>>
    lower_bound<<<{"description": "Distribution lower bound"}>>> = 1
    upper_bound<<<{"description": "Distribution upper bound"}>>> = 9
  []
[]

[Samplers<<<{"href": "../../syntax/Samplers/index.html"}>>>]
  [mc]
    type = MonteCarlo<<<{"description": "Monte Carlo Sampler.", "href": "../../source/samplers/MonteCarloSampler.html"}>>>
    num_rows<<<{"description": "The number of rows per matrix to generate."}>>> = 10
    distributions<<<{"description": "The distribution names to be sampled, the number of distributions provided defines the number of columns per matrix."}>>> = 'uniform uniform'
  []
[]

[MultiApps<<<{"href": "../../syntax/MultiApps/index.html"}>>>]
  [runner]
    type = SamplerFullSolveMultiApp<<<{"description": "Creates a full-solve type sub-application for each row of each Sampler matrix.", "href": "../../source/multiapps/SamplerFullSolveMultiApp.html"}>>>
    sampler<<<{"description": "The Sampler object to utilize for creating MultiApps."}>>> = mc
    input_files<<<{"description": "The input file for each App.  If this parameter only contains one input file it will be used for all of the Apps.  When using 'positions_from_file' it is also admissable to provide one input_file per file."}>>> = 'sub.i'
    mode<<<{"description": "The operation mode, 'normal' creates one sub-application for each row in the Sampler and 'batch-reset' and 'batch-restore' creates N sub-applications, where N is the minimum of 'num_rows' in the Sampler and floor(number of processes / min_procs_per_app). To run the rows in the Sampler, 'batch-reset' will destroy and re-create sub-apps as needed, whereas the 'batch-restore' will backup and restore sub-apps to the initial state prior to execution, without destruction."}>>> = batch-restore
  []
[]

[Transfers<<<{"href": "../../syntax/Transfers/index.html"}>>>]
  [runner]
    type = SamplerParameterTransfer<<<{"description": "Copies Sampler data to a SamplerReceiver object.", "href": "../../source/transfers/SamplerParameterTransfer.html"}>>>
    to_multi_app<<<{"description": "The name of the MultiApp to transfer the data to"}>>> = runner
    parameters<<<{"description": "A list of parameters (on the sub application) to control with the Sampler data. The order of the parameters listed here should match the order of the items in the Sampler."}>>> = 'BCs/left/value BCs/right/value'
    sampler<<<{"description": "A the Sampler object that Transfer is associated.."}>>> = mc
  []
  [data]
    type = SamplerPostprocessorTransfer<<<{"description": "Transfers data from Postprocessors on the sub-application to a VectorPostprocessor on the master application.", "href": "../../source/transfers/SamplerPostprocessorTransfer.html"}>>>
    from_multi_app<<<{"description": "The name of the MultiApp to receive data from"}>>> = runner
    to_vector_postprocessor<<<{"description": "The name of the VectorPostprocessor in the MultiApp to transfer values to."}>>> = storage
    from_postprocessor<<<{"description": "The name(s) of the Postprocessor(s) on the sub-app to transfer from."}>>> = average
    sampler<<<{"description": "A the Sampler object that Transfer is associated.."}>>> = mc
  []
[]

[VectorPostprocessors<<<{"href": "../../syntax/VectorPostprocessors/index.html"}>>>]
  [storage]
    type = StochasticResults<<<{"description": "Storage container for stochastic simulation results coming from a Postprocessor.", "href": "../../source/vectorpostprocessors/StochasticResults.html"}>>>
  []
[]

[Postprocessors<<<{"href": "../../syntax/Postprocessors/index.html"}>>>]
  [total]
    type = MemoryUsage<<<{"description": "Memory usage statistics for the running simulation.", "href": "../../source/postprocessors/MemoryUsage.html"}>>>
    execute_on<<<{"description": "The list of flag(s) indicating when this object should be executed. For a description of each flag, see https://mooseframework.inl.gov/source/interfaces/SetupInterface.html."}>>> = 'INITIAL TIMESTEP_END'
  []
  [per_proc]
    type = MemoryUsage<<<{"description": "Memory usage statistics for the running simulation.", "href": "../../source/postprocessors/MemoryUsage.html"}>>>
    value_type<<<{"description": "Aggregation method to apply to the requested memory metric."}>>> = "average"
    execute_on<<<{"description": "The list of flag(s) indicating when this object should be executed. For a description of each flag, see https://mooseframework.inl.gov/source/interfaces/SetupInterface.html."}>>> = 'INITIAL TIMESTEP_END'
  []
  [max_proc]
    type = MemoryUsage<<<{"description": "Memory usage statistics for the running simulation.", "href": "../../source/postprocessors/MemoryUsage.html"}>>>
    value_type<<<{"description": "Aggregation method to apply to the requested memory metric."}>>> = "max_process"
    execute_on<<<{"description": "The list of flag(s) indicating when this object should be executed. For a description of each flag, see https://mooseframework.inl.gov/source/interfaces/SetupInterface.html."}>>> = 'INITIAL TIMESTEP_END'
  []
[]

[Outputs<<<{"href": "../../syntax/Outputs/index.html"}>>>]
  csv<<<{"description": "Output the scalar variable and postprocessors to a *.csv file using the default CSV output."}>>> = true
  perf_graph<<<{"description": "Enable printing of the performance graph to the screen (Console)"}>>> = true
[]

The example is executed to demonstrate memory performance of the various modes of operation: "normal", "batch-reset", and "batch-restore". Each mode is executed with increasing number of Monte Carlo samples by setting the "n_samples" parameter of the MonteCarloSampler object. Figure 1 and Figure 2 show the resulting memory use at the end of the simulation for each mode of operation with increasing sample numbers in serial and in parallel, respectively.

Figure 1: Total memory at the end of the simulation using a SamplerFullSolveMultiApp with increasing number of Monte Carlo samples for the three available modes of operation running on a single processor.

Figure 2: Total memory and maximum memory per processor at the end of the simulation using a SamplerFullSolveMultiApp with increasing number of Monte Carlo samples for the three available modes of operation running on 56 processors.

An important feature of the various modes of operation is that run-time is not negatively impacted by changing the mode, in some cases using a batch mode can actually decrease total simulation run time.

The total run time results for the full solve problem in serial and parallel are shown in Figure 3 and Figure 4, respectively. The time shown in these plots is the total simulation time, which encompasses both the simulation initialization and solve. The differences in speed are mainly due to the installation and destruction of the sub-application. When running in 'batch-reset' mode, each data sample causes the sub-application to be created and destroyed during the solve, causing the slowest performance. The 'normal' mode creates all sub-applications up front, and the 'batch-restore' method uses the backup-restore capability to save the state of the sub-applications, thus does not require as many instantiations and has the lowest run-time. For this example, the solve portion is minimal as such the sub-application creation time plays a large role. As the solve time increases time gains can be expected to be minimal.

Figure 3: Total execution time of a simulation using SamplerFullSolveMultiApp with increasing number of Monte Carlo samples for the available modes of operation on a single processor.

Figure 4: Total execution time of a simulation using SamplerFullSolveMultiApp with increasing number of Monte Carlo samples for the available modes of operation on 56 processors.

Example 2: Transient Sub-Application

The second example is nearly identical to the first, except the master application is a transient solve that sets the boundary conditions at the end of each time step. The only difference occurs in the master input file, in the Executioner and MultiApps block, as shown in Listing 3.

Listing 3: Complete input file for a transient master application that performs stochastic simulations of a diffusion problem with time varying boundary conditions using Monte Carol sampling.

[Executioner<<<{"href": "../../syntax/Executioner/index.html"}>>>]
  type = Transient
  num_steps = 10
[]

[MultiApps<<<{"href": "../../syntax/MultiApps/index.html"}>>>]
  [runner]
    type = SamplerFullSolveMultiApp<<<{"description": "Creates a full-solve type sub-application for each row of each Sampler matrix.", "href": "../../source/multiapps/SamplerFullSolveMultiApp.html"}>>>
    sampler<<<{"description": "The Sampler object to utilize for creating MultiApps."}>>> = mc
    input_files<<<{"description": "The input file for each App.  If this parameter only contains one input file it will be used for all of the Apps.  When using 'positions_from_file' it is also admissable to provide one input_file per file."}>>> = 'sub.i'
    execute_on<<<{"description": "The list of flag(s) indicating when this object should be executed. For a description of each flag, see https://mooseframework.inl.gov/source/interfaces/SetupInterface.html."}>>> = 'INITIAL TIMESTEP_END'
    mode<<<{"description": "The operation mode, 'normal' creates one sub-application for each row in the Sampler and 'batch-reset' and 'batch-restore' creates N sub-applications, where N is the minimum of 'num_rows' in the Sampler and floor(number of processes / min_procs_per_app). To run the rows in the Sampler, 'batch-reset' will destroy and re-create sub-apps as needed, whereas the 'batch-restore' will backup and restore sub-apps to the initial state prior to execution, without destruction."}>>> = batch-restore
  []
[]

The results shown in Figure 5 and Figure 6 include the memory use at the end of the simulation (10 time steps) for each mode of operation within increasing number of samples in serial and parallel. Recall, as mentioned above, that the "batch-reset" mode is not available in the SamplerTransientMultiApp.

Figure 5: Total memory at the end of the simulation using a SamplerTransientMultiApp with increasing number of Monte Carlo samples for the two available modes of operation running on a single processor.

Figure 6: Total memory and maximum memory per processor at the end of the simulation using a SamplerTransientMultiApp with increasing number of Monte Carlo samples for the two available modes of operation running on 56 processors.

Again, an important feature of the various modes of operation is that run-time is not negatively impacted by changing the mode as seen in Figure 7 and Figure 8. The solve portion of this example is significantly longer than the steady-state example. As such the differences in execution time due to the instantiating of objects is diminished and both modes behave similarly.

Figure 7: Total execution time of a simulation using SamplerTransientMultiApp with increasing number of Monte Carlo samples for the available modes of operation on a single processor.

Figure 8: Total execution time of a simulation using SamplerTransientMultiApp with increasing number of Monte Carlo samples for the available modes of operation on 56 processors.

(modules/stochastic_tools/examples/batch/sub.i)

[Mesh]
  type = GeneratedMesh
  dim = 3
  nx = 10
  ny = 10
  nz = 10
[]

[Variables]
  [u]
  []
[]

[Kernels]
  [diff]
    type = ADDiffusion
    variable = u
  []
  [time]
    type = ADTimeDerivative
    variable = u
  []
[]

[BCs]
  [left]
    type = DirichletBC
    variable = u
    boundary = left
    value = 0
  []
  [right]
    type = DirichletBC
    variable = u
    boundary = right
    value = 1
  []
[]

[Postprocessors]
  [average]
    type = AverageNodalVariableValue
    variable = u
  []
[]

[Executioner]
  type = Transient
  num_steps = 1
  dt = 0.25
  solve_type = NEWTON
[]

[Controls]
  [receiver]
    type = SamplerReceiver
  []
[]

[Outputs]
[]

(modules/stochastic_tools/examples/batch/full_solve.i)

[StochasticTools]
[]

[Distributions]
  [uniform]
    type = Uniform
    lower_bound = 1
    upper_bound = 9
  []
[]

[Samplers]
  [mc]
    type = MonteCarlo
    num_rows = 10
    distributions = 'uniform uniform'
  []
[]

[MultiApps]
  [runner]
    type = SamplerFullSolveMultiApp
    sampler = mc
    input_files = 'sub.i'
    mode = batch-restore
  []
[]

[Transfers]
  [runner]
    type = SamplerParameterTransfer
    to_multi_app = runner
    parameters = 'BCs/left/value BCs/right/value'
    sampler = mc
  []
  [data]
    type = SamplerPostprocessorTransfer
    from_multi_app = runner
    to_vector_postprocessor = storage
    from_postprocessor = average
    sampler = mc
  []
[]

[VectorPostprocessors]
  [storage]
    type = StochasticResults
  []
[]

[Postprocessors]
  [total]
    type = MemoryUsage
    execute_on = 'INITIAL TIMESTEP_END'
  []
  [per_proc]
    type = MemoryUsage
    value_type = "average"
    execute_on = 'INITIAL TIMESTEP_END'
  []
  [max_proc]
    type = MemoryUsage
    value_type = "max_process"
    execute_on = 'INITIAL TIMESTEP_END'
  []
[]

[Outputs]
  csv = true
  perf_graph = true
[]

(modules/stochastic_tools/examples/batch/transient.i)

[StochasticTools]
  auto_create_executioner = false
[]

[Distributions]
  [uniform]
    type = Uniform
    lower_bound = 1
    upper_bound = 9
  []
[]

[Samplers]
  [mc]
    type = MonteCarlo
    num_rows = 10
    distributions = 'uniform uniform'
  []
[]

[Executioner]
  type = Transient
  num_steps = 10
[]

[MultiApps]
  [runner]
    type = SamplerFullSolveMultiApp
    sampler = mc
    input_files = 'sub.i'
    execute_on = 'INITIAL TIMESTEP_END'
    mode = batch-restore
  []
[]

[Transfers]
  [runner]
    type = SamplerParameterTransfer
    to_multi_app = runner
    parameters = 'BCs/left/value BCs/right/value'
    sampler = mc
  []
  [data]
    type = SamplerPostprocessorTransfer
    from_multi_app = runner
    to_vector_postprocessor = storage
    from_postprocessor = average
    sampler = mc
  []
[]

[VectorPostprocessors]
  [storage]
    type = StochasticResults
  []
[]

[Postprocessors]
  [total]
    type = MemoryUsage
    execute_on = 'INITIAL TIMESTEP_END'
  []
  [per_proc]
    type = MemoryUsage
    value_type = "average"
    execute_on = 'INITIAL TIMESTEP_END'
  []
  [max_proc]
    type = MemoryUsage
    value_type = "max_process"
    execute_on = 'INITIAL TIMESTEP_END'
  []
[]

[Outputs]
  csv = true
  perf_graph = true
[]

Example 1 : Full Solve Sub - Application
Example 2 : Transient Sub - Application