Trainers System

Overview

Objects within the [Trainers] block are derived from SurrogateTrainer and are designed for creating training data for use with a model (see Surrogates System).

Creating a SurrogateTrainer

To create a trainer the new object should inherit from SurrogateTrainer, which is derived from GeneralUserObject. SurrogateTrainer overrides the execute() function to loop through the rows of a given sampler, specified by the "sampler" parameter:

void
SurrogateTrainer::execute()
{
  checkIntegrity();

  _row = _sampler.getLocalRowBegin();
  _local_row = 0;

  preTrain();

  for (_row = _sampler.getLocalRowBegin(); _row < _sampler.getLocalRowEnd(); ++_row)
  {
    // Need to do this manually in order to keep the iterators valid
    const std::vector<Real> data = _sampler.getNextLocalRow();
    for (unsigned int i = 0; i < _row_data.size(); ++i)
      _row_data[i] = data[i];

    // Set training data
    for (auto & pair : _training_data)
      pair.second->setCurrentIndex((pair.second->isDistributed() ? _local_row : _row));

    if (!_skip_unconverged || *_converged)
      train();

    _local_row++;
  }

  postTrain();
}

The method will execute once per execution flag (see SetupInterface (execute_on)) on each processor. There are three virtual functions that derived class can and should override:

  /*
   * Setup function called before sampler loop
   */
  virtual void preTrain() {}

  /*
   * Function needed to be overried, called during sampler loop
   */
  virtual void train() {}

  /*
   * Function called after sampler loop, used for mpi communication mainly
   */
  virtual void postTrain() {}

preTrain() is called before the sampler loop and is typically used for resizing variables for the given number of data points.
train() is called within the sampler loop where member variables _local_row, _row, and those declared with getTrainingData are updated.
postTrain() is called after the sampler loop and is typically used for MPI communication.

Gathering Training Data

In order to ease the of gathering the required data needed for training, SurrogateTrainer includes API to get reporter data which takes care of the necessary size checks and distributed data indexing. The idea behind this is to emulate the element loop behavior in other MOOSE objects. For instance, in a kernel, the value of _u corresponds to the solution in an element. Here data referenced with getTrainingData will correspond to the the value of the data in a sampler row. The returned reference is to be used in the train() function. There are four functions that derived classes can call to gather training data:

  /*
   * Get a reference to training data given a reporter name
   */
  template <typename T>
  const T & getTrainingData(const ReporterName & rname);

  /*
   * Get a reference to the sampler row data
   */
  const std::vector<Real> & getSamplerData() const { return _row_data; };

getTrainingData<T>(const ReporterName & rname) will get a vector of training data from a reporter value of type std::vector<T>, whose name is defined by rname.
getSamplerData() will simply return a vector of the sampler row.

Declaring Training Data

Model data must be declare in the object constructor using the declareModelData methods, which are defined as follows. The desired type is provided as the template argument (T) and name to the data is the first input parameter. The second option, if provided, is the initial value for the training data. The name provided is arbitrary, but is used by the model object(s) designed to work with the training data (see Surrogates System).

  template <typename T>
  T & declareModelData(const std::string & data_name);

  template <typename T>
  T & declareModelData(const std::string & data_name, const T & value);

These methods return a reference to the desired type that should be populated in the aforementioned train() method. For example, in the PolynomialChaosTrainer trainer object a scalar value, "order", is stored stored by declaring a reference to the desired type in the header.

  const unsigned int & _order;

Within the source the declared references are initialized with a declare method that includes data initialization.

    _order(declareModelData<unsigned int>("_order", getParam<unsigned int>("order"))),

The training data system leverages the Restartable within MOOSE. As such, the data store can be of an arbitrary type and is automatically used for restarting simulations.

Output Mdoel Data

Training model data can be output to a binary file using the SurrogateTrainerOutput object.

Example Input File Syntax

The following input file snippet adds a PolynomialChaosTrainer object for training. Please refer to the documentation on the individual models for more details.

[Trainers]
  [poly_chaos]
    type = PolynomialChaosTrainer
    execute_on = timestep_end
    order = 5
    distributions = 'D_dist S_dist'
    sampler = sample
    response = storage/data:avg:value
  []
[]

Available Objects

Stochastic Tools App
GaussianProcessTrainerProvides data preperation and training for a Gaussian Process surrogate model.
LibtorchANNTrainerTrains a simple neural network using libtorch.
NearestPointTrainerLoops over and saves sample values for NearestPointSurrogate.
PODReducedBasisTrainerComputes the reduced subspace plus the reduced operators for POD-RB surrogate.
PolynomialChaosTrainerComputes and evaluates polynomial chaos surrogate model.
PolynomialRegressionTrainerComputes coefficients for polynomial regession model.

Available Actions

Stochastic Tools App
AddSurrogateActionAdds SurrogateTrainer and SurrogateModel objects contained within the [Trainers] and [Surrogates] input blocks.

(modules/stochastic_tools/src/surrogates/SurrogateTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "SurrogateTrainer.h"
#include "Sampler.h"
#include "RestartableDataIO.h"
#include "StochasticToolsApp.h"

InputParameters
SurrogateTrainerBase::validParams()
{
  InputParameters params = GeneralUserObject::validParams();
  params.registerBase("SurrogateTrainer");
  return params;
}

SurrogateTrainerBase::SurrogateTrainerBase(const InputParameters & parameters)
  : GeneralUserObject(parameters), _model_meta_data_name(_type + "_" + name())
{
  _app.registerRestartableDataMapName(_model_meta_data_name, name());
}

InputParameters
SurrogateTrainer::validParams()
{
  InputParameters params = SurrogateTrainerBase::validParams();
  params.addRequiredParam<SamplerName>("sampler",
                                       "Sampler used to create predictor and response data.");
  params.addParam<ReporterName>(
      "converged_reporter",
      "Reporter value used to determine if a sample's multiapp solve converged.");
  params.addParam<bool>("skip_unconverged_samples",
                        false,
                        "True to skip samples where the multiapp did not converge, "
                        "'stochastic_reporter' is required to do this.");
  return params;
}

SurrogateTrainer::SurrogateTrainer(const InputParameters & parameters)
  : SurrogateTrainerBase(parameters),
    _sampler(getSampler("sampler")),
    _row_data(_sampler.getNumberOfCols()),
    _skip_unconverged(getParam<bool>("skip_unconverged_samples")),
    _converged(nullptr)
{
  if (_skip_unconverged)
  {
    if (!isParamValid("converged_reporter"))
      paramError("skip_unconverged_samples",
                 "'converged_reporter' needs to be specified to skip unconverged sample.");
    _converged = &getTrainingData<bool>(getParam<ReporterName>("converged_reporter"));
  }
}

void
SurrogateTrainer::initialize()
{
  // Figure out if data is distributed
  for (auto & pair : _training_data)
  {
    const ReporterName & name = pair.first;
    TrainingDataBase & data = *pair.second;

    const auto & mode = _fe_problem.getReporterData().getReporterMode(name);
    if (mode == REPORTER_MODE_DISTRIBUTED || (mode == REPORTER_MODE_ROOT && processor_id() != 0))
      data.isDistributed() = true;
    else if (mode == REPORTER_MODE_REPLICATED ||
             (mode == REPORTER_MODE_ROOT && processor_id() == 0))
      data.isDistributed() = false;
    else
      mooseError("Predictor reporter value ", name, " is not of supported mode.");
  }
}

void
SurrogateTrainer::execute()
{
  checkIntegrity();

  _row = _sampler.getLocalRowBegin();
  _local_row = 0;

  preTrain();

  for (_row = _sampler.getLocalRowBegin(); _row < _sampler.getLocalRowEnd(); ++_row)
  {
    // Need to do this manually in order to keep the iterators valid
    const std::vector<Real> data = _sampler.getNextLocalRow();
    for (unsigned int i = 0; i < _row_data.size(); ++i)
      _row_data[i] = data[i];

    // Set training data
    for (auto & pair : _training_data)
      pair.second->setCurrentIndex((pair.second->isDistributed() ? _local_row : _row));

    if (!_skip_unconverged || *_converged)
      train();

    _local_row++;
  }

  postTrain();
}

void
SurrogateTrainer::checkIntegrity() const
{
  // Check that the number of sampler columns hasn't changed
  if (_row_data.size() != _sampler.getNumberOfCols())
    mooseError("Number of sampler columns has changed.");

  // Check that training data is correctly sized
  for (auto & pair : _training_data)
  {
    dof_id_type rsize = pair.second->size();
    dof_id_type nrow =
        pair.second->isDistributed() ? _sampler.getNumberOfLocalRows() : _sampler.getNumberOfRows();
    if (rsize != nrow)
      mooseError("Reporter value ",
                 pair.first,
                 " of size ",
                 rsize,
                 " does not match sampler size (",
                 nrow,
                 ").");
  }
}

(modules/stochastic_tools/include/surrogates/SurrogateTrainer.h)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#pragma once

#include "StochasticToolsApp.h"
#include "GeneralUserObject.h"
#include "LoadSurrogateDataAction.h"

#include "Sampler.h"
#include "RestartableDataIO.h"
#include "StochasticToolsApp.h"

class TrainingDataBase;
template <typename T>
class TrainingData;

/**
 * This is the base trainer class whose main functionality is the API for declaring
 * model data. All trainer must at least derive from this. Unless a trainer needs
 * to perform its own loop through data, it is highly recommended to derive from
 * SurrogateTrainer.
 */
class SurrogateTrainerBase : public GeneralUserObject
{
public:
  static InputParameters validParams();
  SurrogateTrainerBase(const InputParameters & parameters);

  virtual void initialize() {}                         // not required, but available
  virtual void finalize() {}                           // not required, but available
  virtual void threadJoin(const UserObject &) final {} // GeneralUserObjects are not threaded

  /**
   * The name for training data stored within the MooseApp
   */
  const std::string & modelMetaDataName() const { return _model_meta_data_name; }

  ///@{
  /**
   * Declare model data for loading from file as well as restart
   */
  // MOOSEDOCS_BEGIN
  template <typename T>
  T & declareModelData(const std::string & data_name);

  template <typename T>
  T & declareModelData(const std::string & data_name, const T & value);
  // MOOSEDOCS_END
  ///@}

private:
  /// Name for the meta data associated with training
  const std::string _model_meta_data_name;

  /**
   * Internal function used by public declareModelData methods.
   */
  template <typename T>
  RestartableData<T> & declareModelDataHelper(const std::string & data_name);
};

template <typename T>
T &
SurrogateTrainerBase::declareModelData(const std::string & data_name)
{
  RestartableData<T> & data_ref = declareModelDataHelper<T>(data_name);
  return data_ref.set();
}

template <typename T>
T &
SurrogateTrainerBase::declareModelData(const std::string & data_name, const T & value)
{
  RestartableData<T> & data_ref = declareModelDataHelper<T>(data_name);
  data_ref.set() = value;
  return data_ref.set();
}

template <typename T>
RestartableData<T> &
SurrogateTrainerBase::declareModelDataHelper(const std::string & data_name)
{
  auto data_ptr = std::make_unique<RestartableData<T>>(data_name, nullptr);
  RestartableDataValue & value =
      _app.registerRestartableData(data_name, std::move(data_ptr), 0, false, _model_meta_data_name);
  RestartableData<T> & data_ref = static_cast<RestartableData<T> &>(value);
  return data_ref;
}

/**
 * This is the main trainer base class. The main purpose is to avoid a lot of code
 * duplication from performing sampler loops and dealing with distributed data. There
 * three functions that derived trainer should override: preTrain, train, and postTrain.
 * Derived class should also use the getTrainingData functionality, which provides a
 * refernce to vector reporter data in its current state within the sampler loop.
 *
 * The idea behind this is to emulate the element loop behaiviour in other MOOSE objects.
 * For instance, in a kernel, the value of _u corresponds to the solution in an element.
 * Here data referenced with getTrainingData will correspond to the the value of the
 * data in a sampler row.
 */
class SurrogateTrainer : public SurrogateTrainerBase
{
public:
  static InputParameters validParams();
  SurrogateTrainer(const InputParameters & parameters);

  virtual void initialize() final;
  virtual void execute() final;
  virtual void finalize() final{};

protected:
  /*
   * Setup function called before sampler loop
   */
  virtual void preTrain() {}

  /*
   * Function needed to be overried, called during sampler loop
   */
  virtual void train() {}

  /*
   * Function called after sampler loop, used for mpi communication mainly
   */
  virtual void postTrain() {}

  // TRAINING_DATA_BEGIN

  /*
   * Get a reference to training data given a reporter name
   */
  template <typename T>
  const T & getTrainingData(const ReporterName & rname);

  /*
   * Get a reference to the sampler row data
   */
  const std::vector<Real> & getSamplerData() const { return _row_data; };

  // TRAINING_DATA_END

  /// Sampler being used for training
  Sampler & _sampler;

  /// During training loop, this is the row index of the data
  dof_id_type _row;
  /// During training loop, this is the local row index of the data
  dof_id_type _local_row;

private:
  /*
   * Called at the beginning of execute() to make sure values are set properly
   */
  void checkIntegrity() const;

  /// Sampler data for the current row
  std::vector<Real> _row_data;

  /// Whether or not we are skipping samples that have unconverged solutions
  const bool _skip_unconverged;

  /// Whether or not the current sample has a converged solution
  const bool * _converged;

  /// Vector of reporter names and their corresponding values (to be filled by getTrainingData)
  std::unordered_map<ReporterName, std::shared_ptr<TrainingDataBase>> _training_data;
};

template <typename T>
const T &
SurrogateTrainer::getTrainingData(const ReporterName & rname)
{
  auto it = _training_data.find(rname);
  if (it != _training_data.end())
  {
    auto data = std::dynamic_pointer_cast<TrainingData<T>>(it->second);
    if (!data)
      mooseError("Reporter value ", rname, " already exists but is of different type.");
    return data->get();
  }
  else
  {
    const std::vector<T> & rval = getReporterValueByName<std::vector<T>>(rname);
    _training_data[rname] = std::make_shared<TrainingData<T>>(rval);
    return std::dynamic_pointer_cast<TrainingData<T>>(_training_data[rname])->get();
  }
}

class TrainingDataBase
{
public:
  TrainingDataBase() : _is_distributed(false) {}

  virtual ~TrainingDataBase() = default;

  virtual dof_id_type size() const = 0;
  virtual void setCurrentIndex(dof_id_type index) = 0;
  bool & isDistributed() { return _is_distributed; }

protected:
  bool _is_distributed;
};

template <typename T>
class TrainingData : public TrainingDataBase
{
public:
  TrainingData(const std::vector<T> & vector) : _vector(vector) {}

  virtual dof_id_type size() const override { return _vector.size(); }
  virtual void setCurrentIndex(dof_id_type index) override { _value = _vector[index]; }

  const T & get() const { return _value; }

private:
  const std::vector<T> & _vector;
  T _value;
};

(modules/stochastic_tools/include/surrogates/SurrogateTrainer.h)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#pragma once

#include "StochasticToolsApp.h"
#include "GeneralUserObject.h"
#include "LoadSurrogateDataAction.h"

#include "Sampler.h"
#include "RestartableDataIO.h"
#include "StochasticToolsApp.h"

class TrainingDataBase;
template <typename T>
class TrainingData;

/**
 * This is the base trainer class whose main functionality is the API for declaring
 * model data. All trainer must at least derive from this. Unless a trainer needs
 * to perform its own loop through data, it is highly recommended to derive from
 * SurrogateTrainer.
 */
class SurrogateTrainerBase : public GeneralUserObject
{
public:
  static InputParameters validParams();
  SurrogateTrainerBase(const InputParameters & parameters);

  virtual void initialize() {}                         // not required, but available
  virtual void finalize() {}                           // not required, but available
  virtual void threadJoin(const UserObject &) final {} // GeneralUserObjects are not threaded

  /**
   * The name for training data stored within the MooseApp
   */
  const std::string & modelMetaDataName() const { return _model_meta_data_name; }

  ///@{
  /**
   * Declare model data for loading from file as well as restart
   */
  // MOOSEDOCS_BEGIN
  template <typename T>
  T & declareModelData(const std::string & data_name);

  template <typename T>
  T & declareModelData(const std::string & data_name, const T & value);
  // MOOSEDOCS_END
  ///@}

private:
  /// Name for the meta data associated with training
  const std::string _model_meta_data_name;

  /**
   * Internal function used by public declareModelData methods.
   */
  template <typename T>
  RestartableData<T> & declareModelDataHelper(const std::string & data_name);
};

template <typename T>
T &
SurrogateTrainerBase::declareModelData(const std::string & data_name)
{
  RestartableData<T> & data_ref = declareModelDataHelper<T>(data_name);
  return data_ref.set();
}

template <typename T>
T &
SurrogateTrainerBase::declareModelData(const std::string & data_name, const T & value)
{
  RestartableData<T> & data_ref = declareModelDataHelper<T>(data_name);
  data_ref.set() = value;
  return data_ref.set();
}

template <typename T>
RestartableData<T> &
SurrogateTrainerBase::declareModelDataHelper(const std::string & data_name)
{
  auto data_ptr = std::make_unique<RestartableData<T>>(data_name, nullptr);
  RestartableDataValue & value =
      _app.registerRestartableData(data_name, std::move(data_ptr), 0, false, _model_meta_data_name);
  RestartableData<T> & data_ref = static_cast<RestartableData<T> &>(value);
  return data_ref;
}

/**
 * This is the main trainer base class. The main purpose is to avoid a lot of code
 * duplication from performing sampler loops and dealing with distributed data. There
 * three functions that derived trainer should override: preTrain, train, and postTrain.
 * Derived class should also use the getTrainingData functionality, which provides a
 * refernce to vector reporter data in its current state within the sampler loop.
 *
 * The idea behind this is to emulate the element loop behaiviour in other MOOSE objects.
 * For instance, in a kernel, the value of _u corresponds to the solution in an element.
 * Here data referenced with getTrainingData will correspond to the the value of the
 * data in a sampler row.
 */
class SurrogateTrainer : public SurrogateTrainerBase
{
public:
  static InputParameters validParams();
  SurrogateTrainer(const InputParameters & parameters);

  virtual void initialize() final;
  virtual void execute() final;
  virtual void finalize() final{};

protected:
  /*
   * Setup function called before sampler loop
   */
  virtual void preTrain() {}

  /*
   * Function needed to be overried, called during sampler loop
   */
  virtual void train() {}

  /*
   * Function called after sampler loop, used for mpi communication mainly
   */
  virtual void postTrain() {}

  // TRAINING_DATA_BEGIN

  /*
   * Get a reference to training data given a reporter name
   */
  template <typename T>
  const T & getTrainingData(const ReporterName & rname);

  /*
   * Get a reference to the sampler row data
   */
  const std::vector<Real> & getSamplerData() const { return _row_data; };

  // TRAINING_DATA_END

  /// Sampler being used for training
  Sampler & _sampler;

  /// During training loop, this is the row index of the data
  dof_id_type _row;
  /// During training loop, this is the local row index of the data
  dof_id_type _local_row;

private:
  /*
   * Called at the beginning of execute() to make sure values are set properly
   */
  void checkIntegrity() const;

  /// Sampler data for the current row
  std::vector<Real> _row_data;

  /// Whether or not we are skipping samples that have unconverged solutions
  const bool _skip_unconverged;

  /// Whether or not the current sample has a converged solution
  const bool * _converged;

  /// Vector of reporter names and their corresponding values (to be filled by getTrainingData)
  std::unordered_map<ReporterName, std::shared_ptr<TrainingDataBase>> _training_data;
};

template <typename T>
const T &
SurrogateTrainer::getTrainingData(const ReporterName & rname)
{
  auto it = _training_data.find(rname);
  if (it != _training_data.end())
  {
    auto data = std::dynamic_pointer_cast<TrainingData<T>>(it->second);
    if (!data)
      mooseError("Reporter value ", rname, " already exists but is of different type.");
    return data->get();
  }
  else
  {
    const std::vector<T> & rval = getReporterValueByName<std::vector<T>>(rname);
    _training_data[rname] = std::make_shared<TrainingData<T>>(rval);
    return std::dynamic_pointer_cast<TrainingData<T>>(_training_data[rname])->get();
  }
}

class TrainingDataBase
{
public:
  TrainingDataBase() : _is_distributed(false) {}

  virtual ~TrainingDataBase() = default;

  virtual dof_id_type size() const = 0;
  virtual void setCurrentIndex(dof_id_type index) = 0;
  bool & isDistributed() { return _is_distributed; }

protected:
  bool _is_distributed;
};

template <typename T>
class TrainingData : public TrainingDataBase
{
public:
  TrainingData(const std::vector<T> & vector) : _vector(vector) {}

  virtual dof_id_type size() const override { return _vector.size(); }
  virtual void setCurrentIndex(dof_id_type index) override { _value = _vector[index]; }

  const T & get() const { return _value; }

private:
  const std::vector<T> & _vector;
  T _value;
};

(modules/stochastic_tools/include/surrogates/SurrogateTrainer.h)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#pragma once

#include "StochasticToolsApp.h"
#include "GeneralUserObject.h"
#include "LoadSurrogateDataAction.h"

#include "Sampler.h"
#include "RestartableDataIO.h"
#include "StochasticToolsApp.h"

class TrainingDataBase;
template <typename T>
class TrainingData;

/**
 * This is the base trainer class whose main functionality is the API for declaring
 * model data. All trainer must at least derive from this. Unless a trainer needs
 * to perform its own loop through data, it is highly recommended to derive from
 * SurrogateTrainer.
 */
class SurrogateTrainerBase : public GeneralUserObject
{
public:
  static InputParameters validParams();
  SurrogateTrainerBase(const InputParameters & parameters);

  virtual void initialize() {}                         // not required, but available
  virtual void finalize() {}                           // not required, but available
  virtual void threadJoin(const UserObject &) final {} // GeneralUserObjects are not threaded

  /**
   * The name for training data stored within the MooseApp
   */
  const std::string & modelMetaDataName() const { return _model_meta_data_name; }

  ///@{
  /**
   * Declare model data for loading from file as well as restart
   */
  // MOOSEDOCS_BEGIN
  template <typename T>
  T & declareModelData(const std::string & data_name);

  template <typename T>
  T & declareModelData(const std::string & data_name, const T & value);
  // MOOSEDOCS_END
  ///@}

private:
  /// Name for the meta data associated with training
  const std::string _model_meta_data_name;

  /**
   * Internal function used by public declareModelData methods.
   */
  template <typename T>
  RestartableData<T> & declareModelDataHelper(const std::string & data_name);
};

template <typename T>
T &
SurrogateTrainerBase::declareModelData(const std::string & data_name)
{
  RestartableData<T> & data_ref = declareModelDataHelper<T>(data_name);
  return data_ref.set();
}

template <typename T>
T &
SurrogateTrainerBase::declareModelData(const std::string & data_name, const T & value)
{
  RestartableData<T> & data_ref = declareModelDataHelper<T>(data_name);
  data_ref.set() = value;
  return data_ref.set();
}

template <typename T>
RestartableData<T> &
SurrogateTrainerBase::declareModelDataHelper(const std::string & data_name)
{
  auto data_ptr = std::make_unique<RestartableData<T>>(data_name, nullptr);
  RestartableDataValue & value =
      _app.registerRestartableData(data_name, std::move(data_ptr), 0, false, _model_meta_data_name);
  RestartableData<T> & data_ref = static_cast<RestartableData<T> &>(value);
  return data_ref;
}

/**
 * This is the main trainer base class. The main purpose is to avoid a lot of code
 * duplication from performing sampler loops and dealing with distributed data. There
 * three functions that derived trainer should override: preTrain, train, and postTrain.
 * Derived class should also use the getTrainingData functionality, which provides a
 * refernce to vector reporter data in its current state within the sampler loop.
 *
 * The idea behind this is to emulate the element loop behaiviour in other MOOSE objects.
 * For instance, in a kernel, the value of _u corresponds to the solution in an element.
 * Here data referenced with getTrainingData will correspond to the the value of the
 * data in a sampler row.
 */
class SurrogateTrainer : public SurrogateTrainerBase
{
public:
  static InputParameters validParams();
  SurrogateTrainer(const InputParameters & parameters);

  virtual void initialize() final;
  virtual void execute() final;
  virtual void finalize() final{};

protected:
  /*
   * Setup function called before sampler loop
   */
  virtual void preTrain() {}

  /*
   * Function needed to be overried, called during sampler loop
   */
  virtual void train() {}

  /*
   * Function called after sampler loop, used for mpi communication mainly
   */
  virtual void postTrain() {}

  // TRAINING_DATA_BEGIN

  /*
   * Get a reference to training data given a reporter name
   */
  template <typename T>
  const T & getTrainingData(const ReporterName & rname);

  /*
   * Get a reference to the sampler row data
   */
  const std::vector<Real> & getSamplerData() const { return _row_data; };

  // TRAINING_DATA_END

  /// Sampler being used for training
  Sampler & _sampler;

  /// During training loop, this is the row index of the data
  dof_id_type _row;
  /// During training loop, this is the local row index of the data
  dof_id_type _local_row;

private:
  /*
   * Called at the beginning of execute() to make sure values are set properly
   */
  void checkIntegrity() const;

  /// Sampler data for the current row
  std::vector<Real> _row_data;

  /// Whether or not we are skipping samples that have unconverged solutions
  const bool _skip_unconverged;

  /// Whether or not the current sample has a converged solution
  const bool * _converged;

  /// Vector of reporter names and their corresponding values (to be filled by getTrainingData)
  std::unordered_map<ReporterName, std::shared_ptr<TrainingDataBase>> _training_data;
};

template <typename T>
const T &
SurrogateTrainer::getTrainingData(const ReporterName & rname)
{
  auto it = _training_data.find(rname);
  if (it != _training_data.end())
  {
    auto data = std::dynamic_pointer_cast<TrainingData<T>>(it->second);
    if (!data)
      mooseError("Reporter value ", rname, " already exists but is of different type.");
    return data->get();
  }
  else
  {
    const std::vector<T> & rval = getReporterValueByName<std::vector<T>>(rname);
    _training_data[rname] = std::make_shared<TrainingData<T>>(rval);
    return std::dynamic_pointer_cast<TrainingData<T>>(_training_data[rname])->get();
  }
}

class TrainingDataBase
{
public:
  TrainingDataBase() : _is_distributed(false) {}

  virtual ~TrainingDataBase() = default;

  virtual dof_id_type size() const = 0;
  virtual void setCurrentIndex(dof_id_type index) = 0;
  bool & isDistributed() { return _is_distributed; }

protected:
  bool _is_distributed;
};

template <typename T>
class TrainingData : public TrainingDataBase
{
public:
  TrainingData(const std::vector<T> & vector) : _vector(vector) {}

  virtual dof_id_type size() const override { return _vector.size(); }
  virtual void setCurrentIndex(dof_id_type index) override { _value = _vector[index]; }

  const T & get() const { return _value; }

private:
  const std::vector<T> & _vector;
  T _value;
};

(modules/stochastic_tools/include/surrogates/PolynomialChaosTrainer.h)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#pragma once

#include "SurrogateTrainer.h"
#include "PolynomialQuadrature.h"
#include "QuadratureSampler.h"
#include "MultiDimPolynomialGenerator.h"

#include "Distribution.h"

class PolynomialChaosTrainer : public SurrogateTrainer
{
public:
  static InputParameters validParams();
  PolynomialChaosTrainer(const InputParameters & parameters);
  virtual void train() override;
  virtual void postTrain() override;

private:
  /// Predictor values (taken from sampler)
  const std::vector<Real> & _pvals;

  /// Response results
  const Real & _rval;

  /// Maximum polynomial order. The sum of 1D polynomial orders does not go above this value.
  const unsigned int & _order;

  /// Total number of parameters/dimensions
  unsigned int & _ndim;

  /// A _ndim-by-_ncoeff matrix containing the appropriate one-dimensional polynomial order
  std::vector<std::vector<unsigned int>> & _tuple;

  /// Total number of coefficient (defined by size of _tuple)
  std::size_t & _ncoeff;

  /// These are the coefficients we are after in the PC expansion
  std::vector<Real> & _coeff;

  /// The distributions used for sampling
  std::vector<std::unique_ptr<const PolynomialQuadrature::Polynomial>> & _poly;

  /// QuadratureSampler pointer, necessary for applying quadrature weights
  QuadratureSampler * _quad_sampler;
};

(modules/stochastic_tools/src/surrogates/PolynomialChaosTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "PolynomialChaosTrainer.h"
#include "Sampler.h"
#include "CartesianProduct.h"

registerMooseObject("StochasticToolsApp", PolynomialChaosTrainer);

InputParameters
PolynomialChaosTrainer::validParams()
{
  InputParameters params = SurrogateTrainer::validParams();
  params.addClassDescription("Computes and evaluates polynomial chaos surrogate model.");
  params.addRequiredParam<ReporterName>(
      "response", "Reporter value of response results, can be vpp with <vpp_name>/<vector_name>.");
  params.addRequiredParam<unsigned int>("order", "Maximum polynomial order.");
  params.addRequiredParam<std::vector<DistributionName>>(
      "distributions", "Names of the distributions samples were taken from.");

  return params;
}

PolynomialChaosTrainer::PolynomialChaosTrainer(const InputParameters & parameters)
  : SurrogateTrainer(parameters),
    _pvals(getSamplerData()),
    _rval(getTrainingData<Real>(getParam<ReporterName>("response"))),
    _order(declareModelData<unsigned int>("_order", getParam<unsigned int>("order"))),
    _ndim(declareModelData<unsigned int>("_ndim", _sampler.getNumberOfCols())),
    _tuple(declareModelData<std::vector<std::vector<unsigned int>>>(
        "_tuple", StochasticTools::MultiDimPolynomialGenerator::generateTuple(_ndim, _order))),
    _ncoeff(declareModelData<std::size_t>("_ncoeff", _tuple.size())),
    _coeff(declareModelData<std::vector<Real>>("_coeff")),
    _poly(declareModelData<std::vector<std::unique_ptr<const PolynomialQuadrature::Polynomial>>>(
        "_poly")),
    _quad_sampler(dynamic_cast<QuadratureSampler *>(&_sampler))
{
  // Check if number of distributions is correct
  if (_ndim != _sampler.getNumberOfCols())
    paramError("distributions",
               "Sampler number of columns does not match number of inputted distributions.");

  // Make polynomials
  for (const auto & nm : getParam<std::vector<DistributionName>>("distributions"))
    _poly.push_back(PolynomialQuadrature::makePolynomial(&getDistributionByName(nm)));

  _coeff.resize(_ncoeff, 0);
}

void
PolynomialChaosTrainer::train()
{
  DenseMatrix<Real> poly_val(_ndim, _order);

  // Evaluate polynomials to avoid duplication
  for (unsigned int d = 0; d < _ndim; ++d)
    for (unsigned int i = 0; i < _order; ++i)
      poly_val(d, i) = _poly[d]->compute(i, _pvals[d]);

  // Loop over coefficients
  for (std::size_t i = 0; i < _ncoeff; ++i)
  {
    Real val = _rval;
    // Loop over parameters
    for (std::size_t d = 0; d < _ndim; ++d)
      val *= poly_val(d, _tuple[i][d]);

    if (_quad_sampler)
      val *= _quad_sampler->getQuadratureWeight(_row);
    _coeff[i] += val;
  }
}

void
PolynomialChaosTrainer::postTrain()
{
  gatherSum(_coeff);

  if (!_quad_sampler)
    for (std::size_t i = 0; i < _ncoeff; ++i)
      _coeff[i] /= _sampler.getNumberOfRows();
}

(modules/stochastic_tools/test/tests/surrogates/poly_chaos/main_2d_mc.i)

[StochasticTools]
[]

[Distributions]
  [D_dist]
    type = Uniform
    lower_bound = 2.5
    upper_bound = 7.5
  []
  [S_dist]
    type = Uniform
    lower_bound = 2.5
    upper_bound = 7.5
  []
[]

[Samplers]
  [sample]
    type = MonteCarlo
    num_rows = 100
    distributions = 'D_dist S_dist'
    execute_on = initial
  []
[]

[MultiApps]
  [quad_sub]
    type = SamplerFullSolveMultiApp
    input_files = sub.i
    sampler = sample
    mode = batch-restore
  []
[]

[Transfers]
  [quad]
    type = SamplerParameterTransfer
    to_multi_app = quad_sub
    sampler = sample
    parameters = 'Materials/diffusivity/prop_values Materials/xs/prop_values'
    to_control = 'stochastic'
  []
  [data]
    type = SamplerReporterTransfer
    from_multi_app = quad_sub
    sampler = sample
    stochastic_reporter = storage
    from_reporter = avg/value
  []
[]

[Reporters]
  [storage]
    type = StochasticReporter
    outputs = none
  []
  [pc_samp]
    type = EvaluateSurrogate
    model = poly_chaos
    sampler = sample
    parallel_type = ROOT
    execute_on = final
  []
[]

[Surrogates]
  [poly_chaos]
    type = PolynomialChaos
    trainer = poly_chaos
  []
[]

[Trainers]
  [poly_chaos]
    type = PolynomialChaosTrainer
    execute_on = timestep_end
    order = 5
    distributions = 'D_dist S_dist'
    sampler = sample
    response = storage/data:avg:value
  []
[]

[Outputs]
  [out]
    type = CSV
    execute_on = FINAL
  []
[]

Overview
Creating a SurrogateTrainer
Gathering Training Data
Declaring Training Data
Output Mdoel Data
Example Input File Syntax
Available Objects
Available Actions

Install MOOSE

New Users

Examples and Tutorials

Application Usage

Physics and Syntax

Application Development

Framework Development

MOOSEDocs

Infrastructure

Questions

Information and Tools

INL Applications and Remote Access

Trainers System

Overview

Creating a SurrogateTrainer

Gathering Training Data

Declaring Training Data

Output Mdoel Data

Example Input File Syntax

Available Objects

Available Actions