Creating a Surrogate Model

This example goes through the process of creating a custom surrogate model, in his case the creation of NearestPointSurrogate.

Overview

Building a surrogate model requires the creation of two objects: SurrogateTrainer and SurrogateModel. The SurrogateTrainer uses information from samplers and results to construct variables to be saved into a .rd file at the conclusion of the training run. The SurrogateModel object loads the data from the .rd and contains a function called evaluate that evaluates the surrogate model at a given input. The SurrogateTrainer and Surrogate are heavily tied together where each have the same member variables, the difference being one saves the data and the other loads it. It might be beneficial to have an interface class that contains common functions for training and evaluating, to avoid duplicate code. This example will not go into the creation of this interface class.

Creating a Trainer

This example will go over the creation of NearestPointTrainer. Trainers are derived from SurrogateTrainer which performs a loop over the training data and calls virtual functions that derived classes are meant to override to perform the proper training.

validParams

The trainer requires the input of a sampler, so that it understands how many data points are included and how they are distributed across processors. The trainer also needs the predictor and response values from the full-order model which are stored in a vector postprocessor or reporter.

InputParameters
SurrogateTrainer::validParams()
{
  InputParameters params = SurrogateTrainerBase::validParams();
  params.addRequiredParam<SamplerName>("sampler",
                                       "Sampler used to create predictor and response data.");
  params.addParam<ReporterName>(
      "converged_reporter",
      "Reporter value used to determine if a sample's multiapp solve converged.");
  params.addParam<bool>("skip_unconverged_samples",
                        false,
                        "True to skip samples where the multiapp did not converge, "
                        "'stochastic_reporter' is required to do this.");
  return params;
}

InputParameters
NearestPointTrainer::validParams()
{
  InputParameters params = SurrogateTrainer::validParams();
  params.addClassDescription("Loops over and saves sample values for [NearestPointSurrogate.md].");
  params.addRequiredParam<ReporterName>(
      "response",
      "Reporter value of response results, can be vpp with <vpp_name>/<vector_name> or sampler "
      "column with 'sampler/col_<index>'.");
  params.addParam<std::vector<ReporterName>>(
      "predictors",
      std::vector<ReporterName>(),
      "Reporter values used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");
  params.addParam<std::vector<unsigned int>>(
      "predictor_cols",
      std::vector<unsigned int>(),
      "Sampler columns used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");

  return params;
}

Constructor

All trainers are based on SurrogateTrainer, which provides the necessary interface for saving the surrogate model data and gathering response/predictor data. All the data meant to be saved and gathered is defined in the constructor of the training object. In NearestPointTrainer, the variable _sample_points is declared as the necessary surrogate data, see Trainers for more information on declaring model data. The variables _response, _predictors, and _predictor_cols refer to the data being used for training. _response and _predictors are in the form of reporter values and gathered through the getTrainingData API. _predictor_cols refer to the sampler column being used for training.

NearestPointTrainer::NearestPointTrainer(const InputParameters & parameters)
  : SurrogateTrainer(parameters),
    _sample_points(declareModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sampler_row(getSamplerData()),
    _response(getTrainingData<Real>(getParam<ReporterName>("response"))),
    _predictor_cols(getParam<std::vector<unsigned int>>("predictor_cols"))
{
  for (const ReporterName & rname : getParam<std::vector<ReporterName>>("predictors"))
    _predictors.push_back(&getTrainingData<Real>(rname));

  // If predictors and predictor_cols are empty, use all sampler columns
  if (_predictors.empty() && _predictor_cols.empty())
  {
    _predictor_cols.resize(_sampler.getNumberOfCols());
    std::iota(_predictor_cols.begin(), _predictor_cols.end(), 0);
  }

  // Resize sample points to number of predictors
  _sample_points.resize(_predictors.size() + _predictor_cols.size() + 1);
}

The member variables _sample_points, _response, _predictors, and _predictor_cols are defined in the header file:

  /// Map containing sample points and the results
  std::vector<std::vector<Real>> & _sample_points;

  /// Data from the current sampler row
  const std::vector<Real> & _sampler_row;

  /// Response value
  const Real & _response;

  /// Columns from sampler for predictors
  std::vector<unsigned int> _predictor_cols;

  /// Predictor values from reporters
  std::vector<const Real *> _predictors;

preTrain

preTrain() is called before the sampler loop. For NearestPointTrainer, we resize _sample_points appropriately:

void
NearestPointTrainer::preTrain()
{
  // Resize to number of sample points
  for (auto & it : _sample_points)
    it.resize(_sampler.getNumberOfLocalRows());
}

Note that getNumberOfLocalRows() is used to size the array, this is so that each processor contains a portion of the samples and results. We will gather all samples in postTrain().

train

train() is where the actual training occurs. This function is called during the sampler loop for each row, at which time the member variables _row, _local_row, and ones gathered with getTrainingData are updated:

void
NearestPointTrainer::train()
{
  unsigned int d = 0;
  // Get predictors from reporter values
  for (const auto & val : _predictors)
    _sample_points[d++][_local_row] = *val;
  // Get predictors from sampler
  for (const auto & col : _predictor_cols)
    _sample_points[d++][_local_row] = _sampler_row[col];

  _sample_points.back()[_local_row] = _response;
}

postTrain

postTrain() is called after the sampler loop. This is typically where processor communication happens. Here, we use postTrain() to gather all the local _sample_points so that each processor has the full copy. _communicator.allgather makes it so that every processor has a copy of the full array and _communicator.gather makes it so that only one of the processors has the full copy, the latter is typically used because outputting only happens on the root processor. See libMesh::Parallel::Communicator for more communication options.

void
NearestPointTrainer::postTrain()
{
  for (auto & it : _sample_points)
    _communicator.allgather(it);
}

Creating a Surrogate

This example will go over the creation of NearestPointSurrogate. Surrogates are a specialized version of a MooseObject that must have the evaluate public member function. The validParams for a surrogate will generally define how the surrogate is evaluated. NearestPointSurrogate does not have any options for the method of evaluation.

Constructor

In the constructor, the references for the model data are defined, taken from the training data:

NearestPointSurrogate::NearestPointSurrogate(const InputParameters & parameters)
  : SurrogateModel(parameters),
    _sample_points(getModelData<std::vector<std::vector<Real>>>("_sample_points"))
{
}

See Surrogates for more information on the getModelData function. _sample_points in the surrogate is a const reference, since we do not want to modify the training data during evaluation:

  /// Array containing sample points and the results
  const std::vector<std::vector<Real>> & _sample_points;

evaluate

evaluate is a public member function required for all surrogate models. This is where surrogate model is actually used. evaluate takes in parameter values and returns the surrogate's estimation of the quantity of interest. See EvaluateSurrogate for an example on how the evaluate function is used.

Real
NearestPointSurrogate::evaluate(const std::vector<Real> & x) const
{
  // Check whether input point has same dimensionality as training data
  mooseAssert((_sample_points.size() - 1) == x.size(),
              "Input point does not match dimensionality of training data.");

  // Returned value from training data (first sample is default)
  Real val = _sample_points.back()[0];

  // Container of current minimum distance during training sample loop
  Real dist_min = std::numeric_limits<Real>::max();

  for (dof_id_type p = 0; p < _sample_points[0].size(); ++p)
  {
    // Sum over the distance of each point dimension
    Real dist = 0;
    for (unsigned int i = 0; i < x.size(); ++i)
    {
      Real diff = (x[i] - _sample_points[i][p]);
      dist += diff * diff;
    }

    // Check if this training point distance is smaller than the current minimum
    if (dist < dist_min)
    {
      val = _sample_points.back()[p];
      dist_min = dist;
    }
  }

  return val;
}

(modules/stochastic_tools/src/surrogates/SurrogateTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "SurrogateTrainer.h"
#include "Sampler.h"
#include "RestartableDataIO.h"
#include "StochasticToolsApp.h"

InputParameters
SurrogateTrainerBase::validParams()
{
  InputParameters params = GeneralUserObject::validParams();
  params.registerBase("SurrogateTrainer");
  return params;
}

SurrogateTrainerBase::SurrogateTrainerBase(const InputParameters & parameters)
  : GeneralUserObject(parameters), _model_meta_data_name(_type + "_" + name())
{
  _app.registerRestartableDataMapName(_model_meta_data_name, name());
}

InputParameters
SurrogateTrainer::validParams()
{
  InputParameters params = SurrogateTrainerBase::validParams();
  params.addRequiredParam<SamplerName>("sampler",
                                       "Sampler used to create predictor and response data.");
  params.addParam<ReporterName>(
      "converged_reporter",
      "Reporter value used to determine if a sample's multiapp solve converged.");
  params.addParam<bool>("skip_unconverged_samples",
                        false,
                        "True to skip samples where the multiapp did not converge, "
                        "'stochastic_reporter' is required to do this.");
  return params;
}

SurrogateTrainer::SurrogateTrainer(const InputParameters & parameters)
  : SurrogateTrainerBase(parameters),
    _sampler(getSampler("sampler")),
    _row_data(_sampler.getNumberOfCols()),
    _skip_unconverged(getParam<bool>("skip_unconverged_samples")),
    _converged(nullptr)
{
  if (_skip_unconverged)
  {
    if (!isParamValid("converged_reporter"))
      paramError("skip_unconverged_samples",
                 "'converged_reporter' needs to be specified to skip unconverged sample.");
    _converged = &getTrainingData<bool>(getParam<ReporterName>("converged_reporter"));
  }
}

void
SurrogateTrainer::initialize()
{
  // Figure out if data is distributed
  for (auto & pair : _training_data)
  {
    const ReporterName & name = pair.first;
    TrainingDataBase & data = *pair.second;

    const auto & mode = _fe_problem.getReporterData().getReporterMode(name);
    if (mode == REPORTER_MODE_DISTRIBUTED || (mode == REPORTER_MODE_ROOT && processor_id() != 0))
      data.isDistributed() = true;
    else if (mode == REPORTER_MODE_REPLICATED ||
             (mode == REPORTER_MODE_ROOT && processor_id() == 0))
      data.isDistributed() = false;
    else
      mooseError("Predictor reporter value ", name, " is not of supported mode.");
  }
}

void
SurrogateTrainer::execute()
{
  checkIntegrity();

  _row = _sampler.getLocalRowBegin();
  _local_row = 0;

  preTrain();

  for (_row = _sampler.getLocalRowBegin(); _row < _sampler.getLocalRowEnd(); ++_row)
  {
    // Need to do this manually in order to keep the iterators valid
    const std::vector<Real> data = _sampler.getNextLocalRow();
    for (unsigned int i = 0; i < _row_data.size(); ++i)
      _row_data[i] = data[i];

    // Set training data
    for (auto & pair : _training_data)
      pair.second->setCurrentIndex((pair.second->isDistributed() ? _local_row : _row));

    if (!_skip_unconverged || *_converged)
      train();

    _local_row++;
  }

  postTrain();
}

void
SurrogateTrainer::checkIntegrity() const
{
  // Check that the number of sampler columns hasn't changed
  if (_row_data.size() != _sampler.getNumberOfCols())
    mooseError("Number of sampler columns has changed.");

  // Check that training data is correctly sized
  for (auto & pair : _training_data)
  {
    dof_id_type rsize = pair.second->size();
    dof_id_type nrow =
        pair.second->isDistributed() ? _sampler.getNumberOfLocalRows() : _sampler.getNumberOfRows();
    if (rsize != nrow)
      mooseError("Reporter value ",
                 pair.first,
                 " of size ",
                 rsize,
                 " does not match sampler size (",
                 nrow,
                 ").");
  }
}

(modules/stochastic_tools/src/surrogates/NearestPointTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointTrainer.h"
#include "Sampler.h"

registerMooseObject("StochasticToolsApp", NearestPointTrainer);

InputParameters
NearestPointTrainer::validParams()
{
  InputParameters params = SurrogateTrainer::validParams();
  params.addClassDescription("Loops over and saves sample values for [NearestPointSurrogate.md].");
  params.addRequiredParam<ReporterName>(
      "response",
      "Reporter value of response results, can be vpp with <vpp_name>/<vector_name> or sampler "
      "column with 'sampler/col_<index>'.");
  params.addParam<std::vector<ReporterName>>(
      "predictors",
      std::vector<ReporterName>(),
      "Reporter values used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");
  params.addParam<std::vector<unsigned int>>(
      "predictor_cols",
      std::vector<unsigned int>(),
      "Sampler columns used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");

  return params;
}

NearestPointTrainer::NearestPointTrainer(const InputParameters & parameters)
  : SurrogateTrainer(parameters),
    _sample_points(declareModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sampler_row(getSamplerData()),
    _response(getTrainingData<Real>(getParam<ReporterName>("response"))),
    _predictor_cols(getParam<std::vector<unsigned int>>("predictor_cols"))
{
  for (const ReporterName & rname : getParam<std::vector<ReporterName>>("predictors"))
    _predictors.push_back(&getTrainingData<Real>(rname));

  // If predictors and predictor_cols are empty, use all sampler columns
  if (_predictors.empty() && _predictor_cols.empty())
  {
    _predictor_cols.resize(_sampler.getNumberOfCols());
    std::iota(_predictor_cols.begin(), _predictor_cols.end(), 0);
  }

  // Resize sample points to number of predictors
  _sample_points.resize(_predictors.size() + _predictor_cols.size() + 1);
}

void
NearestPointTrainer::preTrain()
{
  // Resize to number of sample points
  for (auto & it : _sample_points)
    it.resize(_sampler.getNumberOfLocalRows());
}

void
NearestPointTrainer::train()
{
  unsigned int d = 0;
  // Get predictors from reporter values
  for (const auto & val : _predictors)
    _sample_points[d++][_local_row] = *val;
  // Get predictors from sampler
  for (const auto & col : _predictor_cols)
    _sample_points[d++][_local_row] = _sampler_row[col];

  _sample_points.back()[_local_row] = _response;
}

void
NearestPointTrainer::postTrain()
{
  for (auto & it : _sample_points)
    _communicator.allgather(it);
}

(modules/stochastic_tools/src/surrogates/NearestPointTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointTrainer.h"
#include "Sampler.h"

registerMooseObject("StochasticToolsApp", NearestPointTrainer);

InputParameters
NearestPointTrainer::validParams()
{
  InputParameters params = SurrogateTrainer::validParams();
  params.addClassDescription("Loops over and saves sample values for [NearestPointSurrogate.md].");
  params.addRequiredParam<ReporterName>(
      "response",
      "Reporter value of response results, can be vpp with <vpp_name>/<vector_name> or sampler "
      "column with 'sampler/col_<index>'.");
  params.addParam<std::vector<ReporterName>>(
      "predictors",
      std::vector<ReporterName>(),
      "Reporter values used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");
  params.addParam<std::vector<unsigned int>>(
      "predictor_cols",
      std::vector<unsigned int>(),
      "Sampler columns used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");

  return params;
}

NearestPointTrainer::NearestPointTrainer(const InputParameters & parameters)
  : SurrogateTrainer(parameters),
    _sample_points(declareModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sampler_row(getSamplerData()),
    _response(getTrainingData<Real>(getParam<ReporterName>("response"))),
    _predictor_cols(getParam<std::vector<unsigned int>>("predictor_cols"))
{
  for (const ReporterName & rname : getParam<std::vector<ReporterName>>("predictors"))
    _predictors.push_back(&getTrainingData<Real>(rname));

  // If predictors and predictor_cols are empty, use all sampler columns
  if (_predictors.empty() && _predictor_cols.empty())
  {
    _predictor_cols.resize(_sampler.getNumberOfCols());
    std::iota(_predictor_cols.begin(), _predictor_cols.end(), 0);
  }

  // Resize sample points to number of predictors
  _sample_points.resize(_predictors.size() + _predictor_cols.size() + 1);
}

void
NearestPointTrainer::preTrain()
{
  // Resize to number of sample points
  for (auto & it : _sample_points)
    it.resize(_sampler.getNumberOfLocalRows());
}

void
NearestPointTrainer::train()
{
  unsigned int d = 0;
  // Get predictors from reporter values
  for (const auto & val : _predictors)
    _sample_points[d++][_local_row] = *val;
  // Get predictors from sampler
  for (const auto & col : _predictor_cols)
    _sample_points[d++][_local_row] = _sampler_row[col];

  _sample_points.back()[_local_row] = _response;
}

void
NearestPointTrainer::postTrain()
{
  for (auto & it : _sample_points)
    _communicator.allgather(it);
}

(modules/stochastic_tools/include/surrogates/NearestPointTrainer.h)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#pragma once

#include "SurrogateTrainer.h"

class NearestPointTrainer : public SurrogateTrainer
{
public:
  static InputParameters validParams();
  NearestPointTrainer(const InputParameters & parameters);
  virtual void preTrain() override;
  virtual void train() override;
  virtual void postTrain() override;

protected:
  /// Map containing sample points and the results
  std::vector<std::vector<Real>> & _sample_points;

  /// Data from the current sampler row
  const std::vector<Real> & _sampler_row;

  /// Response value
  const Real & _response;

  /// Columns from sampler for predictors
  std::vector<unsigned int> _predictor_cols;

  /// Predictor values from reporters
  std::vector<const Real *> _predictors;
};

(modules/stochastic_tools/src/surrogates/NearestPointTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointTrainer.h"
#include "Sampler.h"

registerMooseObject("StochasticToolsApp", NearestPointTrainer);

InputParameters
NearestPointTrainer::validParams()
{
  InputParameters params = SurrogateTrainer::validParams();
  params.addClassDescription("Loops over and saves sample values for [NearestPointSurrogate.md].");
  params.addRequiredParam<ReporterName>(
      "response",
      "Reporter value of response results, can be vpp with <vpp_name>/<vector_name> or sampler "
      "column with 'sampler/col_<index>'.");
  params.addParam<std::vector<ReporterName>>(
      "predictors",
      std::vector<ReporterName>(),
      "Reporter values used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");
  params.addParam<std::vector<unsigned int>>(
      "predictor_cols",
      std::vector<unsigned int>(),
      "Sampler columns used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");

  return params;
}

NearestPointTrainer::NearestPointTrainer(const InputParameters & parameters)
  : SurrogateTrainer(parameters),
    _sample_points(declareModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sampler_row(getSamplerData()),
    _response(getTrainingData<Real>(getParam<ReporterName>("response"))),
    _predictor_cols(getParam<std::vector<unsigned int>>("predictor_cols"))
{
  for (const ReporterName & rname : getParam<std::vector<ReporterName>>("predictors"))
    _predictors.push_back(&getTrainingData<Real>(rname));

  // If predictors and predictor_cols are empty, use all sampler columns
  if (_predictors.empty() && _predictor_cols.empty())
  {
    _predictor_cols.resize(_sampler.getNumberOfCols());
    std::iota(_predictor_cols.begin(), _predictor_cols.end(), 0);
  }

  // Resize sample points to number of predictors
  _sample_points.resize(_predictors.size() + _predictor_cols.size() + 1);
}

void
NearestPointTrainer::preTrain()
{
  // Resize to number of sample points
  for (auto & it : _sample_points)
    it.resize(_sampler.getNumberOfLocalRows());
}

void
NearestPointTrainer::train()
{
  unsigned int d = 0;
  // Get predictors from reporter values
  for (const auto & val : _predictors)
    _sample_points[d++][_local_row] = *val;
  // Get predictors from sampler
  for (const auto & col : _predictor_cols)
    _sample_points[d++][_local_row] = _sampler_row[col];

  _sample_points.back()[_local_row] = _response;
}

void
NearestPointTrainer::postTrain()
{
  for (auto & it : _sample_points)
    _communicator.allgather(it);
}

(modules/stochastic_tools/src/surrogates/NearestPointTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointTrainer.h"
#include "Sampler.h"

registerMooseObject("StochasticToolsApp", NearestPointTrainer);

InputParameters
NearestPointTrainer::validParams()
{
  InputParameters params = SurrogateTrainer::validParams();
  params.addClassDescription("Loops over and saves sample values for [NearestPointSurrogate.md].");
  params.addRequiredParam<ReporterName>(
      "response",
      "Reporter value of response results, can be vpp with <vpp_name>/<vector_name> or sampler "
      "column with 'sampler/col_<index>'.");
  params.addParam<std::vector<ReporterName>>(
      "predictors",
      std::vector<ReporterName>(),
      "Reporter values used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");
  params.addParam<std::vector<unsigned int>>(
      "predictor_cols",
      std::vector<unsigned int>(),
      "Sampler columns used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");

  return params;
}

NearestPointTrainer::NearestPointTrainer(const InputParameters & parameters)
  : SurrogateTrainer(parameters),
    _sample_points(declareModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sampler_row(getSamplerData()),
    _response(getTrainingData<Real>(getParam<ReporterName>("response"))),
    _predictor_cols(getParam<std::vector<unsigned int>>("predictor_cols"))
{
  for (const ReporterName & rname : getParam<std::vector<ReporterName>>("predictors"))
    _predictors.push_back(&getTrainingData<Real>(rname));

  // If predictors and predictor_cols are empty, use all sampler columns
  if (_predictors.empty() && _predictor_cols.empty())
  {
    _predictor_cols.resize(_sampler.getNumberOfCols());
    std::iota(_predictor_cols.begin(), _predictor_cols.end(), 0);
  }

  // Resize sample points to number of predictors
  _sample_points.resize(_predictors.size() + _predictor_cols.size() + 1);
}

void
NearestPointTrainer::preTrain()
{
  // Resize to number of sample points
  for (auto & it : _sample_points)
    it.resize(_sampler.getNumberOfLocalRows());
}

void
NearestPointTrainer::train()
{
  unsigned int d = 0;
  // Get predictors from reporter values
  for (const auto & val : _predictors)
    _sample_points[d++][_local_row] = *val;
  // Get predictors from sampler
  for (const auto & col : _predictor_cols)
    _sample_points[d++][_local_row] = _sampler_row[col];

  _sample_points.back()[_local_row] = _response;
}

void
NearestPointTrainer::postTrain()
{
  for (auto & it : _sample_points)
    _communicator.allgather(it);
}

(modules/stochastic_tools/src/surrogates/NearestPointTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointTrainer.h"
#include "Sampler.h"

registerMooseObject("StochasticToolsApp", NearestPointTrainer);

InputParameters
NearestPointTrainer::validParams()
{
  InputParameters params = SurrogateTrainer::validParams();
  params.addClassDescription("Loops over and saves sample values for [NearestPointSurrogate.md].");
  params.addRequiredParam<ReporterName>(
      "response",
      "Reporter value of response results, can be vpp with <vpp_name>/<vector_name> or sampler "
      "column with 'sampler/col_<index>'.");
  params.addParam<std::vector<ReporterName>>(
      "predictors",
      std::vector<ReporterName>(),
      "Reporter values used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");
  params.addParam<std::vector<unsigned int>>(
      "predictor_cols",
      std::vector<unsigned int>(),
      "Sampler columns used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");

  return params;
}

NearestPointTrainer::NearestPointTrainer(const InputParameters & parameters)
  : SurrogateTrainer(parameters),
    _sample_points(declareModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sampler_row(getSamplerData()),
    _response(getTrainingData<Real>(getParam<ReporterName>("response"))),
    _predictor_cols(getParam<std::vector<unsigned int>>("predictor_cols"))
{
  for (const ReporterName & rname : getParam<std::vector<ReporterName>>("predictors"))
    _predictors.push_back(&getTrainingData<Real>(rname));

  // If predictors and predictor_cols are empty, use all sampler columns
  if (_predictors.empty() && _predictor_cols.empty())
  {
    _predictor_cols.resize(_sampler.getNumberOfCols());
    std::iota(_predictor_cols.begin(), _predictor_cols.end(), 0);
  }

  // Resize sample points to number of predictors
  _sample_points.resize(_predictors.size() + _predictor_cols.size() + 1);
}

void
NearestPointTrainer::preTrain()
{
  // Resize to number of sample points
  for (auto & it : _sample_points)
    it.resize(_sampler.getNumberOfLocalRows());
}

void
NearestPointTrainer::train()
{
  unsigned int d = 0;
  // Get predictors from reporter values
  for (const auto & val : _predictors)
    _sample_points[d++][_local_row] = *val;
  // Get predictors from sampler
  for (const auto & col : _predictor_cols)
    _sample_points[d++][_local_row] = _sampler_row[col];

  _sample_points.back()[_local_row] = _response;
}

void
NearestPointTrainer::postTrain()
{
  for (auto & it : _sample_points)
    _communicator.allgather(it);
}

(modules/stochastic_tools/src/surrogates/NearestPointSurrogate.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointSurrogate.h"

registerMooseObject("StochasticToolsApp", NearestPointSurrogate);

InputParameters
NearestPointSurrogate::validParams()
{
  InputParameters params = SurrogateModel::validParams();
  params.addClassDescription("Surrogate that evaluates the value from the nearest point from data "
                             "in [NearestPointTrainer.md]");
  return params;
}

NearestPointSurrogate::NearestPointSurrogate(const InputParameters & parameters)
  : SurrogateModel(parameters),
    _sample_points(getModelData<std::vector<std::vector<Real>>>("_sample_points"))
{
}

Real
NearestPointSurrogate::evaluate(const std::vector<Real> & x) const
{
  // Check whether input point has same dimensionality as training data
  mooseAssert((_sample_points.size() - 1) == x.size(),
              "Input point does not match dimensionality of training data.");

  // Returned value from training data (first sample is default)
  Real val = _sample_points.back()[0];

  // Container of current minimum distance during training sample loop
  Real dist_min = std::numeric_limits<Real>::max();

  for (dof_id_type p = 0; p < _sample_points[0].size(); ++p)
  {
    // Sum over the distance of each point dimension
    Real dist = 0;
    for (unsigned int i = 0; i < x.size(); ++i)
    {
      Real diff = (x[i] - _sample_points[i][p]);
      dist += diff * diff;
    }

    // Check if this training point distance is smaller than the current minimum
    if (dist < dist_min)
    {
      val = _sample_points.back()[p];
      dist_min = dist;
    }
  }

  return val;
}

(modules/stochastic_tools/include/surrogates/NearestPointSurrogate.h)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#pragma once

#include "SurrogateModel.h"

class NearestPointSurrogate : public SurrogateModel
{
public:
  static InputParameters validParams();
  NearestPointSurrogate(const InputParameters & parameters);
  using SurrogateModel::evaluate;
  virtual Real evaluate(const std::vector<Real> & x) const override;

protected:
  /// Array containing sample points and the results
  const std::vector<std::vector<Real>> & _sample_points;
};

(modules/stochastic_tools/src/reporters/EvaluateSurrogate.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

// Stocastic Tools Includes
#include "EvaluateSurrogate.h"

#include "Sampler.h"

registerMooseObject("StochasticToolsApp", EvaluateSurrogate);

InputParameters
EvaluateSurrogate::validParams()
{
  InputParameters params = StochasticReporter::validParams();
  params += SurrogateModelInterface::validParams();
  params += SamplerInterface::validParams();
  params.addClassDescription("Tool for sampling surrogate models.");
  params.addRequiredParam<std::vector<UserObjectName>>("model", "Name of surrogate models.");
  params.addRequiredParam<SamplerName>("sampler",
                                       "Sampler to use for evaluating surrogate models.");
  MultiMooseEnum rtypes(SurrogateModel::defaultResponseTypes().getRawNames(), "real");
  params.addParam<MultiMooseEnum>(
      "response_type",
      rtypes,
      "The type of return value expected from the surrogate models, a single entry will use it for "
      "every model. Warning: not every model is able evaluate every response type.");
  MultiMooseEnum estd("false=0 true=1", "false");
  params.addParam<MultiMooseEnum>(
      "evaluate_std",
      estd,
      "Whether or not to evaluate standard deviation associated with each sample, a single entry "
      "will use it for every model. Warning: not every model can compute standard deviation.");
  return params;
}

EvaluateSurrogate::EvaluateSurrogate(const InputParameters & parameters)
  : StochasticReporter(parameters),
    SurrogateModelInterface(this),
    _sampler(getSampler("sampler")),
    _response_types(getParam<MultiMooseEnum>("response_type"))
{
  const auto & model_names = getParam<std::vector<UserObjectName>>("model");
  _model.reserve(model_names.size());
  for (const auto & nm : model_names)
    _model.push_back(&getSurrogateModelByName(nm));

  if (_response_types.size() != 1 && _response_types.size() != _model.size())
    paramError("response_type",
               "Number of entries must be 1 or equal to the number of entries in 'model'.");

  const auto & estd = getParam<MultiMooseEnum>("evaluate_std");
  if (estd.size() != 1 && estd.size() != _model.size())
    paramError("evaluate_std",
               "Nmber of entries must be 1 or equal to the number of entries in 'model'.");
  _doing_std.resize(_model.size());
  for (const auto i : make_range(_model.size()))
    _doing_std[i] = estd.size() == 1 ? estd[0] == "true" : estd[i] == "true";

  _real_values.resize(_model.size(), nullptr);
  _real_std.resize(_model.size(), nullptr);
  _vector_real_values.resize(_model.size(), nullptr);
  _vector_real_std.resize(_model.size(), nullptr);
  for (const auto i : make_range(_model.size()))
  {
    const std::string rtype = _response_types.size() == 1 ? _response_types[0] : _response_types[i];
    if (rtype == "real")
    {
      _real_values[i] = &declareStochasticReporter<Real>(model_names[i], _sampler);
      if (_doing_std[i])
        _real_std[i] = &declareStochasticReporter<Real>(model_names[i] + "_std", _sampler);
    }
    else if (rtype == "vector_real")
    {
      _vector_real_values[i] =
          &declareStochasticReporter<std::vector<Real>>(model_names[i], _sampler);
      if (_doing_std[i])
        _vector_real_std[i] =
            &declareStochasticReporter<std::vector<Real>>(model_names[i] + "_std", _sampler);
    }
    else
      paramError("response_type", "Unknown response type ", _response_types[i]);
  }
}

void
EvaluateSurrogate::execute()
{
  // Loop over samples
  for (const auto ind : make_range(_sampler.getNumberOfLocalRows()))
  {
    const std::vector<Real> data = _sampler.getNextLocalRow();
    for (const auto m : make_range(_model.size()))
    {
      if (_real_values[m] && _real_std[m])
        (*_real_values[m])[ind] = _model[m]->evaluate(data, (*_real_std[m])[ind]);
      else if (_real_values[m])
        (*_real_values[m])[ind] = _model[m]->evaluate(data);
      else if (_vector_real_values[m] && _vector_real_std[m])
        _model[m]->evaluate(data, (*_vector_real_values[m])[ind], (*_vector_real_std[m])[ind]);
      else if (_vector_real_values[m])
        _model[m]->evaluate(data, (*_vector_real_values[m])[ind]);
    }
  }
}

(modules/stochastic_tools/src/surrogates/NearestPointSurrogate.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointSurrogate.h"

registerMooseObject("StochasticToolsApp", NearestPointSurrogate);

InputParameters
NearestPointSurrogate::validParams()
{
  InputParameters params = SurrogateModel::validParams();
  params.addClassDescription("Surrogate that evaluates the value from the nearest point from data "
                             "in [NearestPointTrainer.md]");
  return params;
}

NearestPointSurrogate::NearestPointSurrogate(const InputParameters & parameters)
  : SurrogateModel(parameters),
    _sample_points(getModelData<std::vector<std::vector<Real>>>("_sample_points"))
{
}

Real
NearestPointSurrogate::evaluate(const std::vector<Real> & x) const
{
  // Check whether input point has same dimensionality as training data
  mooseAssert((_sample_points.size() - 1) == x.size(),
              "Input point does not match dimensionality of training data.");

  // Returned value from training data (first sample is default)
  Real val = _sample_points.back()[0];

  // Container of current minimum distance during training sample loop
  Real dist_min = std::numeric_limits<Real>::max();

  for (dof_id_type p = 0; p < _sample_points[0].size(); ++p)
  {
    // Sum over the distance of each point dimension
    Real dist = 0;
    for (unsigned int i = 0; i < x.size(); ++i)
    {
      Real diff = (x[i] - _sample_points[i][p]);
      dist += diff * diff;
    }

    // Check if this training point distance is smaller than the current minimum
    if (dist < dist_min)
    {
      val = _sample_points.back()[p];
      dist_min = dist;
    }
  }

  return val;
}

Overview
Creating a Trainer
Creating a Surrogate

Install MOOSE

New Users

Examples and Tutorials

Application Usage

Physics and Syntax

Application Development

Framework Development

MOOSEDocs

Infrastructure

Questions

Information and Tools

INL Applications and Remote Access

Creating a Surrogate Model

Overview

Creating a Trainer

validParams

Constructor

preTrain

train

postTrain

Creating a Surrogate

Constructor

evaluate