PolynomialRegressionTrainer

Computes coefficients for polynomial regession model.

Overview

This class is responsible for determining the coefficients of a multi-dimensional polynomial which approximates the behavior of the Quantity of Interest (QoI) in the parameter space. For this, the object needs training points in the parameters space. These training points can be characterized using their coordinates in the parameter space:

where denotes the dimension of the parameter space and is a D-dimensional vector containing the coordinates in each dimension. Similarly to other Trainer classes, PolynomialRegressionTrainer accesses this matrix from a Sampler object. Of course, the trainer has to know the values of the QoI at these coordinates as well:

This data is accessed through a VectorPostprocessor. Now that all data is available, unknown function is approximated using a polynomial expression of the following form:

(1)

where is the number of polynomial terms in the approximation, are the unknown coefficients and . The used polynomials in this case can be defined as

where denotes the -th coordinate of parameter vector , while is a -dimensional tuple containing the powers for each coordinate. This tuple is the same as described in PolynomialChaos. To determine these tuples, the trainer needs an additional input parameter, namely the maximum degree of the polynomial. This limits the number of polynomial terms in Eq. (1). If this number is fixed, the only unknown parameters are the elements of .

To determine these, a regression matrix can be defined as:

Ordinary Least Squares (OLS) regression

Using regression matrix and Ordinary Least Squares (OLS) approach described on Wikipedia in detail, the unknown coefficients can be determined as follows:

Finally, it must be mentioned that this method is only applicable if and keeping is recommended.

Ridge regression

Unfortunately, the OLS approach is known to have some issues like: - It is prone to overfit the data, - It yields inaccurate results if the input variables are correlated, - It is sensitive to outliers.

To tackle the problem, an regularization (or Tikhonov regularization is adopted to make sure that the coefficients of the expansion do have uncontrollably high values. This extended least squares regression is often referred to as Ridge Regression. In this scenario the coefficients can be determined by solving:

where is a penalty parameter which penalizes coefficients with large magnitudes. As , Ridge regression converges to OLS.

Example Input File Syntax

To get the necessary data, two essential blocks have to be included in the master input file. The first, the sampler defined in Samplers, creates the coordinates in matrix , while the objects in VectorPostprocessors create, fill and store the result vector .

[Samplers<<<{"href": "../../syntax/Samplers/index.html"}>>>]
  [sample]
    type = CartesianProduct<<<{"description": "Provides complete Cartesian product for the supplied variables.", "href": "../samplers/CartesianProductSampler.html"}>>>
    linear_space_items<<<{"description": "A list of triplets, each item should include the min, step size, and number of steps."}>>> = '0 1 10
                          0 1 10
                          0 1 10'
  []
[]
(modules/stochastic_tools/test/tests/surrogates/polynomial_regression/train.i)
[VectorPostprocessors<<<{"href": "../../syntax/VectorPostprocessors/index.html"}>>>]
  [values]
    type = GFunction
    sampler = sample
    q_vector = '0 0 0'
    execute_on = INITIAL
    outputs = none
  []
[]
(modules/stochastic_tools/test/tests/surrogates/polynomial_regression/train.i)

Similarly to NearestPointTrainer, a GFunction vector postprocessor from SobolStatistics is used to emulate a full-order model. This simply evaluates a function at sample points.

Using this data and the maximum degree setting (max_degree in the input file), the trainer computes the model coefficients . To control the type of regression, the user has to set regression_type to either 'ols' or 'ridge' in the input file:

[Trainers<<<{"href": "../../syntax/Trainers/index.html"}>>>]
  [train]
    type = PolynomialRegressionTrainer<<<{"description": "Computes coefficients for polynomial regession model.", "href": "PolynomialRegressionTrainer.html"}>>>
    regression_type<<<{"description": "The type of regression to perform."}>>> = "ols"
    sampler<<<{"description": "Sampler used to create predictor and response data."}>>> = sample
    response<<<{"description": "Reporter value of response results, can be vpp with <vpp_name>/<vector_name> or sampler column with 'sampler/col_<index>'."}>>> = values/g_values
    max_degree<<<{"description": "Maximum polynomial degree to use for the regression."}>>> = 3
  []
[]
(modules/stochastic_tools/test/tests/surrogates/polynomial_regression/train.i)

Input Parameters

  • max_degreeMaximum polynomial degree to use for the regression.

    C++ Type:unsigned int

    Controllable:No

    Description:Maximum polynomial degree to use for the regression.

  • regression_typeThe type of regression to perform.

    C++ Type:MooseEnum

    Options:ols, ridge

    Controllable:No

    Description:The type of regression to perform.

  • responseReporter value of response results, can be vpp with / or sampler column with 'sampler/col_'.

    C++ Type:ReporterName

    Controllable:No

    Description:Reporter value of response results, can be vpp with / or sampler column with 'sampler/col_'.

  • samplerSampler used to create predictor and response data.

    C++ Type:SamplerName

    Controllable:No

    Description:Sampler used to create predictor and response data.

Required Parameters

  • converged_reporterReporter value used to determine if a sample's multiapp solve converged.

    C++ Type:ReporterName

    Controllable:No

    Description:Reporter value used to determine if a sample's multiapp solve converged.

  • cv_n_trials1Number of repeated trials of cross-validation to perform.

    Default:1

    C++ Type:unsigned int

    Controllable:No

    Description:Number of repeated trials of cross-validation to perform.

  • cv_seed4294967295Seed used to initialize random number generator for data splitting during cross validation.

    Default:4294967295

    C++ Type:unsigned int

    Controllable:No

    Description:Seed used to initialize random number generator for data splitting during cross validation.

  • cv_splits10Number of splits (k) to use in k-fold cross-validation.

    Default:10

    C++ Type:unsigned int

    Controllable:No

    Description:Number of splits (k) to use in k-fold cross-validation.

  • cv_surrogateName of Surrogate object used for model cross-validation.

    C++ Type:UserObjectName

    Controllable:No

    Description:Name of Surrogate object used for model cross-validation.

  • cv_typenoneCross-validation method to use for dataset. Options are 'none' or 'k_fold'.

    Default:none

    C++ Type:MooseEnum

    Options:none, k_fold

    Controllable:No

    Description:Cross-validation method to use for dataset. Options are 'none' or 'k_fold'.

  • filenameThe name of the file which will be associated with the saved/loaded data.

    C++ Type:FileName

    Controllable:No

    Description:The name of the file which will be associated with the saved/loaded data.

  • penalty0Penalty for Ridge regularization.

    Default:0

    C++ Type:double

    Unit:(no unit assumed)

    Controllable:No

    Description:Penalty for Ridge regularization.

  • predictor_colsSampler columns used as the independent random variables, If 'predictors' and 'predictor_cols' are both empty, all sampler columns are used.

    C++ Type:std::vector<unsigned int>

    Controllable:No

    Description:Sampler columns used as the independent random variables, If 'predictors' and 'predictor_cols' are both empty, all sampler columns are used.

  • predictorsReporter values used as the independent random variables, If 'predictors' and 'predictor_cols' are both empty, all sampler columns are used.

    C++ Type:std::vector<ReporterName>

    Controllable:No

    Description:Reporter values used as the independent random variables, If 'predictors' and 'predictor_cols' are both empty, all sampler columns are used.

  • response_typerealResponse data type.

    Default:real

    C++ Type:MooseEnum

    Options:real, vector_real

    Controllable:No

    Description:Response data type.

  • skip_unconverged_samplesFalseTrue to skip samples where the multiapp did not converge, 'stochastic_reporter' is required to do this.

    Default:False

    C++ Type:bool

    Controllable:No

    Description:True to skip samples where the multiapp did not converge, 'stochastic_reporter' is required to do this.

Optional Parameters

  • allow_duplicate_execution_on_initialFalseIn the case where this UserObject is depended upon by an initial condition, allow it to be executed twice during the initial setup (once before the IC and again after mesh adaptivity (if applicable).

    Default:False

    C++ Type:bool

    Controllable:No

    Description:In the case where this UserObject is depended upon by an initial condition, allow it to be executed twice during the initial setup (once before the IC and again after mesh adaptivity (if applicable).

  • execute_onTIMESTEP_ENDThe list of flag(s) indicating when this object should be executed. For a description of each flag, see https://mooseframework.inl.gov/source/interfaces/SetupInterface.html.

    Default:TIMESTEP_END

    C++ Type:ExecFlagEnum

    Options:XFEM_MARK, FORWARD, ADJOINT, HOMOGENEOUS_FORWARD, ADJOINT_TIMESTEP_BEGIN, ADJOINT_TIMESTEP_END, NONE, INITIAL, LINEAR, LINEAR_CONVERGENCE, NONLINEAR, NONLINEAR_CONVERGENCE, POSTCHECK, TIMESTEP_END, TIMESTEP_BEGIN, MULTIAPP_FIXED_POINT_END, MULTIAPP_FIXED_POINT_BEGIN, MULTIAPP_FIXED_POINT_CONVERGENCE, FINAL, CUSTOM

    Controllable:No

    Description:The list of flag(s) indicating when this object should be executed. For a description of each flag, see https://mooseframework.inl.gov/source/interfaces/SetupInterface.html.

  • execution_order_group0Execution order groups are executed in increasing order (e.g., the lowest number is executed first). Note that negative group numbers may be used to execute groups before the default (0) group. Please refer to the user object documentation for ordering of user object execution within a group.

    Default:0

    C++ Type:int

    Controllable:No

    Description:Execution order groups are executed in increasing order (e.g., the lowest number is executed first). Note that negative group numbers may be used to execute groups before the default (0) group. Please refer to the user object documentation for ordering of user object execution within a group.

  • force_postauxFalseForces the UserObject to be executed in POSTAUX

    Default:False

    C++ Type:bool

    Controllable:No

    Description:Forces the UserObject to be executed in POSTAUX

  • force_preauxFalseForces the UserObject to be executed in PREAUX

    Default:False

    C++ Type:bool

    Controllable:No

    Description:Forces the UserObject to be executed in PREAUX

  • force_preicFalseForces the UserObject to be executed in PREIC during initial setup

    Default:False

    C++ Type:bool

    Controllable:No

    Description:Forces the UserObject to be executed in PREIC during initial setup

Execution Scheduling Parameters

  • control_tagsAdds user-defined labels for accessing object parameters via control logic.

    C++ Type:std::vector<std::string>

    Controllable:No

    Description:Adds user-defined labels for accessing object parameters via control logic.

  • enableTrueSet the enabled status of the MooseObject.

    Default:True

    C++ Type:bool

    Controllable:Yes

    Description:Set the enabled status of the MooseObject.

  • use_displaced_meshFalseWhether or not this object should use the displaced mesh for computation. Note that in the case this is true but no displacements are provided in the Mesh block the undisplaced mesh will still be used.

    Default:False

    C++ Type:bool

    Controllable:No

    Description:Whether or not this object should use the displaced mesh for computation. Note that in the case this is true but no displacements are provided in the Mesh block the undisplaced mesh will still be used.

Advanced Parameters

  • prop_getter_suffixAn optional suffix parameter that can be appended to any attempt to retrieve/get material properties. The suffix will be prepended with a '_' character.

    C++ Type:MaterialPropertyName

    Unit:(no unit assumed)

    Controllable:No

    Description:An optional suffix parameter that can be appended to any attempt to retrieve/get material properties. The suffix will be prepended with a '_' character.

  • use_interpolated_stateFalseFor the old and older state use projected material properties interpolated at the quadrature points. To set up projection use the ProjectedStatefulMaterialStorageAction.

    Default:False

    C++ Type:bool

    Controllable:No

    Description:For the old and older state use projected material properties interpolated at the quadrature points. To set up projection use the ProjectedStatefulMaterialStorageAction.

Material Property Retrieval Parameters

Input Files