LibtorchDRLControlTrainer

Trains a neural network controller using the Proximal Policy Optimization (PPO) algorithm.

Overview

This object is supposed to train a Deep Reinforcement Learning (DRL) controller using the Proximal Policy Optimization (PPO) algorithm Schulman et al. (2017).

Example Input File Syntax

Input Parameters

action_standard_deviationsStandard deviation value used while sampling the actions.
C++ Type:std::vector<double>
Unit:(no unit assumed)
Controllable:No
Description:Standard deviation value used while sampling the actions.
controlReporters containing the values of the controlled quantities (control signals) from the model simulations.
C++ Type:std::vector<ReporterName>
Controllable:No
Description:Reporters containing the values of the controlled quantities (control signals) from the model simulations.
control_learning_rateLearning rate (relaxation) for the control neural net training.
C++ Type:double
Unit:(no unit assumed)
Controllable:No
Description:Learning rate (relaxation) for the control neural net training.
critic_learning_rateLearning rate (relaxation) for the emulator training.
C++ Type:double
Unit:(no unit assumed)
Controllable:No
Description:Learning rate (relaxation) for the emulator training.
log_probabilityReporters containing the log probabilities of the actions taken during the simulations.
C++ Type:std::vector<ReporterName>
Controllable:No
Description:Reporters containing the log probabilities of the actions taken during the simulations.
num_control_neurons_per_layerNumber of neurons per layer for the control neural network.
C++ Type:std::vector<unsigned int>
Controllable:No
Description:Number of neurons per layer for the control neural network.
num_critic_neurons_per_layerNumber of neurons per layer in the emulator neural net.
C++ Type:std::vector<unsigned int>
Controllable:No
Description:Number of neurons per layer in the emulator neural net.
num_epochsNumber of epochs for the training.
C++ Type:unsigned int
Controllable:No
Description:Number of epochs for the training.
responseReporter values containing the response values from the model.
C++ Type:std::vector<ReporterName>
Controllable:No
Description:Reporter values containing the response values from the model.
rewardReporter containing the earned time-dependent rewards from the simulation.
C++ Type:ReporterName
Controllable:No
Description:Reporter containing the earned time-dependent rewards from the simulation.

Required Parameters

clip_parameter0.2Clip parameter used while clamping the advantage value.
Default:0.2
C++ Type:double
Unit:(no unit assumed)
Controllable:No
Description:Clip parameter used while clamping the advantage value.
control_activation_functionsrelu The type of activation functions to use in the control neural net. It is either one value or one value per hidden layer.
Default:relu
C++ Type:std::vector<std::string>
Controllable:No
Description:The type of activation functions to use in the control neural net. It is either one value or one value per hidden layer.
critic_activation_functionsrelu The type of activation functions to use in the emulator neural net. It is either one value or one value per hidden layer.
Default:relu
C++ Type:std::vector<std::string>
Controllable:No
Description:The type of activation functions to use in the emulator neural net. It is either one value or one value per hidden layer.
decay_factor1Decay factor for calculating the return. This accounts for decreased reward values from the later steps.
Default:1
C++ Type:double
Unit:(no unit assumed)
Controllable:No
Description:Decay factor for calculating the return. This accounts for decreased reward values from the later steps.
filenameThe name of the file which will be associated with the saved/loaded data.
C++ Type:FileName
Controllable:No
Description:The name of the file which will be associated with the saved/loaded data.
filename_baseFilename used to output the neural net parameters.
C++ Type:std::string
Controllable:No
Description:Filename used to output the neural net parameters.
input_timesteps1Number of time steps to use in the input data, if larger than 1, data from the previous timesteps will be used as inputs in the training.
Default:1
C++ Type:unsigned int
Controllable:No
Description:Number of time steps to use in the input data, if larger than 1, data from the previous timesteps will be used as inputs in the training.
loss_print_frequency0The frequency which is used to print the loss values. If 0, the loss values are not printed.
Default:0
C++ Type:unsigned int
Controllable:No
Description:The frequency which is used to print the loss values. If 0, the loss values are not printed.
read_from_fileFalseSwitch to read the neural network parameters from a file.
Default:False
C++ Type:bool
Controllable:No
Description:Switch to read the neural network parameters from a file.
response_scaling_factorsA normalization constant which will be used to divide the response values. This is used for the manipulation of the neural net inputs for better training efficiency.
C++ Type:std::vector<double>
Unit:(no unit assumed)
Controllable:No
Description:A normalization constant which will be used to divide the response values. This is used for the manipulation of the neural net inputs for better training efficiency.
response_shift_factorsA shift constant which will be used to shift the response values. This is used for the manipulation of the neural net inputs for better training efficiency.
C++ Type:std::vector<double>
Unit:(no unit assumed)
Controllable:No
Description:A shift constant which will be used to shift the response values. This is used for the manipulation of the neural net inputs for better training efficiency.
seed11Random number generator seed for stochastic optimizers.
Default:11
C++ Type:unsigned int
Controllable:No
Description:Random number generator seed for stochastic optimizers.
shift_outputsTrueIf we would like to shift the outputs the realign the input-output pairs.
Default:True
C++ Type:bool
Controllable:No
Description:If we would like to shift the outputs the realign the input-output pairs.
skip_num_rows1Number of rows to ignore from training. We usually skip the 1st row from the reporter since it contains only initial values.
Default:1
C++ Type:unsigned int
Controllable:No
Description:Number of rows to ignore from training. We usually skip the 1st row from the reporter since it contains only initial values.
standardize_advantageTrueSwitch to enable the shifting and normalization of the advantages in the PPO algorithm.
Default:True
C++ Type:bool
Controllable:No
Description:Switch to enable the shifting and normalization of the advantages in the PPO algorithm.
update_frequency1Number of transient simulation data to collect for updating the controller neural network.
Default:1
C++ Type:unsigned int
Controllable:No
Description:Number of transient simulation data to collect for updating the controller neural network.

Optional Parameters

allow_duplicate_execution_on_initialFalseIn the case where this UserObject is depended upon by an initial condition, allow it to be executed twice during the initial setup (once before the IC and again after mesh adaptivity (if applicable).
Default:False
C++ Type:bool
Controllable:No
Description:In the case where this UserObject is depended upon by an initial condition, allow it to be executed twice during the initial setup (once before the IC and again after mesh adaptivity (if applicable).
execute_onTIMESTEP_ENDThe list of flag(s) indicating when this object should be executed. For a description of each flag, see https://mooseframework.inl.gov/source/interfaces/SetupInterface.html.
Default:TIMESTEP_END
C++ Type:ExecFlagEnum
Options:XFEM_MARK, FORWARD, ADJOINT, HOMOGENEOUS_FORWARD, ADJOINT_TIMESTEP_BEGIN, ADJOINT_TIMESTEP_END, NONE, INITIAL, LINEAR, LINEAR_CONVERGENCE, NONLINEAR, NONLINEAR_CONVERGENCE, POSTCHECK, TIMESTEP_END, TIMESTEP_BEGIN, MULTIAPP_FIXED_POINT_END, MULTIAPP_FIXED_POINT_BEGIN, MULTIAPP_FIXED_POINT_CONVERGENCE, FINAL, CUSTOM
Controllable:No
Description:The list of flag(s) indicating when this object should be executed. For a description of each flag, see https://mooseframework.inl.gov/source/interfaces/SetupInterface.html.
execution_order_group0Execution order groups are executed in increasing order (e.g., the lowest number is executed first). Note that negative group numbers may be used to execute groups before the default (0) group. Please refer to the user object documentation for ordering of user object execution within a group.
Default:0
C++ Type:int
Controllable:No
Description:Execution order groups are executed in increasing order (e.g., the lowest number is executed first). Note that negative group numbers may be used to execute groups before the default (0) group. Please refer to the user object documentation for ordering of user object execution within a group.
force_postauxFalseForces the UserObject to be executed in POSTAUX
Default:False
C++ Type:bool
Controllable:No
Description:Forces the UserObject to be executed in POSTAUX
force_preauxFalseForces the UserObject to be executed in PREAUX
Default:False
C++ Type:bool
Controllable:No
Description:Forces the UserObject to be executed in PREAUX
force_preicFalseForces the UserObject to be executed in PREIC during initial setup
Default:False
C++ Type:bool
Controllable:No
Description:Forces the UserObject to be executed in PREIC during initial setup

Execution Scheduling Parameters

control_tagsAdds user-defined labels for accessing object parameters via control logic.
C++ Type:std::vector<std::string>
Controllable:No
Description:Adds user-defined labels for accessing object parameters via control logic.
enableTrueSet the enabled status of the MooseObject.
Default:True
C++ Type:bool
Controllable:Yes
Description:Set the enabled status of the MooseObject.
use_displaced_meshFalseWhether or not this object should use the displaced mesh for computation. Note that in the case this is true but no displacements are provided in the Mesh block the undisplaced mesh will still be used.
Default:False
C++ Type:bool
Controllable:No
Description:Whether or not this object should use the displaced mesh for computation. Note that in the case this is true but no displacements are provided in the Mesh block the undisplaced mesh will still be used.

Advanced Parameters

prop_getter_suffixAn optional suffix parameter that can be appended to any attempt to retrieve/get material properties. The suffix will be prepended with a '_' character.
C++ Type:MaterialPropertyName
Unit:(no unit assumed)
Controllable:No
Description:An optional suffix parameter that can be appended to any attempt to retrieve/get material properties. The suffix will be prepended with a '_' character.
use_interpolated_stateFalseFor the old and older state use projected material properties interpolated at the quadrature points. To set up projection use the ProjectedStatefulMaterialStorageAction.
Default:False
C++ Type:bool
Controllable:No
Description:For the old and older state use projected material properties interpolated at the quadrature points. To set up projection use the ProjectedStatefulMaterialStorageAction.

Material Property Retrieval Parameters

Input Files

(modules/stochastic_tools/test/tests/transfers/libtorch_nn_transfer/libtorch_drl_control_trainer.i)
(modules/stochastic_tools/examples/libtorch_drl_control/libtorch_drl_control_trainer.i)

References

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

@article{schulman2017proximal,
    author = "Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg",
    title = "Proximal policy optimization algorithms",
    journal = "arXiv preprint arXiv:1707.06347",
    year = "2017"
}

(modules/stochastic_tools/test/tests/transfers/libtorch_nn_transfer/libtorch_drl_control_trainer.i)

[StochasticTools]
[]

[Samplers]
  [dummy]
    type = CartesianProduct
    linear_space_items = '0 0.01 1'
  []
[]

[MultiApps]
  [runner]
    type = SamplerFullSolveMultiApp
    sampler = dummy
    input_files = 'libtorch_drl_control_sub.i'
  []
[]

[Transfers]
  [nn_transfer]
    type = LibtorchNeuralNetControlTransfer
    to_multi_app = runner
    trainer_name = nn_trainer
    control_name = src_control
  []
  [r_transfer]
    type = MultiAppReporterTransfer
    from_multi_app = runner
    to_reporters = 'results/center_temp results/env_temp results/reward results/left_flux results/log_prob_left_flux'
    from_reporters = 'T_reporter/center_temp_tend:value T_reporter/env_temp:value T_reporter/reward:value T_reporter/left_flux:value T_reporter/log_prob_left_flux:value'
  []
[]

[Trainers]
  [nn_trainer]
    type = LibtorchDRLControlTrainer
    response = 'results/center_temp results/env_temp'
    control = 'results/left_flux'
    log_probability = 'results/log_prob_left_flux'
    reward = 'results/reward'

    num_epochs = 10
    update_frequency = 2
    decay_factor = 0.0

    loss_print_frequency = 3

    critic_learning_rate = 0.0005
    num_critic_neurons_per_layer = '4 2'

    control_learning_rate = 0.0005
    num_control_neurons_per_layer = '4 2'

    # keep consistent with LibtorchNeuralNetControl
    input_timesteps = 2
    response_scaling_factors = '0.03 0.03'
    response_shift_factors = '270 270'
    action_standard_deviations = '0.1'

    read_from_file = false
  []
[]

[Reporters]
  [results]
    type = ConstantReporter
    real_vector_names = 'center_temp env_temp reward left_flux log_prob_left_flux'
    real_vector_values = '0; 0; 0; 0; 0'
    outputs = 'csv_out'
    execute_on = timestep_begin
  []
  [nn_parameters]
    type = DRLControlNeuralNetParameters
    trainer_name = nn_trainer
    outputs = json_out
  []
[]

[Executioner]
  type = Transient
  num_steps = 1
[]

[Outputs]
  file_base = train_out
  [json_out]
    type = JSON
    execute_on = TIMESTEP_BEGIN
    execute_system_information_on = NONE
  []
[]

(modules/stochastic_tools/examples/libtorch_drl_control/libtorch_drl_control_trainer.i)

[StochasticTools]
[]

[Samplers]
  [dummy]
    type = CartesianProduct
    linear_space_items = '0 0.01 1'
  []
[]

[MultiApps]
  [runner]
    type = SamplerFullSolveMultiApp
    sampler = dummy
    input_files = 'libtorch_drl_control_sub.i'
  []
[]

[Transfers]
  [nn_transfer]
    type = LibtorchNeuralNetControlTransfer
    to_multi_app = runner
    trainer_name = nn_trainer
    control_name = src_control
  []
  [r_transfer]
    type = MultiAppReporterTransfer
    from_multi_app = runner
    to_reporters = 'results/center_temp results/env_temp results/reward results/top_flux results/log_prob_top_flux'
    from_reporters = 'T_reporter/center_temp_tend:value T_reporter/env_temp:value T_reporter/reward:value T_reporter/top_flux:value T_reporter/log_prob_top_flux:value'
  []
[]

[Trainers]
  [nn_trainer]
    type = LibtorchDRLControlTrainer
    response = 'results/center_temp results/env_temp'
    control = 'results/top_flux'
    log_probability = 'results/log_prob_top_flux'
    reward = 'results/reward'

    num_epochs = 1000
    update_frequency = 10
    decay_factor = 0.0

    loss_print_frequency = 10

    critic_learning_rate = 0.0001
    num_critic_neurons_per_layer = '64 27'

    control_learning_rate = 0.0005
    num_control_neurons_per_layer = '16 6'

    # keep consistent with LibtorchNeuralNetControl
    input_timesteps = 2
    response_scaling_factors = '0.03 0.03'
    response_shift_factors = '290 290'
    action_standard_deviations = '0.02'

    standardize_advantage = true

    read_from_file = false
  []
[]

[Reporters]
  [results]
    type = ConstantReporter
    real_vector_names = 'center_temp env_temp reward top_flux log_prob_top_flux'
    real_vector_values = '0; 0; 0; 0; 0'
    outputs = csv
    execute_on = timestep_begin
  []
  [reward]
    type = DRLRewardReporter
    drl_trainer_name = nn_trainer
  []
[]

[Executioner]
  type = Transient
  num_steps = 440
[]

[Outputs]
  file_base = output/train_out
  csv = true
  time_step_interval = 10
[]

Overview
Example Input File Syntax
Input Parameters
Input Files
References