Interface for clipping the reward to some value between the specified maximum and minimum value (Clipping here is implemented as \( g_{\text{clipped}} = \max(g_{\text{min}}, \min(g_{\text{min}}, g))) \).)
More...
#include <reward_clipping.hpp>
|
using | State = typename EnvironmentType::State |
| Convenient typedef for state.
|
|
using | Action = typename EnvironmentType::Action |
| Convenient typedef for action.
|
|
template<typename EnvironmentType>
class mlpack::rl::RewardClipping< EnvironmentType >
Interface for clipping the reward to some value between the specified maximum and minimum value (Clipping here is implemented as \( g_{\text{clipped}} = \max(g_{\text{min}}, \min(g_{\text{min}}, g))) \).)
- Template Parameters
-
EnvironmentType | A type of Environment that is being wrapped. |
◆ RewardClipping()
template<typename EnvironmentType >
Constructor for creating a RewardClipping instance.
- Parameters
-
minReward | Minimum possible value of clipped reward. |
maxReward | Maximum possible value of clipped reward. |
environment | An instance of the environment used for actual simulations. |
◆ InitialSample()
template<typename EnvironmentType >
The InitialSample method is called by the environment to initialize the starting state.
Returns whatever Initial Sample is returned by the environment.
◆ IsTerminal()
template<typename EnvironmentType >
Checks whether given state is a terminal state.
Returns the value by calling the environment method.
- Parameters
-
- Returns
- true if state is a terminal state, otherwise false.
◆ Sample() [1/2]
template<typename EnvironmentType >
Dynamics of Environment.
The rewards returned from the base environment are clipped according the maximum and minimum values specified.
- Parameters
-
state | The current state. |
action | The current action. |
nextState | The next state. |
- Returns
- clippedReward, Reward clipped between [minReward, maxReward].
◆ Sample() [2/2]
template<typename EnvironmentType >
Dynamics of Environment.
The rewards returned from the base environment are clipped according the maximum and minimum values specified.
- Parameters
-
state | The current state. |
action | The current action. |
- Returns
- clippedReward, Reward clipped between [minReward, maxReward].
The documentation for this class was generated from the following file: