Implementation for epsilon greedy policy.
More...
#include <greedy_policy.hpp>
|
|
using | ActionType = typename EnvironmentType::Action |
| | Convenient typedef for action.
|
| |
|
| | GreedyPolicy (const double initialEpsilon, const size_t annealInterval, const double minEpsilon, const double decayRate=1.0) |
| | Constructor for epsilon greedy policy class. More...
|
| |
| ActionType | Sample (const arma::colvec &actionValue, bool deterministic=false, const bool isNoisy=false) |
| | Sample an action based on given action values. More...
|
| |
|
void | Anneal () |
| | Exploration probability will anneal at each step.
|
| |
| const double & | Epsilon () const |
| |
template<typename EnvironmentType>
class mlpack::rl::GreedyPolicy< EnvironmentType >
Implementation for epsilon greedy policy.
In general we will select an action greedily based on the action value, however sometimes we will also randomly select an action to encourage exploration.
- Template Parameters
-
| EnvironmentType | The reinforcement learning task. |
◆ GreedyPolicy()
template<typename EnvironmentType >
Constructor for epsilon greedy policy class.
- Parameters
-
| initialEpsilon | The initial probability to explore (select a random action). |
| annealInterval | The steps during which the probability to explore will anneal. |
| minEpsilon | Epsilon will never be less than this value. |
| decayRate | How much to change the model in response to the estimated error each time the model weights are updated. |
◆ Epsilon()
template<typename EnvironmentType >
- Returns
- Current possibility to explore.
◆ Sample()
template<typename EnvironmentType >
Sample an action based on given action values.
- Parameters
-
| actionValue | Values for each action. |
| deterministic | Always select the action greedily. |
| isNoisy | Specifies whether the network used is noisy. |
- Returns
- Sampled action.
The documentation for this class was generated from the following file: