Implementation for epsilon greedy policy.
More...
#include <greedy_policy.hpp>
|
using | ActionType = typename EnvironmentType::Action |
| Convenient typedef for action.
|
|
|
| GreedyPolicy (const double initialEpsilon, const size_t annealInterval, const double minEpsilon, const double decayRate=1.0) |
| Constructor for epsilon greedy policy class. More...
|
|
ActionType | Sample (const arma::colvec &actionValue, bool deterministic=false, const bool isNoisy=false) |
| Sample an action based on given action values. More...
|
|
void | Anneal () |
| Exploration probability will anneal at each step.
|
|
const double & | Epsilon () const |
|
template<typename EnvironmentType>
class mlpack::rl::GreedyPolicy< EnvironmentType >
Implementation for epsilon greedy policy.
In general we will select an action greedily based on the action value, however sometimes we will also randomly select an action to encourage exploration.
- Template Parameters
-
EnvironmentType | The reinforcement learning task. |
◆ GreedyPolicy()
template<typename EnvironmentType >
Constructor for epsilon greedy policy class.
- Parameters
-
initialEpsilon | The initial probability to explore (select a random action). |
annealInterval | The steps during which the probability to explore will anneal. |
minEpsilon | Epsilon will never be less than this value. |
decayRate | How much to change the model in response to the estimated error each time the model weights are updated. |
◆ Epsilon()
template<typename EnvironmentType >
- Returns
- Current possibility to explore.
◆ Sample()
template<typename EnvironmentType >
Sample an action based on given action values.
- Parameters
-
actionValue | Values for each action. |
deterministic | Always select the action greedily. |
isNoisy | Specifies whether the network used is noisy. |
- Returns
- Sampled action.
The documentation for this class was generated from the following file: