Implementation of various Q-Learning algorithms, such as DQN, double DQN. More...

#include <q_learning.hpp>

Public Types
using	StateType = typename EnvironmentType::State
	Convenient typedef for state.

using	ActionType = typename EnvironmentType::Action
	Convenient typedef for action.

Public Member Functions
	QLearning (TrainingConfig &config, NetworkType &network, PolicyType &policy, ReplayType &replayMethod, UpdaterType updater=UpdaterType(), EnvironmentType environment=EnvironmentType())
	Create the QLearning object with given settings. More...

	~QLearning ()
	Clean memory.

void	TrainAgent ()
	Trains the DQN agent(non-categorical). More...

void	TrainCategoricalAgent ()
	Trains the DQN agent of categorical type.

void	SelectAction ()
	Select an action, given an agent.

double	Episode ()
	Execute an episode. More...

size_t &	TotalSteps ()
	Modify total steps from beginning.

const size_t &	TotalSteps () const
	Get total steps from beginning.

StateType &	State ()
	Modify the state of the agent.

const StateType &	State () const
	Get the state of the agent.

const ActionType &	Action () const
	Get the action of the agent.

EnvironmentType &	Environment ()
	Modify the environment in which the agent is.

const EnvironmentType &	Environment () const
	Get the environment in which the agent is.

bool &	Deterministic ()
	Modify the training mode / test mode indicator.

const bool &	Deterministic () const
	Get the indicator of training mode / test mode.

const NetworkType &	Network () const
	Return the learning network.

NetworkType &	Network ()
	Modify the learning network.

Detailed Description

template<typename EnvironmentType, typename NetworkType, typename UpdaterType, typename PolicyType, typename ReplayType = RandomReplay<EnvironmentType>>
class mlpack::rl::QLearning< EnvironmentType, NetworkType, UpdaterType, PolicyType, ReplayType >

Implementation of various Q-Learning algorithms, such as DQN, double DQN.

For more details, see the following:

@article{Mnih2013,
 author    = {Volodymyr Mnih and
              Koray Kavukcuoglu and
              David Silver and
              Alex Graves and
              Ioannis Antonoglou and
              Daan Wierstra and
              Martin A. Riedmiller},
 title     = {Playing Atari with Deep Reinforcement Learning},
 journal   = {CoRR},
 year      = {2013},
 url       = {http://arxiv.org/abs/1312.5602}
}

Template Parameters

EnvironmentType	The environment of the reinforcement learning task.
NetworkType	The network to compute action value.
UpdaterType	How to apply gradients when training.
PolicyType	Behavior policy of the agent.
ReplayType	Experience replay method.

Constructor & Destructor Documentation

◆ QLearning()

template<typename EnvironmentType , typename NetworkType , typename UpdaterType , typename PolicyType , typename ReplayType >

mlpack::rl::QLearning< EnvironmentType, NetworkType, UpdaterType, PolicyType, ReplayType >::QLearning	(	TrainingConfig &	config,
		NetworkType &	network,
		PolicyType &	policy,
		ReplayType &	replayMethod,
		UpdaterType	updater = `UpdaterType()`,
		EnvironmentType	environment = `EnvironmentType()`
	)

Create the QLearning object with given settings.

If you want to pass in a parameter and discard the original parameter object, be sure to use std::move to avoid unnecessary copy.

Parameters

config	Hyper-parameters for training.
network	The network to compute action value.
policy	Behavior policy of the agent.
replayMethod	Experience replay method.
updater	How to apply gradients when training.
environment	Reinforcement learning task.

Member Function Documentation

◆ Episode()

template<typename EnvironmentType , typename NetworkType , typename UpdaterType , typename BehaviorPolicyType , typename ReplayType >

double mlpack::rl::QLearning< EnvironmentType, NetworkType, UpdaterType, BehaviorPolicyType, ReplayType >::Episode ( )

Execute an episode.

Returns: Return of the episode.

◆ TrainAgent()

template<typename EnvironmentType , typename NetworkType , typename UpdaterType , typename BehaviorPolicyType , typename ReplayType >

void mlpack::rl::QLearning< EnvironmentType, NetworkType, UpdaterType, BehaviorPolicyType, ReplayType >::TrainAgent ( )

Trains the DQN agent(non-categorical).

If the agent is at a terminal state, then we don't need to add the discounted reward. At terminal state, the agent wont perform any action.

The documentation for this class was generated from the following files:

src/mlpack/methods/reinforcement_learning/q_learning.hpp
src/mlpack/methods/reinforcement_learning/q_learning_impl.hpp

Public Types

Public Member Functions