Implementation of Soft Actor-Critic, a model-free off-policy actor-critic based deep reinforcement learning algorithm. More...

#include <sac.hpp>

Public Types
using	StateType = typename EnvironmentType::State
	Convenient typedef for state.

using	ActionType = typename EnvironmentType::Action
	Convenient typedef for action.

Public Member Functions
	SAC (TrainingConfig &config, QNetworkType &learningQ1Network, PolicyNetworkType &policyNetwork, ReplayType &replayMethod, UpdaterType qNetworkUpdater=UpdaterType(), UpdaterType policyNetworkUpdater=UpdaterType(), EnvironmentType environment=EnvironmentType())
	Create the SAC object with given settings. More...

	~SAC ()
	Clean memory.

void	SoftUpdate (double rho)
	Softly update the learning Q network parameters to the target Q network parameters. More...

void	Update ()
	Update the Q and policy networks.

void	SelectAction ()
	Select an action, given an agent.

double	Episode ()
	Execute an episode. More...

size_t &	TotalSteps ()
	Modify total steps from beginning.

const size_t &	TotalSteps () const
	Get total steps from beginning.

StateType &	State ()
	Modify the state of the agent.

const StateType &	State () const
	Get the state of the agent.

const ActionType &	Action () const
	Get the action of the agent.

bool &	Deterministic ()
	Modify the training mode / test mode indicator.

const bool &	Deterministic () const
	Get the indicator of training mode / test mode.

Detailed Description

template<typename EnvironmentType, typename QNetworkType, typename PolicyNetworkType, typename UpdaterType, typename ReplayType = RandomReplay<EnvironmentType>>
class mlpack::rl::SAC< EnvironmentType, QNetworkType, PolicyNetworkType, UpdaterType, ReplayType >

Implementation of Soft Actor-Critic, a model-free off-policy actor-critic based deep reinforcement learning algorithm.

For more details, see the following:

@misc{haarnoja2018soft,
 author    = {Tuomas Haarnoja and
              Aurick Zhou and
              Kristian Hartikainen and
              George Tucker and
              Sehoon Ha and
              Jie Tan and
              Vikash Kumar and
              Henry Zhu and
              Abhishek Gupta and
              Pieter Abbeel and
              Sergey Levine},
 title     = {Soft Actor-Critic Algorithms and Applications},
 year      = {2018},
 url       = {https://arxiv.org/abs/1812.05905}
}

Template Parameters

EnvironmentType	The environment of the reinforcement learning task.
NetworkType	The network to compute action value.
UpdaterType	How to apply gradients when training.
ReplayType	Experience replay method.

Constructor & Destructor Documentation

◆ SAC()

template<typename EnvironmentType , typename QNetworkType , typename PolicyNetworkType , typename UpdaterType , typename ReplayType >

mlpack::rl::SAC< EnvironmentType, QNetworkType, PolicyNetworkType, UpdaterType, ReplayType >::SAC	(	TrainingConfig &	config,
		QNetworkType &	learningQ1Network,
		PolicyNetworkType &	policyNetwork,
		ReplayType &	replayMethod,
		UpdaterType	qNetworkUpdater = `UpdaterType()`,
		UpdaterType	policyNetworkUpdater = `UpdaterType()`,
		EnvironmentType	environment = `EnvironmentType()`
	)

Create the SAC object with given settings.

If you want to pass in a parameter and discard the original parameter object, you can directly pass the parameter, as the constructor takes a reference. This avoids unnecessary copy.

Parameters

config	Hyper-parameters for training.
learningQ1Network	The network to compute action value.
policyNetwork	The network to produce an action given a state.
replayMethod	Experience replay method.
qNetworkUpdater	How to apply gradients to Q network when training.
policyNetworkUpdater	How to apply gradients to policy network when training.
environment	Reinforcement learning task.

Member Function Documentation

◆ Episode()

template<typename EnvironmentType , typename QNetworkType , typename PolicyNetworkType , typename UpdaterType , typename ReplayType >

double mlpack::rl::SAC< EnvironmentType, QNetworkType, PolicyNetworkType, UpdaterType, ReplayType >::Episode ( )

Execute an episode.

Returns: Return of the episode.

◆ SoftUpdate()

template<typename EnvironmentType , typename QNetworkType , typename PolicyNetworkType , typename UpdaterType , typename ReplayType >

void mlpack::rl::SAC< EnvironmentType, QNetworkType, PolicyNetworkType, UpdaterType, ReplayType >::SoftUpdate ( double rho )

Softly update the learning Q network parameters to the target Q network parameters.

Parameters

rho	How "softly" should the parameters be copied.

The documentation for this class was generated from the following files:

src/mlpack/methods/reinforcement_learning/sac.hpp
src/mlpack/methods/reinforcement_learning/sac_impl.hpp

Public Types

Public Member Functions

Detailed Description

template<typename EnvironmentType, typename QNetworkType, typename PolicyNetworkType, typename UpdaterType, typename ReplayType = RandomReplay<EnvironmentType>> class mlpack::rl::SAC< EnvironmentType, QNetworkType, PolicyNetworkType, UpdaterType, ReplayType >

Constructor & Destructor Documentation

◆ SAC()

Member Function Documentation

◆ Episode()

◆ SoftUpdate()

template<typename EnvironmentType, typename QNetworkType, typename PolicyNetworkType, typename UpdaterType, typename ReplayType = RandomReplay<EnvironmentType>>
class mlpack::rl::SAC< EnvironmentType, QNetworkType, PolicyNetworkType, UpdaterType, ReplayType >