Public Member Functions
	Learner (Gym gym, LearnerOptions options)
	Constructor for learner.

void	SarsaLambda (double lambda, bool replacingTrace, unsigned int numEpisodes, double alpha, double linearAlphaDecay, double discount, double epsilon, double linearExploreDecay, double initValue)
	Runs the Sarsa( $\lambda$ ) algorithm. More...

void	DoubleQLearning (unsigned int numEpisodes, double alpha, double linearAlphaDecay, double discount, double epsilon, double linearExploreDecay, double initValue)
	Runs the Double Q-learning algorithm. More...

void	QLearning (unsigned int numEpisodes, double alpha, double linearAlphaDecay, double discount, double epsilon, double linearExploreDecay, double initValue)
	Runs the Q-learning algorithm. More...

void	DifferentialQLearning (unsigned int numEpisodes, double alpha, double linearAlphaDecay, double epsilon, double linearExploreDecay, double eta, double initValue)
	Runs the Differential Q-learning algorithm. More...

Friends
std::ostream &	operator<< (std::ostream &os, Qtype const &Q)

Member Function Documentation

◆ DifferentialQLearning()

void Learner::DifferentialQLearning	(	unsigned int	numEpisodes,
		double	alpha,
		double	linearAlphaDecay,
		double	epsilon,
		double	linearExploreDecay,
		double	eta,
		double	initValue
	)

Runs the Differential Q-learning algorithm.

This algorithm is for learning optimal average reward strategies in communicating MDPs. Algorithmic details can be found here.

Parameters

numEpisodes	The number of episodes to train for.
alpha	The learning rate.
linearAlphaDecay	The final value of alpha to decay linearly to over the course of learning. Negative values indicate no decay.
epsilon	The epsilon parameter used in the epsilon greedy action selection.
linearExploreDecay	The final value epsilon to decay linearly to over the course of learning. Negative values indicate no decay.
eta	The constant multiplied by the learning rate when updating $\bar R$ .
initValue	The value to initialize the Q-table to.

void Learner::DoubleQLearning	(	unsigned int	numEpisodes,
		double	alpha,
		double	linearAlphaDecay,
		double	discount,
		double	epsilon,
		double	linearExploreDecay,
		double	initValue
	)

Runs the Double Q-learning algorithm.

Parameters

numEpisodes	The number of episodes to train for.
alpha	The learning rate.
linearAlphaDecay	The final value of alpha to decay linearly to over the course of learning. Negative values indicate no decay.
discount	The discount factor.
epsilon	The epsilon parameter used in the epsilon greedy action selection.
linearExploreDecay	The final value epsilon to decay linearly to over the course of learning. Negative values indicate no decay.
initValue	The value to initialize the Q-tables to.

Runs the Q-learning algorithm.

Parameters

numEpisodes	The number of episodes to train for.
alpha	The learning rate.
linearAlphaDecay	The final value of alpha to decay linearly to over the course of learning. Negative values indicate no decay.
discount	The discount factor.
epsilon	The epsilon parameter used in the epsilon greedy action selection.
linearExploreDecay	The final value epsilon to decay linearly to over the course of learning. Negative values indicate no decay.
initValue	The value to initialize the Q-table to.

Runs the Sarsa( $\lambda$ ) algorithm.

Parameters

lambda	The lambda parameter in Sarsa( $\lambda$ ).
replacingTrace	Indicates the use of replacing traces. If it is false, then accumulating traces are used.
numEpisodes	The number of episodes to train for.
alpha	The learning rate.
linearAlphaDecay	The final value of alpha to decay linearly to over the course of learning. Negative values indicate no decay.
discount	The discount factor.
epsilon	The epsilon parameter used in the epsilon greedy action selection.
linearExploreDecay	The final value epsilon to decay linearly to over the course of learning. Negative values indicate no decay.
initValue	The value to initialize the Q-table to.

The documentation for this class was generated from the following files: