Mungojerrie  1.0
Mungojerrie
Public Member Functions | Friends | List of all members
Learner Class Reference

Public Member Functions

 Learner (Gym gym, LearnerOptions options)
 Constructor for learner.
 
void SarsaLambda (double lambda, bool replacingTrace, unsigned int numEpisodes, double alpha, double linearAlphaDecay, double discount, double epsilon, double linearExploreDecay, double initValue)
 Runs the Sarsa( $\lambda$) algorithm. More...
 
void DoubleQLearning (unsigned int numEpisodes, double alpha, double linearAlphaDecay, double discount, double epsilon, double linearExploreDecay, double initValue)
 Runs the Double Q-learning algorithm. More...
 
void QLearning (unsigned int numEpisodes, double alpha, double linearAlphaDecay, double discount, double epsilon, double linearExploreDecay, double initValue)
 Runs the Q-learning algorithm. More...
 
void DifferentialQLearning (unsigned int numEpisodes, double alpha, double linearAlphaDecay, double epsilon, double linearExploreDecay, double eta, double initValue)
 Runs the Differential Q-learning algorithm. More...
 

Friends

std::ostream & operator<< (std::ostream &os, Qtype const &Q)
 

Member Function Documentation

◆ DifferentialQLearning()

void Learner::DifferentialQLearning ( unsigned int  numEpisodes,
double  alpha,
double  linearAlphaDecay,
double  epsilon,
double  linearExploreDecay,
double  eta,
double  initValue 
)

Runs the Differential Q-learning algorithm.

This algorithm is for learning optimal average reward strategies in communicating MDPs. Algorithmic details can be found here.

Parameters
numEpisodesThe number of episodes to train for.
alphaThe learning rate.
linearAlphaDecayThe final value of alpha to decay linearly to over the course of learning. Negative values indicate no decay.
epsilonThe epsilon parameter used in the epsilon greedy action selection.
linearExploreDecayThe final value epsilon to decay linearly to over the course of learning. Negative values indicate no decay.
etaThe constant multiplied by the learning rate when updating $\bar R$.
initValueThe value to initialize the Q-table to.

◆ DoubleQLearning()

void Learner::DoubleQLearning ( unsigned int  numEpisodes,
double  alpha,
double  linearAlphaDecay,
double  discount,
double  epsilon,
double  linearExploreDecay,
double  initValue 
)

Runs the Double Q-learning algorithm.

Parameters
numEpisodesThe number of episodes to train for.
alphaThe learning rate.
linearAlphaDecayThe final value of alpha to decay linearly to over the course of learning. Negative values indicate no decay.
discountThe discount factor.
epsilonThe epsilon parameter used in the epsilon greedy action selection.
linearExploreDecayThe final value epsilon to decay linearly to over the course of learning. Negative values indicate no decay.
initValueThe value to initialize the Q-tables to.

◆ QLearning()

void Learner::QLearning ( unsigned int  numEpisodes,
double  alpha,
double  linearAlphaDecay,
double  discount,
double  epsilon,
double  linearExploreDecay,
double  initValue 
)

Runs the Q-learning algorithm.

Parameters
numEpisodesThe number of episodes to train for.
alphaThe learning rate.
linearAlphaDecayThe final value of alpha to decay linearly to over the course of learning. Negative values indicate no decay.
discountThe discount factor.
epsilonThe epsilon parameter used in the epsilon greedy action selection.
linearExploreDecayThe final value epsilon to decay linearly to over the course of learning. Negative values indicate no decay.
initValueThe value to initialize the Q-table to.

◆ SarsaLambda()

void Learner::SarsaLambda ( double  lambda,
bool  replacingTrace,
unsigned int  numEpisodes,
double  alpha,
double  linearAlphaDecay,
double  discount,
double  epsilon,
double  linearExploreDecay,
double  initValue 
)

Runs the Sarsa( $\lambda$) algorithm.

Parameters
lambdaThe lambda parameter in Sarsa( $\lambda$).
replacingTraceIndicates the use of replacing traces. If it is false, then accumulating traces are used.
numEpisodesThe number of episodes to train for.
alphaThe learning rate.
linearAlphaDecayThe final value of alpha to decay linearly to over the course of learning. Negative values indicate no decay.
discountThe discount factor.
epsilonThe epsilon parameter used in the epsilon greedy action selection.
linearExploreDecayThe final value epsilon to decay linearly to over the course of learning. Negative values indicate no decay.
initValueThe value to initialize the Q-table to.

The documentation for this class was generated from the following files: