|
| Learner (Gym gym, LearnerOptions options) |
| Constructor for learner.
|
|
void | SarsaLambda (double lambda, bool replacingTrace, unsigned int numEpisodes, double alpha, double linearAlphaDecay, double discount, double epsilon, double linearExploreDecay, double initValue) |
| Runs the Sarsa( ) algorithm. More...
|
|
void | DoubleQLearning (unsigned int numEpisodes, double alpha, double linearAlphaDecay, double discount, double epsilon, double linearExploreDecay, double initValue) |
| Runs the Double Q-learning algorithm. More...
|
|
void | QLearning (unsigned int numEpisodes, double alpha, double linearAlphaDecay, double discount, double epsilon, double linearExploreDecay, double initValue) |
| Runs the Q-learning algorithm. More...
|
|
void | DifferentialQLearning (unsigned int numEpisodes, double alpha, double linearAlphaDecay, double epsilon, double linearExploreDecay, double eta, double initValue) |
| Runs the Differential Q-learning algorithm. More...
|
|
|
std::ostream & | operator<< (std::ostream &os, Qtype const &Q) |
|
◆ DifferentialQLearning()
void Learner::DifferentialQLearning |
( |
unsigned int |
numEpisodes, |
|
|
double |
alpha, |
|
|
double |
linearAlphaDecay, |
|
|
double |
epsilon, |
|
|
double |
linearExploreDecay, |
|
|
double |
eta, |
|
|
double |
initValue |
|
) |
| |
Runs the Differential Q-learning algorithm.
This algorithm is for learning optimal average reward strategies in communicating MDPs. Algorithmic details can be found here.
- Parameters
-
numEpisodes | The number of episodes to train for. |
alpha | The learning rate. |
linearAlphaDecay | The final value of alpha to decay linearly to over the course of learning. Negative values indicate no decay. |
epsilon | The epsilon parameter used in the epsilon greedy action selection. |
linearExploreDecay | The final value epsilon to decay linearly to over the course of learning. Negative values indicate no decay. |
eta | The constant multiplied by the learning rate when updating . |
initValue | The value to initialize the Q-table to. |
◆ DoubleQLearning()
void Learner::DoubleQLearning |
( |
unsigned int |
numEpisodes, |
|
|
double |
alpha, |
|
|
double |
linearAlphaDecay, |
|
|
double |
discount, |
|
|
double |
epsilon, |
|
|
double |
linearExploreDecay, |
|
|
double |
initValue |
|
) |
| |
Runs the Double Q-learning algorithm.
- Parameters
-
numEpisodes | The number of episodes to train for. |
alpha | The learning rate. |
linearAlphaDecay | The final value of alpha to decay linearly to over the course of learning. Negative values indicate no decay. |
discount | The discount factor. |
epsilon | The epsilon parameter used in the epsilon greedy action selection. |
linearExploreDecay | The final value epsilon to decay linearly to over the course of learning. Negative values indicate no decay. |
initValue | The value to initialize the Q-tables to. |
◆ QLearning()
void Learner::QLearning |
( |
unsigned int |
numEpisodes, |
|
|
double |
alpha, |
|
|
double |
linearAlphaDecay, |
|
|
double |
discount, |
|
|
double |
epsilon, |
|
|
double |
linearExploreDecay, |
|
|
double |
initValue |
|
) |
| |
Runs the Q-learning algorithm.
- Parameters
-
numEpisodes | The number of episodes to train for. |
alpha | The learning rate. |
linearAlphaDecay | The final value of alpha to decay linearly to over the course of learning. Negative values indicate no decay. |
discount | The discount factor. |
epsilon | The epsilon parameter used in the epsilon greedy action selection. |
linearExploreDecay | The final value epsilon to decay linearly to over the course of learning. Negative values indicate no decay. |
initValue | The value to initialize the Q-table to. |
◆ SarsaLambda()
void Learner::SarsaLambda |
( |
double |
lambda, |
|
|
bool |
replacingTrace, |
|
|
unsigned int |
numEpisodes, |
|
|
double |
alpha, |
|
|
double |
linearAlphaDecay, |
|
|
double |
discount, |
|
|
double |
epsilon, |
|
|
double |
linearExploreDecay, |
|
|
double |
initValue |
|
) |
| |
Runs the Sarsa( ) algorithm.
- Parameters
-
lambda | The lambda parameter in Sarsa( ). |
replacingTrace | Indicates the use of replacing traces. If it is false, then accumulating traces are used. |
numEpisodes | The number of episodes to train for. |
alpha | The learning rate. |
linearAlphaDecay | The final value of alpha to decay linearly to over the course of learning. Negative values indicate no decay. |
discount | The discount factor. |
epsilon | The epsilon parameter used in the epsilon greedy action selection. |
linearExploreDecay | The final value epsilon to decay linearly to over the course of learning. Negative values indicate no decay. |
initValue | The value to initialize the Q-table to. |
The documentation for this class was generated from the following files: