Mungojerrie
1.1
Mungojerrie
|
Public Types | |
enum | GymRewardTypes { GymRewardTypes::default_type = 0, GymRewardTypes::prism = 1, GymRewardTypes::zeta_reach = 2, GymRewardTypes::zeta_acc = 3, GymRewardTypes::zeta_discount = 4, GymRewardTypes::reward_on_acc = 5, GymRewardTypes::multi_discount = 6, GymRewardTypes::parity = 7, GymRewardTypes::pri_tracker = 8, GymRewardTypes::lexo = 9, GymRewardTypes::avg = 10 } |
Public Attributes | |
unsigned int | episodeLength |
double | zeta |
double | gammaB |
std::vector< double > | tolerance |
double | priEpsilon |
GymRewardTypes | rewardType |
bool | noResetOnAcc |
bool | terminalUpdateOnTimeLimit |
bool | p1NotStrategic |
bool | concatActionsInCSV |
double | fInvScale |
double | resetPenalty |
bool | randInit |
|
strong |
Enumerator | |
---|---|
default_type | Default type is zeta-reach for 1-1/2 and parity for 2-1/2 players |
prism | Reward from PRISM file. Continuing (non-episodic) setting. Automaton epsilon edges receive zero reward. |
zeta_reach | Zeta-based reachability | See “Omega-Regular Objectives in Model-Free Reinforcement Learning”. |
zeta_acc | Zeta-based reachability with reward on accepting transitions | See “Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives”. |
zeta_discount | Zeta-based discounted reward on accepting transitions | See “Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives”. |
reward_on_acc | Reward on each accepting transition. MAY LEAD TO INCORRECT STRATEGIES | See "A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications". |
multi_discount | Multi-discount reward | See "Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning". |
parity | Reward from parity objectives | See “Model-Free Reinforcement Learning for Stochastic Parity Games”. |
pri_tracker | Reward from priority tracker gadget | See “Model-Free Reinforcement Learning for Stochastic Parity Games”. |
lexo | Reward from lexicographic objectives. | See “Model-Free Reinforcement Learning for Lexicographic Omega-Regular Objectives”. |
avg | Average reward for absolute liveness properties. | See "Translating Omega-Regular Specifications to Average Objectives for Model-Free Reinforcement Learning". |
bool GymOptions::noResetOnAcc |
Turns off resetting episode step when an accepting edge is passed for zeta-reach and zeta-acc (not recommended)
bool GymOptions::p1NotStrategic |
Does not allow player 1 to change their strategy to the optimal counter-strategy to player 0 during the verification of the learned strategies. Instead, player 1 uses learned strategy.
bool GymOptions::terminalUpdateOnTimeLimit |
Treat end of episodes that end due to episode limit as transitioning to zero value sink (not recommended)