Mungojerrie  1.1
Mungojerrie
Public Types | Public Attributes | List of all members
GymOptions Struct Reference

Public Types

enum  GymRewardTypes {
  GymRewardTypes::default_type = 0, GymRewardTypes::prism = 1, GymRewardTypes::zeta_reach = 2, GymRewardTypes::zeta_acc = 3,
  GymRewardTypes::zeta_discount = 4, GymRewardTypes::reward_on_acc = 5, GymRewardTypes::multi_discount = 6, GymRewardTypes::parity = 7,
  GymRewardTypes::pri_tracker = 8, GymRewardTypes::lexo = 9, GymRewardTypes::avg = 10
}
 

Public Attributes

unsigned int episodeLength
 
double zeta
 
double gammaB
 
std::vector< double > tolerance
 
double priEpsilon
 
GymRewardTypes rewardType
 
bool noResetOnAcc
 
bool terminalUpdateOnTimeLimit
 
bool p1NotStrategic
 
bool concatActionsInCSV
 
double fInvScale
 
double resetPenalty
 
bool randInit
 

Member Enumeration Documentation

◆ GymRewardTypes

Enumerator
default_type 

Default type is zeta-reach for 1-1/2 and parity for 2-1/2 players

prism 

Reward from PRISM file. Continuing (non-episodic) setting. Automaton epsilon edges receive zero reward.

zeta_reach 

Zeta-based reachability | See “Omega-Regular Objectives in Model-Free Reinforcement Learning”.

zeta_acc 

Zeta-based reachability with reward on accepting transitions | See “Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives”.

zeta_discount 

Zeta-based discounted reward on accepting transitions | See “Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives”.

reward_on_acc 

Reward on each accepting transition. MAY LEAD TO INCORRECT STRATEGIES | See "A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications".

multi_discount 

Multi-discount reward | See "Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning".

parity 

Reward from parity objectives | See “Model-Free Reinforcement Learning for Stochastic Parity Games”.

pri_tracker 

Reward from priority tracker gadget | See “Model-Free Reinforcement Learning for Stochastic Parity Games”.

lexo 

Reward from lexicographic objectives. | See “Model-Free Reinforcement Learning for Lexicographic Omega-Regular Objectives”.

avg 

Average reward for absolute liveness properties. | See "Translating Omega-Regular Specifications to Average Objectives for Model-Free Reinforcement Learning".

Member Data Documentation

◆ noResetOnAcc

bool GymOptions::noResetOnAcc

Turns off resetting episode step when an accepting edge is passed for zeta-reach and zeta-acc (not recommended)

◆ p1NotStrategic

bool GymOptions::p1NotStrategic

Does not allow player 1 to change their strategy to the optimal counter-strategy to player 0 during the verification of the learned strategies. Instead, player 1 uses learned strategy.

◆ terminalUpdateOnTimeLimit

bool GymOptions::terminalUpdateOnTimeLimit

Treat end of episodes that end due to episode limit as transitioning to zero value sink (not recommended)


The documentation for this struct was generated from the following file: