Public Types
enum	GymRewardTypes { GymRewardTypes::default_type = 0, GymRewardTypes::prism = 1, GymRewardTypes::zeta_reach = 2, GymRewardTypes::zeta_acc = 3, GymRewardTypes::zeta_discount = 4, GymRewardTypes::reward_on_acc = 5, GymRewardTypes::multi_discount = 6, GymRewardTypes::parity = 7, GymRewardTypes::pri_tracker = 8, GymRewardTypes::lexo = 9, GymRewardTypes::avg = 10 }

Public Attributes
unsigned int	episodeLength

double	zeta

double	gammaB

std::vector< double >	tolerance

double	priEpsilon

GymRewardTypes	rewardType

bool	noResetOnAcc

bool	terminalUpdateOnTimeLimit

bool	p1NotStrategic

bool	concatActionsInCSV

double	fInvScale

double	resetPenalty

bool	randInit

Member Enumeration Documentation

◆ GymRewardTypes

enum GymOptions::GymRewardTypes

strong

Enumerator
default_type	Default type is zeta-reach for 1-1/2 and parity for 2-1/2 players
prism	Reward from PRISM file. Continuing (non-episodic) setting. Automaton epsilon edges receive zero reward.
zeta_reach	Zeta-based reachability \| See “Omega-Regular Objectives in Model-Free Reinforcement Learning”.
zeta_acc	Zeta-based reachability with reward on accepting transitions \| See “Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives”.
zeta_discount	Zeta-based discounted reward on accepting transitions \| See “Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives”.
reward_on_acc	Reward on each accepting transition. MAY LEAD TO INCORRECT STRATEGIES \| See "A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications".
multi_discount	Multi-discount reward \| See "Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning".
parity	Reward from parity objectives \| See “Model-Free Reinforcement Learning for Stochastic Parity Games”.
pri_tracker	Reward from priority tracker gadget \| See “Model-Free Reinforcement Learning for Stochastic Parity Games”.
lexo	Reward from lexicographic objectives. \| See “Model-Free Reinforcement Learning for Lexicographic Omega-Regular Objectives”.
avg	Average reward for absolute liveness properties. \| See "Translating Omega-Regular Specifications to Average Objectives for Model-Free Reinforcement Learning".

Member Data Documentation

◆ noResetOnAcc

bool GymOptions::noResetOnAcc

Turns off resetting episode step when an accepting edge is passed for zeta-reach and zeta-acc (not recommended)

◆ p1NotStrategic

bool GymOptions::p1NotStrategic

Does not allow player 1 to change their strategy to the optimal counter-strategy to player 0 during the verification of the learned strategies. Instead, player 1 uses learned strategy.

◆ terminalUpdateOnTimeLimit

bool GymOptions::terminalUpdateOnTimeLimit

Treat end of episodes that end due to episode limit as transitioning to zero value sink (not recommended)

The documentation for this struct was generated from the following file:

Gym/Gym.hh