Translating Omega-Regular Specifications to Average Objectives for Model-Free Reinforcement Learning — Case Studies
All files can be downloaded as a
zip archive
and are available in the "examples" directory of Mungojerrie.
"adverse" consists of two long connected chains, one of which is misleading for satisfying the property (
adverse.prism
+
adverse.hoa
).
"frozenSmall"/"frozenLarge" are inspired by the frozen-lake example from OpenAI Gym (
frozenSmall.prism
/
frozenLarge.prism
+
frozenSmall_avg.hoa
).
“windy”/”windyStoch” are inspired by the example “Windy Gridworld” from Reinforcement Learning: An Introduction 2nd edition by Sutton and Barto on page 130 (
windy.prism
/
windyStoch.prism
+
windy_avg.hoa
).
“grid5x5” is from Sadigh et. al’s
“A Learning Based Approach to Control Synthesis of Markov Decision Processes for Linear Temporal Logic Specifications”
with the property to continue visiting the start and goal states while eventually always avoiding the traps (
grid5x5.prism
+
grid5x5_avg.hoa
).
“ishift” is a small example from Aidan Harding's thesis (
ishift.prism
+
ishift.hoa
).
“doublegrid” is a gridworld where a central controller controls two agents (
doublegrid.prism
+
doublegrid_avg.hoa
).
“busyRingMC2”/“busyRingMC4” are concerned with completing the strategy to a busy-ring asynchronous arbiter (
busyRingMC2.prism
+
busyRing2Comb.hoa
/
busyRingMC4.prism
+
busyRing4Comb.hoa
).