Reinforcement learning under partial observability guided by learned environment models

Edi Muskardin; Ingo Pill; Martin Tappler; Bernhard K. Aichernig

Reinforcement learning under partial observability guided by learned environment models

Edi Muskardin, Ingo Pill, Martin Tappler, Bernhard K. Aichernig

Trustworthy & Adaptive Computing (TAC)

Research output: Conference proceeding/Chapter in Book/Report/ › Conference Paper › peer-review

Abstract

In practical applications, we can rarely assume full observ-
ability of a system’s environment, despite such knowledge being impor-
tant for determining a reactive control system’s precise interaction with
its environment. Therefore, we propose an approach for reinforcement
learning (RL) in partially observable environments. While assuming that
the environment behaves like a partially observable Markov decision pro-
cess with known discrete actions, we assume no knowledge about its
structure or transition probabilities.
Our approach combines Q-learning with IoAlergia, a method for learn-
ing Markov decision processes (MDP). By learning MDP models of the
environment from episodes of the RL agent, we enable RL in partially ob-
servable domains without explicit, additional memory to track previous
interactions for dealing with ambiguities stemming from partial observ-
ability. We instead provide RL with additional observations in the form
of abstract environment states by simulating new experiences on learned
environment models to track the explored states. In our evaluation we
report on the validity of our approach and its promising performance
in comparison to six state-of-the-art deep RL techniques with recurrent
neural networks and fixed memory.

Original language	English
Title of host publication	18th International Conference on integrated Formal Methods
Publisher	Springer
Volume	14300
Edition	LNCS
Publication status	Published - Sept 2023
Event	18th International Conference on integrated Formal Methods (iFM) - Duration: 13 Nov 2023 → 15 Nov 2023 https://liacs.leidenuniv.nl/~bonsanguemm/ifm23/index.html

Conference

Conference	18th International Conference on integrated Formal Methods (iFM)
Abbreviated title	iFM
Period	13/11/23 → 15/11/23
Internet address	https://liacs.leidenuniv.nl/~bonsanguemm/ifm23/index.html

Keywords

Automata Learning
Reinforcement Learning
Partial Observability
Markov Decision Processes

Access to Document

https://link.springer.com/chapter/10.1007/978-3-031-47705-8_14

Cite this

@inproceedings{9afa8bf7001348dfa369d787869b5d3e,

title = "Reinforcement learning under partial observability guided by learned environment models",

abstract = "In practical applications, we can rarely assume full observ-ability of a system{\textquoteright}s environment, despite such knowledge being impor-tant for determining a reactive control system{\textquoteright}s precise interaction withits environment. Therefore, we propose an approach for reinforcementlearning (RL) in partially observable environments. While assuming thatthe environment behaves like a partially observable Markov decision pro-cess with known discrete actions, we assume no knowledge about itsstructure or transition probabilities.Our approach combines Q-learning with IoAlergia, a method for learn-ing Markov decision processes (MDP). By learning MDP models of theenvironment from episodes of the RL agent, we enable RL in partially ob-servable domains without explicit, additional memory to track previousinteractions for dealing with ambiguities stemming from partial observ-ability. We instead provide RL with additional observations in the formof abstract environment states by simulating new experiences on learnedenvironment models to track the explored states. In our evaluation wereport on the validity of our approach and its promising performancein comparison to six state-of-the-art deep RL techniques with recurrentneural networks and fixed memory.",

keywords = "Automata Learning, Reinforcement Learning, Partial Observability, Markov Decision Processes",

author = "Edi Muskardin and Ingo Pill and Martin Tappler and Aichernig, {Bernhard K.}",

year = "2023",

month = sep,

language = "English",

volume = "14300",

booktitle = "18th International Conference on integrated Formal Methods",

publisher = "Springer",

address = "Germany",

edition = "LNCS",

note = "18th International Conference on integrated Formal Methods (iFM), iFM ; Conference date: 13-11-2023 Through 15-11-2023",

url = "https://liacs.leidenuniv.nl/~bonsanguemm/ifm23/index.html",

}

Muskardin, E , Pill, I, Tappler, M & Aichernig, BK 2023, Reinforcement learning under partial observability guided by learned environment models. in 18th International Conference on integrated Formal Methods. LNCS edn, vol. 14300, Springer, 18th International Conference on integrated Formal Methods (iFM), 13/11/23. <https://link.springer.com/chapter/10.1007/978-3-031-47705-8_14>

TY - GEN

T1 - Reinforcement learning under partial observability guided by learned environment models

AU - Muskardin, Edi

AU - Pill, Ingo

AU - Tappler, Martin

AU - Aichernig, Bernhard K.

PY - 2023/9

Y1 - 2023/9

N2 - In practical applications, we can rarely assume full observ-ability of a system’s environment, despite such knowledge being impor-tant for determining a reactive control system’s precise interaction withits environment. Therefore, we propose an approach for reinforcementlearning (RL) in partially observable environments. While assuming thatthe environment behaves like a partially observable Markov decision pro-cess with known discrete actions, we assume no knowledge about itsstructure or transition probabilities.Our approach combines Q-learning with IoAlergia, a method for learn-ing Markov decision processes (MDP). By learning MDP models of theenvironment from episodes of the RL agent, we enable RL in partially ob-servable domains without explicit, additional memory to track previousinteractions for dealing with ambiguities stemming from partial observ-ability. We instead provide RL with additional observations in the formof abstract environment states by simulating new experiences on learnedenvironment models to track the explored states. In our evaluation wereport on the validity of our approach and its promising performancein comparison to six state-of-the-art deep RL techniques with recurrentneural networks and fixed memory.

AB - In practical applications, we can rarely assume full observ-ability of a system’s environment, despite such knowledge being impor-tant for determining a reactive control system’s precise interaction withits environment. Therefore, we propose an approach for reinforcementlearning (RL) in partially observable environments. While assuming thatthe environment behaves like a partially observable Markov decision pro-cess with known discrete actions, we assume no knowledge about itsstructure or transition probabilities.Our approach combines Q-learning with IoAlergia, a method for learn-ing Markov decision processes (MDP). By learning MDP models of theenvironment from episodes of the RL agent, we enable RL in partially ob-servable domains without explicit, additional memory to track previousinteractions for dealing with ambiguities stemming from partial observ-ability. We instead provide RL with additional observations in the formof abstract environment states by simulating new experiences on learnedenvironment models to track the explored states. In our evaluation wereport on the validity of our approach and its promising performancein comparison to six state-of-the-art deep RL techniques with recurrentneural networks and fixed memory.

KW - Automata Learning

KW - Reinforcement Learning

KW - Partial Observability

KW - Markov Decision Processes

M3 - Conference Paper

VL - 14300

BT - 18th International Conference on integrated Formal Methods

PB - Springer

T2 - 18th International Conference on integrated Formal Methods (iFM)

Y2 - 13 November 2023 through 15 November 2023

ER -

Reinforcement learning under partial observability guided by learned environment models

Abstract

Conference

Keywords

Access to Document

Fingerprint

Cite this