Reinforcement learning under partial observability guided by learned environment models

Edi Muskardin; Ingo Pill; Martin Tappler; Bernhard K. Aichernig

Reinforcement learning under partial observability guided by learned environment models

Edi Muskardin, Ingo Pill, Martin Tappler, Bernhard K. Aichernig

Trustworthy & Adaptive Computing (TAC)

Publikation: Konferenzband/Beitrag in Buch/Bericht › Konferenzartikel › Begutachtung

Abstract

In practical applications, we can rarely assume full observ-
ability of a system’s environment, despite such knowledge being impor-
tant for determining a reactive control system’s precise interaction with
its environment. Therefore, we propose an approach for reinforcement
learning (RL) in partially observable environments. While assuming that
the environment behaves like a partially observable Markov decision pro-
cess with known discrete actions, we assume no knowledge about its
structure or transition probabilities.
Our approach combines Q-learning with IoAlergia, a method for learn-
ing Markov decision processes (MDP). By learning MDP models of the
environment from episodes of the RL agent, we enable RL in partially ob-
servable domains without explicit, additional memory to track previous
interactions for dealing with ambiguities stemming from partial observ-
ability. We instead provide RL with additional observations in the form
of abstract environment states by simulating new experiences on learned
environment models to track the explored states. In our evaluation we
report on the validity of our approach and its promising performance
in comparison to six state-of-the-art deep RL techniques with recurrent
neural networks and fixed memory.

Originalsprache	Englisch
Titel	18th International Conference on integrated Formal Methods
Herausgeber (Verlag)	Springer
Band	14300
Auflage	LNCS
Publikationsstatus	Veröffentlicht - Sep. 2023
Veranstaltung	18th International Conference on integrated Formal Methods (iFM) - Dauer: 13 Nov. 2023 → 15 Nov. 2023 https://liacs.leidenuniv.nl/~bonsanguemm/ifm23/index.html

Konferenz

Konferenz	18th International Conference on integrated Formal Methods (iFM)
Kurztitel	iFM
Zeitraum	13/11/23 → 15/11/23
Internetadresse	https://liacs.leidenuniv.nl/~bonsanguemm/ifm23/index.html

Zugriff auf Dokument

https://link.springer.com/chapter/10.1007/978-3-031-47705-8_14

Dieses zitieren

@inproceedings{9afa8bf7001348dfa369d787869b5d3e,

title = "Reinforcement learning under partial observability guided by learned environment models",

abstract = "In practical applications, we can rarely assume full observ-ability of a system{\textquoteright}s environment, despite such knowledge being impor-tant for determining a reactive control system{\textquoteright}s precise interaction withits environment. Therefore, we propose an approach for reinforcementlearning (RL) in partially observable environments. While assuming thatthe environment behaves like a partially observable Markov decision pro-cess with known discrete actions, we assume no knowledge about itsstructure or transition probabilities.Our approach combines Q-learning with IoAlergia, a method for learn-ing Markov decision processes (MDP). By learning MDP models of theenvironment from episodes of the RL agent, we enable RL in partially ob-servable domains without explicit, additional memory to track previousinteractions for dealing with ambiguities stemming from partial observ-ability. We instead provide RL with additional observations in the formof abstract environment states by simulating new experiences on learnedenvironment models to track the explored states. In our evaluation wereport on the validity of our approach and its promising performancein comparison to six state-of-the-art deep RL techniques with recurrentneural networks and fixed memory.",

keywords = "Automata Learning, Reinforcement Learning, Partial Observability, Markov Decision Processes",

author = "Edi Muskardin and Ingo Pill and Martin Tappler and Aichernig, {Bernhard K.}",

year = "2023",

month = sep,

language = "English",

volume = "14300",

booktitle = "18th International Conference on integrated Formal Methods",

publisher = "Springer",

address = "Germany",

edition = "LNCS",

note = "18th International Conference on integrated Formal Methods (iFM), iFM ; Conference date: 13-11-2023 Through 15-11-2023",

url = "https://liacs.leidenuniv.nl/~bonsanguemm/ifm23/index.html",

}

Muskardin, E , Pill, I, Tappler, M & Aichernig, BK 2023, Reinforcement learning under partial observability guided by learned environment models. in 18th International Conference on integrated Formal Methods. LNCS Aufl., Bd. 14300, Springer, 18th International Conference on integrated Formal Methods (iFM), 13/11/23. <https://link.springer.com/chapter/10.1007/978-3-031-47705-8_14>

TY - GEN

T1 - Reinforcement learning under partial observability guided by learned environment models

AU - Muskardin, Edi

AU - Pill, Ingo

AU - Tappler, Martin

AU - Aichernig, Bernhard K.

PY - 2023/9

Y1 - 2023/9

N2 - In practical applications, we can rarely assume full observ-ability of a system’s environment, despite such knowledge being impor-tant for determining a reactive control system’s precise interaction withits environment. Therefore, we propose an approach for reinforcementlearning (RL) in partially observable environments. While assuming thatthe environment behaves like a partially observable Markov decision pro-cess with known discrete actions, we assume no knowledge about itsstructure or transition probabilities.Our approach combines Q-learning with IoAlergia, a method for learn-ing Markov decision processes (MDP). By learning MDP models of theenvironment from episodes of the RL agent, we enable RL in partially ob-servable domains without explicit, additional memory to track previousinteractions for dealing with ambiguities stemming from partial observ-ability. We instead provide RL with additional observations in the formof abstract environment states by simulating new experiences on learnedenvironment models to track the explored states. In our evaluation wereport on the validity of our approach and its promising performancein comparison to six state-of-the-art deep RL techniques with recurrentneural networks and fixed memory.

AB - In practical applications, we can rarely assume full observ-ability of a system’s environment, despite such knowledge being impor-tant for determining a reactive control system’s precise interaction withits environment. Therefore, we propose an approach for reinforcementlearning (RL) in partially observable environments. While assuming thatthe environment behaves like a partially observable Markov decision pro-cess with known discrete actions, we assume no knowledge about itsstructure or transition probabilities.Our approach combines Q-learning with IoAlergia, a method for learn-ing Markov decision processes (MDP). By learning MDP models of theenvironment from episodes of the RL agent, we enable RL in partially ob-servable domains without explicit, additional memory to track previousinteractions for dealing with ambiguities stemming from partial observ-ability. We instead provide RL with additional observations in the formof abstract environment states by simulating new experiences on learnedenvironment models to track the explored states. In our evaluation wereport on the validity of our approach and its promising performancein comparison to six state-of-the-art deep RL techniques with recurrentneural networks and fixed memory.

KW - Automata Learning

KW - Reinforcement Learning

KW - Partial Observability

KW - Markov Decision Processes

M3 - Conference Paper

VL - 14300

BT - 18th International Conference on integrated Formal Methods

PB - Springer

T2 - 18th International Conference on integrated Formal Methods (iFM)

Y2 - 13 November 2023 through 15 November 2023

ER -

Reinforcement learning under partial observability guided by learned environment models

Abstract

Konferenz

Zugriff auf Dokument

Fingerprint

Dieses zitieren