Reinforcement learning under partial observability guided by learned environment models

Edi Muskardin, Ingo Pill, Martin Tappler, Bernhard K. Aichernig

Publikation: Konferenzband/Beitrag in Buch/BerichtKonferenzartikelBegutachtung

Abstract

In practical applications, we can rarely assume full observ-
ability of a system’s environment, despite such knowledge being impor-
tant for determining a reactive control system’s precise interaction with
its environment. Therefore, we propose an approach for reinforcement
learning (RL) in partially observable environments. While assuming that
the environment behaves like a partially observable Markov decision pro-
cess with known discrete actions, we assume no knowledge about its
structure or transition probabilities.
Our approach combines Q-learning with IoAlergia, a method for learn-
ing Markov decision processes (MDP). By learning MDP models of the
environment from episodes of the RL agent, we enable RL in partially ob-
servable domains without explicit, additional memory to track previous
interactions for dealing with ambiguities stemming from partial observ-
ability. We instead provide RL with additional observations in the form
of abstract environment states by simulating new experiences on learned
environment models to track the explored states. In our evaluation we
report on the validity of our approach and its promising performance
in comparison to six state-of-the-art deep RL techniques with recurrent
neural networks and fixed memory.
OriginalspracheEnglisch
Titel18th International Conference on integrated Formal Methods
Herausgeber (Verlag)Springer
Band14300
AuflageLNCS
PublikationsstatusVeröffentlicht - Sep. 2023
Veranstaltung18th International Conference on integrated Formal Methods (iFM) -
Dauer: 13 Nov. 202315 Nov. 2023
https://liacs.leidenuniv.nl/~bonsanguemm/ifm23/index.html

Konferenz

Konferenz18th International Conference on integrated Formal Methods (iFM)
KurztiteliFM
Zeitraum13/11/2315/11/23
Internetadresse

Fingerprint

Untersuchen Sie die Forschungsthemen von „Reinforcement learning under partial observability guided by learned environment models“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren