Reinforcement learning under partial observability guided by learned environment models

Edi Muskardin, Ingo Pill, Martin Tappler, Bernhard K. Aichernig

Research output: Conference proceeding/Chapter in Book/Report/Conference Paperpeer-review

Abstract

In practical applications, we can rarely assume full observ-
ability of a system’s environment, despite such knowledge being impor-
tant for determining a reactive control system’s precise interaction with
its environment. Therefore, we propose an approach for reinforcement
learning (RL) in partially observable environments. While assuming that
the environment behaves like a partially observable Markov decision pro-
cess with known discrete actions, we assume no knowledge about its
structure or transition probabilities.
Our approach combines Q-learning with IoAlergia, a method for learn-
ing Markov decision processes (MDP). By learning MDP models of the
environment from episodes of the RL agent, we enable RL in partially ob-
servable domains without explicit, additional memory to track previous
interactions for dealing with ambiguities stemming from partial observ-
ability. We instead provide RL with additional observations in the form
of abstract environment states by simulating new experiences on learned
environment models to track the explored states. In our evaluation we
report on the validity of our approach and its promising performance
in comparison to six state-of-the-art deep RL techniques with recurrent
neural networks and fixed memory.
Original languageEnglish
Title of host publication18th International Conference on integrated Formal Methods
PublisherSpringer
Volume14300
EditionLNCS
Publication statusPublished - Sept 2023
Event18th International Conference on integrated Formal Methods (iFM) -
Duration: 13 Nov 202315 Nov 2023
https://liacs.leidenuniv.nl/~bonsanguemm/ifm23/index.html

Conference

Conference18th International Conference on integrated Formal Methods (iFM)
Abbreviated titleiFM
Period13/11/2315/11/23
Internet address

Keywords

  • Automata Learning
  • Reinforcement Learning
  • Partial Observability
  • Markov Decision Processes

Fingerprint

Dive into the research topics of 'Reinforcement learning under partial observability guided by learned environment models'. Together they form a unique fingerprint.

Cite this