Abstract
In practical applications, we can rarely assume full observ-
ability of a system’s environment, despite such knowledge being impor-
tant for determining a reactive control system’s precise interaction with
its environment. Therefore, we propose an approach for reinforcement
learning (RL) in partially observable environments. While assuming that
the environment behaves like a partially observable Markov decision pro-
cess with known discrete actions, we assume no knowledge about its
structure or transition probabilities.
Our approach combines Q-learning with IoAlergia, a method for learn-
ing Markov decision processes (MDP). By learning MDP models of the
environment from episodes of the RL agent, we enable RL in partially ob-
servable domains without explicit, additional memory to track previous
interactions for dealing with ambiguities stemming from partial observ-
ability. We instead provide RL with additional observations in the form
of abstract environment states by simulating new experiences on learned
environment models to track the explored states. In our evaluation we
report on the validity of our approach and its promising performance
in comparison to six state-of-the-art deep RL techniques with recurrent
neural networks and fixed memory.
ability of a system’s environment, despite such knowledge being impor-
tant for determining a reactive control system’s precise interaction with
its environment. Therefore, we propose an approach for reinforcement
learning (RL) in partially observable environments. While assuming that
the environment behaves like a partially observable Markov decision pro-
cess with known discrete actions, we assume no knowledge about its
structure or transition probabilities.
Our approach combines Q-learning with IoAlergia, a method for learn-
ing Markov decision processes (MDP). By learning MDP models of the
environment from episodes of the RL agent, we enable RL in partially ob-
servable domains without explicit, additional memory to track previous
interactions for dealing with ambiguities stemming from partial observ-
ability. We instead provide RL with additional observations in the form
of abstract environment states by simulating new experiences on learned
environment models to track the explored states. In our evaluation we
report on the validity of our approach and its promising performance
in comparison to six state-of-the-art deep RL techniques with recurrent
neural networks and fixed memory.
Original language | English |
---|---|
Title of host publication | 18th International Conference on integrated Formal Methods |
Publisher | Springer |
Volume | 14300 |
Edition | LNCS |
Publication status | Published - Sept 2023 |
Event | 18th International Conference on integrated Formal Methods (iFM) - Duration: 13 Nov 2023 → 15 Nov 2023 https://liacs.leidenuniv.nl/~bonsanguemm/ifm23/index.html |
Conference
Conference | 18th International Conference on integrated Formal Methods (iFM) |
---|---|
Abbreviated title | iFM |
Period | 13/11/23 → 15/11/23 |
Internet address |
Keywords
- Automata Learning
- Reinforcement Learning
- Partial Observability
- Markov Decision Processes