Research

ECAI 2023, Santiago, Spain

I study reinforcement learning with deep neural networks, focusing on transformer-based agents, representation learning, and interpretability. My work is part of the 3AI Project between hessian.AI and the Artificial Intelligence and Machine Learning Lab at TU Darmstadt.

Quick navigation:

Research Topics

A snapshot of the themes that guide my current work.

Representation Learning

Learning compact and semantically meaningful state representations that make decision-making more data-efficient and robust.

Transformer-based Agents

Exploring attention-based architectures for long-horizon reasoning and generalization in RL.

Interpretability

Understanding why agents act the way they do by emphasizing object-centric views and causal cues.

Robustness

Designing agents that remain reliable under distribution shifts and noisy, imperfect environments.

Publications

Grouped by venue type for quick scanning.

Journals

  • Jannis Blüml* , Cedric Derstroff*, Bjarne Gregori, Elisabeth Dillies, Quentin Delfosse, Kristian Kersting (2026): Do Object Channels Improve Robustness in Deep Reinforcement Learning? At Transactions in Machine Learning Research (TMLR). More Information
  • F. Helfenstein*, J. Czech*, J. Blüml* , M. Eisel and K. Kersting, (2026): Checkmating One, by Using Many: Combining Mixture of Experts With MCTS to Improve in Chess. IEEE Transactions on Games.
  • Quentin Delfosse*, Jannis Blüml*, Bjarne Gregori, Sebastian Sztwiertnia, Kristian Kersting (2024): OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments. Reinforcement Learning Journal Vol. 1 (RLJ). More Information
  • Jannis Blüml, Johannes Czech, Kristian Kersting (2023): AlphaZe**: AlphaZero-like baselines for imperfect information games are surprisingly strong. Frontiers in Artificial intelligence 6. More Information

Conferences

  • Jannis Blüml, Moritz Huppert, Nora Khayata, Joachim Schmidt, Thomas Schneider, SING: Improving the Efficiency of MPC Protocol Assignment using Graph Neural Networks. In Proceedings of the 2026 Usenix Security Symposium.
  • Johannes Czech, Jannis Blüml, Kristian Kersting, Hedinn Steingrimsson (2024): Representation Matters for Mastering Chess: Improved Feature Representation in AlphaZero Outperforms Changing to Transformers. In Proceedings of the 27th European Conference on Artificial Intelligence (ECAI). More Information

Workshops

  • Raban Emunds, Jannis Blüml, Quentin Delfosse, Kristian Kersting (2025): Interpretable Reinforcement Learning via Meta-Policy Guidance. At 17th European Workshop on Reinforcement Learning (EWRL 2025).
  • Elisabeth Dillies*, Quentin Delfosse*, Jannis Blüml, Raban Emunds, Florian Peter Busch, Kristian Kersting (2025): Better Decisions through the Right Causal World Model. At RLDM 2025. More Information
  • Jannis Blüml, Cedric Derstroff, Elisabeth Dillies, Quentin Delfosse, Kristian Kersting (2025): Balancing Abstraction and Spatial Relationships for Robust Reinforcement Learning. At RLDM 2025.
  • Cedric Derstroff, Jannis Brugger, Jannis Blüml, Mira Mezini, Stefan Kramer, Kristian Kersting (2025): Amplifying Exploration in Monte-Carlo Tree Search by Focusing on the Unknown At RLDM 2025.
  • Timo Kaufmann*, Jannis Blüml*, Antonia Wüst*, Quentin Delfosse*, Kristian Kersting, Eyke Hüllermeier (2024): OCALM: Object-Centric Assessment with Language Models. At RLC 2024 Workshop on Reinforcement Learning Beyond Rewards. More Information
  • Quentin Delfosse*, Jannis Blüml*, Bjarne Gregori, Kristian Kersting (2024): HackAtari: Atari Learning Environments for Robust and Continual Reinforcement Learning. At RLC 2024 Workshop on Interpretable Policies in Reinforcement Learning. More Information

Preprint

  • Can Cömer, Jannis Blüml, Cedric Derstroff, Kristian Kersting (2025): Polynomial Regret Concentration of UCB for Non-Deterministic State Transitions. arXiv preprint arXiv:2502.06900
  • Yannik Keller, Jannis Blüml, Gopika Sudhakaran, Kristian Kersting (2023): From Images to Connections: Can DQN with GNNs learn the Strategic Game of Hex?. arXiv preprint arXiv:2311.13414

Reviewing

Serving as a reviewer at RLC, AAAI, IEEE ToG, NeurIPS

Projects

Selected ongoing work and recently completed projects.

Object-centric Representation for Interpretability and Robustness

The Arcade Learning Environments platform is widely used to train deep RL agents. Object-centric environments help build interpretable agents and compare against object-aware baselines. The results can be found here. We also develop new methods to adapt environments to search for misalignments and shortcut learning, enabling more robust agents while focusing on object-centric input representations.

Feature Representation in AlphaZero.

While transformers have gained the reputation as the "Swiss army knife of AI", no one has challenged them to master the game of chess, one of the classical AI benchmarks. Simply using vision transformers (ViTs) within AlphaZero does not master the game of chess, mainly because ViTs are too slow. Even making them more efficient using a combination of MobileNet and NextViT does not beat what actually matters: a simple change of the input representation and value loss, resulting in a greater boost of up to 180 Elo points over AlphaZero. The results can be found here. Further we researched how information in games like Stratego (Information Sets) or Hex (Graph-based Inputs) can be represented to enable methods like AlphaZero to work.