Full Description
This thesis introduces Auto-BENEDICT, a novel, fully automated methodology designed to generate human-comprehensible causal explanations for model-free Reinforcement Learning (RL) agents. The system addresses the trade-off between high performance and transparency in RL by integrating Bayesian Networks for causal inference and Recurrent Neural Networks to forecast future states and actions. The method provides answers to both "Why" and "Why not" questions, thereby increasing user trust and interpretability. The work also introduces enhanced importance metrics—including both Q-value-based and graph-based approaches—used to detect distal information, i.e., critical sequences of states or actions that are key to solving a task. These metrics are then fused with the causal explanation framework, resulting in Auto-BENEDICT, which not only explains but also recognizes high-risk or critical states automatically. Validation through computational experiments and a human evaluation study shows that Auto-BENEDICT significantly outperforms traditional methods in comprehensibility and trustworthiness, contributing a major advancement in Explainable Reinforcement Learning.



