Abstract
In decision-making processes, an intelligent agent with causal knowledge can optimize action spaces to avoid unnecessary exploration. Astructural causal banditframework provides guidance on how to prune actions that are unable to maximize reward by leveraging prior knowledge of the underlying causal structure among actions. A key assumption of this framework is that the agent has access to a fully-specified causal diagram representing the target system. In this paper, we extend the structural causal bandits to scenarios where the agent leverages a Markov equivalence class. In such cases, the causal structure is provided to the agent in the form of apartial ancestral graph(PAG). We propose a generalized framework for identifying potentially optimal actions within this graph structure, thereby broadening the applicability of structural causal bandits.