What if you didn’t have any hard goals..? And got rewards continually? And have stochastic actions? MDPs as Utility-based problem solving agents.
Abductive Logic Programming Agents
Making complex decisions