Explore policy gradient methods for optimizing agent behavior. Compare value-based vs policy-based reinforcement learning approaches.