Implement the SARSA algorithm for on-policy control. Learn how State-Action-Reward-State-Action tuples are used to optimize policies safely in RL environments.