Learn value-based reinforcement learning. Compare Q-Learning and SARSA algorithms, and understand off-policy vs. on-policy learning for solving MDPs.