Master Q-Learning, the most popular off-policy TD control algorithm. Learn how the max operator helps agents learn optimal policies independently of behavior.