Understand why Layer Normalization is critical for training deep Transformers. Learn how it differs from Batch Normalization.