Deep dive into the Transformer architecture. Learn about encoders, decoders, and multi-head attention that power models like BERT.