Make your LLM faster and cheaper. Learn about quantization (INT8/FP4), pruning, and optimized kernels like FlashAttention.