Deepspeed Megatron Distributed Training