Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
,
Mostofa Patwary
,
Raul Puri
,
Mohammad Shoeybi
,
Mostofa Patwary
,
Raul Puri
,
Patrick LeGresley
,
Jared Casper
,
Bryan Catanzaro
2019
arXiv (Cornell University)
815 citations