[Tensor Parallelism] Megatron-LM to transformers 路 Issue #10321 路 huggingface/transformers (original) (raw)
馃殌 Feature request
Splitting the discussion that started here: #10301 (comment) to add the potential future feature of transformers and it's Tensor Parallelism (Horizontal Model Parallelism) - for bigger context please see Parallelism notes.
Let's start with important clarification: MP can mean many different things
- Vertical MP - slice the layers vertically - one or more full layers placed on each gpu = Vertical MP - in which case VertMP is a simple version of PP with chunks=1
- Horizontal MP - slice the layers horizontally - place a slice of a full model on each gpu - Example Megatron-LM
At the moment I think it's only Megatron-LM that implements Horizontal MP. @anthon-l has ported that model to transformers
, except the Horizontal MP parts, since currently transformers
doesn't yet have support for it. There is already naive Vertical MP in t5 and gpt2 thanks to @alexorona's work, I ported Bart too but it's unmerged, and there is an ongoing effort to figure out how to implement the Pipeline. All these will have to co-operate with each other and also share common tools.
@anton-l started sharing what needs to be done to make that important feature available - and then down the road potentially make it available to other (all?) transformers
models.
@anton-l, the floor is yours.