Mamba + Tensor Parallel Support by haileyschoelkopf · Pull Request #1184 · EleutherAI/gpt-neox (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation2 Commits9 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
This PR adds Mamba + TP support.
Loss curves comparing TP=2 to TP=1 with + without mamba_inner_func_fusion
:
Versus when allreduce was missing with inner func fusion turned on:
Also tested that PP seems to work.
LGTM, no comments.
@haileyschoelkopf -- As a final check, I'd like to verify that TP gives the expected memory benefits. Can you link the wandb here so I can take a look?
2 participants