Mamba + Tensor Parallel Support by haileyschoelkopf · Pull Request #1184 · EleutherAI/gpt-neox (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation2 Commits9 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

haileyschoelkopf

This PR adds Mamba + TP support.

Loss curves comparing TP=2 to TP=1 with + without mamba_inner_func_fusion:

image

Versus when allreduce was missing with inner func fusion turned on:

image

Also tested that PP seems to work.

@Quentin-Anthony

LGTM, no comments.

@haileyschoelkopf -- As a final check, I'd like to verify that TP gives the expected memory benefits. Can you link the wandb here so I can take a look?

@haileyschoelkopf

@Quentin-Anthony

Quentin-Anthony

Quentin-Anthony

2 participants

@haileyschoelkopf @Quentin-Anthony