[Perf] API-server scaleout with many-to-many server-engine comms by njhill · Pull Request #17546 · vllm-project/vllm (original) (raw)
added 19 commits
Signed-off-by: Nick Hill nhill@redhat.com
Signed-off-by: Nick Hill nhill@redhat.com
Signed-off-by: Nick Hill nhill@redhat.com
Signed-off-by: Nick Hill nhill@redhat.com
Signed-off-by: Nick Hill nhill@redhat.com
…-engines
Signed-off-by: Nick Hill nhill@redhat.com
Conflicts:
vllm/v1/engine/core_client.py
vllm/v1/utils.py
Signed-off-by: Nick Hill nhill@redhat.com
Signed-off-by: Nick Hill nhill@redhat.com
Signed-off-by: Nick Hill nhill@redhat.com
Conflicts:
vllm/config.py
vllm/engine/arg_utils.py
vllm/v1/engine/core.py
vllm/v1/engine/core_client.py
Signed-off-by: Nick Hill nhill@redhat.com
Conflicts:
vllm/v1/engine/core.py
vllm/v1/engine/core_client.py
Signed-off-by: Nick Hill nhill@redhat.com
Signed-off-by: Nick Hill nhill@redhat.com
Signed-off-by: Nick Hill nhill@redhat.com
…-engines
Signed-off-by: Nick Hill nhill@redhat.com
Conflicts:
vllm/config.py
vllm/v1/engine/core.py
Signed-off-by: Nick Hill nhill@redhat.com
Conflicts:
vllm/v1/engine/core_client.py
vllm/v1/utils.py
…-engines
Signed-off-by: Nick Hill nhill@redhat.com
Conflicts:
vllm/v1/engine/core.py
vllm/v1/engine/core_client.py
Signed-off-by: Nick Hill nhill@redhat.com
Avoid exception but still needs more work to be functional with multiple api server procs.
Signed-off-by: Nick Hill nhill@redhat.com
Signed-off-by: Nick Hill nhill@redhat.com
Signed-off-by: Nick Hill nhill@redhat.com
Signed-off-by: Nick Hill nhill@redhat.com
Signed-off-by: Nick Hill nhill@redhat.com
Signed-off-by: Nick Hill nhill@redhat.com
Conflicts:
tests/v1/engine/test_engine_core.py
vllm/v1/engine/core.py
vllm/v1/engine/core_client.py
njhill added a commit to njhill/vllm that referenced this pull request
Introduced in vllm-project#17546. We should only call mark_process_dead when we're using prometheus multiprocessing mode (with > 1 API servers).
Signed-off-by: Nick Hill nhill@redhat.com
amitm02 pushed a commit to amitm02/vllm that referenced this pull request
amitm02 pushed a commit to amitm02/vllm that referenced this pull request
0826joyce pushed a commit to 0826joyce/vllm-serving-optimization that referenced this pull request
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})