[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens by sfc-gh-zhwang · Pull Request #17033 · vllm-project/vllm (original) (raw)

[2025-04-24T05:49:22Z] FAILED spec_decode/e2e/test_multistep_correctness.py::test_spec_decode_e2e_greedy_correctness_with_preemption[1-4-256-test_llm_kwargs1-baseline_llm_kwargs0-per_test_common_llm_kwargs0-common_llm_kwargs0] - AssertionError: function <function test_spec_decode_e2e_greedy_correctness_with_preemption at 0x7f4d58803880> failed when called with args () and kwargs {'vllm_runner': <class 'tests.conftest.VllmRunner'>, 'common_llm_kwargs': {'block_size': 8, 'num_gpu_blocks_override': 34, 'max_model_len': 272, 'enforce_eager': True}, 'per_test_common_llm_kwargs': {'model_name': 'JackFram/llama-160m'}, 'baseline_llm_kwargs': {}, 'test_llm_kwargs': {'speculative_config': {'model': 'JackFram/llama-68m', 'num_speculative_tokens': 5}, 'enable_chunked_prefill': True, 'max_num_batched_tokens': 4, 'max_num_seqs': 4}, 'batch_size': 4, 'output_len': 256, 'seed': 1}
--
  | [2025-04-24T05:49:22Z] FAILED spec_decode/e2e/test_multistep_correctness.py::test_spec_decode_different_block_size[1-32-2-test_llm_kwargs1-baseline_llm_kwargs0-per_test_common_llm_kwargs0-common_llm_kwargs0] - AssertionError: function <function test_spec_decode_different_block_size at 0x7f4d588039c0> failed when called with args () and kwargs {'vllm_runner': <class 'tests.conftest.VllmRunner'>, 'common_llm_kwargs': {'model_name': 'JackFram/llama-160m', 'enforce_eager': True}, 'per_test_common_llm_kwargs': {'block_size': 8}, 'baseline_llm_kwargs': {}, 'test_llm_kwargs': {'speculative_config': {'model': 'JackFram/llama-68m', 'num_speculative_tokens': 5}, 'enable_chunked_prefill': True, 'max_num_batched_tokens': 4, 'max_num_seqs': 4}, 'batch_size': 2, 'output_len': 32, 'seed': 1}