Example Run Script — TensorRT LLM (original) (raw)
To build and run AutoDeploy example, use the examples/auto_deploy/build_and_run_ad.py script:
cd examples/auto_deploy python build_and_run_ad.py --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
You can configure your experiment with various options. Use the -h/--help flag to see available options:
python build_and_run_ad.py --help
The following is a non-exhaustive list of common configuration options:
For default values and additional configuration options, refer to the ExperimentConfig class in examples/auto_deploy/build_and_run_ad.py file.
The following is a more complete example of using the script:
cd examples/auto_deploy
python build_and_run_ad.py
--model "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
--args.world-size 2
--args.runtime "demollm"
--args.compile-backend "torch-compile"
--args.attn-backend "flashinfer"
--benchmark.enabled True