Example Run Script — TensorRT LLM (original) (raw)

To build and run AutoDeploy example, use the examples/auto_deploy/build_and_run_ad.py script:

cd examples/auto_deploy python build_and_run_ad.py --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

You can configure your experiment with various options. Use the -h/--help flag to see available options:

python build_and_run_ad.py --help

The following is a non-exhaustive list of common configuration options:

For default values and additional configuration options, refer to the ExperimentConfig class in examples/auto_deploy/build_and_run_ad.py file.

The following is a more complete example of using the script:

cd examples/auto_deploy python build_and_run_ad.py
--model "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
--args.world-size 2
--args.runtime "demollm"
--args.compile-backend "torch-compile"
--args.attn-backend "flashinfer"
--benchmark.enabled True