mims-harvard/TxAgent-T1-Llama-3.1-8B · Hugging Face (original) (raw)

TxAgent-T1 Model

ProjectPage PaperLink TxAgent-PIP ToolUniverse-PIP TxAgent ToolUniverse HuggingFace

Model Information

TxAgent-T1-Llama-3.1-8B is instruction tuned with TxAgent-Instruct dataset, a diverse synthetic multi-step reasoning and massive function call training dataset anchored in biomedical knowledge. TxAgent-T1-Llama-3.1-8B is finetuned based on Llama3.1-88-Instrcut.

TxAgent

Introduction

Precision therapeutics require multimodal adaptive models that generate personalized treatment recommendations. We introduce TxAgent, an AI agent that leverages multi-step reasoning and real-time biomedical knowledge retrieval across a toolbox of 211 tools to analyze drug interactions, contraindications, and patient-specific treatment strategies.

TxAgent outperforms leading LLMs, tool-use models, and reasoning agents across five new benchmarks: DrugPC, BrandPC, GenericPC, TreatmentPC, and DescriptionPC, covering 3,168 drug reasoning tasks and 456 personalized treatment scenarios.

By integrating multi-step inference, real-time knowledge grounding, and tool- assisted decision-making, TxAgent ensures that treatment recommendations align with established clinical guidelines and real-world evidence, reducing the risk of adverse events and improving therapeutic decision-making.

Model Training and Data

To generate TxAgent-Instruct dataset, we construct three datasets—a tooling dataset, a comprehensive therapeutic question dataset, and a reasoning trace dataset—using the auxiliary agent systems. The tooling dataset consists of augmented versions of 211 tools from ToolUniverse, where each tool's description is randomly rephrased to enhance variability. This enables TxAgent to learn how to use new tools rather than simply memorizing those in ToolUniverse. The comprehensive therapeutic question dataset includes 85,340 therapeutic questions and functional instructions designed to train TxAgent's abilities. These are generated by the QuestionGen agent system. The reasoning trace dataset comprises 85,340 detailed reasoning traces for answering therapeutic questions. These traces collectively encompass 177,626 reasoning steps and 281,695 function calls, all generated by the TraceGen agent system. By processing the data from these three datasets, we construct TxAgent-Instruct dataset, which comprises 378,027 instruction-tuning data samples. The agent systems generate training data grounded in biomedical knowledge and therapeutic scenarios by randomly sampling drugs and disease information from verified sources. Drug information is sourced from FDA drug labeling documents, while the disease list is derived from PrimeKG. Associations among drugs, diseases, phenotypes, and targets are compiled from Open Targets. To avoid any leakage of evaluation data into the training set, we exclude all drugs approved after 2023 from the training set and use drugs approved in 2024 as the source of evaluation data.

Please refer to our project page for more details: Project Page.

How to use

Install ToolUniverse:

# Install from source code:
git clone https://github.com/mims-harvard/ToolUniverse.git
cd ToolUniverse
python -m pip install . --no-cache-dir

# Install from pypi:
pip install tooluniverse

Install TxAgent:

# Install from source code:
git clone https://github.com/mims-harvard/TxAgent.git

python -m pip install . --no-cache-dir
# Install from pypi:
pip install txagent

Run the example with the run_example.py.

Run the gradio demo with the run_txagent_app.py.

Citation

@misc{gao2025txagent,
      title={TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools}, 
      author={Shanghua Gao and Richard Zhu and Zhenglun Kong and Ayush Noori and Xiaorui Su and Curtis Ginder and Theodoros Tsiligkaridis and Marinka Zitnik},
      year={2025},
      eprint={2503.10970},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2503.10970}, 
}

Contact

If you have any questions or suggestions, please email Shanghua Gao and Marinka Zitnik.