Fix minor benchmark script bugs by suryabdev · Pull Request #1822 · huggingface/smolagents (original) (raw)
Found the following minor bugs when running the benchmark script
'ChatMessage' object is not iterable
There is an error while running the benchmark script
python3 ./run.py --model-id Qwen/Qwen2.5-Coder-32B-Instruct --provider together
All the answers are "'ChatMessage' object is not iterable". The entries in the output files will look like
{"model_id": "Qwen/Qwen2.5-Coder-32B-Instruct", "agent_action_type": "code", "question": "What year was the municipality of Ramiriqu\u00ed, Boyac\u00e1, Colombia, founded? Answer with only the final number.", "original_question": "What year was the municipality of Ramiriqu\u00ed, Boyac\u00e1, Colombia, founded?", "answer": "'ChatMessage' object is not iterable", "true_answer": "1541", "source": "SimpleQA", "intermediate_steps": [], "start_time": 1760503458.6643338, "end_time": "2025-10-15 04:44:22", "token_counts": {"input": 0, "output": 0}}
Similar to #1763, The following line creating the error has to be updated from dict(message) to message.dict()
| intermediate_steps = [dict(message) for message in agent.write_memory_to_messages()] |
|---|
After that the output files have the expected answer
{"model_id": "Qwen/Qwen2.5-Coder-32B-Instruct", "agent_action_type": "code", "question": "What is the counter strength value for the Fume Sword in Dark Souls II? Answer with only the final number.", "original_question": "What is the counter strength value for the Fume Sword in Dark Souls II?", "answer": "120", "true_answer": "120", "source": "SimpleQA", "intermediate_steps": ..., "start_time": 1760507832.4037542, "end_time": "2025-10-15 05:57:17", "token_counts": {"input_tokens": 5341, "output_tokens": 113, "total_tokens": 5454}}
ToolCallingAgent unexpected keyword argument 'additional_authorized_imports'
additional_authorized_imports has to be removed from the ToolCallingAgent initialization
Remove default InferenceClient provider
The default provider hf-inference does not support all models. I faced an issue with Qwen/Qwen3-Next-80B-A3B-Thinking
Error in generating model output:\n404 Client Error: Not Found for url: https://router.huggingface.co
Removing the default provider and letting the API pick the provider is a good default behaviour
Datetime import issue
When running the score.ipynb notebook, I was facing an issue with the datetime line datetime.date.today().isoformat()
AttributeError: 'method_descriptor' object has no attribute 'today'
Changing the import from from datetime import datetime to import datetime fixed the issue