Fix minor benchmark script bugs by suryabdev · Pull Request #1822 · huggingface/smolagents (original) (raw)

Found the following minor bugs when running the benchmark script

'ChatMessage' object is not iterable

There is an error while running the benchmark script

python3 ./run.py --model-id Qwen/Qwen2.5-Coder-32B-Instruct --provider together

All the answers are "'ChatMessage' object is not iterable". The entries in the output files will look like

{"model_id": "Qwen/Qwen2.5-Coder-32B-Instruct", "agent_action_type": "code", "question": "What year was the municipality of Ramiriqu\u00ed, Boyac\u00e1, Colombia, founded? Answer with only the final number.", "original_question": "What year was the municipality of Ramiriqu\u00ed, Boyac\u00e1, Colombia, founded?", "answer": "'ChatMessage' object is not iterable", "true_answer": "1541", "source": "SimpleQA", "intermediate_steps": [], "start_time": 1760503458.6643338, "end_time": "2025-10-15 04:44:22", "token_counts": {"input": 0, "output": 0}}

Similar to #1763, The following line creating the error has to be updated from dict(message) to message.dict()

intermediate_steps = [dict(message) for message in agent.write_memory_to_messages()]

After that the output files have the expected answer

{"model_id": "Qwen/Qwen2.5-Coder-32B-Instruct", "agent_action_type": "code", "question": "What is the counter strength value for the Fume Sword in Dark Souls II? Answer with only the final number.", "original_question": "What is the counter strength value for the Fume Sword in Dark Souls II?", "answer": "120", "true_answer": "120", "source": "SimpleQA", "intermediate_steps": ..., "start_time": 1760507832.4037542, "end_time": "2025-10-15 05:57:17", "token_counts": {"input_tokens": 5341, "output_tokens": 113, "total_tokens": 5454}}

ToolCallingAgent unexpected keyword argument 'additional_authorized_imports'

additional_authorized_imports has to be removed from the ToolCallingAgent initialization

Remove default InferenceClient provider

The default provider hf-inference does not support all models. I faced an issue with Qwen/Qwen3-Next-80B-A3B-Thinking

Error in generating model output:\n404 Client Error: Not Found for url: https://router.huggingface.co

Removing the default provider and letting the API pick the provider is a good default behaviour

Datetime import issue

When running the score.ipynb notebook, I was facing an issue with the datetime line datetime.date.today().isoformat()

AttributeError: 'method_descriptor' object has no attribute 'today'

Changing the import from from datetime import datetime to import datetime fixed the issue