Smolagents Tool Development: Complete Guide to Building AI Agent Tools
Note: these articles are auto generated from my Obsidian notebook by Claude
Smolagents by Hugging Face is rapidly becoming one of the best AI agents frameworks available. This comprehensive guide will teach you how to create powerful tools that extend smolagents' capabilities, whether you're building simple utility functions or complex stateful operations for your AI development projects.
Overview
Smolagents stands out among AI agent frameworks by providing two elegant approaches to tool creation. This flexibility makes it ideal for both rapid prototyping and production-grade AI development. Whether you're working on AI engineer jobs or building the next generation of AI applications, understanding smolagents tool development is crucial.
The framework offers two primary methods for creating tools:
- @tool decorator: Perfect for simple, single-function tools that need minimal setup
- Tool class: Ideal for complex tools requiring multiple methods, state management, or sophisticated workflows
Tools are the primary interface through which agents interact with external systems, databases, and APIs. They transform smolagents from a simple conversation framework into a powerful automation platform.
Tool Structure
Core Requirements
Every smolagents tool must include these five essential elements:
- Clear name: Descriptive function or class name that agents can understand
- Type hints: All parameters and return types must be annotated
- Comprehensive docstring: Must include Description, Args, and Returns sections
- Self-contained imports: All imports must be inside the function/method
- String return value: Tools should return strings for agent consumption
Docstring Format
The docstring is crucial for agent discovery and understanding. Here's the required format:
"""Brief one-line description of what the tool does
Longer description explaining when and why to use this tool.
Include any important context or limitations.
Args:
param1: Description of first parameter
param2: Description of second parameter (default: value)
Returns:
Description of what the tool returns
"""
Creating Tools with @tool Decorator
Basic Structure
The @tool decorator is the simplest way to create tools for smolagents. Here's the basic pattern:
from smolagents import tool
import logging
logger = logging.getLogger(__name__)
@tool
def my_tool_name(param1: str, param2: int = 5) -> str:
"""Brief description of tool functionality
This tool does X when given Y. It's useful for Z scenarios.
Args:
param1: The main input parameter
param2: Optional configuration (default: 5)
Returns:
Formatted string with results
"""
# All imports go inside the function
import os
import json
from datetime import datetime
logger.info(f"Starting my_tool with {param1}")
try:
# Tool implementation
result = process_something(param1, param2)
return f"Successfully processed: {result}"
except Exception as e:
logger.error(f"Error in my_tool: {e}")
return f"Error: {str(e)}"
Real Example: URL Filtering Tool
Here's a practical example from a production codebase showing how to create a tool that filters URLs for web scraping:
@tool
def filter_urls_for_scraping(company_name: str, max_urls: int = 5) -> str:
"""Select the most relevant URLs from the stored lead report
This tool retrieves the latest report for a company and intelligently
filters URLs to return only those most likely to contain real information
about the target company.
Args:
company_name: Name of the company to filter URLs for
max_urls: Maximum number of URLs to return (default: 5)
Returns:
Formatted list of URLs with relevance scores and reasons
"""
import os
import json
from datetime import datetime
logger.info(f"🎯 Filtering URLs for {company_name} (max: {max_urls})")
# Implementation would include:
# 1. Load stored reports
# 2. Score URLs based on relevance
# 3. Format and return top URLs
try:
# Load data, score URLs, format results
results = get_scored_urls(company_name, max_urls)
return format_url_results(results)
except Exception as e:
logger.error(f"Failed to filter URLs: {e}")
return f"Error filtering URLs: {str(e)}"
Creating Tools with Tool Class
When to Use Tool Class
Use the Tool class approach when you need:
- Multiple helper methods
- State management between calls
- Complex initialization or configuration
- Reusable components across methods
Basic Structure
from smolagents import Tool
import logging
logger = logging.getLogger(__name__)
class ComplexTool(Tool):
name = "complex_tool"
description = """
This tool performs complex operations requiring multiple steps.
It maintains state and can handle sophisticated workflows.
"""
inputs = {
"query": {
"type": "string",
"description": "The search query to process"
},
"max_results": {
"type": "integer",
"description": "Maximum number of results to return"
}
}
output_type = "string"
def __init__(self):
super().__init__()
# Initialize any state or configuration
self.config = self._load_config()
def forward(self, query: str, max_results: int = 10) -> str:
"""Execute the tool's main functionality"""
import requests
import json
try:
results = self._process_query(query, max_results)
return self._format_results(results)
except Exception as e:
logger.error(f"Error in {self.name}: {e}")
return f"Error: {str(e)}"
def _process_query(self, query: str, max_results: int):
"""Helper method for processing queries"""
# Implementation details
pass
def _format_results(self, results):
"""Helper method for formatting output"""
# Implementation details
pass
Best Practices
1. Error Handling
Always implement comprehensive error handling to ensure your tools fail gracefully:
@tool
def safe_tool(input: str) -> str:
"""Tool with proper error handling"""
try:
# Main logic
result = risky_operation(input)
return format_success(result)
except SpecificError as e:
logger.warning(f"Expected error: {e}")
return f"Could not process: {e}"
except Exception as e:
logger.error(f"Unexpected error: {e}")
return f"Error: {str(e)}"
2. Logging Best Practices
Use logging for debugging without cluttering the agent's output:
logger.info("🔍 Starting operation") # Use emojis sparingly for key events
logger.debug(f"Processing {len(items)} items") # Detailed debug info
logger.error(f"Failed to connect: {e}") # Always log errors
3. Return Value Formatting
Format outputs for optimal agent readability:
@tool
def get_data(query: str) -> str:
"""Get formatted data"""
data = fetch_data(query)
# Format as readable text
output = f"Found {len(data)} results for '{query}'\n"
output += "=" * 50 + "\n\n"
for i, item in enumerate(data[:5], 1):
output += f"{i}. {item['title']}\n"
output += f" URL: {item['url']}\n"
output += f" Score: {item['score']}/100\n\n"
return output
4. Database Connections
Always use context managers and handle connection errors properly:
@tool
def database_tool(query: str) -> str:
"""Tool that uses database"""
import sqlite3
import os
db_path = os.path.join(os.path.dirname(__file__), '..', 'data', 'app.db')
try:
with sqlite3.connect(db_path) as conn:
cursor = conn.cursor()
cursor.execute(query)
results = cursor.fetchall()
return format_results(results)
except sqlite3.Error as e:
return f"Database error: {e}"
5. Environment Variables
Load environment variables inside the tool for better portability:
@tool
def api_tool(endpoint: str) -> str:
"""Tool that uses API"""
import os
import requests
from dotenv import load_dotenv
# Load environment variables
env_path = os.path.join(os.path.dirname(__file__), '..', '.env')
load_dotenv(env_path)
api_key = os.getenv('API_KEY')
if not api_key:
return "Error: API_KEY not found in environment"
# Use the API key...
headers = {"Authorization": f"Bearer {api_key}"}
response = requests.get(endpoint, headers=headers)
return response.text
Examples from Production Codebase
1. Contact Storage Tool
This example shows how to handle database operations with validation and deduplication:
@tool
def store_contact(
name: str,
company: str,
email: str = None,
phone: str = None,
title: str = None,
linkedin_url: str = None,
source_url: str = None,
confidence: float = 0.5
) -> str:
"""Store a business contact in the database
Validates and stores contact information with deduplication.
Args:
name: Full name of the contact
company: Company name
email: Email address (optional)
phone: Phone number (optional)
title: Job title (optional)
linkedin_url: LinkedIn profile URL (optional)
source_url: Where this contact was found (optional)
confidence: Confidence score 0-1 (default: 0.5)
Returns:
Success or error message
"""
import sqlite3
import re
from datetime import datetime
# Validation
if not name or not company:
return "Error: Name and company are required"
# Email validation
if email and not re.match(r'^[\w\.-]+@[\w\.-]+\.\w+$', email):
return f"Error: Invalid email format: {email}"
# Implementation continues with deduplication and storage...
2. Playwright Scraper Tool
This example demonstrates how to wrap async operations for synchronous agents:
@tool
def playwright_scraper(url: str, wait_time: int = 5000, referer: Optional[str] = None) -> str:
"""Scrape URL using Playwright with stealth mode
Args:
url: Target page to scrape
wait_time: Extra delay in ms after page load (default: 5000)
referer: Optional referer header
Returns:
Raw HTML content or error message
"""
import asyncio
from typing import Optional
async def _scrape(url: str, wait_time: int, referer: Optional[str]):
from playwright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
)
if referer:
await context.set_extra_http_headers({"Referer": referer})
page = await context.new_page()
await page.goto(url, wait_until='networkidle')
await page.wait_for_timeout(wait_time)
content = await page.content()
await browser.close()
return content
try:
# Wrap async operation for sync agents
return asyncio.run(_scrape(url, wait_time, referer))
except Exception as e:
logger.exception("Scrape failed: %s", e)
return f"Error: {e}"
Integration with Agents
1. Importing Tools
First, import your tools into your agent script:
# In your agent file
from tools.my_tool import my_tool_function
from tools.complex_tool import ComplexTool
# Create tool instances
simple_tool = my_tool_function # Decorated function
complex_tool = ComplexTool() # Class instance
2. Creating an Agent with Tools
Add your tools to the agent during initialization:
from smolagents import ToolCallingAgent, LiteLLMModel
# Configure model
model = LiteLLMModel(model_id="gpt-4")
# Create agent with tools
agent = ToolCallingAgent(
model=model,
tools=[
simple_tool,
complex_tool,
GoogleSearchTool(provider="serper"), # Built-in tool
],
max_steps=10
)
# Run agent
result = agent.run("Use my tools to accomplish X")
print(result)
3. How Agents Discover Tools
Agents discover and understand tools through their metadata:
- Function/class name becomes the tool identifier
- Docstring becomes the tool description in the agent's prompt
- Type hints define parameter types and validation
- The agent's system prompt includes all available tool descriptions
Testing Tools
1. Standalone Testing
Test your tools independently before integration:
if __name__ == "__main__":
# Test the tool directly
import logging
logging.basicConfig(level=logging.INFO)
# Test with sample inputs
result = my_tool("test input", 10)
print(f"Result: {result}")
# Test error cases
error_result = my_tool("", -1)
print(f"Error case: {error_result}")
2. Unit Testing
Create comprehensive unit tests for your tools:
import pytest
from tools.my_tool import my_tool
def test_my_tool_success():
result = my_tool("valid input")
assert "Successfully" in result
assert not result.startswith("Error:")
def test_my_tool_error():
result = my_tool("")
assert result.startswith("Error:")
def test_my_tool_formatting():
result = my_tool("test", max_results=3)
lines = result.split('\n')
assert len([l for l in lines if l.strip()]) <= 10
3. Integration Testing
Test tools within the agent context:
def test_tool_with_agent():
from smolagents import ToolCallingAgent, LiteLLMModel
model = LiteLLMModel(model_id="gpt-3.5-turbo")
agent = ToolCallingAgent(
model=model,
tools=[my_tool],
max_steps=3
)
result = agent.run("Use my_tool to process 'test data'")
assert "test data" in result
Common Patterns
1. URL Filtering and Scoring
def _score_url(url: str, company_name: str) -> Tuple[int, str]:
"""Score URL relevance (0-100)"""
url_lower = url.lower()
company_lower = company_name.lower()
if company_name in url:
return 100, "Official company domain"
elif 'linkedin.com/company/' in url_lower:
return 90, "LinkedIn company page"
elif 'crunchbase.com' in url_lower and company_lower in url_lower:
return 85, "Crunchbase profile"
elif 'github.com' in url_lower:
return 80, "GitHub organization"
else:
return 50, "General mention"
2. Data Formatting for Agents
def _format_results(data: List[Dict]) -> str:
"""Format data for agent consumption"""
output = f"Found {len(data)} results\n"
output += "=" * 60 + "\n\n"
for i, item in enumerate(data, 1):
output += f"{i}. {item['name']}\n"
for key, value in item.items():
if key != 'name' and value:
output += f" {key.title()}: {value}\n"
output += "\n"
return output
3. Progress Tracking
@tool
def long_running_tool(items: List[str]) -> str:
"""Process multiple items with progress tracking"""
results = []
for i, item in enumerate(items, 1):
logger.info(f"Processing {i}/{len(items)}: {item}")
try:
result = process_item(item)
results.append({"item": item, "status": "success", "result": result})
except Exception as e:
results.append({"item": item, "status": "error", "error": str(e)})
return _format_results(results)
Troubleshooting
Common Issues and Solutions
- Import Errors: Ensure all imports are inside the function/method body
- Type Errors: Verify all parameters have proper type hints
- Agent Can't Find Tool: Check tool is properly imported and added to the tools list
- Tool Not Being Called: Improve docstring clarity and parameter descriptions
- Async Issues: Wrap async code with
asyncio.run()
Debugging Tips
# 1. Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)
# 2. Add debug prints in your tool
@tool
def debug_tool(input: str) -> str:
"""Tool with debug output"""
logger.debug(f"Received input: {input}")
logger.debug(f"Input type: {type(input)}")
# Process...
logger.debug(f"Returning result of type: {type(result)}")
return result
# 3. Test tools in isolation first
# 4. Use descriptive error messages
# 5. Log all external API calls
The Bottom Line
Smolagents tool development is straightforward yet powerful. Start with the @tool decorator for simple functions, graduate to Tool classes for complex operations, and always prioritize clear documentation and error handling. With these patterns, you can extend smolagents to interact with any system, making it one of the best AI agents frameworks for practical AI development projects.