Scalable Agent Architecture: Optimizing Tool Selection with Amazon S3 Vectors
By Braincuber Team
Published on February 11, 2026
The hallmark of a great AI agent is its toolbox. But what happens when that toolbox grows from 10 simple functions to 500+ complex APIs? Sending every single tool definition to the LLM (Large Language Model) in every prompt is a recipe for disaster: it explodes your token costs, increases latency, and confuses the model with irrelevant noise.
The solution is Semantic Tool Retrieval. Instead of showing the agent everything, we use vector search to show it only the top 5 tools relevant to the current task. In this guide, we'll implement this architecture using the newly released Amazon S3 Vectors integration with Bedrock Knowledge Bases. We'll build a "CloudOps Agent" capable of managing hundreds of infrastructure tasks efficiently.
The Scaling Problem
- Context Limits: 500 tool definitions can easily exceed 30k tokens.
- Cost: Paying for those tokens on every turn is financially unsustainable.
- Accuracy: LLMs struggle to "pick the needle from the haystack" when presented with hundreds of similar options.
Step 1: Defining the Tool Corpus
First, we treat our tools as documents. Each tool definition (JSON schema) is a "document" that we will embed and store in S3. Here is a sample of our CloudOps toolset.
[
{
"tool_name": "restart_ec2_instance",
"description": "Restarts a specific EC2 instance given its ID. Use this when a server is unresponsive.",
"parameters": { "instance_id": "string" }
},
{
"tool_name": "fetch_cloudwatch_metrics",
"description": "Retrieves CPU, Memory, or Disk metrics for a resource over a time period.",
"parameters": { "namespace": "string", "metric_name": "string" }
},
{
"tool_name": "scale_auto_scaling_group",
"description": "Updates the desired capacity of an Auto Scaling Group.",
"parameters": { "asg_name": "string", "capacity": "integer" }
}
// ... imagine 500 more tools like this
]
Step 2: The Retrieval Workflow
Instead of hardcoding tools in the system prompt, we insert a retrieval step. When the user asks a question, we first query the Bedrock Knowledge Base (backed by S3 Vectors) to find relevant JSON schemas.
import boto3
import json
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime')
bedrock_runtime = boto3.client('bedrock-runtime')
def get_relevant_tools(user_query, kb_id):
"""Retrieves top 5 relevant tools from S3 Vector KB"""
response = bedrock_agent_runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={'text': user_query},
retrievalConfiguration={
'vectorSearchConfiguration': {'numberOfResults': 5}
}
)
# Parse the JSON tool definitions from the retrieved text
tools = []
for result in response['retrievalResults']:
# Assuming the tool definition is stored directly in the content
tools.append(json.loads(result['content']['text']))
return tools
def run_agent_turn(user_query, kb_id):
# 1. Retrieve the tools
relevant_tools = get_relevant_tools(user_query, kb_id)
print(f"DEBUG: Retrieved {len(relevant_tools)} tools: {[t['tool_name'] for t in relevant_tools]}")
# 2. Construct the prompt with ONLY these tools
system_prompt = f"""
You are a CloudOps Assistant. You have access to the following tools:
{json.dumps(relevant_tools, indent=2)}
Select the best tool to answer the user's request.
"""
# 3. Call the LLM
response = bedrock_runtime.converse(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
messages=[{'role': 'user', 'content': [{'text': user_query}]}],
system=[{'text': system_prompt}]
)
return response['output']['message']
Step 3: Why S3 Vectors?
You might wonder, "Why not use OpenSearch Serverless or Pinecone?" For this specific use case—tool selection—S3 Vectors offers a compelling advantage: Simplicity and Cost.
Tool libraries are typically small datasets (thousands of records, not millions). S3 Vectors allows you to skip the complex cluster management of a dedicated vector database. You simply drop your JSON files in a bucket, enable the integration, and Bedrock handles the indexing automatically.
Conclusion
By decoupling tool storage from the agent's context window, we've built a "CloudOps Agent" that can theoretically scale to 10,000 tools without getting slower or more expensive. The S3 Vector integration makes this architecture accessible without needing a dedicated team to manage your vector infrastructure.
Scaling Your AI Agents?
Building enterprise-grade agents requires more than just prompt engineering. Let our architects help you design scalable, cost-effective retrieval systems.
