How much cheaper is this approach?

In AWS benchmarks, this method reduced LLM inference costs by over 92% compared to feeding all tool definitions into the context window, primarily due to the massive reduction in input tokens.

Does S3 Vectors support real-time updates?

Yes, S3 Vectors supports near real-time updates. When you upload a new tool definition to S3, it is indexed and available for querying almost immediately.

Can I use this with LangGraph?

Absolutely. You can implement this as a custom 'Tool Retrieval' node in your LangGraph workflow before the 'Tool Selection' node.

Optimize AI Agent Tool Selection with Amazon S3 Vectors

The hallmark of a great AI agent is its toolbox. But what happens when that toolbox grows from 10 simple functions to 500+ complex APIs? Sending every single tool definition to the LLM (Large Language Model) in every prompt is a recipe for disaster: it explodes your token costs, increases latency, and confuses the model with irrelevant noise.

The solution is Semantic Tool Retrieval. Instead of showing the agent everything, we use vector search to show it only the top 5 tools relevant to the current task. In this guide, we'll implement this architecture using the newly released Amazon S3 Vectors integration with Bedrock Knowledge Bases. We'll build a "CloudOps Agent" capable of managing hundreds of infrastructure tasks efficiently.

The Scaling Problem

Context Limits: 500 tool definitions can easily exceed 30k tokens.
Cost: Paying for those tokens on every turn is financially unsustainable.
Accuracy: LLMs struggle to "pick the needle from the haystack" when presented with hundreds of similar options.

Step 1: Defining the Tool Corpus

First, we treat our tools as documents. Each tool definition (JSON schema) is a "document" that we will embed and store in S3. Here is a sample of our CloudOps toolset.

tools_corpus.json

[
  {
    "tool_name": "restart_ec2_instance",
    "description": "Restarts a specific EC2 instance given its ID. Use this when a server is unresponsive.",
    "parameters": { "instance_id": "string" }
  },
  {
    "tool_name": "fetch_cloudwatch_metrics",
    "description": "Retrieves CPU, Memory, or Disk metrics for a resource over a time period.",
    "parameters": { "namespace": "string", "metric_name": "string" }
  },
  {
    "tool_name": "scale_auto_scaling_group",
    "description": "Updates the desired capacity of an Auto Scaling Group.",
    "parameters": { "asg_name": "string", "capacity": "integer" }
  }
  // ... imagine 500 more tools like this
]

Step 2: The Retrieval Workflow

Instead of hardcoding tools in the system prompt, we insert a retrieval step. When the user asks a question, we first query the Bedrock Knowledge Base (backed by S3 Vectors) to find relevant JSON schemas.

agent_tool_retrieval.py

import boto3
import json

bedrock_agent_runtime = boto3.client('bedrock-agent-runtime')
bedrock_runtime = boto3.client('bedrock-runtime')

def get_relevant_tools(user_query, kb_id):
    """Retrieves top 5 relevant tools from S3 Vector KB"""
    response = bedrock_agent_runtime.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={'text': user_query},
        retrievalConfiguration={
            'vectorSearchConfiguration': {'numberOfResults': 5}
        }
    )
    
    # Parse the JSON tool definitions from the retrieved text
    tools = []
    for result in response['retrievalResults']:
        # Assuming the tool definition is stored directly in the content
        tools.append(json.loads(result['content']['text']))
        
    return tools

def run_agent_turn(user_query, kb_id):
    # 1. Retrieve the tools
    relevant_tools = get_relevant_tools(user_query, kb_id)
    print(f"DEBUG: Retrieved {len(relevant_tools)} tools: {[t['tool_name'] for t in relevant_tools]}")
    
    # 2. Construct the prompt with ONLY these tools
    system_prompt = f"""
    You are a CloudOps Assistant. You have access to the following tools:
    {json.dumps(relevant_tools, indent=2)}
    
    Select the best tool to answer the user's request.
    """
    
    # 3. Call the LLM
    response = bedrock_runtime.converse(
        modelId='anthropic.claude-3-sonnet-20240229-v1:0',
        messages=[{'role': 'user', 'content': [{'text': user_query}]}],
        system=[{'text': system_prompt}]
    )
    
    return response['output']['message']

Step 3: Why S3 Vectors?

You might wonder, "Why not use OpenSearch Serverless or Pinecone?" For this specific use case—tool selection—S3 Vectors offers a compelling advantage: Simplicity and Cost.

Tool libraries are typically small datasets (thousands of records, not millions). S3 Vectors allows you to skip the complex cluster management of a dedicated vector database. You simply drop your JSON files in a bucket, enable the integration, and Bedrock handles the indexing automatically.

Conclusion

By decoupling tool storage from the agent's context window, we've built a "CloudOps Agent" that can theoretically scale to 10,000 tools without getting slower or more expensive. The S3 Vector integration makes this architecture accessible without needing a dedicated team to manage your vector infrastructure.

Scaling Your AI Agents?

Building enterprise-grade agents requires more than just prompt engineering. Let our architects help you design scalable, cost-effective retrieval systems.

Talk to a practice lead

Build this for your business?

We have shipped 50+ production AI agents for US enterprises since 2023 — SOC 2 Type II, audit logs, gated rollouts. Free 30-min architecture call below, no sales sequence.

Book a free 30-min AI call → AI Agent Development hub →

Related resources

The Scaling Problem

Context Limits: 500 tool definitions can easily exceed 30k tokens.
Cost: Paying for those tokens on every turn is financially unsustainable.
Accuracy: LLMs struggle to "pick the needle from the haystack" when presented with hundreds of similar options.

Step 1: Defining the Tool Corpus

First, we treat our tools as documents. Each tool definition (JSON schema) is a "document" that we will embed and store in S3. Here is a sample of our CloudOps toolset.

tools_corpus.json

[
  {
    "tool_name": "restart_ec2_instance",
    "description": "Restarts a specific EC2 instance given its ID. Use this when a server is unresponsive.",
    "parameters": { "instance_id": "string" }
  },
  {
    "tool_name": "fetch_cloudwatch_metrics",
    "description": "Retrieves CPU, Memory, or Disk metrics for a resource over a time period.",
    "parameters": { "namespace": "string", "metric_name": "string" }
  },
  {
    "tool_name": "scale_auto_scaling_group",
    "description": "Updates the desired capacity of an Auto Scaling Group.",
    "parameters": { "asg_name": "string", "capacity": "integer" }
  }
  // ... imagine 500 more tools like this
]

Step 2: The Retrieval Workflow

agent_tool_retrieval.py

import boto3
import json

bedrock_agent_runtime = boto3.client('bedrock-agent-runtime')
bedrock_runtime = boto3.client('bedrock-runtime')

def get_relevant_tools(user_query, kb_id):
    """Retrieves top 5 relevant tools from S3 Vector KB"""
    response = bedrock_agent_runtime.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={'text': user_query},
        retrievalConfiguration={
            'vectorSearchConfiguration': {'numberOfResults': 5}
        }
    )
    
    # Parse the JSON tool definitions from the retrieved text
    tools = []
    for result in response['retrievalResults']:
        # Assuming the tool definition is stored directly in the content
        tools.append(json.loads(result['content']['text']))
        
    return tools

def run_agent_turn(user_query, kb_id):
    # 1. Retrieve the tools
    relevant_tools = get_relevant_tools(user_query, kb_id)
    print(f"DEBUG: Retrieved {len(relevant_tools)} tools: {[t['tool_name'] for t in relevant_tools]}")
    
    # 2. Construct the prompt with ONLY these tools
    system_prompt = f"""
    You are a CloudOps Assistant. You have access to the following tools:
    {json.dumps(relevant_tools, indent=2)}
    
    Select the best tool to answer the user's request.
    """
    
    # 3. Call the LLM
    response = bedrock_runtime.converse(
        modelId='anthropic.claude-3-sonnet-20240229-v1:0',
        messages=[{'role': 'user', 'content': [{'text': user_query}]}],
        system=[{'text': system_prompt}]
    )
    
    return response['output']['message']

Step 3: Why S3 Vectors?

You might wonder, "Why not use OpenSearch Serverless or Pinecone?" For this specific use case—tool selection—S3 Vectors offers a compelling advantage: Simplicity and Cost.

Conclusion

Scaling Your AI Agents?

Building enterprise-grade agents requires more than just prompt engineering. Let our architects help you design scalable, cost-effective retrieval systems.

Talk to a practice lead

Build this for your business?

We have shipped 50+ production AI agents for US enterprises since 2023 — SOC 2 Type II, audit logs, gated rollouts. Free 30-min architecture call below, no sales sequence.

Book a free 30-min AI call → AI Agent Development hub →

Related resources

Scalable Agent Architecture: Optimizing Tool Selection with Amazon S3 Vectors

Step 1: Defining the Tool Corpus

Step 2: The Retrieval Workflow

Step 3: Why S3 Vectors?

Conclusion

Scaling Your AI Agents?

Build this for your business?

Need this implemented in your project?

Take the guide with you

Book a 30-min architecture call

Get a free 48-hour written brief

Scalable Agent Architecture: Optimizing Tool Selection with Amazon S3 Vectors

Step 1: Defining the Tool Corpus

Step 2: The Retrieval Workflow

Step 3: Why S3 Vectors?

Conclusion

Scaling Your AI Agents?

Build this for your business?

Need this implemented in your project?

Take the guide with you

Book a 30-min architecture call

Get a free 48-hour written brief