How to Use Supermemory for Persistent AI Agent Memory Guide
By Braincuber Team
Published on May 12, 2026
Most LLM agents restart from zero on every run. They forget the user's name, their last conversation, and the file they were working on. Anything user-specific has to be re-established turn by turn. This complete tutorial shows you how to add persistent memory to an AI agent so the next run starts where the last one ended. The memory layer is Supermemory, a hosted API that stores per-user facts and returns them in one call. By the end of this beginner guide, you will have built a personal exercise trainer in Python that logs workouts, remembers preferences, and suggests the next session across separate script runs.
What You Will Learn:
- What Supermemory is and how it differs from plain vector databases and RAG
- How to set up a Supermemory account and get your API key
- How to initialise a Python project with the Supermemory SDK and OpenAI Agents SDK
- How the static-vs-dynamic profile split drives personalised agent behaviour
- How to build two memory-backed agent tools: log_workout and suggest_next_session
- How to verify multi-session recall across separate Python process runs
- Next steps for scoping memory per real user and handling production failures
What Is Supermemory?
Supermemory is an AI memory API for agents. When you hand Supermemory strings about your user, it later returns a compact view of who that user is and what they have recently been doing. Embedding, indexing, and retrieval all run inside Supermemory, so your agent code stays small. The LongMemEval benchmark tests how well a memory system answers questions over a long conversation history. Supermemory recalls 81.6% of the right facts. Zep, the next-best system, scores 71.2%, a 10-point gap that translates to roughly 1 extra correct answer per 10 user questions. The open-source repository has 22k+ GitHub stars, another signal of real-world use.
Memory vs RAG: Two Different Jobs
Most developers reaching for an agent memory tool have used RAG before. It helps to place Supermemory next to it. RAG and memory solve different problems, and they often live in the same agent.
| Aspect | RAG | Supermemory |
|---|---|---|
| Data Source | Document corpus (product manuals, support articles) | Per-user facts (preferences, history, activity) |
| Update Frequency | Loaded at deploy time, rarely changes | Grows with every conversation |
| Purpose | Answer questions the product knows the answer to | Answer questions only the user can answer |
| Example Query | "What is your refund policy?" | "What was my bench press last week?" |
In a real product, the two run side-by-side. RAG over a company knowledge base answers product questions. Supermemory over the user answers personal questions. Same agent, two data stores, two jobs.
User Profiles: Static and Dynamic Facts
Supermemory's main idea is the user profile. Every log gets sorted into two buckets: static facts that rarely change, and dynamic facts about current activity. Recurring patterns get promoted into static. Recent activity stays in the dynamic. When the agent reads the profile, one call returns both buckets plus the matching memory chunks.
| Static Facts | Dynamic Facts |
|---|---|
| Trains at home with dumbbells and a pull-up bar | Current focus: upper body strength |
| Left knee injury, no deep squats | Last bench: 4 sets of 5 reps at 185 lb |
| Wants to add 20 lb to bench by year-end | Working on grease-the-groove pull-ups this week |
| Trains evenings only, never mornings | Ran 5k in 28 minutes yesterday |
The split matters because static and dynamic facts answer different questions about the same user. A workout suggester needs both. The static side rules out gym-only exercises. The dynamic side picks today's session.
Embedding & Indexing
Every raw memory chunk is embedded and indexed automatically. You never touch a vector or a dimension count in your code.
Semantic Search
Similarity search runs at read time with a single query string. Results come back sorted with relevance scores.
Profile Extraction
Supermemory rewrites raw sentences into first-person facts. "The user trains at home" becomes "Trains at home instead of a gym."
Container Tag Scoping
Every memory is tagged with a string you choose. Every read passes the same tag back. Memory stays scoped per user.
Setting Up Your Supermemory Environment
The trainer needs two API keys (Supermemory and OpenAI) and a Python project with three dependencies. A quick round-trip script proves both keys work before any agent code goes near them.
Getting Your API Keys
The Supermemory API key lives at console.supermemory.ai, NOT at app.supermemory.ai. The app subdomain is the consumer memory product for saving notes and browsing your space. It has no API key page. Skip it and go straight to the console.
Create a Supermemory API Key
Go to console.supermemory.ai. Sign in. Click API Keys in the sidebar. Click Create API Key. Name it (use datacamp-tutorial for this demo). Copy the resulting key. It starts with sm_.
Get Your OpenAI API Key
You need an OpenAI key for the agent's LLM calls. Grab one at platform.openai.com/api-keys if you do not have one already. Supermemory's free tier covers this tutorial without entering payment info.
Create the .env File
Create a .env file in your project root with both keys. Do not commit this file to version control.
SUPERMEMORY_API_KEY=sm_your_key_here
OPENAI_API_KEY=sk-your_key_here
Important: Console vs App
The API key page lives at console.supermemory.ai, not app.supermemory.ai. The app subdomain is the consumer product with no developer API access. Go directly to the console.
Installing Dependencies
This complete tutorial uses uv for project setup and execution. If you do not have uv, install it once with the one-liner from astral.sh/uv. You need Python 3.10 or newer.
uv init supermemory-trainer
cd supermemory-trainer
uv add supermemory==3.37.0 openai-agents python-dotenv
Three dependencies: supermemory==3.37.0 is the memory client pinned to the version verified for this tutorial. openai-agents is the OpenAI Agents SDK. python-dotenv reads the .env file. The resulting pyproject.toml should look like this:
[project]
name = "supermemory-trainer"
version = "0.1.0"
description = "Personal exercise trainer agent built with Supermemory and the OpenAI Agents SDK."
requires-python = ">=3.10"
dependencies = [
"openai-agents>=0.10.2",
"python-dotenv>=1.2.1",
"supermemory==3.37.0",
]
Verifying Your Setup with a Warm-Up Script
Before writing any agent code, verify Supermemory works on a single sentence. The script below sends one fact, waits for the pipeline, then reads the profile back. If this runs cleanly, the keys work and the SDK is reachable. The output also gives you a first look at what Supermemory does with raw text.
import time
from dotenv import load_dotenv
from supermemory import Supermemory
load_dotenv()
client = Supermemory()
USER_ID = "demo_warmup"
response = client.add(
content="The user is learning Supermemory by building a personal trainer agent.",
container_tag=USER_ID,
)
print(f"client.add() -> id={response.id} status={response.status}")
Now add the wait and the read at the bottom of the same file:
print("Waiting 20 seconds for processing...")
time.sleep(20)
prof = client.profile(container_tag=USER_ID, q="learning")
print(f"profile.static ({len(prof.profile.static)}): {prof.profile.static}")
print(f"profile.dynamic ({len(prof.profile.dynamic)}): {prof.profile.dynamic}")
print(f"search_results.results ({len(prof.search_results.results)}):")
for r in prof.search_results.results[:3]:
print(f" - {r['memory']} (similarity={r['similarity']:.3f})")
The 20-second sleep gives Supermemory's embed-and-extract pipeline time to process the new memory. Without it, the read returns nothing and the script looks broken when it is not. Run the file:
uv run python hello.py
Expected output:
client.add() -> id=zNLsJBrY1PZupAeZ3Qn6EL status=queued
Waiting 20 seconds for processing...
profile.static (0): []
profile.dynamic (1): ['Building a personal trainer agent to learn Supermemory.']
search_results.results (1):
- Building a personal trainer agent to learn Supermemory. (similarity=0.650)
Three details matter. client.add() returns immediately with status="queued", since Supermemory processes documents asynchronously. The 20-second wait covers the embed-and-extract pipeline. The interesting line is profile.dynamic. The input was "The user is learning Supermemory by building a personal trainer agent." The output is "Building a personal trainer agent to learn Supermemory." Supermemory rewrote a third-person sentence into a first-person fact about the user. That is the profile extractor doing its job. Profile.static is empty because static facts consolidate slowly after a handful of related logs accumulate.
Building the Supermemory Agent: Personal Exercise Trainer
The trainer wraps client.add() and client.profile() in two agent tools, so reads and writes happen automatically as the user chats. Workout history fits memory well. Equipment, injuries, and recent lifts do not live in the LLM's training data, and they accumulate session by session.
Project Structure
The trainer is small enough that the whole project fits in two Python files plus the pyproject.toml:
supermemory-trainer/
.env # your real keys (gitignored)
.env.example # placeholders, committed
.gitignore
.python-version
main.py # agent definition, system prompt, REPL loop
pyproject.toml
tools.py # log_workout and suggest_next_session
tools.py holds the two memory-backed tools. log_workout writes a workout to Supermemory via client.add(). suggest_next_session reads the user's profile via client.profile(). main.py imports both and wires the agent.
Writing main.py: The Agent and System Prompt
One sentence in the system prompt does the Supermemory work: every fact about the user must come back through tool calls. The agent is told it has no memory of its own. That single rule is what makes the trainer memory-backed.
import asyncio
from agents import Agent, Runner, SQLiteSession
from tools import log_workout, suggest_next_session
SYSTEM_PROMPT = """You are a personal exercise trainer who logs the user's
workouts and recommends what to do next.
You have no memory of the user's history on your own. Every fact about the
user lives in Supermemory and reaches you only through tool calls.
Two rules, no exceptions:
1. Whenever the user reports completing a workout, call log_workout
immediately, before responding. Extract the exercise, sets, reps, weight,
and any notes from what they said. If a value is missing, ask one short
follow-up question instead of guessing. After logging, confirm in one short
sentence and stop. Do NOT recommend the next session unless the user asks
for one.
2. When the user explicitly asks what to do next (or asks for a
recommendation, suggestion, or plan), call suggest_next_session first.
Never recommend from your own training data. The tool returns the user's
recent activity, stable preferences, and matching past sessions. Reference
those facts directly in your reply.
Keep replies concise (2-4 sentences). Be specific: name the exercise, sets,
reps, and weight. Honor any injuries or equipment constraints the tool
surfaces.
"""
def build_agent() -> Agent:
return Agent(
name="Trainer",
instructions=SYSTEM_PROMPT,
tools=[log_workout, suggest_next_session],
model="gpt-5",
)
async def chat() -> None:
agent = build_agent()
session = SQLiteSession(session_id="trainer-cli")
print("Trainer ready. Type a message, or 'exit' to quit.\n")
while True:
try:
message = input("You: ").strip()
except (EOFError, KeyboardInterrupt):
print()
break
if not message:
continue
if message.lower() in {"exit", "quit"}:
break
result = await Runner.run(agent, message, session=session)
print(f"\nTrainer: {result.final_output}\n")
if __name__ == "__main__":
asyncio.run(chat())
Both rules in the system prompt route the model through Supermemory. Rule 1 forces a log_workout write whenever the user reports a workout, so every workout reaches the memory store. Rule 2 forces a suggest_next_session read before any recommendation, so every recommendation is grounded in what Supermemory knows. Skip those rules and the agent answers from its training data, which defeats the point of a memory layer.
Important: Jupyter vs Script
Run main.py as a script, not in a Jupyter cell, since Jupyter's event loop conflicts with asyncio.run(). The synchronous Supermemory client works inside async tool functions because the Agents SDK runs tools in a thread pool.
Writing tools.py: The Memory-Backed Tools
Start with the imports and a single shared client. load_dotenv() runs at import time so SUPERMEMORY_API_KEY is in the environment before Supermemory() is constructed. Both tool functions share one client and one USER_ID constant.
from agents import function_tool
from dotenv import load_dotenv
from supermemory import Supermemory
load_dotenv()
USER_ID = "demo_user"
client = Supermemory()
The log_workout Tool
log_workout is the write side of the agent's memory. The function takes structured arguments from the agent (exercise name, sets, reps, weight, optional notes), turns them into one short English sentence, and hands the sentence to Supermemory through client.add(). The embed-and-extract pipeline runs inside Supermemory after that and needs nothing from the trainer.
@function_tool
def log_workout(
exercise: str,
sets: int,
reps: int,
weight: float,
notes: str = "",
) -> str:
"""Log a completed workout to the user's memory.
Args:
exercise: Name of the exercise.
sets: Number of sets performed.
reps: Number of reps per set.
weight: Weight in pounds. Pass 0 for bodyweight or cardio.
notes: Optional notes about the session.
"""
print(f"[log_workout] {exercise=} {sets=} {reps=} {weight=} {notes=}")
content = f"Performed {exercise}: {sets} sets of {reps} reps at {weight} lbs."
if notes:
content += f" Notes: {notes}"
response = client.add(content=content, container_tag=USER_ID)
print(f"[log_workout] -> id={response.id} status={response.status}")
return f"Logged {exercise} ({sets}x{reps} @ {weight} lb)."
The @function_tool docstring is what the LLM sees when it decides whether to call the tool. The Args block maps to per-parameter descriptions. Both are part of the agent's contract with the function.
The tool sends a plain sentence to client.add(), not JSON. Supermemory's profile extractor reads natural language and infers facts from it. JSON technically works, but the extraction quality drops because the model has no narrative to summarise. "Performed bench press: 4 sets of 5 reps at 185.0 lbs" gives the extractor a clean sentence to work with.
The suggest_next_session Tool
suggest_next_session is the read side, and this is where the static-and-dynamic split pays off. One client.profile(container_tag=USER_ID, q=focus) call returns three views of the user in a single round trip. Stable preferences come back as profile.static, current activity as profile.dynamic, and the closest matching past memories as search_results.results. The tool's job is to flatten those three views into one block of context the agent can quote.
@function_tool
def suggest_next_session(focus: str) -> str:
"""Fetch the user's training history and preferences for a given focus.
Returns a context string the agent can use to recommend the next session.
The agent is responsible for the actual recommendation. This tool only
surfaces what Supermemory knows about the user.
Args:
focus: What the user wants to train next (e.g. "upper body", "legs",
"cardio", "today"). Drives semantic search against past logs.
"""
print(f"[suggest_next_session] focus={focus!r}")
profile = client.profile(container_tag=USER_ID, q=focus)
static_facts = profile.profile.static
dynamic_facts = profile.profile.dynamic
matches = profile.search_results.results
print(
f"[suggest_next_session] static={len(static_facts)} "
f"dynamic={len(dynamic_facts)} matches={len(matches)}"
)
sections = []
if static_facts:
sections.append("Stable preferences and constraints:")
sections.extend(f"- {fact}" for fact in static_facts)
if dynamic_facts:
sections.append("Recent activity:")
sections.extend(f"- {fact}" for fact in dynamic_facts)
if matches:
sections.append("Closest matching past entries:")
for r in matches[:5]:
sections.append(f"- {r['memory']}")
if not sections:
return (
"No prior training history found for this user. "
"Ask the user about their goals, equipment, and recent training."
)
return "\n".join(sections)
Bracket Access vs Attribute Access
In supermemory==3.37.0, each search result is a Python dict, not a Pydantic object. Use r["memory"] for the text and r["similarity"] for the score. Attribute access like r.memory raises AttributeError in this version.
Running the Agent: Two Sessions Across Processes
Session 1: Logging Workouts
Start session 1 and log a few workouts to fill Supermemory with something to read back later. Run the script:
uv run python main.py
Log bench press, then a 5k run, then deadlift, plus one preference statement: "I only train at home, no gym." The agent fires log_workout once per workout, and the tool's print lines make every call visible in the terminal. The three status=queued lines are the moment Supermemory takes over. Each one corresponds to a document moving through the embed-and-extract pipeline on Supermemory's side. For short text logs like these, the document becomes searchable through client.profile() within about 12 seconds.
Type exit to close session 1. The Python process ends, and the SQLiteSession is gone with it. The workout logs and the preference statement now live in Supermemory under container_tag="demo_user", separate from the script that wrote them.
Verifying Recall Between Sessions
Before session 2, confirm that the facts from session 1 are queryable. Open a fresh Python REPL or save this as a short verification script:
from dotenv import load_dotenv
from supermemory import Supermemory
load_dotenv()
client = Supermemory()
prof = client.profile(container_tag="demo_user", q="training")
print(f"static ({len(prof.profile.static)}): {prof.profile.static}")
print(f"dynamic ({len(prof.profile.dynamic)}):")
for fact in prof.profile.dynamic:
print(f" - {fact}")
print(f"matches ({len(prof.search_results.results)}):")
for r in prof.search_results.results[:5]:
print(f" - {r['memory']} (similarity={r['similarity']:.3f})")
Expected output captured between the two sessions:
static (0): []
dynamic (5):
- Trains at home instead of a gym
- Performed deadlift: 3 sets of 5 reps at 225.0 lbs
- Performed 5k run in 26 minutes
- Reports no knee pain during bench press
- Performed bench press: 4 sets of 5 reps at 185.0 lbs
matches (5):
- Trains at home instead of a gym (similarity=0.682)
- Performed deadlift: 3 sets of 5 reps at 225.0 lbs (similarity=0.643)
- Performed bench press: 4 sets of 5 reps at 185.0 lbs (similarity=0.631)
- Performed 5k run in 26 minutes (similarity=0.585)
- Reports no knee pain during bench press (similarity=0.585)
Look at what Supermemory's extractor produced. The user said "I only train at home, no gym," once. The extractor turned that into the dynamic fact "Trains at home instead of a gym." The bench press log included a notes field about no knee pain. The extractor split that single log into two dynamic facts: one for the workout, one for the absence of pain. Four logs became five normalised dynamic facts plus five matching memory chunks with similarity scores between 0.585 and 0.682. None of that splitting, normalisation, or matching ran in the trainer's code.
Session 2: Fresh Process, Full Recall
Now start session 2 in a brand-new process. This is a fresh Python interpreter. No shared memory with session 1. No warm cache. Anything the agent recalls comes from Supermemory.
uv run python main.py
Send one message: "What should I do for my workout today?" The agent calls suggest_next_session("today"). The tool prints static=0 dynamic=5 matches=5. The captured run replied with a lower-body session at home (squats, lunges, step-ups). The recommendation lined up with the previous logs because Supermemory's profile told the agent what they were. Bench, deadlift, and a 5k were upper-body or cardio, and the user only trained at home. Both facts came back from the same client.profile() call.
Next Steps for Your Supermemory Agent
The demo is one user, two tools, and a CLI. A real version of the trainer extends in three Supermemory-shaped directions before it touches the agent loop.
Scope Memory Per Real User
Replace USER_ID = "demo_user" with computed tags like container_tag="user_sarah" or container_tag=customer_id. Memory between users stays separate because every read passes the tag back. One change in tools.py, no other code moves.
Add More Memory-Backed Tools
Deload weeks, PR tracking, weekly mobility prompts. Each one is another @function_tool function that calls client.add() for writes and client.profile() for reads against the same container_tag. The shape stays the same. Only what the agent records and asks for changes.
Handle Supermemory Failures
Wrap client.add() and client.profile() in try/except supermemory.APIError so transient failures do not crash the agent. Set per-request timeouts if your agent runs in a constrained environment.
Beyond the Trainer: Other Use Cases for the Static-and-Dynamic Split
The static-and-dynamic split that powers the workout suggester also fits other domains:
| Domain | Static Facts | Dynamic Facts |
|---|---|---|
| Customer Support Agent | Known issues, account preferences | Open tickets, recent contacts |
| Coding Agent | Preferred languages, frameworks | Current task, recently-touched files |
| Learning Tutor | Primary learning style, weak topics | Last lesson, quiz scores this week |
The agent-loop side of the work is independent of Supermemory and can change later. Front the CLI with Telegram, Discord, or Slack, so the user texts a workout and the bot calls Runner.run(). Or swap the framework. Supermemory has a LangChain integration if your stack is already on LangChain agents, and the memory code does not change. Pair it with RAG and the same agent answers questions about the user and the product.
Frequently Asked Questions
What is Supermemory?
Supermemory is a hosted memory API that stores, indexes, and retrieves long-term context for AI agents so they can remember users across sessions with one API call.
How is Supermemory different from plain vector databases?
Supermemory adds embeddings, indexing, semantic search, and profile-style fact extraction on top of storage, so you get usable memories back with one API call instead of managing vector operations yourself.
Can Supermemory handle both RAG and user memory?
Yes, it supports document-style RAG over files and URLs as well as user-centric memories, letting the same API power product knowledge and personal history in a single agent.
Do I need to manage my own embedding models with Supermemory?
No, Supermemory runs the embedding and retrieval pipeline for you. You send raw text or content and query it later without handling models or indexes yourself.
Is there a free tier for trying Supermemory?
Yes, there is a free plan intended for testing and small projects, with monthly token and query limits so you can integrate and experiment before upgrading.
Need Help with AI Agent Development?
Our experts can help you build memory-backed AI agents, integrate Supermemory with your stack, and design production-ready agent architectures.
