How to Build an AI Job Search Assistant with Kimi K2.6 API
By Braincuber Team
Published on May 8, 2026
Job hunting is overwhelming when you have to check ten different job boards, read thirty listings, and manually decide which ones actually fit your profile. This complete tutorial shows you how to build JobFit AI, an intelligent agent that reads your CV, searches live job postings, scrapes job pages, and generates a ranked fit report in under a minute. You will use Kimi K2.6, Moonshot AIs latest open-source agentic model, paired with Olostep for live web search and the OpenAI Agents SDK for tool orchestration.
What You Will Learn:
- How to set up the Kimi K2.6 API with an OpenAI-compatible client
- How to integrate Olostep for live job search and page scraping
- How to build a tool-using agent with the OpenAI Agents SDK
- How to extract text from PDF CVs using pypdf
- How to create a Gradio web interface for your agent
- How to structure agent instructions for reliable job-fit analysis
What is Kimi K2.6?
Kimi K2.6 is Moonshot AIs latest open-weight agentic model, released in April 2026. It is built on a Mixture-of-Experts (MoE) architecture with roughly 1 trillion total parameters, activating 32 billion per forward pass. The model is optimized for coding, tool use, and long-horizon agent tasks, making it ideal for building practical AI applications that need reasoning, tool calling, and structured outputs.
Kimi K2.6 supports a 262,144-token context window and scores 80.2 on SWE-Bench Verified, placing it just behind closed models like Claude Opus 4.6 and Gemini 3.1 Pro. On BrowseComp, it scores 83.2, rising to 86.3 with Agent Swarm mode. The model is available through Kimi.com, the Kimi app, Kimi Code, and the API, making it accessible for both experimentation and production deployment.
For this tutorial, you use Kimi K2.6 through Moonshot AIs OpenAI-compatible API endpoint. This means any tool that works with the OpenAI API such as the OpenAI Agents SDK can work with Kimi K2.6 by simply changing the base URL and model name.
Prerequisites
| Requirement | Details |
|---|---|
| Python | 3.11 or higher |
| Kimi API Key | At least $5 credit in your Moonshot AI account |
| Olostep API Key | Free account at olostep.com (Starter plan $9/mo with 5,000 requests) |
| CV File | Your resume in PDF format |
| Basic Python Knowledge | Familiarity with Python, APIs, and async programming |
Tool-Use Optimized
Kimi K2.6 excels at calling external tools and APIs. It can search the web, scrape pages, and process results in a single agentic loop without losing context.
Long Context Window
The 256K token context window handles entire CVs, multiple job listings, and detailed analysis all in one session without needing chunking or summarization.
OpenAI-Compatible API
Kimi K2.6 works with any OpenAI-compatible client library. You can use the OpenAI Python SDK, LangChain, or the OpenAI Agents SDK with minimal configuration changes.
Cost-Effective Agentic AI
Input tokens at $0.95/1M (cache miss) and output at $4.00/1M tokens. Third-party providers offer even lower rates starting around $0.60/1M input and $2.80/1M output.
Step 1: Set Up Your Python Environment
Create a new project folder and install the required packages. Open your terminal and run:
mkdir JobFit-AI
cd JobFit-AI
pip install gradio openai pypdf openai-agents
Package overview: gradio creates the web interface, openai connects to OpenAI-compatible APIs, pypdf extracts text from PDF files, and openai-agents provides the framework for building tool-using AI agents.
Next, create accounts and generate your API keys. Sign up for a free Olostep account at olostep.com and generate an API key from the dashboard. The Starter plan at $9/month includes 5,000 requests, enough to test and deploy your app. Then go to the Kimi platform at platform.kimi.ai, add at least $5 credit, and generate your API key.
Set your API keys as environment variables. The MOONSHOT_API_KEY accesses the Kimi K2.6 API, while OLOSTEP_API_KEY powers the web search and scraping tools.
Define the Project Configuration
Launch Jupyter Notebook and add the required imports and project configuration. The KIMI_MODEL and KIMI_BASE_URL values tell the app to use Kimi K2.6 through Moonshot AIs OpenAI-compatible endpoint. The Olostep URLs are used for live job search and page scraping.
import json
import os
import requests
from agents import Agent, AsyncOpenAI, ModelSettings, OpenAIChatCompletionsModel, RunConfig, Runner, function_tool, set_tracing_disabled
from IPython.display import Markdown, display
from pypdf import PdfReader
KIMI_MODEL = "kimi-k2.6"
KIMI_BASE_URL = "https://api.moonshot.ai/v1"
OLOSTEP_SEARCH_URL = "https://api.olostep.com/v1/searches"
OLOSTEP_SCRAPE_URL = "https://api.olostep.com/v1/scrapes"
MAX_AGENT_TURNS = 25
cv_path = "abid-resume.pdf"
preferences = """
Remote data science, AI writer, or technical writer roles in AI, machine learning, data science, or cloud.
Prefer technical content, tutorials, developer education, research writing, and AI product storytelling.
""".strip()
set_tracing_disabled(True)
The cv_path variable points to your resume PDF. Make sure the file is saved in the same project folder or update the path. The preferences variable tells the agent what types of jobs to search for. Customize this based on your target role, industry, location, and seniority level. Tracing is disabled because the OpenAI Agents SDK tracing feature is designed for OpenAIs backend and causes issues with third-party model providers.
Connect to Kimi K2.6 API via OpenAI-Compatible Client
Create an AsyncOpenAI client pointing to Moonshots API endpoint, then wrap it in an OpenAIChatCompletionsModel so the OpenAI Agents SDK can use Kimi K2.6 as its underlying model. Configure ModelSettings with tool_choice="auto", parallel_tool_calls=True, and disable thinking mode for cleaner structured output.
kimi_client = AsyncOpenAI(
api_key=os.environ["MOONSHOT_API_KEY"],
base_url=KIMI_BASE_URL,
)
kimi_model = OpenAIChatCompletionsModel(
model=KIMI_MODEL,
openai_client=kimi_client,
)
model_settings = ModelSettings(
tool_choice="auto",
parallel_tool_calls=True,
extra_body={"thinking": {"type": "disabled"}},
)
run_config = RunConfig(
workflow_name="JobFit AI Kimi Search",
tracing_disabled=True,
)
Thinking Mode
Kimi K2.6 supports both a thinking mode (extended reasoning) and instant mode (faster responses). This tutorial disables thinking with extra_body={"thinking": {"type": "disabled"}} to keep the output cleaner and better suited for structured job-fit reports. Enable it if you need more detailed reasoning from the model.
Extract CV Text and Define Agent Instructions
Use PdfReader from pypdf to load your CV and extract text from each page. The [:12000] limit keeps the CV short enough to fit inside the agent prompt while providing enough context about experience, skills, and preferences.
reader = PdfReader(cv_path)
cv_text = "
".join(page.extract_text() or "" for page in reader.pages)[:12000]
print(f"Loaded {len(cv_text):,} characters from {cv_path}")
AGENT_INSTRUCTIONS = """
You are JobFit AI, a focused job-search agent.
Tool plan:
- Call search_jobs exactly once with limit 8.
- Read at most 3 direct job pages with read_job_page.
- After reading up to 3 pages, stop using tools and write the report.
- Search again only if the first search returns zero usable jobs.
- Avoid broad search pages, expired jobs, and LinkedIn unless no better source exists.
Report rules:
- Keep the report simple, clear, and practical.
- Use short bullets.
- Do not use em dashes.
- Do not use contractions.
- Do not add text before or after the report.
- End after the final Job Notes entry.
- Include at least 5 ranked jobs if the search results contain at least 5 usable jobs.
- If only 3 pages were scraped, use backup jobs from search results when they look usable.
- Every job must include a clickable Markdown link.
- Every job must have one apply decision: Apply, Maybe, or Do not apply.
"""
RUN_PROMPT_TEMPLATE = """
Find current job postings for this candidate and rank them by fit.
Keep the run simple:
- one search
- up to three page reads
- final report
The final report must follow AGENT_INSTRUCTIONS exactly.
Use simple wording. Do not use em dashes. Do not use contractions.
Candidate CV:
{cv_text}
Preferences:
{preferences}
""".strip()
The agent instructions are critical for reliable behavior. They limit the workflow to one job search, up to three job page reads, and a fixed Markdown report structure. The report rules require short bullets, clickable links, fit scores, and a clear apply decision for each role. Without these guardrails, the agent may search excessively or produce inconsistent output formats.
Add Live Web Search and Page Scraping Tools with Olostep
Create two function tools decorated with @function_tool that the agent can call. The first tool searches for job listings and returns compact JSON results. The second tool scrapes a specific job URL and returns the page content in Markdown format, limited to 8,000 characters to stay within processing limits.
@function_tool
def search_jobs(query: str, limit: int = 8) -> str:
"""Search the web for job listings and return compact JSON results."""
response = requests.post(
OLOSTEP_SEARCH_URL,
headers={"Authorization": f"Bearer {os.environ['OLOSTEP_API_KEY']}", "Content-Type": "application/json"},
json={"query": query},
timeout=60,
)
response.raise_for_status()
links = response.json().get("result", {}).get("links", [])[:limit]
results = [
{"title": item.get("title", "Untitled"), "url": item.get("url"), "description": item.get("description", "")}
for item in links
if isinstance(item, dict) and item.get("url")
]
return json.dumps(results, ensure_ascii=False)
@function_tool
def read_job_page(url: str) -> str:
"""Scrape one job listing URL and return markdown text."""
response = requests.post(
OLOSTEP_SCRAPE_URL,
headers={"Authorization": f"Bearer {os.environ['OLOSTEP_API_KEY']}", "Content-Type": "application/json"},
json={"url_to_scrape": url, "formats": ["markdown"]},
timeout=120,
)
response.raise_for_status()
markdown = response.json().get("result", {}).get("markdown_content") or ""
return markdown[:8000]
Create the JobFit AI Agent and Run the Workflow
Instantiate the Agent with the Kimi model, model settings, Olostep tools, and agent instructions. Use Runner.run_streamed() to start the agent workflow with streaming events. The stream lets you see when the agent searches for jobs, reads pages, and generates the final report. The loop prints progress updates to show tool calls and output sizes.
agent = Agent(
name="JobFit AI",
model=kimi_model,
model_settings=model_settings,
tools=[search_jobs, read_job_page],
instructions=AGENT_INSTRUCTIONS,
)
prompt = RUN_PROMPT_TEMPLATE.format(cv_text=cv_text, preferences=preferences)
print("Starting agent run")
result = Runner.run_streamed(
agent,
prompt,
max_turns=MAX_AGENT_TURNS,
run_config=run_config,
)
async for event in result.stream_events():
if event.type == "agent_updated_stream_event":
print(f"Agent: {event.new_agent.name}")
elif event.type == "run_item_stream_event":
item = event.item
if event.name == "tool_called":
raw = item.raw_item
tool_name = raw.get("name") if isinstance(raw, dict) else getattr(raw, "name", "tool")
arguments = raw.get("arguments") if isinstance(raw, dict) else getattr(raw, "arguments", "")
arguments = str(arguments).replace(chr(10), " ")[:500]
print(f"Tool call: {tool_name}")
if arguments:
print(f"Parameters: {arguments}")
elif event.name == "tool_output":
print(f"Tool output: {len(str(item.output)):,} chars")
elif event.name == "message_output_created":
print("Final message ready")
report = result.final_output
print("Run complete")
print(f"Model responses: {len(result.raw_responses)}")
print(f"Run items: {len(result.new_items)}")
print(f"Final output: {len(str(report)):,} chars")
Turn the Agent Into a Gradio Web App
After testing the workflow in a Jupyter notebook, you can package it into a simple Gradio web application. Create an app.py file that contains all the code from the notebook, wrapped with a Gradio interface. The app provides:
CV PDF Upload
Users upload their resume in PDF format. The app extracts text using pypdf and passes it to the agent as context.
Job Preferences Input
A text box lets users describe their target role type, industry, location, seniority level, and preferred topics.
Live Progress Log
A hidden progress log shows each step: reading CV, extracting text, calling tools, and receiving outputs.
Ranked Fit Report
The final Markdown report shows the best match, ranked jobs table, fit scores, apply decisions, concerns, and application angles.
Run the app with python app.py and open the displayed local URL (typically http://127.0.0.1:7860/) in your browser. Upload a CV, enter your job preferences, and click Generate JobFit Report. The app runs the full agent workflow and displays the report in under a minute.
Kimi K2.6 Model Specs and Pricing
| Specification | Value |
|---|---|
| Architecture | Mixture-of-Experts (MoE), ~1T total params, 32B active |
| Context Window | 262,144 tokens (256K) |
| SWE-Bench Verified | 80.2% |
| BrowseComp | 83.2% (86.3% with Agent Swarm) |
| Input Price (direct) | $0.95/1M tokens (cache miss), $0.16/1M (cache hit) |
| Output Price (direct) | $4.00/1M tokens |
| Third-Party Price | From $0.60/1M input, $2.80/1M output |
| Release Date | April 2026 |
Cost Optimization Tip
Kimi K2.6 offers a substantial cache hit discount ($0.16 vs $0.95 per 1M input tokens). For production deployments, structure your prompts to maximize cache hits by keeping system messages and instructions consistent across calls. Third-party providers like Together AI and Fireworks often offer lower rates than the direct API.
Performance Tuning and Agent Behavior
The JobFit AI agent is configured with a maximum of 25 turns. During testing, when the agent was allowed up to 25 turns, it searched and scraped multiple pages in depth but took around five minutes. To balance quality and speed, the workflow is restricted to one search and up to three page reads, generating a report in under a minute.
You can improve the quality of the job report by increasing the number of allowed steps, search results, or page reads. Increasing the agent limit to 30 turns with more page reads produces a deeper report with more roles and stronger recommendations, at the cost of longer runtime and higher API usage.
Agent Turn Limits
The agent may exceed the specified turn limit if it is in the middle of a tool call when the limit is reached. The OpenAI Agents SDK allows the current tool call to complete before terminating. Set MAX_AGENT_TURNS to 30-35 for production use to allow for more thorough job analysis.
Frequently Asked Questions
What makes Kimi K2.6 different from other open-source LLMs?
Kimi K2.6 uses a Mixture-of-Experts architecture with 1 trillion total parameters but only activates 32 billion per forward pass, making it efficient for deployment. It scores 80.2 on SWE-Bench Verified and 83.2 on BrowseComp, placing it among the top open-source models for agentic coding and web browsing tasks, behind only closed models like Claude Opus 4.6 and Gemini 3.1 Pro.
Can I use the OpenAI Agents SDK with non-OpenAI models?
Yes, the OpenAI Agents SDK supports any OpenAI-compatible API endpoint. You wrap the third-party model using OpenAIChatCompletionsModel with a custom client pointing to the providers base URL. This works with Kimi K2.6, DeepSeek, Qwen, and other models that expose an OpenAI-compatible API.
How do I get a Kimi API key for K2.6?
Go to platform.kimi.ai, create an account, add at least $5 in credits, and generate an API key from the dashboard. The key is used as the MOONSHOT_API_KEY environment variable with the base URL https://api.moonshot.ai/v1. The K2.6 model identifier is kimi-k2.6.
What is thinking mode in Kimi K2.6 and when should I use it?
Thinking mode enables extended chain-of-thought reasoning where the model shows its internal reasoning process before producing the final answer. Use it for complex problem-solving, code generation, and multi-step analysis. Disable it for faster responses when you need structured, formatted output like the job-fit report in this tutorial.
How reliable is the Olostep API for job search and scraping?
Olostep provides reliable web search and scraping with 5,000 requests per month on the Starter plan at $9. The search API returns structured results with titles, URLs, and descriptions. The scrape API converts pages to Markdown format. For production use, consider adding error handling and retry logic for occasional timeouts or rate limits.
Need Help Building AI Agents?
Our experts can help you design and deploy custom AI agents using Kimi K2.6, OpenAI Agents SDK, and other cutting-edge models for your specific business needs.
