How to Build a Repo Analyzer with Nemotron 3 Super: Complete Step by Step Guide
By Braincuber Team
Published on April 21, 2026
Long-context models unlock a different way to analyze codebases. Instead of breaking a repository into chunks, embedding them, and retrieving relevant pieces, you can pack an entire small-to-medium repo into a single prompt and let the model reason over it end-to-end. This complete tutorial shows you how to build a GitHub repository analyzer using NVIDIA Nemotron 3 Super with a Gradio interface.
What You'll Learn:
- What is NVIDIA Nemotron 3 Super and its architecture
- How to set up Nemotron 3 Super via OpenRouter
- Building repository cloning and filtering pipeline
- Implementing prompt packing for whole-codebase analysis
- Creating task schemas for structured JSON outputs
- Building a Gradio interface for the repo analyzer
What is NVIDIA Nemotron 3 Super?
Nemotron 3 Super is NVIDIA's open reasoning model for agentic and long-context workflows. It uses a hybrid Mamba-Transformer Latent MoE architecture with 120B total parameters and 12B active parameters, includes multi-token prediction (MTP), and is designed for reasoning-heavy tasks such as coding, tool use, and long-context analysis.
NVIDIA positions it specifically for complex multi-agent applications and whole-codebase reasoning. Some core ideas behind Nemotron 3 Super include:
Hybrid Mamba-Transformer
A hybrid backbone for long-sequence efficiency and precise retrieval.
Latent MoE
Routes tokens through more specialists at the same effective cost.
Multi-Token Prediction
MTP improves generation speed and reasoning quality.
1M Token Context
Native 1M-token context window in NVIDIA supported deployment.
Nemotron 3 Family Overview
NVIDIA's Nemotron 3 family was introduced as an open model family for agentic AI systems, with different sizes aimed at different deployment and reasoning needs.
| Model | Use Case |
|---|---|
| Nemotron 3 Nano | Lighter model for efficient targeted tasks and lower-cost inference |
| Nemotron 3 Super | Middle tier for dense technical reasoning, complex coding, and long-context analysis |
| Nemotron 3 Ultra | Larger tier for heavier reasoning and deployment scenarios |
For this tutorial, we will use OpenRouter's hosted free endpoint since both the full and quantized variants of the model typically require at least an H100 GPU for practical inference. The hosted endpoint currently supports a 262K token context window (instead of the full 1M), which is still sufficient for analyzing medium-sized repositories.
Step 1: Install Dependencies
Before building the repo analyzer, we need a minimal environment that can handle UI rendering, repository processing, and model interaction. Since we are not running Nemotron locally, the setup stays lightweight.
pip install gradio openai tiktoken gitpython orjson pandas
We use the following packages:
- openai - Interact with OpenRouter API
- gradio - Build the interactive UI
- tiktoken - Estimate token usage
- orjson - Fast JSON parsing
- pandas - Render structured results as tables
- gitpython - Clone and work with repositories
Step 2: Initialize the OpenRouter Client
Set up Nemotron 3 Super via OpenRouter and define core constants for repository processing.
OPENROUTER_API_KEY = os.environ.get("OPENROUTER_API_KEY", "")
MODEL = "nvidia/nemotron-3-super-120b-a12b:free"
WORKDIR = "/content/repo_ui"
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=OPENROUTER_API_KEY,
)
enc = tiktoken.get_encoding("cl100k_base")
Important Notes
OpenRouter logs both prompts and outputs, so use only with public repositories. The available context window is ~262K tokens, so large repositories may require filtering.
Step 3: Repo Filters and System Prompt
We need two safeguards: file filtering and prompt grounding. File filtering prevents wasting context on irrelevant artifacts, while prompt grounding ensures the model stays anchored to the actual code.
BASE_IGNORE_DIRS = {".git", ".github", "__pycache__", "node_modules", ".venv", "venv", "dist", "build", ".mypy_cache", ".pytest_cache", ".idea", ".vscode"}
IGNORE_SUFFIXES = {".png", ".jpg", ".jpeg", ".gif", ".webp", ".pdf", ".zip", ".gz", ".tar", ".mp4", ".mov", ".onnx", ".pt", ".bin", ".parquet", ".feather", ".ico", ".svg", ".lock"}
MAX_FILE_BYTES = 250_000
Step 4: Define Task Schemas
Define a set of repository-wide tasks, each paired with a structured JSON schema. This lets the model know exactly what output to produce and allows the app to parse results reliably.
| Task | Description |
|---|---|
| Architecture Overview | Explain architecture, major modules, entrypoints, dependencies |
| Code Duplication | Focus on repeated logic and refactor opportunities |
| Improvement Opportunities | Identify maintainability and modularity gains |
| Testing Gaps | Highlight under-tested areas |
| Onboarding Guide | Explain how a new engineer should approach the codebase |
Step 5: Clone and Pack Repository
Turn a GitHub repository into a single structured prompt. This step handles cloning, filtering, extracting text, and packing everything for the model.
Clone the Repository
Use shallow clone to avoid pulling full history. The output is a local path that the rest of the pipeline operates on.
Filter Irrelevant Files
Exclude system directories, unsupported file types, and optionally filter out tests, documentation, and notebooks.
Build Packed Prompt
Combine repo tree structure with file contents wrapped in markers. Estimate token usage with tiktoken.
Step 6: Model Inference
Send the packed prompt to Nemotron 3 Super via OpenRouter. Instead of chunking or retrieving parts of the repo, we send the entire filtered codebase in a single request.
resp = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": prompt},
],
temperature=0.2,
max_tokens=5000,
)
Step 7: Build Gradio Interface
Wrap everything in a Gradio interface with three tabs: Analyze (input), Report (output tables), and Raw JSON (debugging).
with gr.Blocks(title="Nemotron Repo Analyzer") as demo:
gr.Markdown("# Nemotron Repo Analyzer")
with gr.Tab("Analyze"):
repo_url = gr.Textbox(label="GitHub Repo URL")
branch = gr.Textbox(label="Branch", value="main")
task = gr.Dropdown(choices=TASK_CHOICES, value="Full Review")
analyze_btn = gr.Button("Analyze Repo", variant="primary")
status = gr.Textbox(label="Status", lines=7)
with gr.Tab("Report"):
report_md = gr.Markdown()
modules_df = gr.Dataframe(label="Modules")
analyze_btn.click(analyze_repo, inputs=[repo_url, branch, task], outputs=[status, report_md, modules_df])
demo.launch(debug=True, share=True)
What the App Can Do
The final app can perform the following tasks:
- Clone and preprocess a public GitHub repository
- Estimate token count and check if it fits within the budget
- Run one or multiple repo analysis tasks
- Present structured results through a task-aware UI
- Architecture analysis - identify modules, dependencies, issues
- Code duplication detection - find repeated logic
- Testing gap analysis - highlight under-tested areas
- Onboarding guidance - explain codebase to new engineers
Frequently Asked Questions
Does this tutorial use the full 1M-token Nemotron 3 Super context window?
No. NVIDIA documents up to 1M tokens for official deployments, but OpenRouter's hosted free endpoint currently exposes 262,144 context, which is good enough for analyzing medium-sized repositories.
Why not run Nemotron 3 Super locally?
Running Nemotron 3 Super locally typically requires expensive hardware like H100 GPUs with significant VRAM. Using OpenRouter's hosted endpoint keeps the setup lightweight and accessible.
What is prompt packing vs RAG?
Instead of chunking, embedding, and retrieving relevant pieces, prompt packing puts the entire repository into a single long-context prompt. This works well for tasks needing a global view of the codebase.
Can I analyze private repositories?
This setup uses OpenRouter's free endpoint which logs prompts and outputs. For private repositories, you should run Nemotron locally or use a private API endpoint.
What programming languages are supported?
Nemotron 3 Super can analyze any programming language. The app filters by file type and content, so it works with Python, JavaScript, Java, C++, and more.
Need Help Building AI-Powered Applications?
Our experts can help you build custom AI applications using NVIDIA models, implement prompt engineering strategies, and deploy production-ready LLM solutions.
