Long-context models open up a different way to analyze codebases. Instead of breaking a repository into chunks, embedding them, and retrieving relevant pieces, you can pack an entire small-to-medium repo into a single prompt and let the model reason over it end-to-end. This complete tutorial shows you how to build a GitHub repository analyzer using NVIDIA Nemotron 3 Super with a Gradio interface.

What You'll Learn:

What is NVIDIA Nemotron 3 Super and its architecture
How to set up Nemotron 3 Super via OpenRouter
Building repository cloning and filtering pipeline
Implementing prompt packing for whole-codebase analysis
Creating task schemas for structured JSON outputs
Building a Gradio interface for the repo analyzer

What is NVIDIA Nemotron 3 Super?

Nemotron 3 Super is NVIDIA's open reasoning model for agentic and long-context workflows. It uses a hybrid Mamba-Transformer Latent MoE architecture with 120B total parameters and 12B active parameters, includes multi-token prediction (MTP), and is designed for reasoning-heavy tasks such as coding, tool use, and long-context analysis.

NVIDIA positions it specifically for complex multi-agent applications and whole-codebase reasoning. Some core ideas behind Nemotron 3 Super include:

Hybrid Mamba-Transformer

A hybrid backbone for long-sequence efficiency and precise retrieval.

Latent MoE

Routes tokens through more specialists at the same effective cost.

Multi-Token Prediction

MTP improves generation speed and reasoning quality.

1M Token Context

Native 1M-token context window in NVIDIA supported deployment.

Nemotron 3 Family Overview

NVIDIA's Nemotron 3 family was introduced as an open model family for agentic AI systems, with different sizes aimed at different deployment and reasoning needs.

Model	Use Case
Nemotron 3 Nano	Lighter model for efficient targeted tasks and lower-cost inference
Nemotron 3 Super	Middle tier for dense technical reasoning, complex coding, and long-context analysis
Nemotron 3 Ultra	Larger tier for heavier reasoning and deployment scenarios

For this tutorial, we will use OpenRouter's hosted free endpoint since both the full and quantized variants of the model typically require at least an H100 GPU for practical inference. The hosted endpoint currently supports a 262K token context window (instead of the full 1M), which is still sufficient for analyzing medium-sized repositories.

Step 1: Install Dependencies

Before building the repo analyzer, we need a minimal environment that can handle UI rendering, repository processing, and model interaction. Since we are not running Nemotron locally, the setup stays lightweight.

Install Dependencies

pip install gradio openai tiktoken gitpython orjson pandas

We use the following packages:

openai - Interact with OpenRouter API
gradio - Build the interactive UI
tiktoken - Estimate token usage
orjson - Fast JSON parsing
pandas - Render structured results as tables
gitpython - Clone and work with repositories

Step 2: Initialize the OpenRouter Client

Set up Nemotron 3 Super via OpenRouter and define core constants for repository processing.

Initialize Client

OPENROUTER_API_KEY = os.environ.get("OPENROUTER_API_KEY", "")
MODEL = "nvidia/nemotron-3-super-120b-a12b:free"
WORKDIR = "/content/repo_ui"

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=OPENROUTER_API_KEY,
)
enc = tiktoken.get_encoding("cl100k_base")

Important Notes

OpenRouter logs both prompts and outputs, so use only with public repositories. The available context window is ~262K tokens, so large repositories may require filtering.

Step 3: Repo Filters and System Prompt

We need two safeguards: file filtering and prompt grounding. File filtering prevents wasting context on irrelevant artifacts, while prompt grounding ensures the model stays anchored to the actual code.

File Filters

BASE_IGNORE_DIRS = {".git", ".github", "__pycache__", "node_modules", ".venv", "venv", "dist", "build", ".mypy_cache", ".pytest_cache", ".idea", ".vscode"}

IGNORE_SUFFIXES = {".png", ".jpg", ".jpeg", ".gif", ".webp", ".pdf", ".zip", ".gz", ".tar", ".mp4", ".mov", ".onnx", ".pt", ".bin", ".parquet", ".feather", ".ico", ".svg", ".lock"}

MAX_FILE_BYTES = 250_000

Step 4: Define Task Schemas

Define a set of repository-wide tasks, each paired with a structured JSON schema. This lets the model know exactly what output to produce and allows the app to parse results reliably.

Task	Description
Architecture Overview	Explain architecture, major modules, entrypoints, dependencies
Code Duplication	Focus on repeated logic and refactor opportunities
Improvement Opportunities	Identify maintainability and modularity gains
Testing Gaps	Highlight under-tested areas
Onboarding Guide	Explain how a new engineer should approach the codebase

Step 5: Clone and Pack Repository

Turn a GitHub repository into a single structured prompt. This step handles cloning, filtering, extracting text, and packing everything for the model.

Clone the Repository

Use shallow clone to avoid pulling full history. The output is a local path that the rest of the pipeline operates on.

Filter Irrelevant Files

Exclude system directories, unsupported file types, and optionally filter out tests, documentation, and notebooks.

Build Packed Prompt

Combine repo tree structure with file contents wrapped in markers. Estimate token usage with tiktoken.

Step 6: Model Inference

Send the packed prompt to Nemotron 3 Super via OpenRouter. Instead of chunking or retrieving parts of the repo, we send the entire filtered codebase in a single request.

Model Call

resp = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": prompt},
    ],
    temperature=0.2,
    max_tokens=5000,
)

Step 7: Build Gradio Interface

Wrap everything in a Gradio interface with three tabs: Analyze (input), Report (output tables), and Raw JSON (debugging).

Gradio App

with gr.Blocks(title="Nemotron Repo Analyzer") as demo:
    gr.Markdown("# Nemotron Repo Analyzer")
    with gr.Tab("Analyze"):
        repo_url = gr.Textbox(label="GitHub Repo URL")
        branch = gr.Textbox(label="Branch", value="main")
        task = gr.Dropdown(choices=TASK_CHOICES, value="Full Review")
        analyze_btn = gr.Button("Analyze Repo", variant="primary")
        status = gr.Textbox(label="Status", lines=7)
    with gr.Tab("Report"):
        report_md = gr.Markdown()
        modules_df = gr.Dataframe(label="Modules")
    analyze_btn.click(analyze_repo, inputs=[repo_url, branch, task], outputs=[status, report_md, modules_df])

demo.launch(debug=True, share=True)

What the App Can Do

The final app can perform the following tasks:

Clone and preprocess a public GitHub repository
Estimate token count and check if it fits within the budget
Run one or multiple repo analysis tasks
Present structured results through a task-aware UI
Architecture analysis - identify modules, dependencies, issues
Code duplication detection - find repeated logic
Testing gap analysis - highlight under-tested areas
Onboarding guidance - explain codebase to new engineers

Frequently Asked Questions

Does this tutorial use the full 1M-token Nemotron 3 Super context window?

No. NVIDIA documents up to 1M tokens for official deployments, but OpenRouter's hosted free endpoint currently exposes 262,144 context, which is good enough for analyzing medium-sized repositories.

Why not run Nemotron 3 Super locally?

Running Nemotron 3 Super locally typically requires expensive hardware like H100 GPUs with significant VRAM. Using OpenRouter's hosted endpoint keeps the setup lightweight and accessible.

What is prompt packing vs RAG?

Instead of chunking, embedding, and retrieving relevant pieces, prompt packing puts the entire repository into a single long-context prompt. This works well for tasks needing a global view of the codebase.

Can I analyze private repositories?

This setup uses OpenRouter's free endpoint which logs prompts and outputs. For private repositories, you should run Nemotron locally or use a private API endpoint.

What programming languages are supported?

Nemotron 3 Super can analyze any programming language. The app filters by file type and content, so it works with Python, JavaScript, Java, C++, and more.

Need Help Building AI-Powered Applications?

Our experts can help you build custom AI applications using NVIDIA models, implement prompt engineering strategies, and deploy production-ready LLM solutions.

What You'll Learn:

What is NVIDIA Nemotron 3 Super and its architecture
How to set up Nemotron 3 Super via OpenRouter
Building repository cloning and filtering pipeline
Implementing prompt packing for whole-codebase analysis
Creating task schemas for structured JSON outputs
Building a Gradio interface for the repo analyzer

What is NVIDIA Nemotron 3 Super?

NVIDIA positions it specifically for complex multi-agent applications and whole-codebase reasoning. Some core ideas behind Nemotron 3 Super include:

Hybrid Mamba-Transformer

A hybrid backbone for long-sequence efficiency and precise retrieval.

Latent MoE

Routes tokens through more specialists at the same effective cost.

Multi-Token Prediction

MTP improves generation speed and reasoning quality.

1M Token Context

Native 1M-token context window in NVIDIA supported deployment.

Nemotron 3 Family Overview

NVIDIA's Nemotron 3 family was introduced as an open model family for agentic AI systems, with different sizes aimed at different deployment and reasoning needs.

Model	Use Case
Nemotron 3 Nano	Lighter model for efficient targeted tasks and lower-cost inference
Nemotron 3 Super	Middle tier for dense technical reasoning, complex coding, and long-context analysis
Nemotron 3 Ultra	Larger tier for heavier reasoning and deployment scenarios

Step 1: Install Dependencies

Install Dependencies

pip install gradio openai tiktoken gitpython orjson pandas

We use the following packages:

openai - Interact with OpenRouter API
gradio - Build the interactive UI
tiktoken - Estimate token usage
orjson - Fast JSON parsing
pandas - Render structured results as tables
gitpython - Clone and work with repositories

Step 2: Initialize the OpenRouter Client

Set up Nemotron 3 Super via OpenRouter and define core constants for repository processing.

Initialize Client

OPENROUTER_API_KEY = os.environ.get("OPENROUTER_API_KEY", "")
MODEL = "nvidia/nemotron-3-super-120b-a12b:free"
WORKDIR = "/content/repo_ui"

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=OPENROUTER_API_KEY,
)
enc = tiktoken.get_encoding("cl100k_base")

Important Notes

OpenRouter logs both prompts and outputs, so use only with public repositories. The available context window is ~262K tokens, so large repositories may require filtering.

Step 3: Repo Filters and System Prompt

File Filters

BASE_IGNORE_DIRS = {".git", ".github", "__pycache__", "node_modules", ".venv", "venv", "dist", "build", ".mypy_cache", ".pytest_cache", ".idea", ".vscode"}

IGNORE_SUFFIXES = {".png", ".jpg", ".jpeg", ".gif", ".webp", ".pdf", ".zip", ".gz", ".tar", ".mp4", ".mov", ".onnx", ".pt", ".bin", ".parquet", ".feather", ".ico", ".svg", ".lock"}

MAX_FILE_BYTES = 250_000

Step 4: Define Task Schemas

Define a set of repository-wide tasks, each paired with a structured JSON schema. This lets the model know exactly what output to produce and allows the app to parse results reliably.

Task	Description
Architecture Overview	Explain architecture, major modules, entrypoints, dependencies
Code Duplication	Focus on repeated logic and refactor opportunities
Improvement Opportunities	Identify maintainability and modularity gains
Testing Gaps	Highlight under-tested areas
Onboarding Guide	Explain how a new engineer should approach the codebase

Step 5: Clone and Pack Repository

Turn a GitHub repository into a single structured prompt. This step handles cloning, filtering, extracting text, and packing everything for the model.

Clone the Repository

Use shallow clone to avoid pulling full history. The output is a local path that the rest of the pipeline operates on.

Filter Irrelevant Files

Exclude system directories, unsupported file types, and optionally filter out tests, documentation, and notebooks.

Build Packed Prompt

Combine repo tree structure with file contents wrapped in markers. Estimate token usage with tiktoken.

Step 6: Model Inference

Send the packed prompt to Nemotron 3 Super via OpenRouter. Instead of chunking or retrieving parts of the repo, we send the entire filtered codebase in a single request.

Model Call

resp = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": prompt},
    ],
    temperature=0.2,
    max_tokens=5000,
)

Step 7: Build Gradio Interface

Wrap everything in a Gradio interface with three tabs: Analyze (input), Report (output tables), and Raw JSON (debugging).

Gradio App

with gr.Blocks(title="Nemotron Repo Analyzer") as demo:
    gr.Markdown("# Nemotron Repo Analyzer")
    with gr.Tab("Analyze"):
        repo_url = gr.Textbox(label="GitHub Repo URL")
        branch = gr.Textbox(label="Branch", value="main")
        task = gr.Dropdown(choices=TASK_CHOICES, value="Full Review")
        analyze_btn = gr.Button("Analyze Repo", variant="primary")
        status = gr.Textbox(label="Status", lines=7)
    with gr.Tab("Report"):
        report_md = gr.Markdown()
        modules_df = gr.Dataframe(label="Modules")
    analyze_btn.click(analyze_repo, inputs=[repo_url, branch, task], outputs=[status, report_md, modules_df])

demo.launch(debug=True, share=True)

What the App Can Do

The final app can perform the following tasks:

Clone and preprocess a public GitHub repository
Estimate token count and check if it fits within the budget
Run one or multiple repo analysis tasks
Present structured results through a task-aware UI
Architecture analysis - identify modules, dependencies, issues
Code duplication detection - find repeated logic
Testing gap analysis - highlight under-tested areas
Onboarding guidance - explain codebase to new engineers

Frequently Asked Questions

Does this tutorial use the full 1M-token Nemotron 3 Super context window?

No. NVIDIA documents up to 1M tokens for official deployments, but OpenRouter's hosted free endpoint currently exposes 262,144 context, which is good enough for analyzing medium-sized repositories.

Why not run Nemotron 3 Super locally?

Running Nemotron 3 Super locally typically requires expensive hardware like H100 GPUs with significant VRAM. Using OpenRouter's hosted endpoint keeps the setup lightweight and accessible.

What is prompt packing vs RAG?

Can I analyze private repositories?

This setup uses OpenRouter's free endpoint which logs prompts and outputs. For private repositories, you should run Nemotron locally or use a private API endpoint.

What programming languages are supported?

Nemotron 3 Super can analyze any programming language. The app filters by file type and content, so it works with Python, JavaScript, Java, C++, and more.

Need Help Building AI-Powered Applications?

Our experts can help you build custom AI applications using NVIDIA models, implement prompt engineering strategies, and deploy production-ready LLM solutions.

How to Build a Repo Analyzer with Nemotron 3 Super: Complete Step by Step Guide

What is NVIDIA Nemotron 3 Super?

Hybrid Mamba-Transformer

Latent MoE

Multi-Token Prediction

1M Token Context

Nemotron 3 Family Overview

Step 1: Install Dependencies

Step 2: Initialize the OpenRouter Client

Step 3: Repo Filters and System Prompt

Step 4: Define Task Schemas

Step 5: Clone and Pack Repository

Clone the Repository

Filter Irrelevant Files

Build Packed Prompt

Step 6: Model Inference

Step 7: Build Gradio Interface

What the App Can Do

Frequently Asked Questions

Does this tutorial use the full 1M-token Nemotron 3 Super context window?

Why not run Nemotron 3 Super locally?

What is prompt packing vs RAG?

Can I analyze private repositories?

What programming languages are supported?

Need Help Building AI-Powered Applications?

Need this implemented in your project?

Take the guide with you

Book a 30-min architecture call

Get a free 48-hour written brief

How to Build a Repo Analyzer with Nemotron 3 Super: Complete Step by Step Guide

What is NVIDIA Nemotron 3 Super?

Hybrid Mamba-Transformer

Latent MoE

Multi-Token Prediction

1M Token Context

Nemotron 3 Family Overview

Step 1: Install Dependencies

Step 2: Initialize the OpenRouter Client

Step 3: Repo Filters and System Prompt

Step 4: Define Task Schemas

Step 5: Clone and Pack Repository

Clone the Repository

Filter Irrelevant Files

Build Packed Prompt

Step 6: Model Inference

Step 7: Build Gradio Interface

What the App Can Do

Frequently Asked Questions

Does this tutorial use the full 1M-token Nemotron 3 Super context window?

Why not run Nemotron 3 Super locally?

What is prompt packing vs RAG?

Can I analyze private repositories?

What programming languages are supported?

Need Help Building AI-Powered Applications?

Need this implemented in your project?

Take the guide with you

Book a 30-min architecture call

Get a free 48-hour written brief