Enterprise teams spend thousands of hours manually extracting data from invoices, contracts, and scanned documents. Gemini 2.5 Flash can read any of those documents natively, and Google's Agent Development Kit gives you a clean framework to turn that capability into a production service. This complete beginner guide walks you through building a document extraction agent, wrapping it in a FastAPI service, containerizing it, and shipping it to a live Vultr server. By the end you will have a public API endpoint that accepts a PDF, image, or text file and returns structured JSON with every relevant field pulled out of the document.

What You Will Learn:

How to get a Gemini API key and set up the project structure
How to define Pydantic response schemas for structured extraction
How to build three ADK extraction tools for invoices, contracts, and general documents
How to wire the tools into an ADK agent with InMemoryRunner
How to build a FastAPI service with file upload and validation
How to containerize the agent with Docker and docker-compose
How to provision a Vultr Cloud Compute instance and deploy
How to test the live API endpoint with curl and inspect responses

What You Will Build

A containerized FastAPI service backed by a Google ADK agent that accepts file uploads (PDF, image, plain text), identifies the document type automatically, calls the appropriate extraction tool (invoice, contract, or general), and returns clean structured JSON. The stack is Python 3.11, Google ADK 1.18, Gemini 2.5 Flash, FastAPI, Docker, and Vultr Cloud Compute.

This kind of document intelligence agent is particularly valuable for AI hackathon 2026 teams, where shipping a working prototype in 48 hours is the whole game. Whether you are building a fintech tool, a legal document processor, or a contract analyzer, this stack gives you a production-ready foundation within a single sprint.

Prerequisites

Gemini API Key

A Google AI Studio account and API key from aistudio.google.com/app/apikey.

Vultr Account

A Vultr account with a billing method added to deploy the containerized agent to a cloud instance.

Docker Installed

Docker installed locally for building and testing the container image before deployment.

Python 3.10+

Python 3.10 or higher and basic familiarity with FastAPI and async Python.

Step 1: Get Your Gemini API Key

Get Your Gemini API Key

Go to aistudio.google.com/app/apikey, sign in, and click Get API Key. Create a new project if prompted, then copy the key. Keep it in a safe place as you will need it for both local development and the Vultr deployment.

Step 2: Set Up the Project

Create Project Directory and Install Dependencies

Create the project directory, set up a Python virtual environment, and create the requirements.txt with google-adk==1.18.0, fastapi, uvicorn, python-multipart, pydantic, and python-dotenv. Install with pip and create a .env file for the API key.

Project Setup Commands

mkdir gemini-multimodal-document-agent
cd gemini-multimodal-document-agent
python3.10 -m venv .venv
source .venv/bin/activate

requirements.txt

google-adk==1.18.0
fastapi>=0.111.0
uvicorn[standard]>=0.29.0
python-multipart>=0.0.9
pydantic>=2.7.0
python-dotenv>=1.0.0

Step 3: Define the Response Schemas

Create app/schemas.py with a Pydantic AnalysisResponse model that defines the shape of the JSON returned by the API endpoint. The response includes the document type, filename, extracted data dictionary, a human-readable summary, and optional processing notes.

app/schemas.py

from pydantic import BaseModel
from typing import Optional, Any


class AnalysisResponse(BaseModel):
    document_type: str
    filename: str
    extracted_data: dict[str, Any]
    summary: str
    processing_notes: Optional[str] = None

Step 4: Build the Extraction Tools

The ADK agent uses function-calling tools to return structured data. Each tool corresponds to a document type. When the agent reads a document, it decides which tool to call and passes every extracted field as typed arguments. The tool writes those arguments into the session state, which we read back after the agent finishes.

The Invoice Extraction Tool

save_invoice_extraction accepts fields like vendor name, invoice number, dates, total amount, currency, line items, payment terms, and billing address. All list parameters are typed as list[str] because the Gemini API requires concrete generic types in tool schemas.

app/tools.py - save_invoice_extraction

def save_invoice_extraction(
    tool_context: ToolContext,
    vendor_name: Optional[str] = None,
    invoice_number: Optional[str] = None,
    invoice_date: Optional[str] = None,
    due_date: Optional[str] = None,
    total_amount: Optional[str] = None,
    currency: Optional[str] = None,
    subtotal: Optional[str] = None,
    tax_amount: Optional[str] = None,
    line_items: Optional[list[str]] = None,
    payment_terms: Optional[str] = None,
    billing_address: Optional[str] = None,
    notes: Optional[str] = None,
) -> str:
    """Save structured data extracted from an invoice document."""
    tool_context.state["extraction_result"] = {
        "document_type": "invoice",
        "extracted_data": {
            "vendor_name": vendor_name,
            "invoice_number": invoice_number,
            "invoice_date": invoice_date,
            "due_date": due_date,
            "total_amount": total_amount,
            "currency": currency,
            "subtotal": subtotal,
            "tax_amount": tax_amount,
            "line_items": line_items or [],
            "payment_terms": payment_terms,
            "billing_address": billing_address,
            "notes": notes,
        },
    }
    return "Invoice extraction saved."

The Contract and General Extraction Tools

save_contract_extraction handles contracts, agreements, NDAs, and MOUs with fields for parties, effective date, expiration date, contract type, key obligations, termination conditions, governing law, and jurisdiction. save_general_extraction handles everything else including reports, images, and plain text with fields for document title, summary, key entities, dates mentioned, key figures, and main topics.

Important: Typed List Parameters

All list parameters must be typed as list[str] rather than just list. The Gemini API generates a JSON schema from your tool's type annotations. An untyped list produces a schema without an items field, which the API rejects with a 400 INVALID_ARGUMENT error.

Step 5: Build the ADK Agent

Create app/agent.py with the agent instruction, runner setup, and the analyze_document function. The agent uses InMemoryRunner for session management, event routing, and LLM calls. types.Part.from_bytes passes raw file bytes directly to Gemini, which reads PDFs, images, and text natively without any preprocessing.

app/agent.py

INSTRUCTION = """
You are an enterprise document intelligence agent. Your job is to analyze
uploaded documents and extract all relevant structured data from them.

When you receive a document, follow these steps:
1. Identify the document type: invoice, contract, or general.
2. Read the document carefully and extract every relevant field.
3. Call exactly ONE of the following tools with the extracted data.
Rules:
- Extract ALL fields you can find. If a field is missing, pass null.
- For line_items in invoices, format as: "Description | Qty | Unit Price | Total"
- For scanned images or photos, read all visible text before extracting.
- Always call one of the save tools. Never respond without calling a tool.
- Be precise with amounts, dates, and names. Do not infer missing values.
"""


def create_runner() -> InMemoryRunner:
    agent = Agent(
        model="gemini-2.5-flash",
        name="document_agent",
        tools=[
            save_invoice_extraction,
            save_contract_extraction,
            save_general_extraction,
        ],
    )
    return InMemoryRunner(agent=agent, app_name="document_agent")


async def analyze_document(
    runner: InMemoryRunner, file_bytes: bytes,
    mime_type: str, filename: str,
) -> dict:
    user_id = "api_user"
    session_id = str(uuid.uuid4())
    await runner.session_service.create_session(
        app_name="document_agent",
        user_id=user_id, session_id=session_id,
    )
    content = types.Content(
        role="user",
        parts=[
            types.Part.from_bytes(data=file_bytes, mime_type=mime_type),
            types.Part.from_text(text=f"Analyze this document: {filename}"),
        ],
    )
    async for _ in runner.run_async(
        user_id=user_id, session_id=session_id, new_message=content,
    ):
        pass
    session = await runner.session_service.get_session(
        app_name="document_agent", user_id=user_id, session_id=session_id,
    )
    result = session.state.get("extraction_result")
    if not result:
        return {"document_type": "unknown", "extracted_data": {},
                "summary": "Could not extract structured data from this document."}
    return result

Step 6: Build the FastAPI Service

Create app/main.py with the FastAPI application, file validation, and the /analyze endpoint. The service supports PDF, JPEG, PNG, WebP, plain text, and Markdown files up to 20 MB. The runner is created once at startup and reused across requests via the lifespan context manager.

app/main.py - FastAPI Endpoint

SUPPORTED_MIME_TYPES = {
    "application/pdf", "image/jpeg", "image/jpg",
    "image/png", "image/webp", "text/plain", "text/markdown",
}
MAX_FILE_SIZE_MB = 20


@asynccontextmanager
async def lifespan(app: FastAPI):
    if not os.getenv("GOOGLE_API_KEY"):
        raise RuntimeError("GOOGLE_API_KEY is not set.")
    app.state.runner = create_runner()
    yield


app = FastAPI(title="Document Intelligence Agent", lifespan=lifespan)


@app.get("/health")
async def health():
    return {"status": "ok"}


@app.post("/analyze", response_model=AnalysisResponse)
async def analyze(file: UploadFile = File(...)):
    if file.content_type not in SUPPORTED_MIME_TYPES:
        raise HTTPException(status_code=415, detail="Unsupported file type.")
    file_bytes = await file.read()
    if len(file_bytes) > MAX_FILE_SIZE_MB * 1024 * 1024:
        raise HTTPException(status_code=413, detail="File too large. Max 20MB.")
    if len(file_bytes) == 0:
        raise HTTPException(status_code=400, detail="Uploaded file is empty.")
    result = await analyze_document(
        runner=app.state.runner, file_bytes=file_bytes,
        mime_type=file.content_type, filename=file.filename or "document",
    )
    return AnalysisResponse(
        document_type=result.get("document_type", "unknown"),
        filename=file.filename or "document",
        extracted_data=result.get("extracted_data", {}),
        summary=_build_summary(result.get("document_type", ""),
                               result.get("extracted_data", {}),
                               file.filename or "document"),
    )

Test it locally before deploying by running uvicorn app.main:app --host 0.0.0.0 --port 8000 and sending a test file with curl. FastAPI also provides an interactive docs UI at http://localhost:8000/docs where you can upload files and inspect responses without writing any curl commands.

Step 7: Containerize with Docker

Create Dockerfile and docker-compose.yml

Create a Dockerfile based on python:3.11-slim that copies requirements.txt, installs dependencies, copies the app directory, and runs uvicorn on port 8000. Then create docker-compose.yml that builds the image, maps port 8000, loads the .env file, and sets restart to unless-stopped. Build and verify locally with docker compose up --build.

Dockerfile

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app/ ./app/

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

docker-compose.yml

services:
  app:
    build: .
    ports:
      - "8000:8000"
    env_file:
      - .env
    restart: unless-stopped

Step 8: Provision a Vultr Instance

Deploy a Vultr Cloud Compute Instance

Add a billing method to your Vultr account first. Then log in to console.vultr.com, click Quick Deploy then Instances. Select Shared CPU, location Amsterdam, image Ubuntu 24.04 LTS, plan vc2-1c-1gb ($5/month), hostname document-agent. Click Deploy and wait about 60 seconds for the status to change from Installing to Running. The IP address and root password appear on the instance overview page.

Step 9: Deploy the Agent on Vultr

SSH Into the Server and Install Docker

SSH into the server using the IP and root password from the dashboard. Install Docker with curl -fsSL https://get.docker.com | sh and enable it with systemctl. Back on your local machine, copy the project to the server with scp -r ./gemini-multimodal-document-agent root@YOUR_VULTR_IP:/opt/document-agent. On the server, create the .env file with your API key and run docker compose up -d --build. The first build takes 2 to 3 minutes.

Step 10: Test the Live API

Send Documents to the Public Endpoint

From your local machine, send a real document with curl -X POST http://YOUR_VULTR_IP:8000/analyze -F "file=@sample_invoice.txt;type=text/plain". The agent identifies the document type and returns structured JSON. Try a contract file to see the agent switch tools automatically and return parties, obligations, termination conditions, and governing law instead.

Expected response from the invoice endpoint:

Sample JSON Response

{
  "document_type": "invoice",
  "filename": "sample_invoice.txt",
  "extracted_data": {
    "vendor_name": "Acme Solutions Ltd.",
    "invoice_number": "INV-2026-0042",
    "invoice_date": "2026-05-05",
    "due_date": "2026-06-04",
    "total_amount": "$6,032.00",
    "currency": "USD",
    "line_items": [
      "API Integration Services | 1 | $2,500.00 | $2,500.00",
      "Cloud Infrastructure Setup | 1 | $1,200.00 | $1,200.00"
    ],
    "payment_terms": "Net 30"
  },
  "summary": "Invoice #INV-2026-0042 from Acme Solutions Ltd. for USD $6,032.00."
}

What Is Happening Under the Hood

When a file hits the /analyze endpoint, here is the execution path: FastAPI reads the file bytes and validates the MIME type. analyze_document creates a new ADK session and sends the file to Gemini via InMemoryRunner. The agent reads the document using Gemini's native multimodal understanding. Based on what it reads, the agent calls one of the three extraction tools. The tool writes structured data into the session state. After the agent finishes, we read that state and return it as JSON.

The key design decision is that the tools do not receive the document. Gemini has already read it from the multimodal message context. The tools only receive the extracted fields as typed arguments, which forces the model to commit to specific values rather than returning freeform text.

Next Steps

Add a Firewall

On Vultr, create a Firewall Group under Network to restrict port 8000 to trusted IPs, or put Nginx in front as a reverse proxy with SSL termination.

Handle Larger Files

For files over 20MB, swap Part.from_bytes for the Gemini Files API and pass a file URI. Gemini supports PDFs up to 1,000 pages.

Add More Document Types

Define a new tool function, for example save_purchase_order_extraction, add it to the agent, and update the instruction to describe when to call it.

Persist Results

Swap InMemorySessionService for a database-backed session service and store extraction results in Postgres or Supabase for audit trails and historical queries.

Frequently Asked Questions

What file types does the /analyze endpoint accept?

PDF, JPEG, PNG, WebP, plain text (.txt), and Markdown (.md) files up to 20 MB. For larger files, swap Part.from_bytes for the Gemini Files API and pass a file URI instead.

Why must list parameters in ADK tools be typed as list[str]?

The Gemini API generates a JSON schema from your tool annotations. An untyped list produces a schema without an items field, which the API rejects with a 400 INVALID_ARGUMENT error. Use list[str] for concrete generic types.

Can I use this stack in an AI hackathon project?

Yes, that is the point. The Docker Compose setup deploys to any cloud instance in one command, and the ADK tool-calling pattern makes it easy to extend with new document types or swap Gemini for another model.

How do I add support for a new document type like purchase orders?

Create a new extraction function in app/tools.py following the same pattern with typed parameters and ToolContext, then add it to the tools list in app/agent.py and update the instruction.

Do I need a GPU or expensive hardware to run this?

No, all the AI processing happens on Google's servers via the Gemini API. The Vultr instance only needs the $5/month plan with 1 vCPU and 1 GB RAM to run FastAPI and Docker.

Need Help with AI Agent Development?

Our experts can help you build multimodal document agents, deploy with Docker on cloud infrastructure, and design production-ready extraction pipelines for your AI hackathon projects.

What You Will Learn:

How to get a Gemini API key and set up the project structure
How to define Pydantic response schemas for structured extraction
How to build three ADK extraction tools for invoices, contracts, and general documents
How to wire the tools into an ADK agent with InMemoryRunner
How to build a FastAPI service with file upload and validation
How to containerize the agent with Docker and docker-compose
How to provision a Vultr Cloud Compute instance and deploy
How to test the live API endpoint with curl and inspect responses

What You Will Build

Prerequisites

Gemini API Key

A Google AI Studio account and API key from aistudio.google.com/app/apikey.

Vultr Account

A Vultr account with a billing method added to deploy the containerized agent to a cloud instance.

Docker Installed

Docker installed locally for building and testing the container image before deployment.

Python 3.10+

Python 3.10 or higher and basic familiarity with FastAPI and async Python.

Step 1: Get Your Gemini API Key

Get Your Gemini API Key

Step 2: Set Up the Project

Create Project Directory and Install Dependencies

Project Setup Commands

mkdir gemini-multimodal-document-agent
cd gemini-multimodal-document-agent
python3.10 -m venv .venv
source .venv/bin/activate

requirements.txt

google-adk==1.18.0
fastapi>=0.111.0
uvicorn[standard]>=0.29.0
python-multipart>=0.0.9
pydantic>=2.7.0
python-dotenv>=1.0.0

Step 3: Define the Response Schemas

app/schemas.py

from pydantic import BaseModel
from typing import Optional, Any


class AnalysisResponse(BaseModel):
    document_type: str
    filename: str
    extracted_data: dict[str, Any]
    summary: str
    processing_notes: Optional[str] = None

Step 4: Build the Extraction Tools

The Invoice Extraction Tool

app/tools.py - save_invoice_extraction

def save_invoice_extraction(
    tool_context: ToolContext,
    vendor_name: Optional[str] = None,
    invoice_number: Optional[str] = None,
    invoice_date: Optional[str] = None,
    due_date: Optional[str] = None,
    total_amount: Optional[str] = None,
    currency: Optional[str] = None,
    subtotal: Optional[str] = None,
    tax_amount: Optional[str] = None,
    line_items: Optional[list[str]] = None,
    payment_terms: Optional[str] = None,
    billing_address: Optional[str] = None,
    notes: Optional[str] = None,
) -> str:
    """Save structured data extracted from an invoice document."""
    tool_context.state["extraction_result"] = {
        "document_type": "invoice",
        "extracted_data": {
            "vendor_name": vendor_name,
            "invoice_number": invoice_number,
            "invoice_date": invoice_date,
            "due_date": due_date,
            "total_amount": total_amount,
            "currency": currency,
            "subtotal": subtotal,
            "tax_amount": tax_amount,
            "line_items": line_items or [],
            "payment_terms": payment_terms,
            "billing_address": billing_address,
            "notes": notes,
        },
    }
    return "Invoice extraction saved."

The Contract and General Extraction Tools

Important: Typed List Parameters

Step 5: Build the ADK Agent

app/agent.py

INSTRUCTION = """
You are an enterprise document intelligence agent. Your job is to analyze
uploaded documents and extract all relevant structured data from them.

When you receive a document, follow these steps:
1. Identify the document type: invoice, contract, or general.
2. Read the document carefully and extract every relevant field.
3. Call exactly ONE of the following tools with the extracted data.
Rules:
- Extract ALL fields you can find. If a field is missing, pass null.
- For line_items in invoices, format as: "Description | Qty | Unit Price | Total"
- For scanned images or photos, read all visible text before extracting.
- Always call one of the save tools. Never respond without calling a tool.
- Be precise with amounts, dates, and names. Do not infer missing values.
"""


def create_runner() -> InMemoryRunner:
    agent = Agent(
        model="gemini-2.5-flash",
        name="document_agent",
        tools=[
            save_invoice_extraction,
            save_contract_extraction,
            save_general_extraction,
        ],
    )
    return InMemoryRunner(agent=agent, app_name="document_agent")


async def analyze_document(
    runner: InMemoryRunner, file_bytes: bytes,
    mime_type: str, filename: str,
) -> dict:
    user_id = "api_user"
    session_id = str(uuid.uuid4())
    await runner.session_service.create_session(
        app_name="document_agent",
        user_id=user_id, session_id=session_id,
    )
    content = types.Content(
        role="user",
        parts=[
            types.Part.from_bytes(data=file_bytes, mime_type=mime_type),
            types.Part.from_text(text=f"Analyze this document: {filename}"),
        ],
    )
    async for _ in runner.run_async(
        user_id=user_id, session_id=session_id, new_message=content,
    ):
        pass
    session = await runner.session_service.get_session(
        app_name="document_agent", user_id=user_id, session_id=session_id,
    )
    result = session.state.get("extraction_result")
    if not result:
        return {"document_type": "unknown", "extracted_data": {},
                "summary": "Could not extract structured data from this document."}
    return result

Step 6: Build the FastAPI Service

app/main.py - FastAPI Endpoint

SUPPORTED_MIME_TYPES = {
    "application/pdf", "image/jpeg", "image/jpg",
    "image/png", "image/webp", "text/plain", "text/markdown",
}
MAX_FILE_SIZE_MB = 20


@asynccontextmanager
async def lifespan(app: FastAPI):
    if not os.getenv("GOOGLE_API_KEY"):
        raise RuntimeError("GOOGLE_API_KEY is not set.")
    app.state.runner = create_runner()
    yield


app = FastAPI(title="Document Intelligence Agent", lifespan=lifespan)


@app.get("/health")
async def health():
    return {"status": "ok"}


@app.post("/analyze", response_model=AnalysisResponse)
async def analyze(file: UploadFile = File(...)):
    if file.content_type not in SUPPORTED_MIME_TYPES:
        raise HTTPException(status_code=415, detail="Unsupported file type.")
    file_bytes = await file.read()
    if len(file_bytes) > MAX_FILE_SIZE_MB * 1024 * 1024:
        raise HTTPException(status_code=413, detail="File too large. Max 20MB.")
    if len(file_bytes) == 0:
        raise HTTPException(status_code=400, detail="Uploaded file is empty.")
    result = await analyze_document(
        runner=app.state.runner, file_bytes=file_bytes,
        mime_type=file.content_type, filename=file.filename or "document",
    )
    return AnalysisResponse(
        document_type=result.get("document_type", "unknown"),
        filename=file.filename or "document",
        extracted_data=result.get("extracted_data", {}),
        summary=_build_summary(result.get("document_type", ""),
                               result.get("extracted_data", {}),
                               file.filename or "document"),
    )

Step 7: Containerize with Docker

Create Dockerfile and docker-compose.yml

Dockerfile

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app/ ./app/

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

docker-compose.yml

services:
  app:
    build: .
    ports:
      - "8000:8000"
    env_file:
      - .env
    restart: unless-stopped

Step 8: Provision a Vultr Instance

Deploy a Vultr Cloud Compute Instance

Step 9: Deploy the Agent on Vultr

SSH Into the Server and Install Docker

Step 10: Test the Live API

Send Documents to the Public Endpoint

Expected response from the invoice endpoint:

Sample JSON Response

{
  "document_type": "invoice",
  "filename": "sample_invoice.txt",
  "extracted_data": {
    "vendor_name": "Acme Solutions Ltd.",
    "invoice_number": "INV-2026-0042",
    "invoice_date": "2026-05-05",
    "due_date": "2026-06-04",
    "total_amount": "$6,032.00",
    "currency": "USD",
    "line_items": [
      "API Integration Services | 1 | $2,500.00 | $2,500.00",
      "Cloud Infrastructure Setup | 1 | $1,200.00 | $1,200.00"
    ],
    "payment_terms": "Net 30"
  },
  "summary": "Invoice #INV-2026-0042 from Acme Solutions Ltd. for USD $6,032.00."
}

What Is Happening Under the Hood

Next Steps

Add a Firewall

On Vultr, create a Firewall Group under Network to restrict port 8000 to trusted IPs, or put Nginx in front as a reverse proxy with SSL termination.

Handle Larger Files

For files over 20MB, swap Part.from_bytes for the Gemini Files API and pass a file URI. Gemini supports PDFs up to 1,000 pages.

Add More Document Types

Define a new tool function, for example save_purchase_order_extraction, add it to the agent, and update the instruction to describe when to call it.

Persist Results

Swap InMemorySessionService for a database-backed session service and store extraction results in Postgres or Supabase for audit trails and historical queries.

Frequently Asked Questions

What file types does the /analyze endpoint accept?

PDF, JPEG, PNG, WebP, plain text (.txt), and Markdown (.md) files up to 20 MB. For larger files, swap Part.from_bytes for the Gemini Files API and pass a file URI instead.

Why must list parameters in ADK tools be typed as list[str]?

Can I use this stack in an AI hackathon project?

How do I add support for a new document type like purchase orders?

Create a new extraction function in app/tools.py following the same pattern with typed parameters and ToolContext, then add it to the tools list in app/agent.py and update the instruction.

Do I need a GPU or expensive hardware to run this?

No, all the AI processing happens on Google's servers via the Gemini API. The Vultr instance only needs the $5/month plan with 1 vCPU and 1 GB RAM to run FastAPI and Docker.

Need Help with AI Agent Development?

Our experts can help you build multimodal document agents, deploy with Docker on cloud infrastructure, and design production-ready extraction pipelines for your AI hackathon projects.

How to Build a Gemini Multimodal Document Agent for AI Hackathons

What You Will Build

Prerequisites

Gemini API Key

Vultr Account

Docker Installed

Python 3.10+

Step 1: Get Your Gemini API Key

Get Your Gemini API Key

Step 2: Set Up the Project

Create Project Directory and Install Dependencies

Step 3: Define the Response Schemas

Step 4: Build the Extraction Tools

The Invoice Extraction Tool

The Contract and General Extraction Tools

Step 5: Build the ADK Agent

Step 6: Build the FastAPI Service

Step 7: Containerize with Docker

Create Dockerfile and docker-compose.yml

Step 8: Provision a Vultr Instance

Deploy a Vultr Cloud Compute Instance

Step 9: Deploy the Agent on Vultr

SSH Into the Server and Install Docker

Step 10: Test the Live API

Send Documents to the Public Endpoint

What Is Happening Under the Hood

Next Steps

Add a Firewall

Handle Larger Files

Add More Document Types

Persist Results

Frequently Asked Questions

What file types does the /analyze endpoint accept?

Why must list parameters in ADK tools be typed as list[str]?

Can I use this stack in an AI hackathon project?

How do I add support for a new document type like purchase orders?

Do I need a GPU or expensive hardware to run this?

Need Help with AI Agent Development?

Need This Implemented in Your Project?

How to Build a Gemini Multimodal Document Agent for AI Hackathons

What You Will Build

Prerequisites

Gemini API Key

Vultr Account

Docker Installed

Python 3.10+

Step 1: Get Your Gemini API Key

Get Your Gemini API Key

Step 2: Set Up the Project

Create Project Directory and Install Dependencies

Step 3: Define the Response Schemas

Step 4: Build the Extraction Tools

The Invoice Extraction Tool

The Contract and General Extraction Tools

Step 5: Build the ADK Agent

Step 6: Build the FastAPI Service

Step 7: Containerize with Docker

Create Dockerfile and docker-compose.yml

Step 8: Provision a Vultr Instance

Deploy a Vultr Cloud Compute Instance

Step 9: Deploy the Agent on Vultr

SSH Into the Server and Install Docker

Step 10: Test the Live API

Send Documents to the Public Endpoint

What Is Happening Under the Hood

Next Steps

Add a Firewall

Handle Larger Files

Add More Document Types

Persist Results

Frequently Asked Questions

What file types does the /analyze endpoint accept?

Why must list parameters in ADK tools be typed as list[str]?

Can I use this stack in an AI hackathon project?

How do I add support for a new document type like purchase orders?

Do I need a GPU or expensive hardware to run this?

Need Help with AI Agent Development?

Need This Implemented in Your Project?