How to Deploy an AI App on AMD MI300X as HuggingFace Space
By Braincuber Team
Published on May 12, 2026
The AMD Developer Cloud gets you to a live vLLM API endpoint running on AMD MI300X hardware in under 30 minutes. That is your backend sorted. But a raw API endpoint is not a demo. Judges cannot click on it, teammates cannot try it, and it cannot win the HuggingFace Category Prize. This complete beginner guide picks up from that point. You will build a Gradio chat interface that connects to your vLLM endpoint, push it to HuggingFace as a Space, and end up with a live, publicly accessible demo that anyone can use without touching your GPU.
What You Will Learn:
- How to open port 8000 on your AMD droplet for external access
- How to build a 30-line Gradio chat app with streaming responses
- How to structure the three key files: app.py, requirements.txt, and README.md
- How to test your chat interface locally before pushing to production
- How to create a HuggingFace Space and push your code
- How to configure Space secrets for secure endpoint access
- How to debug common issues like connection refused and timeouts
- How to tag your Space for hackathon visibility and judging
Prerequisites
Running vLLM Endpoint
A working vLLM endpoint on AMD MI300X with the public IP and port (e.g. http://129.x.x.x:8000/v1).
HuggingFace Account
A HuggingFace account with access to create Spaces under the hackathon organisation or your personal namespace.
Python 3.10+
Python 3.10 or higher installed locally for testing the Gradio app before deployment.
SSH Access to Droplet
SSH access to your AMD droplet to open firewall port 8000 and verify the vLLM endpoint status.
Step 1: Open Port 8000 on Your AMD Droplet
By default, the AMD Developer Cloud droplet blocks all ports except 22, 80, and 443. Your Gradio Space needs to reach port 8000 to talk to vLLM. SSH into your droplet and allow port 8000 through the firewall.
Allow Port 8000 and Verify the Endpoint
Run ufw allow 8000 on the droplet to open the port. Then verify the endpoint is reachable from outside with curl -s http://YOUR_DROPLET_IP:8000/v1/models. If you see a JSON response listing your loaded model, the endpoint is publicly accessible.
Step 2: Create the Project Files
Create a new folder called amd-gradio-demo on your local machine. You need exactly three files: app.py, requirements.txt, and README.md.
The Chat Application (app.py)
This is the entire chat application. About 30 lines of Python. The VLLM_BASE_URL and MODEL_NAME are read from environment variables so you do not hardcode your endpoint. You configure them via HuggingFace Space secrets instead. The OpenAI client works directly with vLLM because vLLM exposes an OpenAI-compatible API at /v1. The chat function is a generator that yields partial responses as they stream in, giving you the typing effect in the UI.
import os
import gradio as gr
from openai import OpenAI
VLLM_BASE_URL = os.environ.get("VLLM_BASE_URL", "http://localhost:8000/v1")
MODEL_NAME = os.environ.get("MODEL_NAME", "meta-llama/Llama-3.1-8B-Instruct")
client = OpenAI(base_url=VLLM_BASE_URL, api_key="not-required")
def chat(message, history):
messages = [{"role": "system", "content": "You are a helpful assistant."}]
for item in history:
if isinstance(item, dict):
messages.append({"role": item["role"], "content": item["content"]})
else:
messages.append({"role": "user", "content": item[0]})
if item[1]:
messages.append({"role": "assistant", "content": item[1]})
messages.append({"role": "user", "content": message})
stream = client.chat.completions.create(
model=MODEL_NAME, messages=messages, stream=True,
)
partial = ""
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
partial += delta
yield partial
demo = gr.ChatInterface(
fn=chat,
title="AMD MI300X AI Demo",
description="Chat with an LLM running on AMD MI300X GPU via vLLM.",
examples=["Explain what AMD MI300X is.", "Write a Python hello world."],
cache_examples=False,
)
if __name__ == "__main__":
demo.launch()
requirements.txt and README.md
The requirements file only needs openai>=1.0.0. You do not list Gradio here because HuggingFace Spaces installs it automatically based on the sdk_version in your README. The README contains a YAML block that HuggingFace reads to configure your Space including the SDK, version, app file, and important tags like amd-hackathon-2026 for hackathon discoverability.
openai>=1.0.0
---
title: AMD HuggingFace Demo
emoji: 🚀
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
tags:
- amd
- amd-hackathon-2026
- vllm
- gradio
---
# AMD MI300X AI Demo
A Gradio chat interface connected to a vLLM endpoint running on AMD MI300X GPU.
## Setup
Add these as Space secrets (Settings > Variables and secrets):
| Secret | Value |
|---|---|
| VLLM_BASE_URL | Your AMD vLLM endpoint, e.g. http://your-ip:8000/v1 |
| MODEL_NAME | Model ID loaded by vLLM, e.g. Qwen/Qwen2.5-1.5B-Instruct |
Step 3: Test Locally Before Pushing
Test Locally with Your AMD Endpoint
Install the dependencies in a Python 3.10+ virtual environment: pip install "gradio>=5.0.0" openai. Run the app with your AMD endpoint: VLLM_BASE_URL="http://YOUR_DROPLET_IP:8000/v1" MODEL_NAME="Qwen/Qwen2.5-1.5B-Instruct" python app.py. Open http://127.0.0.1:7860 in your browser and send a message. Testing locally first saves you a round-trip of pushing to the Space, waiting for the build, and debugging in the logs.
Common Problems at Local Test Stage
Connection refused means vLLM is not running inside the container. SSH into the droplet and run docker exec rocm ps aux | grep vllm to check. Timeout means port 8000 is still blocked. Run ufw allow 8000 on the droplet. Model not found means MODEL_NAME does not match the model ID vLLM loaded. Check the exact ID with curl -s http://YOUR_DROPLET_IP:8000/v1/models.
Step 4: Create the HuggingFace Space
Create a New Space on HuggingFace
Go to huggingface.co/new-space. Select the owner (for hackathons use the organisation like lablab-ai-amd-developer-hackathon). Choose a Space name. Set SDK to Gradio. Set Visibility to Public (required for hackathon prize eligibility) or Private during development. Once created, you will have an empty git repository at huggingface.co/spaces/org/your-space-name.
Step 5: Push Your Files to the Space
Upload Files via huggingface_hub or Git
Push your files using the huggingface_hub Python library or via git. With the library, iterate through app.py, requirements.txt, and README.md, calling api.upload_file() for each. Or initialise a git repo, add the remote origin pointing to your Space URL, commit, and push to main. The Space starts building immediately after the push.
Step 6: Add Your Endpoint as Space Secrets
Configure Space Secrets for Secure Access
Your app reads VLLM_BASE_URL and MODEL_NAME from environment variables. Go to your Space, then Settings, then Variables and secrets, then New secret. Add VLLM_BASE_URL with your AMD droplet endpoint and MODEL_NAME with the model ID loaded by vLLM. Add them as Secrets (not Variables) so they remain private. The Space restarts automatically once you save.
| Secret Name | Example Value |
|---|---|
| VLLM_BASE_URL | http://129.x.x.x:8000/v1 |
| MODEL_NAME | Qwen/Qwen2.5-1.5B-Instruct |
Step 7: Verify the Live Space
Open Your Space URL and Send a Message
Open your Space URL at huggingface.co/spaces/org/your-space-name and send a message. You should see streaming responses from the model running on your AMD MI300X. If the Space shows a build error, check the Logs tab. The most common issues are wrong sdk_version in README.md, missing secrets, or port 8000 still blocked on the droplet.
Hackathon Submission Tip
If you are submitting to the AMD Developer Hackathon, make sure your Space is public and tagged with amd-hackathon-2026 before the deadline. The HuggingFace Category Prize goes to the Space with the most likes, so share your link early. The complete demo Space is available at huggingface.co/spaces/lablab-ai-amd-developer-hackathon/amd-huggingface-demo.
What You Built
Live Chat Application
A 30-line Gradio chat app with streaming responses, connected to a vLLM endpoint on AMD MI300X hardware and deployed as a public HuggingFace Space.
Secure Configuration
Endpoint URL and model name stored as HuggingFace Space secrets, keeping your GPU infrastructure private while the demo remains publicly accessible.
Hackathon-Ready Demo
A public, shareable URL that judges and teammates can interact with directly. Tagged with amd-hackathon-2026 for full hackathon discoverability.
Streaming Responses
The chat function is a Python generator that yields partial responses as they stream from vLLM, giving users a natural typing effect in the Gradio UI.
Frequently Asked Questions
Do I need to list Gradio in requirements.txt?
No. HuggingFace Spaces installs Gradio automatically based on the sdk_version field in your README.md YAML block. You only need to list additional dependencies like the OpenAI Python client.
Why does the OpenAI client work with vLLM?
vLLM exposes an OpenAI-compatible API at the /v1 endpoint, so the standard OpenAI Python library works as a drop-in client without any modifications. Set base_url to your vLLM endpoint and api_key to any placeholder value.
My HuggingFace Space shows a build error. What should I check?
Check the Logs tab. The most common issues are wrong sdk_version in README.md, missing VLLM_BASE_URL secret, port 8000 still blocked on the droplet, or vLLM not running inside the container.
How do I make my Space eligible for the HuggingFace Category Prize?
Make your Space public, tag it with amd-hackathon-2026 in the README.md tags section, and submit it before the hackathon deadline. The prize goes to the Space with the most likes, so share your link widely.
Can I deploy this without an AMD MI300X endpoint?
The Gradio app itself works with any OpenAI-compatible API endpoint. You can swap the VLLM_BASE_URL for any provider that exposes an OpenAI-compatible chat completions API, but the tutorial specifically covers AMD MI300X deployment via the AMD Developer Cloud.
Need Help with AI Deployment?
Our experts can help you deploy AI applications on AMD GPUs, configure HuggingFace Spaces, and build production-ready demos for hackathons and enterprise use.
