TOP 5 April 12, 2026, 5:30 p.m.

Gradio 5: Build AI Demo Interfaces in Minutes

Gradio 5 has turned the once‑tedious process of building AI demos into a matter of minutes. Whether you’re a researcher showcasing a new model or a developer prototyping a product, the library now offers a more intuitive API, richer component library, and seamless deployment options. In this guide we’ll walk through the core concepts, build two end‑to‑end demos, and explore real‑world scenarios where Gradio shines.

What’s New in Gradio 5?

Gradio 5 introduces a modular Blocks architecture that lets you compose interfaces from reusable building blocks. The new Theme system provides dark mode, custom colors, and font choices out of the box. Additionally, the library now supports stateful interactions, streaming outputs, and built‑in analytics for tracking usage.

Another highlight is the Live Share feature, which generates a public URL with a single line of code. This eliminates the need for complex tunneling tools like ngrok. Finally, Gradio 5 tightens its integration with Hugging Face Hub, making it effortless to push demos directly to Spaces.

Getting Started: Installation & Setup

Before diving into code, ensure you have Python 3.9 or newer. Install Gradio via pip and optionally add torch or tensorflow depending on your model backend.

pip install gradio==5.0
# Optional: install a deep learning framework
pip install torch torchvision  # for PyTorch models
# or
pip install tensorflow        # for TensorFlow models

After installation, verify the version to confirm you’re running Gradio 5.

import gradio as gr
print(gr.__version__)  # should output 5.0.x

With the environment ready, let’s build our first demo.

Demo 1: Image Classification with a Pre‑trained ResNet

This example shows how to wrap a PyTorch model in a Gradio interface using the new Blocks API. We’ll load a ResNet‑50 model from torchvision, preprocess the input, and display the top‑3 predictions.

Step‑by‑step implementation

Import required libraries and load the model.
Define a prediction function that handles preprocessing and post‑processing.
Construct the UI with an Image input and a Label output.
Launch the demo locally or share it publicly.

import torch
import torchvision.transforms as T
from torchvision import models
import gradio as gr

# 1️⃣ Load pre‑trained ResNet‑50
model = models.resnet50(pretrained=True)
model.eval()

# 2️⃣ Pre‑processing pipeline
preprocess = T.Compose([
    T.Resize(256),
    T.CenterCrop(224),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225]),
])

# 3️⃣ Prediction function
def classify_image(img):
    # img is a PIL.Image object
    tensor = preprocess(img).unsqueeze(0)  # add batch dimension
    with torch.no_grad():
        logits = model(tensor)
    probs = torch.nn.functional.softmax(logits, dim=1)[0]
    # Load ImageNet class names
    with open("imagenet_classes.txt") as f:
        classes = [line.strip() for line in f.readlines()]
    top5_prob, top5_idx = torch.topk(probs, 5)
    return {classes[idx]: float(prob) for idx, prob in zip(top5_idx, top5_prob)}

# 4️⃣ Build the Gradio interface using Blocks
with gr.Blocks(theme="default") as demo:
    gr.Markdown("# ResNet‑50 Image Classifier")
    with gr.Row():
        img_input = gr.Image(label="Upload an image", type="pil")
        label_output = gr.Label(num_top_classes=5, label="Top‑5 predictions")
    img_input.change(fn=classify_image, inputs=img_input, outputs=label_output)

# 5️⃣ Launch
demo.launch(share=True)

The interface presents a clean two‑column layout: upload an image on the left, and instantly see the model’s top‑5 guesses on the right. By setting share=True, Gradio generates a public HTTPS URL that anyone can access without installing anything.

Pro tip: Store the ImageNet class list in a cached file or embed it directly in the script to avoid I/O overhead on every request.

Demo 2: Text Generation with a Hugging Face Transformer

Next, we’ll create a chat‑style text generation demo using the gpt2 model from the Hugging Face Hub. Gradio 5’s streaming output makes the response appear character‑by‑character, mimicking a real conversation.

Preparing the model

Install the transformers library if you haven’t already. We’ll use the pipeline API for simplicity.

pip install transformers

Now, set up the pipeline and a small helper to stream tokens.

from transformers import pipeline
import gradio as gr

# Load the GPT‑2 text‑generation pipeline
generator = pipeline("text-generation", model="gpt2")

def generate_text(prompt, max_length=100):
    # The pipeline returns a list of dictionaries; we take the first result
    result = generator(prompt, max_new_tokens=max_length, do_sample=True, temperature=0.7)[0]["generated_text"]
    # Yield the result progressively for streaming effect
    for i in range(1, len(result) + 1):
        yield result[:i]

Building the streaming UI

Gradio’s Textbox component supports a stream flag that accepts a generator. We’ll pair it with a simple Button to trigger generation.

with gr.Blocks(theme="default") as chat_demo:
    gr.Markdown("## GPT‑2 Text Generator")
    with gr.Row():
        prompt_box = gr.Textbox(label="Prompt", placeholder="Enter a sentence...", lines=2)
        generate_btn = gr.Button("Generate")
    output_box = gr.Textbox(label="Generated Text", lines=10, interactive=False)

    # Bind the button click to the generator function
    generate_btn.click(
        fn=generate_text,
        inputs=prompt_box,
        outputs=output_box,
        stream=True
    )

chat_demo.launch(share=True)

The result is a sleek chat‑like interface where the model’s response unfolds in real time. Users can experiment with different prompts, adjust the max_length parameter, or swap in a larger model (e.g., EleutherAI/gpt-neo-2.7B) with minimal code changes.

Pro tip: For production use, wrap the pipeline in a torch.compile call (PyTorch 2.0) or enable torchscript to shave off latency.

Real‑World Use Cases

Gradio’s flexibility makes it a go‑to solution across many domains. Below are three common scenarios where developers leverage Gradio 5 to accelerate delivery.

Education: Instructors build interactive notebooks that let students experiment with NLP or computer vision models without writing boilerplate code.
Healthcare: Radiologists can upload DICOM images to a Gradio demo that runs a segmentation model, instantly visualizing organ boundaries.
Business Intelligence: Sales teams use a Gradio interface to query a fine‑tuned BERT model for product recommendation explanations, turning black‑box predictions into actionable insights.

Because the demos are web‑based, they can be embedded in learning management systems, internal dashboards, or even shared on social media for broader outreach.

Customizing the Look & Feel

Gradio 5 ships with a theme argument that accepts either a preset name or a dictionary of CSS variables. You can quickly switch to a dark theme, change primary colors, or apply a custom font.

custom_theme = {
    "primary_color": "#2E86AB",
    "background_fill_primary": "#F0F4F8",
    "font": "Inter, sans-serif",
    "border_radius": "8px",
    "shadow": "0 4px 12px rgba(0,0,0,0.1)",
    "dark": True
}

with gr.Blocks(theme=custom_theme) as themed_demo:
    # UI components go here
    ...

For more granular control, you can inject raw CSS via the gr.HTML component. This is useful when aligning the demo with corporate branding guidelines.

custom_css = """
<style>
    .gradio-container { max-width: 800px; margin: auto; }
    .gr-button { background-color: #2E86AB; color: white; }
</style>
"""

with gr.Blocks() as styled_demo:
    gr.HTML(custom_css)
    # rest of the UI
    ...

Advanced Features: State, Streaming, and Batching

Gradio 5 introduces a State component that persists data across interactions. This is essential for building multi‑turn chatbots, iterative image editors, or any workflow where the output of one step feeds into the next.

with gr.Blocks() as multi_step_demo:
    state = gr.State([])  # holds a list of previous messages
    user_input = gr.Textbox(label="Your message")
    send_btn = gr.Button("Send")
    chat_history = gr.Chatbot(label="Conversation")

    def update_chat(message, history):
        history.append(("User", message))
        # Simulate a model reply (replace with actual inference)
        reply = "Echo: " + message
        history.append(("Bot", reply))
        return history, history

    send_btn.click(
        fn=update_chat,
        inputs=[user_input, state],
        outputs=[chat_history, state]
    )

Streaming isn’t limited to text; you can stream audio waveforms or video frames as they are generated. Pair this with the new Audio and Video components to create real‑time speech synthesis demos.

Batching is another performance booster. By setting batch=True on a component, Gradio will accumulate multiple requests and process them together, reducing GPU overhead for large models.

def batch_predict(batch_of_images):
    # batch_of_images is a list of PIL images
    tensors = torch.stack([preprocess(img) for img in batch_of_images])
    with torch.no_grad():
        logits = model(tensors)
    probs = torch.nn.functional.softmax(logits, dim=1)
    # Return top‑1 class for each image
    top_idx = torch.argmax(probs, dim=1)
    return [classes[i] for i in top_idx]

image_input = gr.Image(label="Upload", type="pil", batch=True)
output = gr.Label(num_top_classes=1)
image_input.change(fn=batch_predict, inputs=image_input, outputs=output)

Deploying Gradio Demos at Scale

Once your demo is polished, you have several deployment pathways.

Gradio Share URL: Ideal for quick prototypes; the URL expires after a few hours.
Hugging Face Spaces: Push the entire project to a public or private Space with a single git push. Gradio handles the container build automatically.
Docker: For on‑premises or cloud‑native environments, wrap your app in a lightweight Docker image.
Serverless Functions: Deploy to AWS Lambda or Google Cloud Functions using the gradio-server CLI.

Here’s a minimal Dockerfile for a Gradio app.

# Dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
EXPOSE 7860
CMD ["python", "app.py"]

After building the image, run it with docker run -p 7860:7860 your-image. Gradio will be reachable at http://localhost:7860. For production, consider adding a reverse proxy (NGINX) and TLS termination.

Pro tip: Enable analytics=False in demo.launch() when deploying to environments with strict privacy requirements. This disables Gradio’s built‑in usage tracking.

Testing & CI Integration

Gradio interfaces can be unit‑tested with pytest by calling the underlying prediction functions directly. For end‑to‑end testing, use playwright or selenium to simulate user interactions.

def test_classify_image():
    from PIL import Image
    img = Image.open("tests/cat.jpg")
    result = classify_image(img)
    assert isinstance(result, dict)
    assert "tabby" in result  # assuming a cat image

# Playwright example (pseudo‑code)
# await page.goto(demo_url)
# await page.setInputFiles('input', 'tests/dog.jpg')
# await page.waitForSelector('.gr-label')
# const label = await page.textContent('.gr-label')
# expect(label).toContain('dog')

Integrating these tests into GitHub Actions ensures that every push maintains a functional demo, catching regressions before they reach end users.

Best Practices for Production‑Ready Demos

Model Loading: Load heavy models once at startup, not per request. Use lazy loading if memory is a concern.
Input Validation: Guard against malformed files or excessively long text to prevent DoS attacks.
Rate Limiting: Apply a simple token bucket or integrate with a gateway (e.g., FastAPI + Redis) to throttle abuse.
Logging & Monitoring: Leverage Gradio’s analytics callbacks or export metrics to Prometheus.
Security: Serve over HTTPS, hide API keys using environment variables, and avoid exposing raw model weights.

Pro Tips & Gotchas

1️⃣ Use type="numpy" for fast NumPy array transfer. When dealing with large tensors, sending raw Python lists incurs serialization overhead. Gradio can directly pass NumPy arrays to your function, cutting latency by up to 30 %.

2️⃣ Cache expensive preprocessing. Wrap preprocessing steps with functools.lru_cache if the same inputs recur frequently.

3️⃣ Keep the UI lightweight. Avoid rendering high‑resolution images directly; downscale them client‑side with the Image component’s scale parameter.

4️⃣ Leverage gradio.utils.run_in_thread. For CPU‑bound models,

Share this article